You are on page 1of 23

EUROPEAN

JOURNAL
OF OPERATIONAL
RESEARCH
ELSEVIER European Journal of Operational Research IO7 ( 1998) 507-529

Theory and Methodology

Multi-attribute decision making:


A simulation comparison of select methods
Stelios H. Zanakis aY* , Anthony Solomon b, Nicole Wishart a, Sandipa Dublish ’
a Decision Sciences and Information Systems Department, College of Business Administration, Florida International University,
Miami, FL 33199, USA
’ Decision & Information Science Department, Oakland University, Rochester, MI 4U309. USA
’ Marketing Department, Fairleigh Dickinson IJniversiQ, Teaneck, NJ 07666, USA

Received 7 August 1996; accepted 18February 1997

Abstract

Several methods have been proposed for solving multi-attribute decision making problems (MADM). A major criticism
of MADM is that different techniques may yield different results when applied to the same problem. The problem
considered in this study consists of a decision matrix input of N criteria weights and ratings of L alternatives on each
criterion. The comparative performance of some methods has been investigated in a few, mostly field, studies. In this
simulation experiment we investigate the performance of eight methods: ELECTRE, TOPSIS, Multiplicative Exponential
Weighting (MEW), Simple Additive Weighting (SAW), and four versions of AHP (original vs. geometric scale and right
eigenvector vs. mean transformation solution). Simulation parameters are the number of alternatives, criteria and their
distribution. The solutions are analyzed using twelve measures of similarity of performance. Similarities and differences in
the behavior of these methods are investigated. Dissimilarities in weights produced by these methods become stronger in
problems with few alternatives; however, the corresponding final rankings of the alternatives vary across methods more in
problems with many alternatives. Although less significant, the distribution of criterion weights affects the methods
differently. In general, all AHP versions behave similarly and closer to SAW than the other methods. ELECTRE is the least
similar to SAW (except for closer matching the top-ranked alternative), followed by MEW. TOPSIS behaves closer to AHP
and differently from ELECTRE and MEW, except for problems with few criteria. A similar rank-reversal experiment
produced the following performance order of methods: SAW and MEW (best), followed by TOPSIS, AHPs and ELECTRE.
It should be noted that the ELECTRE version used was adapted to the common MADM problem and therefore it did not
take advantage of the method’s capabilities in handling problems with ordinal or imprecise information. 0 1998 Elsevier
Science B.V.
Keywords: Multiple criteria analysis; Decision theory; Utility theory; Simulation

1. Introduction usually conflicting criteria. MCDM problems are


commonly categorized as continuous or discrete, de-
Multiple criteria decision making (MCDM) refers
pending on the domain of alternatives. Hwang and
to making decisions in the presence of multiple,
Yoon (1981) classify them as (i) Multiple Attribute
Decision Making (MADM), with discrete, usually
* Corresponding author. Fax: + I-305-348-4126; e-mail: limited, number of prespecified alternatives, requir-
zanakis@servms.fiu.edu. ing inter and intra-attribute comparisons, involving

0377-2217/98/$19.00 0 1998 Elsevier Science B.V. All rights reserved


PII SO377-2217(97)00147-l
508 S.H. Zanakis et al. / European Journal ofOperational Research 107 (1998) 507-529

implicit or explicit tradeoffs; and (ii) Multiple Objec- cessing between DM, even under similar prefer-
tive Decision Making (MODM), with decision vari- ences. Other researchers have argued the opposite;
able values to be determined in a continuous or namely that, given a type of problem, the solutions
integer domain, of infinite or large number of choices, obtained by different MADM methods are essen-
to best satisfy the DM constraints, preferences or tially the same (Belton, 1986; Timmermans et al.,
priorities. MADM methods have also been used for 1989; Karni et al., 1990; Goicoechea et al., 1992;
combining good MODM solutions based on DM Olson et al., 1995). Schoemaker and Waid (1982)
preferences (Kok, 1986; Kok and Lootsma, 1985). found different additive utility models produce gen-
In this paper we focus on MADM which is used erally different weights, but predicted equally well
in a finite ‘selection’ or ‘choice’ problem. In litera- on the average. Practitioners seem to prefer simple
ture, the term MCDM is often used to indicate and transparent methods, which, however, are un-
MADM, and sometimes MODM methods. To avoid likely to represent weight trade-offs that users are
any ambiguity we would hence forth use the term willing to make (Hobbs et al., 1992).
MADM when referring to a discrete MCDM prob- The wide variety of available techniques, of vary-
lem. Methods involving only ranking discrete alter- ing complexity and possibly solutions, confuses po-
natives with equal criteria weights, like voting tential users. Several MADM methods may appear to
choices, will not be examined in this paper. be suitable for a particular decision problem. Hence
Churchman et al. (1957) were among the earlier the user faces the task of selecting the most appropri-
academicians to look at the MADM problem for- ate method from among several alternative feasible
mally using a simple additive weighting method. methods.
Over the years different behavioral scientists, opera- The need for comparing MCDM methods and the
tional researchers and decision theorists have pro- importance of the selection problem were probably
posed a variety of methods describing how a DM first recognized by MacCrimmon (1973) who sug-
might arrive at a preference judgment when choosing gested a taxonomy of MCDM methods. More re-
among multiple attribute alternatives. For a survey of cently several authors have outlined procedures for
MCDM methods and applications see Stewart (1992) the selection of an appropriate MCDM method such
and Zanakis et al. (1995). as Ozernoy (1992), Hwang and Yoon (1981), Hobbs
Gershon and Duckstein (1983) state that the major (1986), Ozernoy (1987). These classifications are
criticism of MADM methods is that different tech- primarily driven by the input requirements of the
niques yield different results when applied to the method (type of information that the DM must pro-
same problem, apparently under the same assump- vide and the form in which it must be provided).
tions and by a single DM. Comparing 23 cardinal Very often these classifications serve more as a tool
and 9 qualitative aggregation methods, Voogd (1983) for elimination rather than selection of the ‘right’
found that, at least 40% of the time, each technique method. The use of expert systems has also been
produced a different result from any other technique. advocated for selecting MCDM methods (Jelassi and
The inconsistency in such results occurs because: Ozernoy, 1988).
Our literature search revealed that a limited num-
(a> the techniques use weights differently in their ber of works has been done in terms of comparing
calculations; and integrating the different methods. Denpontin et
(b) algorithms differ in their approach to selecting al. (1983) developed a comprehensive catalogue of
the ‘best’ solution; the different methods, but concluded that it was
cc> many algorithms attempt to scale the objec- difficult to fit the methods in a classification schema
tives, which affects the weights already chosen; since “decision studies varied so much in quantity,
(d) some algorithms introduce additional parame- quality and precision of information.” Many authors
ters that affect which solution will be chosen. stress the validity of the method as the key criterion
for choosing it. Validity implies that the method is
This is compounded by the inherent differences in likely to yield choices that accurately reflect the
experimental conditions and human information pro- values of the user (Hobbs et al., 1992). However
S.H. Zunakis et al/European Journal of Operational Research 107 (1998) 507-529 509

there is no absolute, objective standard of validity as are not affected significantly by the choice of
preferences can be contradictory when articulated in decision maker or which of these methods is
different ways. Researchers often measure validity used.” The fact that judgments were elicited from
by checking how well a given method predicts the working professionals in one study and graduate
unaided decisions made independently of judgments students in the other may explain partially the
used to fit the model (Schoemaker and Waid, 1982; discrepancy.
Currim and Sarin, 1984). Decision scientists question (f) It is impossible or difficult to answer questions
the applicability of this criterion, particularly in com- like:
plex problems that will cause users to adopt less 1. Which method is more appropriate for what
rational heuristics and to be inconsistent. Studies in type of problem?
decision making have shown that the efficiency of a 2. What are the advantages/disadvantages of us-
decision made has an inverted U shaped relationship ing one method over another?
with the amount of information provided (Kok, 1986; 3. Does a decision change when using different
Gemunden and Hauschildt, 1985). methods? If yes, why and to what extent?
Researchers, who have attempted the task of com-
paring the different MADM methods have used ei- The above limitations may be overcome via simula-
ther real life cases or formulated a real life like tion. However, since they cannot capture human
problem and presented it to a selected group of users idiosyncrasies, their findings should supplement
(Currim and Satin, 1984; Gemunden and Hauschildt, rather than substitute those of the field experiments.
1985; Belton, 1986; Roy and Bouyssou, 1986; Hobbs, We have found only three simulation studies com-
1986; Buchanan and Daellenbach, 1987; Lockett and paring solely AHP type methods.
Stratford, 1987; Stillwell et al., 1987; Karni et al., Zahedi (1986) generated symmetric AHP and
1990; Stewart, 1992; Goicoechea et al., 1992). Such asymmetric matrices of size 6 and 22 from uniform,
field experiments are valuable tools for comparing gamma and lognormal distributions, with muhiplica-
MADM methods, based on user reactions. If prop- tive error term. Criteria weights were derived using
erly designed, they assess the impact of human six methods: Right eigenvalue, row and column geo-
information processing and judgmental decision metric means, harmonic mean, simple row average,
making, beyond the nature of the methods employed. and row average of columns normalized first by their
Users may compare these methods along different sum (called mean transformation method). The accu-
dimensions, such as perceived simplicity, trustwor- racy of the corresponding weight and rank estimators
thiness, robustness and quality. However, field stud- was evaluated using MAE, MSE, Variance and
ies have the following limitations and disadvantages: Theil’s coefficient. She concluded that, when the
input matrix is symmetric, the mean transformation
(a) The sample size and range of problems studied method outperformed all other methods in accuracy,
is very limited. rank preservation and robustness toward error distri-
(b) The subjects are often students, rather than real bution. Differences between methods were notice-
decision makers. able only under a gamma error distribution, where
(c) The way the information is elicited may influ- the eigenvalue method did poorly, while the row
ence the results more than the model used (Olson geometric mean exhibited better rank preservation
et al., 1995). with large-size matrix. All methods performed
(d) The learning effect biases outcomes, especially equally well (except simple row average) and much
when a subject employs various methods sequen- better when errors had a uniform than lognormal
tially (Kok, 1986). distribution.
(e> Inherent human differences led Hobbs et al. Takeda et al. (1987) conducted an AHP simula-
(1992) to conclude that “decisions can be as or tion study, with multiplicative random errors, to
more sensitive to the method used as to which evaluate different eigen-weight vectors. They advo-
person applies it.” However, in a similar study, cate using their graded eigenvector method over
Goicoechea et al. (1992) concluded that “rankings Saaty’s simpler right eigenvector approach.
510 S.H. Zunakis et al./ European Journal of Operational Research 107 (1998) 507-529

Triantaphyllou and Mann (1989) simulated ran- add to one. Different MADM methods will be exam-
dom AHP matrices of 3-21 criteria and alternatives. ined for eliciting these judgments and aggregating
Each problem was solved using four methods: them into an overall score S, for each alternative.
Weighted sum model (WSM), weighted product Then, the overall evaluation (weight) of each alterna-
model (WPM), right-eigenvector AHP and AHP re- tive will be W, = S,/CS,, leading to a final ranking
vised by normalizing each column by the maximum of all alternatives. Development of a cardinal mea-
rather the sum of its elements, according to Belton sure of overall preference of alternatives (S;) have
and Gear (1984) suggestion for reducing rank rever- been criticized by advocates of outranking methods
sals. Solutions were compared against the WSM as not reliably portraying true or incomplete prefer-
benchmark and rate of change in best alternative ences. Such methods establish measures of outrank-
when a nonoptimal alternative is replaced by a worse ing relationships among pairs of alternatives, leading
one. They concluded that the revised AHP appears to to a complete or partial ordering of alternatives.
perform closest to the WSM; AHP tends to behave
like WSM as the number of alternatives increases;
and that the rate of change does not depend on the 2. Methods compared
number of criteria.
Of the many MADM methods available we have
The first two studies are limited to a single AHP
chosen the following five for comparison in our
matrix; i.e. different methods for deriving weights
research, when applied to solve the same problem
only for the criteria or only for the alternatives under
with the decision matrix information stated earlier:
a single criterion - not simultaneously for the entire
MADM problem. And all three are limited to vari- 1. Simple Additive Weighting (SAW): Si = Cjcjri,.
ants of the AHP. A further limitation of the third 2. Multiplicative Exponent Weighting (MEW ): Si =
study is that it employs only two measures of perfor- n, rz.
mance: The percentage contradiction between a 3. Analytic Hierarchy Process (AHP) - four ver-
method’s rankings to WSM, and the rate of rank sions.
reversal of top priority. There is clearly a need for a 4. ELECTRE.
simulation study comparing also other MADM type 5. TOPSIS (Technique for Preference by Similarity
methods, using various measures of performance. to the Ideal Solution).
Our work in that regard is explained in the next
section. The MADM problem under consideration is The rationale for selection has been that most of
depicted by the following DM matrix of preferences these are among the more popular and widely used
for m alternatives rated on n criteria: methods and each method reflects a different ap-
proach to solve MADM problems. SAW’s simplicity
Criterion
makes it very popular to practitioners (Hobbs et al.,
Alternative c, c2 ... cJ ... cN 1992, Zanakis et al., 1995). MEW is a theoretically
1 rt* ... rIj ... rl attractive contrast against SAW. However, it has not
rll N
been applied often, because of its practitioner-unat-
2 r21 rz2 ... rlj ... r2N
tractive mathematical concept, yet in spite of its
scale invariant property (depends only on the ratio of
i r 11 ri2 ... rij ... riN ratings of alternatives). TOPSIS (Hwang and Yoon,
1981) is an exception in that it is not widely used;
we have included it because it is unique in the way it
r, .
TLI rL2
. ... rLj . ._ ‘LN
approaches the problem and is intuitively appealing
and easy to understand. Its fundamental premise is
Where c, is the importance (weight) of the jth that the best alternative, say ith, should have the
criterion and rij is the rating of the ith alternative on shortest Euclidean distance S,! = [C(rij - r,?)2]1’2
the jth criterion. As commonly done, we will as- from the ideal solution (r,?, made up of the best
sume that the latter are column normalized, to also value for each attribute regardless of alternative) and
S.H. Zmakis et al./ European Journal of Operational Research 107 (1998) 507-529 511

the farthest distance S; = [C(rjj - r,:)2]‘/2 from the additive aggregation to global weights. Normaliza-
negative-ideal solution (r,:, made up of the worst tion of the decision matrix is necessary to handle
value for each attribute). The alternative with the different types of attributes (e.g. benefits vs. costs) in
highest relative closeness measure S,T/<.S,T + S,:) is all methods, except ELECTRE which can also han-
chosen as best. In a sense, S,? and S,: correspond to dle ordinal or descriptive (imprecise) information
ELECTRE’s concordance and discordance indexes. and criteria importance not adding up to one. TOP-
The ELECTRE method is much more popular in SIS uses the Euclidean norm to normalize the deci-
Europe than in the US. Proponents argue that its sion matrix, while the regular AHP normalizes
outranking concept is more relevant to practical situ- weights by dividing them by their sum. ELECTRE’s
ations than the restrictive dominance concept. It output differs from the other methods, in that it does
elicits from the DM, for each pair of alternatives, a not provide a global preference of alternatives, but a
concordance and discordance index. The first repre- partial (sometimes complete) ranking of alternatives.
sents the sum of weights of attributes for which In that sense, ELECTRE results can be compared to
alternative A is better than B. The second denotes the final ranking of alternatives produced by the
the absolute difference of this pair of attributes di- other methods. This common ‘denominator ap-
vided by the maximum difference over all pairs. By proach’ overlooks some of the ELECTRE’s advan-
establishing threshold values for these two indices, tages of dealing with different or less precise situa-
one can generate a set of alternatives that is not tions via binary relationships. However, it is of
outranked by other alternatives. In our simulation interest in building computerized evaluation and DSS
experiments, we set these threshold values as the (Pomerol, 1993) for handling the common problem
average of each index over all pairs of alternatives, defined by the earlier decision matrix; namely, a
as suggested by Hwang and Yoon (1981). In order to decision matrix of explicitly rated alternatives and
obtain an overall ranking of the alternatives in our criteria weights. Many MCDA methods have been
experiment, the procedure was reapplied to all alter- developed over the years, but little is known about
natives in the same bracket (dominated or non- their relative merits on similar problems. Surveys of
dominated). MCDM research status point to needs of more vali-
In the case of AHP we tested four versions using dation studies, choice of aggregation procedures
two different methods for obtaining the relative based on problem characteristics, as well as simple,
weights (right eigenvalue vs. mean transformation, understandable, and usable approaches for solving
as in Zahedi, 1986) and two different types of scale: MCDM and MAUT problems (Dyer et al., 1992;
AHP original scale: Stewart, 1992).
The methods examined in this experiment have
1 2 3 4 5 6 7 8 9
been contrasted in field studies by other researchers.
AHP geometric scale :
Olson et al. (1995) used a single problem to examine
e0.5 el.O e1.5 e2.0 e2.5 e3.0 e3.5 e4.0
1 how a group of students used and compared software
Geometric scales have been advocated over Saaty’s implementing MAUT, SAW, AHP and ZAPROS - a
original AHP scale, because of their transitivity and procedure of ordinal tradeoffs with additive value
larger value span found in many physical situations, function, whose parameters are not explicitly deter-
resulting in more robust selections (Legrady et al., mined. Several other field studies (but no simulation
1984). study) have compared ELECTRE to one or more of
Our choice of methods in this simulation study the other methods. Karni et al. (1990) concluded that
may seem strange at first. They require different ELECTRE, AHP and SAW rankings did not differ
input preference information or scales and aim at significantly in three real life case studies. Lootsma
different outputs. SAW and MEW, assume additive (1990) contrasted AHP and ELECTRE as represent-
and multiplicative weighted preferences in an inter- ing the American and French schools in MCDA
val scale. AHP employs a ratio scale to elicit pair- thought found to be “unexpectedly close to each
wise comparisons of alternatives on each criterion other.” In extensive field studies Hobbs et al. (1992)
(even without explicitly rating each pair) and an and Goicoechea et al. (1992) had graduate students
512 S.H. Zmukis et d/European Journul of Operational Research 107 (19981507-529

and U.S. Army Corps Engineers evaluate AHP, The following parameters were chosen for our
ELECTRE, SAW and other methods on water supply simulation:
planning studies. Their results were contradictory;
the first found perceived differences across methods 1. Number of criteria N: 5 10 15 20.
and users, while the latter study did not. Finally, 2. Number of alternatives L: 3 5 7 9.
Comes (1989) compared ELECTRE to his method 3. Ratings of alternatives rjj: randomly generated
TODIM (a combination of direct rating, AHP from a uniform distribution in O-l
weighting and dominance ordering rules) on a trans- 4. Weights of criteria c,: set all equal (l/N), ran-
portation problem and concluded that both methods domly generated from a uniform distribution in
produced essentially the same ranking of alterna- O-l (std. dev. l/12) or from a ‘beta’ U-shaped
tives. The above findings highlight our motivation distribution in O-l (std. dev. l/24).
and justification for undertaking this simulation 5. Number of replications: 100 for each combina-
study. Our major objective was to conduct an exten- tion, thus producing 4 criteria levels X 4 alterna-
sive numerical comparison of several MCDA meth- tive levels X 3 weight distributions X 100 replica-
ods, contrasted in several field studies, when applied tions = 4800 problems, resulting in a total of
to a common problem (a decision matrix of explic- 38,400 solutions, across eight approaches - four
itly rated alternatives and criteria weights) and deter- methods plus AHP with four versions.
mine when and how their solutions differ.
An explanation of these choices is in order. The
range for the number of criteria and alternatives is
3. Simulation experiment typical of those found in many applications. This is
representative of a typical MADM problem, where a
few alternatives are evaluated on the basis of a wide
According to Hobbs et al. (1992) a good experi-
set of criteria, as explained below. Many empirical
ment should satisfy the following conditions:
studies on the size of the evoked set in the consumer
and industrial market context have shown that the
(a) Compare methods that are widely used, repre- number of intensely discussed alternatives does not
sent divergent philosophies of decision making or exceed 4-5 (Gemunden and Hauschildt, 1985). In
claimed to represent important methodological im- practice a simple check-list of desirable features will
provements. rule out unacceptable alternatives early, thus leaving
(b) Address the question of appropriateness, ease for consideration only a small number. The number
of use and validity. of criteria, though, can be considerably higher. Three
(c) Well controlled, uses large samples and is distributions for weights were assumed: No distribu-
replicable. tion, i.e. all weights equal to l/N (class of problems
(d) Compares methods across a variety of prob- where criteria are replaced by judges or voters of
lems. equal impact); uniform distribution, which may re-
(e) Problems involved are realistic. flect an unbiased, indecisive or uninformed user; and
a U shape distribution, which may typify a biased
Our simulation experiment satisfies all conditions user, strongly favoring some issues while rigidly
except the second one. opposing others. Under group pressure, similar situa-
Computer simulation was used for the purpose of tion may not arise often in openly supporting pet
comparing the MADM methods. The reason for projects. For this reason and in order to keep this
using simulation was that it is a flexible and versatile simulation size manageable, we considered only one
method which allows us to generate a range of distribution (uniform) for ratings under each crite-
problems, and replicate them several times. This rion.
provides a vast database of results from which we Additional care was taken during the data genera-
can study the patterns of solutions provided by the tion phase. The ratio of any two criteria weights or
different methods. alternative ratings should not be extremely high or
S.H. Zanakis et ul. / Eurc~peun Joumul of Operutionul Reseurch 107 (1998) 507-529 513

extremely low; this will avoid pathological cases or for selecting SAW as the benchmark is that its
scale-induced imbalances between methods, whose simplicity makes it extremely popular in practice.
performance then deteriorates (Zahedi, 1986). After For each method, the following measures of similar-
some experimentation, this was set at 75 (and l/75), ity were computed on its final evaluation (weights or
one step beyond the maximum e4 of the geometric ranks) against those of the SAW method, averaged
AHP scale. Symmetric reciprocal matrices were ob- over all alternatives in the problem:
tained from these ratio entries for the AHP methods.
No alternative was kept if it was dominating all 1. Mean squared error of weights (MSEW) and the
others on every criterion, or if it was dominated by same for ranks (MSER).
another alternative on all criteria. For each criterion, 2. Mean Absolute error of weights (MAEW) and the
all weights were normalized to add up to one. Simi- same for ranks (MAER).
lar normalization was applied to the final weights of 3. Theil’s coefficient U for weights (UW) and the
the alternatives overall criteria in each problem. The same for ranks CUR).
AHP pairwise comparisons a,, (> 1) were generated 4. Kendall’s correlation Tau for weights (KWC).
by selecting the closest original (Saaty) or geometric 5. Sperman’s correlation for ranks (SRC).
scale value to the ratio c,/ci for two criteria and 6. Weighted rank crossing 1 (WRCI).
rrk/rlk for two alternative ratings under criterion k; 7. Weighted rank crossing 2 (WRC2).
and then filling the symmetric entries using the 8. Top rank matched count (TOP).
reciprocal ratio condition aji = l/a;,. 9. Number of ranks matched, as % of number of
The generated data were also altered subsequently alternatives L (MATCH%).
to simulate rank reversal conditions, when a non-op-
timal new alternative is introduced. This is a primary The reason for looking at measures for both final
criticism of AHP and has created a long and intense weights and ranks is because methods may produce
controversy among researchers (Belton and Gear, different final weights for alternatives, but they can
1984; Saaty, 1984; Saaty, 1990; Dyer, 1990; Harker result in the same or different rank order of alterna-
and Vargas, 1990; Stewart, 1992). This experimenta- tives. Our last four measures capture this rank dis-
tion was applied to each method solution and initial agreement (crossings of rank order), of which mea-
problem, say of L alternatives, as follows: (i) A new sures, two are giving more weight to higher rank
alternative is introduced in the problem by randomly differences:
generating n ratings for each criterion from the
uniform distribution; (ii) the ranks of L + 1 alterna-
tives in the new problem are determined; (iii) if the
WRC= 5 W,R.s*w- R,.blETHv
i=l
5Y <=I

new (L + 1) alternative gets the first rank, it is where


rejected and another alternative is generated as in W,=L+ 1 -i i= 1,2 ,..., L forWRC1
step (ii); (iv) if the new alternative gets any other
W, = l/i i = 1,2,. . ,L for WRC2.
rank, the new rank order of the old alternatives is
determined after removing the new alternative rank. Perfect agreement between a method and SAW would
Thus an original array of ranks and a new array of have zero MSEW, MAEW, MSER, MAER, UW,
ranks are produced for each problem and method. UR, WRCl and WCR2, as well as perfect correla-
These two rank arrays are used in computing the tions KWC = 1 and SRC = 1, and perfect match
rank reversal measures. TOP = 1 and MATCH% = 1.
Two categories of performance measures were Similar measures of rank reversal were computed
employed in our experiment: (1) measures that com- on the rank order of the L alternatives before and
pare each method with the SAW method, in terms of after the introduction of the additional alternative, for
final weights or ranks of alternatives; and (2) rank each method and problem: WRl, WR2, MSER,
reversal measures for each method. In the absence of MAER, and SRC. Additionally, we counted for each
any other objective standard, the solution provided method and problem, the percent of time the top
by SAW was used as the benchmark. The rationale ranked alternative remained the same after the intro-
514 S.H. Zanakis et al. /European Journal qf Operational Research 107 II 998) 507-529

duction of a new nonoptimal alternative (TOP); and measures at the 95% confidence level, except by
the total number or ranks not altered as a percent of distribution type for KWC, SRC, MSER, UR, and
number of alternatives (MATCH%) for that problem. marginally for MAER, WRCl and WRC2. Accord-
Here we would like to clarify that the efficiency ing to the parametric ANOVA, the number of alter-
of a method is not merely a function of the theory natives, number of criteria and method, as well as
supporting it or how rigorous it is mathematically most of their interactions, affect significantly all
speaking. The other aspects which are also very measures of performance. However, the distribution
important, relate to its ease of using it, user under- type and few of its interactions, do not influence
standing and faith in the results, method reliability significantly four performance measures; namely
(consistency) vs. variety. These are important and KWC and UR (as was the case with the nonparamet-
have been tackled by some authors (Buchanan and ric tests), SRC and MSER at the 95% level.
Daellenbach, 1987; Hobbs et al., 1992; Stewart, Table 5 portrays the average performance mea-
1992). Such issues can not be studied in a simulation sure for each method, along with Tukey’s studen-
experiment. tized range test of mean differences. Performance
measures on weights are not given for ELECTRE,
since it only rankorders the alternatives. The four
4. Analysis of experimental results AHP methods produce indistinguishable results on
all measures, and they were always closer to SAW
The simulation results were analyzed using the than the other three methods. The only exception is
SAS package. Each measure of performance was the TOP result for ELECTRE, indicating that it
analyzed via parametric ANOVA and nonparametric matched the top ranked alternative produced by SAW
(Kruskal-Wallis). The results are summarized in 90% of the time, vs. 82% for the AHPs. Any differ-
Tables 1 and 3. The nonparametric tests reveal that ences among the four AHP version results are af-
N, L and distribution type affect all performance fected more by the scale (original vs. geometric) than

Table 1
Summary of ANOVA significance levels for factors and interactions

KWC MATCH% WRCl WRC2 SRC MSER MAER MSEW MAEW UW UR


L 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001
V 0.0019 0.0410 0.0373 - - 0.0607 0.0001 0.0001 0.0001 -
METH 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 O.oool 0.0001 0.0001 0.0001 0.0001
N 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 o.ooo1 0.0001 0.0001 0.0001
L*V 0.0001 0.0001 0.0001 O.cOOl 0.0001 0.0001 0.0001 0.0001 0.0001 - 0.0001
L*METH 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001
N*L 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 o.Oc01 0.0001 0.0001
V * METH 0.0410 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0010 0.0079 0.0001 0.0001
N*V 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0787 0.0577 0.0138 0.0001
N+METH 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001
N*L*V 0.0071 0.0001 0.0058 0.0025 0.0015 0.0998 0.0155 0.0013 0.0004 0.0001 0.0094
N* L*METH 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 - - 0.0001
N*V*MJTH 0.0001 0.0498 - - 0.0503 0.0204 - 0.0329 0.0001 0.0253
L* V*METH - 0.0002 0.0030 - 0.0001 0.0002 - - - -
N*L*V*h4ETH - -

-: Indicates not significant result (P-value > 0.10).


L: Number of alternatives.
N: Number of criteria.
V: Type of distribution = 1 equal weights; 2 uniform; 3 beta U.
MISTH: Method = 1 Simple Additive Weighting (SAW); 2 AHP with original scale using eigenvector; 3 AHP with geometric scafe using
eigenvector; 4 AHP with original scale using mean transformation; 5 AHP with geometric scale using mean transformation; 6 Multiplicative
Exponential Weighting; 7 TOPSIS; 8 ELECTR.
S.H. Zanakis et al. /European Journal qf Operational Research 107 (I 998) 507-529 515

Table 2
Summary of ANOVA significance levels for factors and interactions rank reversal experiment
MATCH% WRCl WRC2 SRC MSER MAER

L 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001


V 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001
METH 0.0001 O.OCQl 0.0001 O.OOQl 0.0001 0.0001
N 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001
L*V 0.0001 0.0753 O.OQOl 0.0001 0.0001 0.0089
L * METH 0.0001 0.0001 O.CQOl 0.0001 0.0001 0.0001
N*L 0.0001 0.0001 0.0030 0.007 1 O.OOfll 0.0001
V * METH 0.0001 0.0001 O.oool 0.0001 0.0001 0.0001
N*V 0.0226 0.0039 0.0185 0.0126 0.0006 0.0041
N*METH O.OOQl 0.0001 0.0001 0.0001 0.0001 0.0001
N*L*V 0.0055 0.0077 0.0796 0.0110 0.0161
N* L*METH 0.0001 0.0001 0.0001 0.0001 0.0001
N* V*METH 0.005 1 0.0261 0.0001 0.0004
L* v*METH 0.0146 0.0001 0.0001 0.0001 0.0001
N* L*V*METH 0.0181 0.0433 0.0001 0.0175

-: Indicates not significant result (P-value > 0. IO).


L: Number of alternatives.
N: Number of criteria.
V: Type of distribution = 1 equal weights; 2 uniform; 3 beta U.
METH: Method = 1 Simple Additive Weighting (SAW); 2 AHP with original scale using eigenvector; 3 AHP with geometric scale using
eigenvector; 4 AHP with original scale using mean transformation; 5 AHP with geometric scale using mean transformation; 6 Multiplicative
Exponential Weighting; 7 TOPSIS; 8 ELECTRE.

by the solution approach (eigenvector vs. mean produces significantly different results from all AHP
transformation). The latter contradicts Zahedi’s versions on all measures. MEW and ELECTRE be-
(1986) study that examined single AHP matrices, have similarly in SRC and MSER, but differ accord-
possibly due to the aggregating effect of looking at ing to MAR, UR, WRCl and WCR2. TOPSIS dif-
criteria and alternatives together. The MAEW for fers from ELECTRE and MEW on all measures; and
each AHP version was only about 0.008, implying agrees with AHP only on SRC and UR (only for
weights of about +0X% away from those of SAW original scale). The rankorder results of all methods
on the average. The most dissimilar method to SAW mostly agree with those of SAW, as indicated by
is ELECTRE followed by MEW, and TOPSIS to a their high correlations (all SRC > 0.80). In light of
lesser extent. More specifically, the MEW method the prior comments, SRC gives a stronger impression

Table 3
Summary of Kruskal-Wallis nonparametric ANOVA significance levels
SRC MSER MAER UR WRCl WRC2 MAEW MSEW UW KWC MATCH%
Alternatives O.oool 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 O.OQOl O.Oc@l 0.0001
Criteria o.ooo3 0.0006 0.0004 0.0004 0.0010 0.0005 0.0001 0.0001 0.0001 0.0002 O.OflOl
Distribution 0.0473 0.0151 0.0177 0.0518 0.0260 0.0464 O.OtlOl O.oool 0.0001 0.1021 0.0234
Method 0.0001 0.0001 0.0001 0.0001 0.0001 O.oool 0.0001 0.0001 0.0001 0.0001 0.0001
516 S.H. Zanakis et al. /European Journal of Operational Research 107 (1998) 507-529

Table 4
Summary of Kruskal-Wallis nontwametric ANOVA significance levels rank reversal exoeriment
SRC MSER MAER WRC 1 WRC2 MATCH%
Alternatives 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001
Criteria 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001
Distribution 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001
Method 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001

of similarity than it actually exists. For the large also more in higher ranked alternatives than lower
sample sizes involved, SRC should be below 0.04 ones. An interesting finding is that although ELEC-
approximately to imply no correlation or above 0.96 TRE matches SAW top rank more often (90%) than
to imply perfect rank agreement, neither of which is the other methods, its match of all SAW ranks
the case here. SRC results sometimes contradicted (MATCH%) is far smaller than any of the other
those of the other rank performance measures. In methods. Many graphs were also drawn to further
those cases we lean towards the latter, since SRC identify parameter value impacts, mean differences
does not consider rank importance, unlike our mea- and important interactions. However, space limita-
sures WRCl and WRC2 (the former giving larger tions prevent showing all of them.
values than the later by design). Comparing SRC to ESfect of number of alternatives (L): As the num-
WRCl or WCR2, one may observe that although ber of alternatives L increases, all methods tend to
TOPSIS and the four AHPs have similar SRC, the produce overall weights closer to SAW’s (especially
higher WRC values imply that TOPSIS differs from TOPSIS). This is reflected in higher correlations
the AHPs more in higher ranked than lower ranked KWC (except for the insensitive method MEW) and
alternatives. Similarly, ELECTRE differs from MEW SRC, higher Theil’s UW (only for AHPs), and lower

0.65 I
2 3 4 5 6 7

Method

Fig. 1. KWC by number of alternatives.


S.H. Zanuki.s et al. / European Journal ~fOperutionu1 Research 107 (1998) 507-529 517

Table 5
Average performance measures by method and Tukey’s test on differences
SRC WRC I WRC2

Methods Mean Tukey Mean Tukey Mean Tukey


AHP, Original, eigen 0.8967 A 0.362 1 D 0.3253 D
AHP, Geometric, eigen 0.8992 A 0.3507 D 0.3142 D
AHP, Original, MTM 0.8969 A 0.3626 D 0.3258 D
AHP, Geometric, MTM 0.8992 A 0.3500 D 0.3138 D
MEW 0.8045 B 0.6278 B 0.5726 B
TOPSIS 0.8921 A 0.4047 C 0.3723 C
ELECTRE 0.8078 B 0.7267 A 0.686 I A

KWC MSEW MAEW

Methods Mean Tukey Mean Tukey Mean Tukey

AHP, Original, eigen 0.8257 A O.OQOl7 B 0.0085 C


AHP, Geometric, eigen 0.8280 A 0.00019 B 0.0087 C
AHP, Original, MTM 0.8257 A 0.ooo17 B 0.0084 C
AHP, Geometric, MTM 0.827 1 A 0.00019 B 0.0087 C
MEW 0.7329 C 0.00074 A 0.0194 A
TOPSIS 0.7764 B o.OOQ77 A 0.0158 B
ELECTRE

MSER MAER uw

Methods Mean Tukey Mean Tukey Mean Tukey

AHP, Original, eigen 0.4972 C 0.3590 D 0.023 C


AHP, Geometric, eigen 0.4784 C 0.348 1 D 0.0236 C
AHP, Original, MTM 0.4974 C 0.3592 D 0.0232 C
AHP, Geometric, MTM 0.4779 C 0.3474 D 0.0235 C
MEW 1.1820 A 0.6376 B 0.0565 A
TOPSIS 0.6747 B 0.4093 C 0.0416 B
ELECTRE 1.2132 A 0.7250 A

UR TOP MATCH%

Methods Mean Tukey Mean Tukey Mean Tukey

AHP, Original, eigen 0.0663 CD 0.8215 B 0.6910 A


AHP, Geometric, eigen 0.0647 D 0.8246 B 0.6966 A
AHP, Original, MTM 0.0663 CD 0.8206 B 0.6908 A
AHP, Geometric, MTM 0.0646 D 0.8254 B 0.6950 A
MEW 0.1055 B 0.7548 C 0.567 1 C
TOPSIS 0.0690 C 0.7549 C 0.6343 B
ELECTRE 0.1168 A 0.9035 A 0.3537 D

Note: The same letter (A, B, C, D) indicates no significant average difference between methods, based on Tukey’s test. Letter order A to D
is from largest to smallest average value.

MSEW and MAEW. However, when the number of clear rank results of MATCH%, WRCl and WRC2,
alternatives is large, rank discrepancies are amplified SRC produces mixed results as L increases; this
(to a lesser extent for TOPSIS), as evident by higher demonstrates further its inability to account for dif-
rank performance measures MAER, MSER, WRCl, ferent rank importance. ELECTRE matched the SAW
WRC2 and to some extent UR. In contrast to the top (all) ranked alternatives more (less) often than
518 S.H. Zanakis et al./ European Journal of Operational Research 107 (1998) 507-529

0.035

0.03

0.025
3
B
P t
p 0.02

I
t
z 0.015

z
P
0.01
t

f
OC
2 3 4 5 6 7

Method

Fig. 2. MAEW by number of alternatives.

any other method, resulting in larger WRCs, regard- Effect of number of criteria (N): Most perfor-
less of the number of alternatives. The change in L mance measures (MAER, MSER, SRC, KWC, UR,
affects each AHP version the same way. See Figs. WRCI, WRC2) for most methods changed slightly

.~_
1-6. with N, but significantly according to ANOVA. This

r-----
i.2 7

+3

L,
+5
-A-7
*9

3 4 5 6 7 8

Method

Fig. 3. MAER by number of alternatives.


S.H. Zanakis et al. / European Journal of Operational Research 107 (1998) 507-529 519

0.1 - I

Fig. 4. TOP by number of alternatives.

is because MEW and the four AHPs are hardly TRE but not TOPSIS) tend to produce different
sensitive to changes in N (no change in KWC and rankings of the alternatives from those of SAW, as
all rank performance measures). As the number of documented by higher MAER, MSER, UR, WRCl,
criteria N increases, the methods (especially ELEC- WRC2 and lower SRC; and to some extent different

0.1

Fig. 5. MATCH% by number of alternatives.


520 S.H. Zanakis et al. /European Journal of Operational Research 107 (1998) 507-529

2 3 4
-~__ 5

Method
6 7 8

Fig. 6. WRCl by number of alternatives

weights of alternatives, as implied by somewhat differently from the other methods, more so in its
smaller KWC. However, differences in the final final rankings than its final weights. TOPSIS rank-
weights for alternatives were larger in problems with ings differ from those of SAW and the AHPs when
fewer criteria, as proven by increased MAEW, N is large (= 20) and, to a lesser extent, when N is
MSEW, UW and lower KWC. TOPSIS behaved small (= 5) where it behaved more like ELECTRE

0 025

P
P 0.02
L
$
0‘
ii 0.015
d
J

I
5 0.01
P

0.005

3 4 5 6 7

Method

Fig. 7. MAEW by number of criteria.


S.H. Zanakis et al. /European Journal qf Operational Research 107 (1998) 507-529 521

09

0.6

0.7

: 0.3
i

0 ~~
2 3 4 5 6 7 a
Method

Fig. 8. MAERE by number of criteria.

and MEW. This is evident by its increased MAER, regardless of the number of criteria. The change in L
MSER, UR, WRCl, WRC2 and reduced TOP, affects each AHP version the same way. See Figs.
MATCH% and SRC. Again, ELECTRE matched the 7-11.
SAW top (all) ranked alternatives more (less) often EfSect of distribution of criteria weights (V): It
than any other method, resulting in larger WRCs, does not affect significantly several weight measures

0.9
,

2 0.5
f t
a
op0.5
L

Fig. 9. TOP by number of criteria.


522 S.H. Zmakis et al/European Journal of Operational Research 107 (1998) 507-529

..-____
1
0.9 i
0.6 $

0.7 --
P

f 0.8 --

d
5
E 0.5 --
8
s
e !

1 0.4 1
0.3

0.2

0.1

0 I__ --_-____~-_-_+
1 2 3 4 5 6 7 8
Method

Fig. 10. MATCH% by number of criteria.

(VW, MAEW, MSEW - except TOPSIS), while the native weight differences between methods. Surpris-
effect is mixed according to rank measures. As ingly, however, final weight dissimilarities between
expected, equal criteria weights (V = 1) reduce alter- methods were higher under the uniform than beta

0.6 7

0.7 i

_ 0.6
e
i
g 0.5
5

i o.4
E

0.3

0.2

0.1
I

Fig. Il. WRC 1 by number of criteria.


S.H. Zunakis er al. / European Journal qf Operational Research 107 (1998) 507-529 523

o--. I
2 3

Fig. 12. MAEW by criterion weight distribution.

distribution. In the case of AHP, the uniform distri- rankings more (least) under the equal constant (uni-
bution differentiates slightly more its final rankings form) distribution. See Figs. 12-15.
and weights from SAW when using the original
scale rather than the geometric scale. TOPSIS final 4.1. Rank reversal results
rankings differ from those of SAW more (least)
under the beta (equal constant) distribution. ELEC- Similar analyses were performed on the rank re-
TRE and MEW methods differentiate their final versa1 experimental results. Here each method results

0 /

2 3 4 5 6 7 8
Method

Fig. 13. MAER by criterion weight distribution.


524 S.H. Zanakis et al. /European Journal of Operational Research 107 (1998) 507-529

Fig. 14. TOP by criterion weight distribution

were compared to its own (not SAW), before and reveal that all factors (number of alternatives, num-
after the introduction of a new (not best) alternative. ber of criteria, distribution and method), and most of
The major findings are summarized in Tables 2, 4 their interactions, are highly significant (Tables 2
and 6. The parametric and non-parametric ANOVAs and 4).

0.9

0.6

07

_ 06
@
'I
I
0' 0.5

2 3 4 5 5 7

Method

Fig. 15. WRCl by criterion weight distribution.


S.H. Zanakis et al. /European Journal of Operational Research 107 (1998) 507-529 525

Table 6
Average performance measures by method and Tukey’s test on differences rank reversal experiment
SRC WRCl WRC2

Methods Mean Tukey Mean Tukey Mean Tukey

SAW I.0 A 0 D 0 D
AHP, Original, eigen 0.9530 C 0.1532 B 0.1361 B
AHP, Geometric, eigen 0.9499 C 0.1595 B 0.1421 B
AHP, Original, MTM 0.9560 C 0.1520 B 0.1351 B
AHP, Geometric, MTM 0.95 1 I C 0.1610 B 0.1446 B
MEW 1.0 A 0 D 0 D
TOPSIS 0.9692 B 0.1116 C 0.097 C
ELECTRE 0.9356 D 0.2138 A 0.1996 A

MSER MAER TOP MATCH%

Methods Mean Tukey Mean Tukey Mean Tukey Mean Tukey

SAW 0 0 1.o 1.o


AHP, Original, eigen 0.1752 0.1522 0.9258 0.8584
AHP, Geometric, eigen 0.1854 0.1581 0.9235 0.8544
AHP, Original, MTM 0.1740 0.1515 0.9258 0.8590
AHP, Geometric, MTM 0.1820 0.1568 0.9165 0.855 1
MEW 0 0 1.0 1.0
TOPSIS 0.1379 0.1104 0.9531 0.9005
ELECTRE 0.3479 0.2347 0.4402 0.7501

Note: The same letter (A, B, C, D) indicates no significant average difference between methods, based on Tukey’s test. Letter order A to D
is from largest to smallest average value.

d5 0.2
L

0’
i
2 0.15

I
4
5
g 01

0.05

0
1 2 3 4 5 6 7 8

Fig. 16. Rank reversal MAER by number of alternatives


526 S.H. Zuzakis et al. /European Journal qf Operational Research 107 (1998) 507-529

0.7

z
5 t /
B 0.6 t
I--
9
5 i !Z:'
L 0.5 i
-A-l
B
5 -*9
.E 0.4
B t
i
03 1

0.2 1

0.1 I
0 k -~--~-~~ ~~ -c- ----_t------~r ~ ~~~~

1 2 3 4 5 6 7 8
Method

Fig. 17. Rank reversal MATCH% by number of alternatives.

As summarized in Table 6, the MEW and SAW reversal performance measures (larger TOP,
methods did not produce any rank reversals, which MATCH%, SRC, and smaller RMSER, RMAER,
was expected. The next best method was TOPSIS, WRCl and WRC2). The rank reversal performance
followed by the four AHPs, according to all rank of each AHP version was statistically not different

03

0.25 L

I
p 0.2 j

+5
+10
d-15;

.-rt20/

1 2 3 4 5 6 7

Fie. 18. Rank reversal MAER bv number of criteria.


SH. Zunakis et al./ European Journal qf Operational Research 107 (1998) 507-529 527

from the other three AHPs. ELECTRE exhibited the ric scale in AHP seems to reduce rank reversals
worst rank reversal performance of all the methods when the number of criteria is small, as documented
in this experiment, and more so in TOP than all by smaller MAER and higher MATCH%. According
ranks (MATCH%). The last finding should be inter- to the SRC criterion, rank reversals for TOPSIS and
preted with caution, since it does not reflect ELEC- the AHPs with original scale are not sensitive to N.
TRE’s versatile capabilities when used directly by a Interestingly enough, TOPSIS exhibits its worst rank
human; it is only indicative of its restrictive ability to reversals when N is small, while ELECTRE does
discriminate among several alternatives, based on the same when N is large. See Fig. 18.
prespecified threshold parameters. Effect of distribution of criteria weights (V) on
Effect of number of alternatives (L) on rank rank reversal: In general, more rank reversals were
reuersal: In general, more rank reversals occur in observed under constant weights, and fewer under
problems with more alternatives. This is evident by uniformly distributed weights. This was negligible
lower MATCH% and higher MAER, WRCl and for TOPSIS, but most profound on ELECTRE. See
WRC2 Among AHPs. That increase was a little Fig. 19.
faster for the AHP with original scale and MTM
solution. The MTM AHP has a slight advantage over
5. Conclusion and recommendations
the eigenvector AHP when there are not many alter-
natives. Reversals of the top rank occur more often This simulation experiment evaluated eight
in problems with more alternatives for the AHPs, but MADM methods (including four variants of AHP)
fewer alternatives for ELECTRE. TOPSIS top rank under different number of alternatives CL), criteria
reversals seem to be insensitive to L. See Figs. 16 (N) and distributions. The final results are affected
and 17. by these three factors in that order. In general, as the
Effect of number of criteria (N) on rank reversal: number of alternatives increases, the methods tend to
The number of rank reversals was influenced less by produce similar final weights, but dissimilar rank-
the number of criteria than by the number of alterna- ings, and more rank reversals (fewer top rank rever-
tives. For all AHP versions, rank reversals for top sals for ELECTRE). The number of criteria had little
(all) ranks remained at about 9% (14%) of L, regard- effect on AHPs, MEW and ELECTRE. TOPSIS
less of the number of criteria. However, the geomet- rankings differ from those of SAW more when N is

V
2 3 4 5 6 7 a
Method

Fig. 19. Rank reversal MAER by criterion weight distribution.


528 S.H. Zunakis et al. /European Journal qf Operational Research 107 (1998) 507-529

large, when it also exhibits its fewest rank reversals. lated beyond the type of MADM problem considered
ELECTRE produces more rank reversals in problems in this study; namely a decision matrix input of N
with many criteria. criteria weights and explicit ratings of L alternatives
The distribution of criteria weights affects fewer on each criterion. Therefore, method variations capa-
performance measures than does the number of alter- ble of handling different problems were not consid-
natives or the number of criteria. However, it affects ered in this simulation. This ‘standardization’ ham-
differently the methods examined. Equal criterion pers ELECTRE more than any of the other methods.
weights reduces final weight differences between It unavoidably did not consider the variety of fea-
methods, it differentiates further the rankings pro- tures of the many versions of this method developed
duced by ELECTRE and MEW, and produces more to handle different problem types. It did not take
rank reversals than the other distributions. Surpris- advantage of the method’s capabilities in handling
ingly, however, final weight dissimilarities between problems with ordinal or imprecise information. Even
methods were higher under the uniform than beta in the form used here, ELECTRE may produce
distribution, while the latter produced the fewest different results for different thresholds of concor-
rank reversals. A uniform distribution of criteria dance and discordance indexes (which of course
weights differentiates more the AHP final rankings leaves open the question on which index should the
from SAW when using the original scale rather than user select). Finally, any MADM method cannot be
the geometric scale. Finally, a beta distribution of considered as a tool for discovering an ‘objective
criterion weights affects more TOPSIS, whose final truth’. Such models should function within a DSS
rankings differ even more from those of SAW. context to aid the user to learn more about the
In general, all AHP versions behave similarly and problem and solutions to reach the ultimate decision.
closer to SAW than the other methods. ELECTRE is Such insight-gaining methods are better termed deci-
the least similar to SAW (except for best matching sion aids rather than decision making. MADM meth-
the top-ranked alternative), followed by the MEW ods should not be considered as single-pass tech-
method. TOPSIS behaves closer to AHP and differ- niques, without a posteriori robustness analysis. A
ently from ELECTRE and MEW, except for prob- sensitivity (robustness) analysis is essential for any
lems with few criteria. In terms of rank reversals, the MADM method, but this is clearly beyond the scope
four AHP versions were uniformly worse than TOP- of this simulation experiment.
SIS, but more robust than ELECTRE.
The detailed findings of this simulation study can
provide useful insights to researchers and practition- References
ers of MADM. A user’s interest in evaluating alter-
natives may be in one or more of the final output, Belton, V., 1986. A comparison of the analytic hierarchy process
namely their weights, ranking or rank reversals. This and a simple multi-attribute value function. European Journal
of Operational Research 26, 7-2 I
experiment reveals when a user’s results are likely to
Belton, V., Gear, T., 1984. The legitimacy of rank reversal - A
be practically the same, regardless of the subset of comment. Omega 13, 143-144.
methods employed; or when and by how much the Buchanan, J.T., Daellenbach, H.G., 1987. A comparative evalua-
solutions may differ, thus guiding a user in selecting tion of interactive solution methods for multiple objective
an appropriate method. SAW was selected as the decision models. European Journal of Operational Research
29, 353-359.
basis to which to compare the other methods, be-
Churchman, C.W., Ackoff, R.L., Amoff, E.L., 1957. Introduction
cause its simplicity makes it used often by practition- to Operations Research. Wiley, New York.
ers. Even some researchers argue that SAW should Currim, I.S., Satin, R.K., 1984. A comparative evaluation of
be the standard for comparisons, because “it gives multiattribute consumer preference models. Management Sci-
the most acceptable results for the majority of ence 30, 543-561.
Denpontin, M., Mascarola, H., Spronk, J., 1983. A user oriented
single-dimensional problems” (Triantaphyllou and
listing of MCDM. Revue Beige de Researche Operationelle
Mann, 1989). 23, 3-11.
Some caution, however, must be used when con- Dyer, J., 1990. Remarks on the analytic hierarchy process. Man-
sidering our findings. They should not be extrapo- agement Science 36, 249-258.
S.H. Zanakis er ul. / Europeun Journd of Operutwnul Research IO? (I 998) 507-529 529

Dyer. J., Fishbum, P., Steuer, R., Wallenius, J., Zionts, S., 1992. Olson, D.L., Moshkovich, H.M., Schellenberger, R., Mechitov,
Multiple criteria decision making, multiattribute utility theory: A.]., 1995. Consistency and accuracy in decision aids: Experi-
The next ten years. Management Science 38, 645-654. ments with four multiattribute systems. Decision Sciences 26,
Gemunden, H.G., Hauschildt, J., 1985. Number of alternatives 723-748.
and efficiency in different types of top-management decisions. Ozemoy, V.M., 1987. A framework for choosing the most appro-
European Journal of Operational Research 22, 178- 190. priate discrete alternative MCDM in decision support and
Gershon, M.E., Duckstein, L., 1983. Multiobjective approaches to expert systems. In: Savaragi, Y., et al. (Eds.), Toward Interac-
river basin planning. Journal of Water Resource Planning 109, tive and Intelligent Decision Support Systems. Springer-Verlag,
13-28. Heildelberg, pp. 56-64.
Goicoechea, A., Stakhiv, E.Z., Li, F., 1992. Experimental evalua- Ozemoy, V.M., 1992. Choosing the ‘best’ multiple criteria deci-
tion of multiple criteria decision making models for applica- sion-making method. INFOR 30, I59- I7 I
tion to water resources planning. Water Resources Bulletin 28, Pomerol, J., 1993. Multicriteria DSS: State of the art and prob-
89- 102. lems Central European Journal for Operations Research and
Gomes, L.F.A.M., 1989. Comparing two methods for multicrite,ria Economics 2, 197-212.
ranking of urban transportation system alternatives. Journal of Roy, B., Bouyssou, D., 1986. Comparison of two decision-aid
Advanced Transportation 23, 217-219. models applied to a nuclear power plant siting example.
Harker, P.T., Vargas, L.G., 1990. Reply to “Remarks on the European Journal of Operational Research 25, 200-215.
analytic hierarchy process” by J.S. Dyer. Management Sci- Saaty, T.L., 1984. The legitimacy of rank reversal. OMEGA 12,
ence 36, 269-273. 513-516.
Hobbs, B.F., 1986. What can we learn from experiments in Saaty, T.L., 1990. An exposition of the AHP in reply to the paper
multiobjective decision analysis. IEEE Transactions on Sys- remarks on the analytic hierarchy process. Management Sci-
tems Management and Cybernetics 16, 384-394. ence 36, 259-268.
Hobbs, B.J., Chankong, V., Hamadeh, W., Stakhiv, E., 1992. Schoemaker, P.J., Waid, CC., 1982. An experimental comparison
Does choice of multicriteria method matter? An experiment in of different approaches to determining weights in additive
water resource planning. Water Resources Research 28, 1767- utility models. Management Science 28, I82- 196.
1779. Stewart, T.J., 1992. A critical survey on the status of multiple
Hwang, C.L. Yoon, K.L., 198 1. Multiple Attribute Decision Mak- criteria decision making theory and practice. OMEGA 20,
ing: Methods and Applications. Springer-Verlag, New York. 569-586.
Jelassi, M.T.J., Ozemoy, V.M., 1988. A framework for building Stillwell, W., Winterfeldt, D., John, R., 1987. Comparing hierar-
an expert system for MCDM models selection. In: Lockett, chical and nonhierarchical weighting methods for eliciting
A.G., Islei, G. (Eds.), Improving Decision Making in Organ- multiattribute value models. Management Science 33, 442-
zations. Springer-Verlag, New York, pp. 553-562. 450.
Karni, R., Sanchez, P., Tummala, V., 1990. A comparative study Takeda, E., Cogger, K.O., Yu, P.L., 1987. Estimating criterion
of multiattribute decision making methodologies. Theory and weights using eigenvectors: A comparative study. European
Decision 29, 203-222. Journal of Operational Research 29, 360-369.
Kok, M., 1986. The interface with decision makers and some Timmermans, D., Vlek, C., Handrickx, L., 1989. An experimental
experimental results in interactive multiple objective program- study of the effectiveness of computer-programmed decision
ming methods. European Journal of Operational Research 26, support. In: Locket& A.G., Islei, G. (Eds.), Improving Deci-
96- 107. sion Making in Organizations. Springer-Verlag, Heidelberg,
Kok, M., Lootsma, F.A., 1985. Pairwise-comparison methods in pp. 13-23.
multiple objective programming, with applications in a long- Triantaphyllou, E., Mann, S.H., 1989. An examination of the
term energy-planning model. European Journal of Operational effectiveness of multi-dimensional decision-making methods:
Research 22, 44-55. A decision-making paradox. Decision Support Systems 5,
Lockett, G., Stratford, M., 1987. Ranking of research projects: 303-312.
Experiments with two methods. Omega 15, 395-400. Voogd, H., 1983. Multicriteria Evaluation for Urban and Regional
Legrady, K., Lootsma, F.A., Meisner, J., Schellemans, F., 1984. Planning. Pion, London.
Multicriteria decision analysis to aid budget allocation, In: Zahedi, F., 1986. A simulation study of estimation methods in the
Grower, M., Wierzbicki, A.P., (Ed%), Interactive Decision analytic hierarchy process. Socio-Economic Planning Sciences
Analysis. Springer-Verlag, pp. 164-174. 20, 347-354.
Lootsma, F.A., 1990. The French and American school in multi- Zanakis, S., Mandakovic, T., Gupta, S., Sahay, S., Hong, S.,
criteria decision analysis. Recherche Operationelle 24, 263- 1995. A review of program evaluation and fund allocation
285. methods within the service and government sectors. Socio-
MacCrimmon, K.R., 1973. An overview of multiple objective Economic Planning Sciences 29, 59-79.
decision making. In: Co&ran, J.L., Zeleny, M. (Eds.), Multi-
ple Criteria Decision Making. University of South Carolina
Press, Columbia.

You might also like