You are on page 1of 277


a critical perspective
a critical perspective

Denis Bouyssou

Thierry Marchant
Ghent University

Marc Pirlot
SMRO, Faculte Polytechnique de Mons

Patrice Perny
LIP6, Universite Paris VI

Alexis Tsoukias
LAMSADE - CNRS, Universite Paris Dauphine

Philippe Vincke
SMG - ISRO, Universite Libre de Bruxelles



1 Introduction 1
1.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Who are the authors ? . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.7 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Choosing on the basis of several opinions 7

2.1 Analysis of some voting systems . . . . . . . . . . . . . . . . . . . 8
2.1.1 Uninominal election . . . . . . . . . . . . . . . . . . . . . . 9
2.1.2 Election by rankings . . . . . . . . . . . . . . . . . . . . . . 13
2.1.3 Some theoretical results . . . . . . . . . . . . . . . . . . . . 16
2.2 Modelling the preferences of a voter . . . . . . . . . . . . . . . . . 18
2.2.1 Rankings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.2 Fuzzy relations . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.3 Other models . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 The voting process . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.1 Definition of the set of candidates . . . . . . . . . . . . . . 23
2.3.2 Definition of the set of the voters . . . . . . . . . . . . . . . 24
2.3.3 Choice of the aggregation method . . . . . . . . . . . . . . 24
2.4 Social choice and multiple criteria decision
support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.1 Analogies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3 Building and aggregating evaluations 29

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.2 Evaluating students in Universities . . . . . . . . . . . . . . 30
3.2 Grading students in a given course . . . . . . . . . . . . . . . . . . 31
3.2.1 What is a grade? . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.2 The grading process . . . . . . . . . . . . . . . . . . . . . . 31
3.2.3 Interpreting grades . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.4 Why use grades? . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3 Aggregating grades . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3.1 Rules for aggregating grades . . . . . . . . . . . . . . . . . 41


3.3.2 Aggregating grades using a weighted average . . . . . . . . 42

3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4 Constructing measures 53
4.1 The human development index . . . . . . . . . . . . . . . . . . . . 54
4.1.1 Scale Normalisation . . . . . . . . . . . . . . . . . . . . . . 56
4.1.2 Compensation . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.1.3 Dimension independence . . . . . . . . . . . . . . . . . . . . 58
4.1.4 Scale construction . . . . . . . . . . . . . . . . . . . . . . . 59
4.1.5 Statistical aspects . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 Air quality index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2.1 Monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2.2 Non compensation . . . . . . . . . . . . . . . . . . . . . . . 62
4.2.3 Meaningfulness . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3 The decathlon score . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3.1 Role of the decathlon score . . . . . . . . . . . . . . . . . . 65
4.4 Indicators and multiple criteria decision
support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5 Assessing competing projects 71

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 The principles of CBA . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2.1 Choosing between investment projects in private firms . . . 73
5.2.2 From Corporate Finance to CBA . . . . . . . . . . . . . . . 75
5.2.3 Theoretical foundations . . . . . . . . . . . . . . . . . . . . 76
5.3 Some examples in transportation studies . . . . . . . . . . . . . . . 79
5.3.1 Prevision of traffic . . . . . . . . . . . . . . . . . . . . . . . 80
5.3.2 Time gains . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.3.3 Security gains . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.3.4 Other effects and remarks . . . . . . . . . . . . . . . . . . . 82
5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6 Comparing on several attributes 87

6.1 Thierrys choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.1.1 Description of the case . . . . . . . . . . . . . . . . . . . . 88
6.1.2 Reasoning with preferences . . . . . . . . . . . . . . . . . . 91
6.2 The weighted sum . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.2.1 Transforming the evaluations . . . . . . . . . . . . . . . . . 98
6.2.2 Using the weighted sum on the case . . . . . . . . . . . . . 99
6.2.3 Is the resulting ranking reliable? . . . . . . . . . . . . . . . 99
6.2.4 The difficulties of a proper usage of the weighted sum . . . 101
6.2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.3 The additive value model . . . . . . . . . . . . . . . . . . . . . . . 106
6.3.1 Direct methods for determining single-attribute
value functions . . . . . . . . . . . . . . . . . . . . . . . . . 107

6.3.2 AHP and Saatys eigenvalue method . . . . . . . . . . . . . 111

6.3.3 An indirect method for assessing single-attribute value func-
tions and trade-offs . . . . . . . . . . . . . . . . . . . . . . 117
6.3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.4 Outranking methods . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.4.1 Condorcet-like procedures in decision analysis . . . . . . . . 124
6.4.2 A simple outranking method . . . . . . . . . . . . . . . . . 129
6.4.3 Using ELECTRE I on the case . . . . . . . . . . . . . . . . 131
6.4.4 Main features and problems of elementary outranking ap-
proaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.4.5 Advanced outranking methods: from thresholding towards
valued relations . . . . . . . . . . . . . . . . . . . . . . . . 141
6.5 General conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

7 Deciding automatically 147

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.2 A System with Explicit Decision Rules . . . . . . . . . . . . . . . . 149
7.2.1 Designing a decision system for automatic watering . . . . . 150
7.2.2 Linking symbolic and numerical representations . . . . . . . 150
7.2.3 Interpreting input labels as scalars . . . . . . . . . . . . . . 153
7.2.4 Interpreting input labels as intervals . . . . . . . . . . . . . 156
7.2.5 Interpreting input labels as fuzzy intervals . . . . . . . . . . 161
7.2.6 Interpreting output labels as (fuzzy) intervals . . . . . . . . 164
7.3 A System with Implicit Decision Rules . . . . . . . . . . . . . . . . 170
7.3.1 Controlling the quality of biscuits during baking . . . . . . 170
7.3.2 Automatising human decisions by learning from examples . 171
7.4 An hybrid approach for automatic decision-making . . . . . . . . . 174
7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

8 Dealing with uncertainty 179

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
8.2 The context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
8.3 The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
8.3.1 The set of actions . . . . . . . . . . . . . . . . . . . . . . . 180
8.3.2 The set of criteria . . . . . . . . . . . . . . . . . . . . . . . 181
8.3.3 Uncertainties and scenarios . . . . . . . . . . . . . . . . . . 182
8.3.4 The temporal dimension . . . . . . . . . . . . . . . . . . . . 184
8.3.5 Summary of the model . . . . . . . . . . . . . . . . . . . . . 186
8.4 A didactic example . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
8.4.1 The expected value approach . . . . . . . . . . . . . . . . . 187
8.4.2 Some comments on the previous approach . . . . . . . . . . 187
8.4.3 The expected utility approach . . . . . . . . . . . . . . . . . 189
8.4.4 Some comments on the expected utility approach . . . . . . 191
8.4.5 The approach applied in this case: first step . . . . . . . . . 193
8.4.6 Comment on the first step . . . . . . . . . . . . . . . . . . . 196
8.4.7 The approach applied in this case: second step . . . . . . . 198

8.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

9 Supporting decisions 205

9.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
9.2 The Decision Process . . . . . . . . . . . . . . . . . . . . . . . . . . 207
9.3 Decision Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
9.3.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . 211
9.3.2 The Evaluation Model . . . . . . . . . . . . . . . . . . . . . 213
9.3.3 The final recommendation . . . . . . . . . . . . . . . . . . . 219
9.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

10 Conclusion 237
10.1 Formal methods are all around us . . . . . . . . . . . . . . . . . . . 237
10.2 What have we learned? . . . . . . . . . . . . . . . . . . . . . . . . 239
10.3 What can be expected? . . . . . . . . . . . . . . . . . . . . . . . . 243

Bibliography 247

Index 262

1.1 Motivations
Deciding is a very complex and difficult task. Some people even argue that our abil-
ity to make decisions in complex situations is the main feature that distinguishes
us from animals (it is also common to say that laughing is the main difference).
Nevertheless, when the task is too complex or the interests at stake are too impor-
tant, it quite often happens that we do not know or we are not sure what to decide
and, in many instances, we resort to a decision support technique: an informal
onewe toss a coin, we ask an oracle, we visit an astrologer, we consult an expert,
we thinkor a formal one. Although informal decision support techniques can be
of interest, in this book, we will focus on formal ones. Among the latter, we find
some well-known decision support techniques: cost-benefit analysis, multiple crite-
ria decision analysis, decision trees, . . . But there are many other ones, sometimes
not presented as decision support techniques, that help making decisions. Let us
cite but a few examples.
When the director of a school must decide whether a given student will pass
or fail, he usually asks each teacher to assess the merits of the student by
means of a grade. The director then sums the grades and compares the result
to a threshold.
When a bank must decide whether a given client will obtain a credit or not,
a technique, called credit scoring, is often used.
When the mayor of a city decides to temporarily forbid car traffic in a city
because of air pollution, he probably takes the value of some indicators, e.g.
the air quality index, into account.
Groups or committees must also make decisions. In order to do so, they
often use voting procedures.
All these formal techniques are what we call (formal) decision and evaluation
models, i.e. a set of explicit and well-defined rules to collect, assess and process
information in order to be able to make recommendations in decision and/or eval-
uation processes. They are so widespread that almost no one can pretend he is


not using or suffering the consequences of one of them. These modelsprobably

because of their formal characterinspire respect and trust: they look scientific.
But are they really well founded ? Do they perform as well as we want ? Can we
safely rely on them when we have to make important decisions ?
That is why we try to look at formal decision and evaluation models with a
critical eye in this book. You guessed it: this book is more than 200 pages long.
So, there is probably a lot of criticism. You are right.
None of the evaluation and decision models that we examined are perfect or
the best. They all suffer limitations. For each one, we can find situations in which
it will perform very poorly. This is not really new: most decision models have
had contenders for a long time. Do we want to contend all models at the same
time ? Definitely not ! Our conviction is that there cannot be a best decision or
evaluation modelthis has been proved in some contexts (e.g. in voting) and seems
empirically correct in other contextsbut we are convinced as well that formal
evaluation and decision models are useful in many circumstances and here is why:

Formal models provide explicit and, to a large extent, unambiguous represen-

tations of a given problem; they offer a common language for communicating
about the problem. They are therefore particularly well suited for facilitating
communication among the actors of a decision or evaluation process.

Formal models require that the decision maker makes a substantial effort to
structure his perception or representation of the problem. This effort can
only be beneficial as it forces the decision maker to think harder and deeper
about his problem.

Once a formal model has been established, a battery of formal techniques

(often implemented on a computer) become available for drawing any kind
of conclusion that can be drawn from the model. For example, hundreds of
what-if questions can be answered in a flash. This can be of great help if we
want to devise robust recommendations.

For all these reasons (complexity, usefulness, importance of the interests at

stake, popularity) plus the fact that formal models lend themselves easily to criti-
cism, we think that it is important to deepen our understanding of evaluation and
decision models and encourage their users to think more thoroughly about them.
Our aim with this book is to foster reflection and critical thinking among all
individuals utilising decision and evaluation models, whether it be for research or

1.2 Audience
Most of us are confronted with formal evaluation and decision models. Very often,
we use them without even thinking about it. This book is intended for the aware
or enlightened practitioner, for anyone who uses decision or evaluation modelsfor
research or for applicationsand is willing to question his practice, to have a deeper
understanding of what he does. We have tried to keep mathematics and formalism

at a very low level so that, hopefully, most of the material will be accessible to the
not mathematically-inclined readers. A rich bibliography will allow the interested
reader to locate the more technical literature easily.

1.3 Structure
There are so many decision and evaluation models that it would be impossible to
deal with all of them within a single book. As will become apparent later, most of
them rely on similar kinds of principles. We decided to present seven examples of
such models. These examples, chosen in a wide variety of domains, will hopefully
allow the reader to grasp these principles. Each example is presented in a chapter
(Chapters 2 to 8), almost independent of the other chapters. Each of these seven
chapters ends with a conclusion, placing what has been discussed in a broader
context and indicating links with other chapters. Chapter 9 is somewhat different
from the seven previous ones: it does not focus on a decision model but presents a
real world application. The aim of this chapter is to emphasise the importance of
the decision aiding process (the context of the problem, the position of the actors
and their interactions, the role of the analyst, . . . ), to show that many difficulties
arise there as well and that a coherence between the decision aiding process and
the formal model is necessary.
Some examples have been chosen because they correspond to decision models
that everyone has experienced and can understand easily (student grades and
voting). We chose some models because they are not often perceived as decision
or evaluation models (student grades, indicators and rule based control). The other
examples (cost-benefit analysis, multiple criteria decision support and choice under
uncertainty) correspond to well identified and popular evaluation and decision

1.4 Outline
Chapter 2 is devoted to the problem of voting. After showing the analogy between
voting and multiple criteria decision support, we present a sequence of twelve
short examples, each one illustrating a problem that arises with a particular voting
method. We begin with simple methods based on pairwise comparisons and we
end up with the Borda method. Although the goal of this book is not to overwhelm
the reader with theory, we informally present two theorems (Arrow and Gibbard-
Satterthwaite) that in one way or another explain why we encountered so many
difficulties in our twelve examples.
Then we turn to the way voters preferences are modelled. We present many
different models, each one trying to outdo the previous one but suffering its own
weaknesses. Finally, we explore some issues that are often neglected: who is going
to vote? Who are the candidates? These questions are difficult and we show that
they are important. The construction of the set of voters and the set of candidates,
as well as the choice of a voting method must be considered as part of the voting

After examining voting, we turn in Chapter 3 to another very familiar topic for
the reader: students marks or grades. Marks are used for different purposes (e.g.
ranking the students, deciding whether a student is allowed to begin the next level
of study, deciding whether a student gets a degree, . . . ). Students are assessed in
a huge variety of ways in different countries and schools. This seems to indicate
that assessing students might not be trivial. We use this familiar topic to discuss
operations such as evaluating a performance and aggregating evaluations.
In Chapter 4, three particular indicators are considered: the Human Devel-
opment Index (used by the United Nations), the ATMO index (an air pollution
indicator used by the French government) and the decathlon score. We present
a few examples illustrating some problems occurring with indicators. We assert
that some difficulties are the consequences of the fact that the role of an indicator
is often manifold and not well defined. An indicator is a measure but, often, it is
also a tool for controlling or managing (in a broad sense).
Cost-benefit analysis (CBA) is a decision aiding method that is extremely
popular among economists. Following the CBA approach, a project should only
be undertaken when its benefits outweigh its costs. First we present the principles
of CBA and its theoretical foundations. Then, using an example in transportation
studies, we illustrate some difficulties encountered with CBA. Finally, we clarify
some of the hypotheses at the heart of CBA and criticise the relevance of these
hypotheses in some decision aiding processes.
In Chapter 6, using a well documented example, we present some difficulties
that arise when one wants to choose from or rank a set of alternatives considered
from different viewpoints. We examine several aggregation methods that lead to
a value function on the set of alternatives, namely the weighted sum, the sum of
utilities (direct and indirect assessment) and AHP (the Analytic Hierarchy Pro-
cess). Then we turn to the so called outranking methods. Some of these methods
can be used even when the data are not very rich or precise. The price we pay
for this is that results provided by these methods are not rich either, in the sense
that conclusions that can be drawn regarding a decision are not clear-cut.
Chapter 7 is dedicated to the study of automatic decision systems. These
systems concern the execution of repetitive decision tasks and the great majority
of them are based on more or less explicit decision rules aimed towards reflecting
the usual decision policy of humans. The goal of this section is to show the interest
of some formal tools (e.g. fuzzy sets) to model decision rules but also to clarify
some problems arising when simulating the rules. Three examples are presented:
the first one concerns the control of an automatic watering system while the others
are about the control of a food process. The first two examples describe decision
systems based on explicit decision rules; the third one addresses the case of implicit
decision rules.
The goal of Chapter 8 is to raise some questions about the modelling of un-
certainty. We present a real-life problem concerning the planning of electricity
production. This problem is characterised by many different uncertainties: for
example, the price of oil or the electricity demand in 20 years time. This prob-
lem is classically described by using a decision tree and solved with an expected
utility approach. After recalling some well known criticisms directed against this

approach, we present the approach that has been used by the team that solved
this problem. Some of the drawbacks of this approach are discussed as well. The
relevance of probabilities is criticised and other modelling tools, such as belief
functions, fuzzy set theory and possibility theory, are briefly mentioned.
Convinced that there is more to decision aiding than just number crunching,
we devote the last chapter to the description of a real world decision aiding process
that took place in a large Italian company a few years ago. It concerns the eval-
uation of offers following a call for tenders for a GIS (Geographical Information
System) acquisition. Some important elements such as the participating actors,
the problem formulation, the construction of the criteria, etc. deserve greater con-
sideration. One should ideally never consider these elements separately from the
aggregation process because they can impact the whole decision process and even
the way the aggregation procedure behaves.

1.5 Who are the authors ?

The authors of this book are European academics working in six different universi-
ties, in France and in Belgium. They teach in engineering, business, mathematics,
computer science and psychology schools. Their background is quite varied as
well: mathematics, economics, engineering, law and geology but they are all ac-
tive in decision support and more particularly in multiple criteria decision support.
Among their special interests are preference modelling, fuzzy logic, aggregation
techniques, social choice theory, artificial intelligence, problem structuring, mea-
surement theory, operations research, . . . Besides their interest in multiple criteria
decision support, they share a common view on this field. Five of the six authors
of the present volume presented their thoughts on the past and the objectives of
future research in multiple criteria decision support in the Manifesto of the new
MCDA era (Bouyssou, Perny, Pirlot, Tsoukias and Vincke 1993).
The authors are very active in theoretical research on the foundations of de-
cision aiding, mainly from an axiomatic point of view, but have been involved
in a variety of applications ranging from software evaluation to location of a nu-
clear repository, through the rehabilitation of a sewer network or the location of
high-voltage lines.
In spite of the large number of co-authors, this book is not a collection of
papers. It is a joint work.

1.6 Conventions
To refer to a decision maker, a voter or an individual whose sex is not determined,
we decided not to use the politically correct he/she but just he in order to
make the text easy to read. The fact that all of the authors are male has nothing
to do with this choice. The same applies for his/her.
None of the authors is a native English speaker. Therefore, even if we did
our best to write in correct English, the reader should not be surprised to find

some mistakes or inelegant expressions. We beg the readers leniency for any
incorrectness that might remain.
The adopted spelling is the British and not the American one.

1.7 Acknowledgements
We are ggreatly indebted to our collEague
///////// friend Philippe Fortemps \cite{Fortemps99}
Without him and his knowledge of Late-
x, this book would look like this paragraph.%\newline

The authors also wish to thank J.-L. Ottinger, who contributed to Chapter
8, H. Melot, who laid out the complex diagrams of that chapter, and Stefano
Abruzzini, who gave us a number of references concerning indicators. Chapter 6
is based on a report by Sebastien Clement written to fulfil the requirements of a
course on multiple criteria decision support. Large part of chapter 9 uses material
already published in (Paschetta and Tsoukias 1999).
A special thank goes to Marjorie and Diane Gassner who had the patience to
read and correct our continental approximation of the English language and to
Francois Glineur who helped in solving a great number of latex problems.
We thank Gary Folven from Kluwer Academic Publisher for his constant sup-
port during the preparation of this manuscript.

Voting is easy! Youve voted hundreds of times in committees, in presidential

elections, for the senate, . . . Is there much to say about voting ? Well, just think
about the way heads of state or members of parliament are elected in Australia,
France, the UK, . . .

United Kingdoms members of parliament The territory of the UK is di-

vided into about 650 constituencies. One representative is elected in each
constituency. Each voter chooses one of the candidates in his constituency.
The winner is the candidate that is chosen by more voters than any other
one. Note that the winner does not have to win an overall majority of votes.

Frances members of parliament As in the UK, the French territory is divided

into single-seat constituencies. In a constituency, each voter chooses one of
the candidates. If one candidate receives more than 50 % of the votes, he
is elected. Otherwise a second stage is organised. During the second stage,
all candidates that were chosen by more than 12.5 % of the registered voters
may compete. Once more, each voter chooses one of the candidates. The
winner is the candidate that received the most votes.

Frances president Each voter chooses one of the candidates. If one candidate
has been chosen by more than 50 % of the voters, he is elected. Otherwise
a second stage is organised. During the second stage, only two candidates
remain: those with the highest scores. Once again, each voter chooses one of
the candidates. The winner is the candidate that has been chosen by more
voters than the other one.

Australias members of parliament The territory is divided into single-seat

constituencies called divisions. In a division, each voter is asked to rank all
candidates: he puts a 1 next to his preferred candidate, a 2 next to his second
preferred candidate, then a 3, and so on until his least preferred candidate.
Then the ballot papers are sorted according to the first preference votes. If a
candidate has more than 50 % of the ballot papers, he is elected. Otherwise,
the candidate that received fewer papers than any other is eliminated and
the corresponding ballot papers are transferred to the candidates that got


a 2 on these papers. Once more, if a candidate has more than 50 % of the

ballot papers, he is elected. Otherwise, the candidate that received fewer
papers than any other is eliminated and the corresponding ballot papers are
transferred to the candidates that got a 3 on these papers, etc. In the worst
case, this process ends when all but two candidates are eliminated, because,
unless they are tied, one of the candidates necessarily has more than 50 %
of the papers. Note that, as far as we know, it seems that the case of a tie
is seldom considered in electoral laws.

Canadas members of parliament and prime minister Every five years, the
Canadian parliament is elected as follows. The territory is divided into about
270 constituencies called counties. In each county, each party can present
one candidate. Each voter chooses one candidate. The winner in a county is
the candidate that is chosen by more voters than any other one. He is thus
the countys representative in the parliament. The leader of the party that
has the most representatives becomes prime minister.

Those interested in voting methods and the way they are applied in various
countries will find valuable information in Farrell (1997) and Nurmi (1987). The
diversity of the methods applied in practice probably reflects some underlying
complexity and, in fact, if you take a closer look at voting, you will be amazed
by the incredible complexity of the subject. In spite of its apparent simplicity,
thousands of papers have been devoted to the problem of voting (Kelly 1991) and
our guess is that many more are to come.
Our aim in this chapter is, on the one hand, to show that many difficult and
interesting problems arise in voting and, on the other hand, to convince the reader
that a formal study of voting might be enlightening. This chapter is organised
as follows. In Section 1, we make the following basic assumption: each voters
preferences can accurately be represented by a ranking of all candidates from best
to worse, without ties. Then we show some problems occurring when aggregating
the rankings, using classical voting systems such as those applied in France or the
United Kingdom. We do this through the use of small and classical examples. In
Section 2, we consider other preference models than the linear ranking of Section
1. Some models are poorer in information but more realistic. Some are richer and
less realistic. In most cases, the aggregation remains a difficult task. In Section
3, we change the focus and try to examine voting in a much broader context.
Voting is not instantaneous. It is not just counting the votes and performing
some mathematical operation to find the winner. It is a process that begins when
somebody decides that a vote should occur (or even earlier) and ends when the
winner begins his mandate (or even later). In Section 4, we discuss the analogy
with multiple criteria decision support. The chapter ends with a conclusion.

2.1 Analysis of some voting systems

From now on, we will distinguish between the electionthe process by which the
voters express their preferences about a set of candidatesand the aggregation

methodthe process used to extract the best candidate or a ranking of the can-
didates from the result of the election. In many cases, the election is uninominal,
i.e. each voter votes for one candidate only

2.1.1 Uninominal election

Let us recall the assumption that we mentioned earlier and that will hold through-
out Section 1. Each voter, consciously or not, ranks all candidates from best to
worse, without ties and, when voting, each voter sincerely (or naively) reports his
preferences. Thus, in a uninominal election, we shall assume that each voter votes
for the candidate that he ranks in first position. For example, suppose that a voter
prefers candidate a to b and b to c (in short aP bP c). He votes for a. We are now
ready to present a first example that illustrates a difficulty in voting.

Example 1. Dictature of majority

Let {a, b, c, . . . , y, z} be a set of 26 candidates for a 100 voters election. Suppose

51 voters have preferences aP bP cP . . . P yP z

and 49 voters have preferences zP bP cP . . . P yP a.
It is clear that 51 voters will vote for a while 49 vote for z. Thus a has an
absolute majority and, in all uninominal systems we are aware of, a wins. But
is a really a good candidate ? Almost half of the voters perceive a as the worst
one. And candidate b seems to be a good candidate for everyone. Candidate b
could be a good compromise. As shown by this example, a uninominal election
combined with the majority rule allows a dictatorship of majority and doesnt
favour a compromise. A possible way to avoid this problem might be to ask the
voters to provide their whole ranking instead of their preferred candidate. This
will be discussed later. Let us continue with some strange problems arising when
using a uninominal election.

Example 2. Respect of majority in the British system

The voting system in the United Kingdom is plurality voting, i.e. the election is
uninominal and the aggregation method is simple majority. Let {a, b, c} be the
set of candidates for a 21 voters election. Suppose that

10 voters have preferences aP bP c,

6 voters have preferences bP cP a
and 5 voters have preferences cP bP a.

Then a (resp. b and c) obtains 10 votes (resp. 6 and 5). Thus a is chosen.
Nevertheless, this might be different from what a majority of voters wanted. In-
deed, an absolute majority of voters prefers any other candidate to a (11 out of
21 voters prefer b and c to a).

Let us see, using the same example, if such a problem would be avoided by the
two-stage French system. After the first stage, as no candidate has an absolute
majority, a second stage is run between candidates a and b. We suppose that the
voters keep the same preferences on {a, b, c}. Thus a obtains 10 votes and b, 11
votes so that candidate b is elected. This time, none of the beaten candidates (a
and c) are preferred to b by a majority of voters. Nonetheless we cannot conclude
that the two-stage French system is superior to the British system from this point
of view, as shown by the following example.

Example 3. Respect of majority in the two-stage French system

Let {a, b, c, d} be the set of candidates for a 21 voters election. Suppose that

10 voters have preferences bP aP cP d,

6 voters have preferences cP aP dP b
and 5 voters have preferences aP dP bP c.
After the first stage, as no candidate has absolute majority, a second stage is
run between candidates b and c. Candidate b easily wins with 15 out of 21 votes
though an absolute majority (11/21) of voters prefer a and d to b. Because it
is not necessary to be a mathematician to figure out such problems, some voters
might be tempted not to sincerely report their preferences as shown in the next

Example 4. Manipulation in the two-stage French system

Let us continue with the example used above. Suppose that the six voters having
preferences cP aP dP b decide not to be sincere and vote for a instead of c. Then
candidate a wins after the first stage because there is an absolute majority for
him (11/21). If they had been sincere (as in the previous example), b would have
been elected. Thus, casting a non sincere vote is useful for those 6 voters as they
prefer a to b. Such a system, that may encourage voters to falsely report their
preferences, is called manipulable. This is not the only weakness of the French
system as attested by the three following examples.

Example 5. Monotonicity in the two-stage French system

Let {a, b, c} be the set of candidates for a 17 voters election. A few days before
the election, the results of a survey are as follows:

6 voters have preferences aP bP c,

5 voters have preferences cP aP b,
4 voters have preferences bP cP a
and 2 voters have preferences bP aP c.
With the French system, a second stage would be run, between a and b and
a would be chosen obtaining 11 out of 17 votes. Suppose that candidate a, in
order to increase his lead over b and to lessen the likelihood of a defeat, decides to
strengthen his electoral campaign against b. Suppose that the survey did exactly

reveal the preferences of the voters and that the campaign has the right effect on
the last two voters. Hence we observe the following preferences.

8 voters have preferences aP bP c,

5 voters have preferences cP aP b
and 4 voters have preferences bP cP a.
After the first stage, b is eliminated, due to the campaign of a. The second
stage opposes a to c and c wins, obtaining 9 votes. Candidate a thought that
his campaign would be beneficial. He was wrong. Such a method is called non
monotonic because an improvement of a candidates position in some of the voters
preferences can lead to a deterioration of his position after the aggregation. It is
clear with such a system that it is not always interesting or efficient to sincerely re-
port ones preferences. You will note in the next example that some manipulations
can be very simple.

Example 6. Participation in the two-stage French system

Let {a, b, c} be the set of candidates for a 11 voters election. Suppose that

4 voters have preferences aP bP c,

4 voters have preferences cP bP a
and 3 voters have preferences bP cP a.
Using the French system, a second stage should oppose a to c and c should win
the election obtaining 7 out of 11 votes. Suppose that 2 of the 4 first voters (with
preferences aP bP c) decide not to vote because c, the worst candidate according
to them, is going to win anyway. What will happen ? There will be only 9 voters.

2 voters have preferences aP bP c,

4 voters have preferences cP bP a
and 3 voters have preferences bP cP a.
Contrary to all expectations, candidate c will loose while b will win, obtaining
5 out of 9 votes. Our two lazy voters can be proud of their abstention since they
prefer b to c. Clearly such a method does not encourage participation.

Example 7. Separability in the two-stage French system

Let {a, b, c} be the set of candidates for a 26 voters election. The voters are located
in two different areas: countryside and town. Suppose that the 13 voters located
in the town have the following preferences.

4 voters have preferences aP bP c,

3 voters have preferences bP aP c,
3 voters have preferences cP aP b
and 3 voters have preferences cP bP a.
Suppose that the 13 voters located in the countryside have the following pref-

4 voters have preferences aP bP c,

3 voters have preferences cP aP b,
3 voters have preferences bP cP a
and 3 voters have preferences bP aP c.
Suppose now that an election is organised in the town, with 13 voters. Candi-
dates a and c will go to the second stage and a will be chosen, obtaining 7 votes.
If an election is organised in the countryside, a will defeat b in the second stage,
obtaining 7 votes. Thus a is the winner in both areas. Naturally we expect a to
be the winner in a global election. But it is easy to observe that in the global
election (26 voters) a is defeated during the first stage. Such a method is called
non separable.
The previous examples showed that, when there are more than 2 candidates, it
is not an easy task to imagine a system that would behave as expected. Note that,
in the presence of 2 candidates, the British system (uninominal and one-stage) is
equivalent to all other systems and it suffers none of the above mentioned problems
(May 1952). Thus we might be tempted by a generalisation of the British system
(restricted to 2 candidates). If there are two candidates, we use the British system;
if there are more than two candidates, we arbitrarily choose two of them and we use
the British system to select one. The winner is opposed (using the British system)
to a new arbitrarily chosen candidate. And so on until no more candidates remain.
This would require n 1 votes between 2 candidates. Unfortunately, this method
suffers severe drawbacks.

Example 8. Influence of the agenda in sequential voting

Let {a, b, c} be the set of candidates for a 3 voters election. Suppose that

1 voter has preferences aP bP c,

1 voter has preferences bP cP a
and 1 voter has preferences cP aP b.
The 3 candidates will be considered two by two in the following order or agenda:
a and b first, then c. During the first vote, a is opposed to b and a wins with
absolute majority (2 votes against 1). Then a is opposed to c and c defeats a with
absolute majority. Thus c is elected.
If the agenda is a and c first, it is easy to see that c defeats a and is then
opposed to b. Hence, b wins against c and is elected.
If the agenda is b and c first, it is easy to see that, finally, a is elected. Conse-
quently, in this example, any candidate can be elected and the outcome depends
completely on the agenda, i.e. on an arbitrary decision. Let us note that sequential
voting is very common in different parliaments. The different amendments to a
bill are considered one by one in a predefined sequence. The first one is opposed to
the status quo, using the British system; the second one is opposed to the winner ,
and so on. Clearly, such a method lacks neutrality. It doesnt treat all candidates
in a symmetric way. Candidates (or amendments) appearing at the end of the
agenda are more likely to be elected than those at the beginning.

Example 9. Violation of unanimity in sequential voting

Let {a, b, c, d} be the set of candidates for a 3 voters election. Suppose that

1 voter has preferences bP aP dP c,

1 voter has preferences cP bP aP d
and 1 voter has preferences aP dP cP b.
Consider the following agenda: a and b first, then c and finally d. Candidate a
is defeated by b during the first vote. Candidate c wins the second vote and d is
finally elected though all voters unanimously prefer a to d. Let us remark that
this cannot happen with the French and British systems.
Up to now, we have assumed that the voters are able to rank all candidates
from best to worse without ties but the only information that we collected was the
best candidate. Why not try to palliate the many encountered problems by asking
voters to explicitly rank the candidates ? This idea, though interesting, will lead
us to many other pitfalls that we discuss just below.

2.1.2 Election by rankings

In this kind of election, each voter provides a ranking without ties of the candidates.
Hence the task of the aggregation method is to extract from all these rankings the
best candidate or a ranking of the candidates reflecting the preferences of the
voters as much as possible.
At the end of the 18th century, two aggregation methods for election by rank-
ings appeared in France. One was proposed by Borda, the other by Condorcet.
Although other methods have been proposed, their methods are still at the heart
of many scientists concerns. In fact, many methods are variants of the Borda and
Condorcet methods.

The Condorcet method

Condorcet (1785) suggests to compare all candidates pairwise in the following way.
A candidate a is preferred to b if and only if the number of voters ranking a before
b is larger than the number of voters ranking b before a. In case of tie, candidates
a and b are indifferent. A candidate that is preferred to all other candidates is
called a (Condorcet) winner. In other words, a winner is a candidate that, opposed
to each of the n 1 other candidates, wins by a majority. It can be shown that
there is never more than one Condorcet winner.
Note that both the British as well as the two-stage French methods are different
from the Condorcet method. In example 2, candidate a is elected by the British
method but b is the Condorcet winner. In example 3, a is the Condorcet winner
although b is chosen by the French method.
Although the principle underlying the Condorcet methodthe candidate that
beats all other candidates in a pairwise contest is the winnerseems very natural,
close to the concept of democracy and hence very appealing, it is worth noting
that, in some instances, this principle might be questioned: in example 1, a is the

Condorcet winner, although almost half of the voters consider him to be the worse
candidate. Consider also example 10 taken from Fishburn (1977).

Example 10. Critique of the majority principle

Let {a, b, c, d, e, f, g, x, y} be a set of 9 candidates for a 101 voters election. Suppose

19 voters have preferences yP aP bP cP dP eP f P gP x,

21 voters have preferences eP f P gP xP yP aP bP cP d,
10 voters have preferences eP xP yP aP bP cP dP f P g,
10 voters have preferences f P xP yP aP bP cP dP eP g,
10 voters have preferences gP xP yP aP bP cP dP eP f
and 31 voters have preferences yP aP bP cP dP xP eP f P g.
Candidate x wins against every other candidate with a majority of 51 votes.
Thus x is the Condorcet winner. But let us focus on the candidates x and y.
Let us summarise their results in Table 2.1. In view of Table 2.1, it seems that y
should be elected.

1 2 3 4 5 6 7 8 9
x 0 30 0 21 0 31 0 0 19
y 50 0 30 0 21 0 0 0 0

Table 2.1: Number of voters who rank the candidate in k-th place in their prefer-

Furthermore, there are cases (called Condorcet paradoxes) where there is no

Condorcet winner. Consider example 8: a is preferred to b, b is preferred to c
and c is preferred to a. No candidate is preferred to all others. In such a case,
the Condorcet method fails to elect a candidate. One might think that example
8 is very bizarre and very unlikely to happen. Unfortunately it isnt. If you
consider an election with 25 voters and 11 candidates, the probability of such a
paradox is significantly high as it is approximately 1/2 (Gehrlein 1983) and the
more candidates or voters, the higher the probability of such a paradox. Note
that, in order to obtain this result, all rankings are supposed to have the same
probability. Such an hypothesis is clearly questionable (Gehrlein 1983).
Many methods have been designed that elect the Condorcet winner, if he exists,
and choose a candidate in any case (Fishburn 1977, Nurmi 1987).

The Borda method

Borda (1781) proposed to use the following aggregation method. In each voters
preference, each candidate has a rank: 1 for the first candidate in the ranking, 2
for the second, . . . and n for the last. Compute the Borda score of each candidate,
i.e. the sum for all voters of that candidates rank. Then choose the candidate
with lowest Borda score.

Note that there can be several such candidates. In these cases, the Borda
method does not tell us which one to choose. They are considered as equivalent.
But the likelihood of indifference is rather small and decreases as the number of
candidates or voters increases. For example, for 3 candidates and 2 voters, the
probability of all candidates being tied is 1/3; for 3 candidates and 50 voters, it is
less than 1 %. Note that once again, we supposed that all rankings have the same
Note that the Borda method not only allows to choose one candidate but to
rank them (by increasing Borda scores). If two candidates have the same Borda
score, then they are indifferent.

Example 11. Comparison of the Borda and Condorcet methods

Let {a, b, c, d} be the set of candidates for a 3 voters election. Suppose that

2 voters have preferences bP aP cP d

and 1 voter has preferences aP cP dP b.
The Borda score of a is 5 = 22+11. For b, it is 6 = 21+14. Candidates
c and d receive 8 and 11. Thus a is the winner. Using the Condorcet method, the
conclusion is different: b is the Condorcet winner. Thus, when a Condorcet winner
exists, it is not always chosen by the Borda method. Nevertheless, it can be shown
that the Borda method never chooses a Condorcet looser, i.e. a candidate that is
beaten by all other candidates by an absolute majority (contrary to the British
system, see Example 2).
Suppose now that candidates c and d decide not to compete because they
are almost sure to lose. With the Borda method, the new winner is b. Thus b
now defeats a just because c and d dropped out. Thus the fact that a defeats
or is defeated by b depends upon the presence of other candidates. This can be
a problem as the set of the candidates is not always fixed. It can vary because
candidates withdraw, because feasible solutions become infeasible or the converse,
because new solutions emerge during discussions, . . .
With the Condorcet method, b remains the winner and it can be shown that
this is always the case: if a candidate is a Condorcet winner, then he is still a
Condorcet winner after the elimination of some candidates.

Example 12. Borda and the independence of irrelevant alternatives

Let {a, b, c} be the set of candidates for a 2 voters election. Suppose that

1 voter has preferences aP cP b

and 1 voter has preferences bP aP c.
The alternative with the lowest Borda score is a. Now consider a new election
where the alternatives and voters are identical but they changed their preferences
about c. Suppose that

1 voter has preferences aP bP c

and 1 voter has preferences bP cP a.

It turns out that b has the lowest Borda score. However, none of the two
voters changed their opinion about the pair {a, b}. The first (resp. second) voter
prefers a (resp. b) in both cases. Only the relative position of c changed and
this was enough to turn b into a winner and a into a looser. This can be seen
as a shortcoming of the Borda method. One says that the Borda method does
not satisfy the independence of irrelevant alternatives. It can be shown that the
Condorcet method satisfies this property.

2.1.3 Some theoretical results

We could go on and on with examples showing, that any method you can think of
suffers severe problems. But we think it is time to stop for at least two reasons.
First, it is not very constructive and, second, each example is related to a particular
method; hence this approach lacks generality. A more general (and thus theoretic)
approach is needed. We should find a way to answer questions like

Do non manipulable methods exist ?

Is it possible for a non separable method to satisfy unanimity ?


In another book, in preparation, we will follow such a general approach but, in

the present volume, we try to present various problems arising in evaluation and
decision models in an informal way and to show the need for formal methods.
Nevertheless, we cannot resist to the desire to present now, in an informal way,
some of the most famous results of social choice theory.

Arrows theorem
Arrow (1963) was interested by the aggregation of rankings with ties into a ranking,
possibly with ties. We will call this ranking the overall ranking. He examined the
methods verifying the following properties.

Universal domain. This property implies that the aggregation method must be
applicable to all cases. Whatever the rankings provided by the voters, the
method must yield an overall ranking of the candidates. This property rules
out methods that would impose some restrictions on the preferences of the

Transitivity. The result of the aggregation must always be a ranking, possibly

with ties. This implies that, if aP b and bP c in the overall ranking, then
aP c in the overall ranking. Example 8 showed that the Condorcet method
doesnt verify transitivity: a is preferred to b, b is preferred to c and c is
preferred to a.

Unanimity. If all voters are unanimous about a pair of candidates, e.g. if all voters
rank a before b, then a must be ranked before b in the overall preference.
This seems quite reasonable but example 9 showed that some commonly used

aggregation methods fail to respect unanimity. This property is often called

Pareto condition.

Independence. The relative position of two candidates in the overall ranking de-
pends only on their relative positions in the individuals preferences. There-
fore other alternatives are considered as irrelevant with respect to that pair.
Note that we observed in example 12 that the Borda method violates the
independence property. This property is often called Independence of irrel-
evant alternatives.

Non-dictatorship. None of the voters can systematically impose his preferences

on the other ones. This rules out aggregation methods such that the overall
ranking is always identical to the preference ranking of a given voter. This
may be seen as a minimal requirement for a democratic method.

These five conditions allow to state Arrows celebrated theorem.

Theorem 2.1 (Arrow) When the number of candidates is at least 3, there ex-
ists no aggregation method satisfying simultaneously the properties of universal
domain, transitivity, unanimity, independence and non-dictatorship.

To a large extent, this theorem explains why we encountered so many diffi-

culties when trying to find a satisfying aggregation method. For example, let us
observe that the Borda method satisfies the universal domain, transitivity, una-
nimity and non-dictatorship properties. Therefore, as a consequence of theorem
2.1, we can deduce that it cannot satisfy the independence condition. What about
the Condorcet method ? It satisfies the universal domain, unanimity, independence
and non-dictatorship properties. Hence it cannot verify transitivity (see example
8). Note that Arrows theorem uses only five conditions that, in addition, are
quite weak (at least at first glance). Yet, the result is powerful. If, in addition to
these five conditions, we wish to find a method satisfying neutrality, separability,
monotonicity, non-manipulability, . . . we face an even more puzzling problem.

Gibbard-Satterthwaites theorem
Gibbard (Gibbard 1973) and Satterthwaite (Satterthwaite 1975) were very inter-
ested by the (non-)manipulability of aggregation methods, especially those leading
to the election of a unique candidate. Informally, a method is non-manipulable if,
in no case, a voter can improve the result of the election by not reporting his true
preferences. They proved the following result.

Theorem 2.2 (Gibbard-Satterthwaite) When the number of candidates is larger

than two, there exists no aggregation method satisfying simultaneously the proper-
ties of universal domain, non-manipulability and non-dictatorship.

Example 4 concerning the two-stage French system can be revisited bearing

in mind theorem 2.2. The French system satisfies universal domain and non-
dictatorship. Therefore, it is not surprising that it is manipulable.

Many other impossibility results can be found in the literature. But this is not
the place to review them. Besides impossibility results, many characterisations are
available. A characterisation of a given aggregation method is a set of properties
simultaneously satisfied by only that method. These results help to understand
the fundamental principles of a method and to compare different methods.
At the beginning of this chapter, we decided to focus on elections of a unique
candidate. Some voting systems lead to the election of several candidates and
aim towards achieving a kind of proportional representation. One might think
that those systems are the solution to our problems. In fact, they are not. Those
systems raise as many questions (perhaps more) as the ones we considered (Balinski
and Young 1982). Furthermore, suppose that a parliament has been elected, using
proportional representation. This parliament will have to vote on many different
issues and, very often, only one candidate or law or project will have to be chosen.

2.2 Modelling the preferences of a voter

Let us consider the assumption that we made in Section 1: the preferences of each
voter can accurately be represented by a ranking of all candidates from best to
worse, without ties. We all know that this is not always realistic. For example, in
some instances, there are several candidates that a voter cannot rank, just because
he considers them as equivalent. Those candidates are tied. There are many other
reasons to question our assumption. In some cases, a voter is not able to rank
the candidates; in others, he is able to rank them but another kind of modeling of
his preferences would be more accurate. In this section, we list different cases in
which our initial assumption is not valid.

2.2.1 Rankings
To model the preferences of a voter, we can use a ranking without ties. This model
corresponds to the assumption of Section 1. This implies that when you present a
pair of candidates (a, b) to a voter, he is always able to tell if he prefers a to b or
the converse. Furthermore, if he prefers a to b and b to c, he necessarily prefers a
to c (transitivity of preference).

Indifference: rankings with ties

In some cases, a voter is unable to state if he prefers a to b or the converse because
he thinks that both candidates are of equal value. He is indifferent between a and
b. Thus, we need to model his preferences by a ranking with ties. For each pair
of candidates (a, b), we have a is preferred to b, the converse or a is indifferent
to b (which is equivalent to b is indifferent to a). Preference still is transitive.
Suppose that a voter prefers a to b, c and d, he is indifferent between b and c and,
finally, he prefers a, b and c to d. We can model his preferences by a ranking with
ties. A graphic representation of this model is given in Fig. 2.1 where an arrow
between two candidates (e.g. a and b) means that a is preferred to b and a line
between them means that a is indifferent to b. Note that, in a ranking with ties,

a d

Figure 2.1: A complete pre-order. Arrows implied by transitivity are not repre-

indifference also is transitive. If a voter is indifferent between a and b and between

b and c, he is also indifferent between a and c.

Incomparability: partial rankings

It can also occur that a voter is unable to rank the candidates, not because he
thinks that some of them are equivalent but because he cannot compare some of
them. There can be several reasons for this.
Poor information Suppose that a voter must compare two candidates a and b
about which he knows almost nothing, except that their names are a and b
and that they are candidates. Such a voter cannot declare that he prefers a
to b nor the converse. If he is forced to express his preferences by means of a
ranking with ties, he will probably rank a and b tied rather than ranking one
above the other. But this would not really reflect his preferences because he
has no reasons to consider that they are equivalent. It is very likely that one
is better than the other but, as he doesnt know which one, he is better off
not stating any preferences about them.
Conflicting information Suppose that a voter has to compare two candidates a
and b about which he knows a lot. He might be embarrassed when asked to
tell which candidate he prefers because, in some respects, a is far better than
b but, in other respects, b is far better than a. And he does not know how
to balance the pros and cons or he does not want to do so for the moment.
Confidential information Suppose that your mother invited you and your wife
for dinner. At the end of the meal, your mother says I have never eaten
such a good pie! Does NameOfYourWife prepare it as well as I do ? No
matter what your preference is, you would probably be very embarrassed to
answer. And your answer is very likely to be Well, it is difficult to say.
In fact they are different. I like both but I cannot compare them. Such
situations are very common in real life where people do not tell the truth,
all the truth and nothing but the truth about their preferences.
Of course, this list is not exhaustive. We therefore need to introduce a new model
in which voters are allowed to express incomparabilities. Hence, when comparing
two candidates a and b, four situations can arise:
1. a is preferred to b,

2. b is preferred to a,

3. a is indifferent to b or

4. a and b are incomparable.

If we keep the transitivity of preference (and indifference), the structure we

obtain is called a partial ranking.

Example 13. Transitivity and coffee: semiorders

Consider a voter who is indifferent between a and b as well as between b and c.

If we use a ranking with ties to model his preferences, he is necessarily indifferent
between a and c, because of the transitivity of indifference. Is this what we want ?
We are going to borrow a small example from Luce (1956) to show that transitivity
of indifference should be dropped, at least in some cases. Let us suppose that I
present two cups of coffee to a voter: one cup without sugar, the other one with
one grain of sugar. Let us also suppose that he likes his coffee with sugar. If I ask
him which cup he prefers, he will tell me that he is indifferent (because he is not
able to detect one grain of sugar). He equally dislikes both. I will then present him
a cup with one grain and another with two. He will still be indifferent. Next, two
grains and three grains, and so on until nine hundred ninety nine and one thousand
grains. The voter will always be indifferent between the two cups that I present
to him because they differ by just one grain of sugar. Because of the transitivity
of indifference, he must also be indifferent between a cup without sugar and a cup
with one thousand grains (2 full spoons). But of course, if I ask him which one
he prefers, he will choose the cup with one thousand grains. Thus transitivity of
indifference is violated. A possible objection to this is that the voter will be tired
before he reaches the cup with one thousand grains. Furthermorethis is more
seriousthe coffee will be cold and he hates that.
There is a structure that keeps transitivity of preference and drops it for in-
difference. Consequently, it can model the preferences of our coffee drinker. It is
called semiorder. For details about semiorders, see Pirlot and Vincke (1997).

Example 14. Transitivity and poneys: more semiorders

Do we need semiorders only when a voter cannot distinguish between two very
similar objects ? The following example, adapted from (Armstrong 1939) will give
the answer. Suppose that you ask your child to choose between two presents for
his birthday: a poney and a blue bicycle. As he likes both of them equally, he will
say he is indifferent. Suppose now that you present him a third candidate: a red
bicycle with a small bell. He will probably tell you that he prefers the red one to
the blue one. So, you prefer the red bicycle to the poney, is that right ? you
would say if you consider a transitive indifference. However, it is obvious that the
child can still be indifferent between the poney and the red bicycle.


red bike

blue bike

Figure 2.2: The poney vs bicycles semiorder

Other binary relations

Rankings with or without ties, partial rankings and semiorders are all binary
relations. Many other families of binary relations have been considered in the
literature in order to formally model the preferences of individuals as faithfully as
possible (e.g. Roubens and Vincke 1985, Abbas, Pirlot and Vincke 1996). Note
that even the transitivity of strict preference can be questioned due to empirical
observations (e.g. Fishburn 1988, Fishburn 1991, Tversky 1969, Sen 1997). Let us
now focus on another kind of mathematical structure used to model the preferences
of a voter.

2.2.2 Fuzzy relations

Fuzzy relations can be used to model preferences in at least two very different

Fuzzy relations and uncertainty

When a voter is asked to express his preferences by means of a binary relation, he
has to examine each pair and choose a is preferred to b, b is preferred to a,
a is indifferent to b or a and b are incomparable (if indifference and incom-
parability are allowed). In fact, reality is more subtle. When facing a question
like do you prefer a to b, a voter might hesitate. It is easy to imagine situations
where a voter would like to say perhaps. And it is just a step further to imag-
ine different situations where a voter would hesitate but with various degrees of
confidence: almost yes but not completely sure, perhaps but more on the side of
yes, perhaps, perhaps but more on the side of no, . . . There can be many reasons
for his hesitations.

He does not have full knowledge about the candidates. For example, in a
legislative election, a voter does not necessarily know what the position of
all candidates is regarding a particular issue.

He does have full knowledge about the candidates but not about some events
that might occur in the future and affect the way he compares the candi-
dates. For example, again in a legislative election, a voter might ideally know
everything about all candidates. But he does not know if, during the forth-
coming mandate, the representatives will have to vote on a particular issue.
If such a vote is to occur, a voter might prefer candidate a to candidate b.

In the other case, he might prefer b to a because there is just one thing that
he disapproves of the policy of b: his position about that particular issue.
He does not fully know his preferences. Suppose that the community in
which you live has decided to build a new recreational facility. There are
two options: a tennis court or a playground. You have to vote. You perfectly
know the two options (budget, time to completion, plan, . . . ). You like tennis
and your children would love that playground. You will have access to both
facilities under the same conditions. Can you tell which one you will choose ?
What will you enjoy more ? To play tennis or to let your children play in
the playground ?
These three cases can be seen as three facets of a single problem. The voter is
uncertain about the final consequences of his choice.
Fuzzy relations can be used to model such preferences. The voter must still
answer the above mentioned question (do you prefer a to b ?), but by numbers,
no longer by yes or no. If he feels that a is preferred to b is definitely true, he
answers 1. If he feels that a is preferred to b is definitely false, he answers 0. For
intermediate situations, he chooses intermediate numbers. For example, perhaps
could be 0.5 and almost yes, 0.9. A typical fuzzy relation on three candidates is
illustrated by Fig. 2.3 where a number on the arrow between two candidates (e.g.
a and b) is the answer of the voter to the question is a preferred to b.

0.6 b
a 0.8


1.0 c

Figure 2.3: A fuzzy relation

Note that, in some cases, a probability distribution on the possible conse-

quences is assumed to exist. In such cases, the problem faced by the voter is no
longer uncertainty but risk. In these cases, probabilities of preference might be
assigned to each pair.

Fuzzy relations and preference intensity

In some cases, when a voter is asked to tell if he prefers a to b, he will tend to
express faint differences in his judgement, not because he is uncertain about his
judgement, but because the concept of preference is vague and not well defined.
For example, a voter might say I definitely prefer a to b but not as much as I
prefer c to d. This is due to the fact that preference is not a clear-cut concept.
We might then model his preferences by a fuzzy relation and choose 0.5 for (a, b)
and 0.8 for (c, d). A value of 0 would correspond to no preference.

Note that in many cases, uncertainty and vagueness are probably simultane-
ously present. For a thorough review of fuzzy preference modelling, see (Perny
and Roubens 1998).

2.2.3 Other models

Many other models can be conceived or have been described in the literature. An
important one is the utilitarian one: a voter assigns to each candidate a number
(the utility of the candidate). The position of a candidate with respect to any
other candidate is a function only of the utilities of the two candidates. If the
utilities of a and b are respectively are 50 and 40, the implication is that a is
preferred to b. In addition, if the utilities of c and d are respectively 30 and 10,
it implies that the preference between c and d is twice as large as the preference
between a and b.
Another important model is used in approval voting (Brams and Fishburn
1982). In this voting system, every voter votes for as many candidates as he wants
or approves. Consequently, the preferences of a voter are modelled by a partition
of the set of candidates into two subsets: a subset of approved candidates and
a subset of disapproved candidates. Approval voting received a lot of attention
during the last twenty years and has been adopted by a number of committees.
We will not continue our list of preference models any further. Our aim was
just to give a small overview of the many problems that can arise when trying
to model the preferences of a voter. But there is an important issue that we still
must address. We encountered many problems in Section 2.1. In this section, we
were using complete orders to model voters the preferences. We then examined
alternative models. Is it easier to aggregate individual preferences modelled by
means of complete pre-orders, semiorders, fuzzy relations, . . . ? Unfortunately,
the answer is no. Many examples, similar to those in Section 1, can be built to
demonstrate this (Sen 1986, Salles, Barrett and Pattanaik 1992).

2.3 The voting process

Until now, we considered only modelling the preferences of a voter and aggregating
the preferences of several voters. But voting is much more than that. Here are a
few points that are included in the voting process, even if they are often left aside
in the literature.

2.3.1 Definition of the set of candidates

Who is going to define the candidates or alternatives that will be submitted to a
vote ? All the voters, some of them or one of them ? In some cases, e.g. presidential
elections, the candidates are voters that become candidates on a voluntary basis.
Nevertheless, there are often some rules: not everyone can be a candidate. Who
should fix these rules and how ? There is an even more fundamental question:
who should decide that voting should occur, on what issue, according to which

rules ? All these questions received different answers in different countries and
committees. This may indicate that they are far from trivial.
Let us now be more pragmatic. The board of directors of a company asks the
executive committee to prepare a report on the future investment strategies. A
vote on the proposed strategies will be held during the next board of directors
meeting. How should the executive committee prepare its report ? Should they
include all strategies, even infeasible ones ? If infeasible ones are to be avoided,
who should decide that they are infeasible. To find all feasible strategies might
be prohibitively resource and time consuming. And one can never be sure that
all feasible strategies have been explored. There is no systematic way, no formal
method to do that. Creativity and imagination are needed during this process.
Finally, suppose that the executive committee decides to explore only some
strategies. A more or less arbitrary selection needs to be made. Even if they do
make this selection in a perfectly honest way, it can have far reaching consequences
on the outcome of the process. Remember example 11 in which we showed that,
for some aggregation methods, the relative ranking of two candidates depends on
the presence (or absence) of some other candidates. Furthermore, some studies
show that an individual can prefer a to b or b to a depending on the presence or
absence of some other candidate (Sen 1997).

2.3.2 Definition of the set of the voters

Who is going to vote ? As in the previous subsection, let us look at different

democracies, past or present. Citizens, rich people, noble people, men, men and
women, everyone, white men, experts who have some knowledge about the dis-
cussed problem, one representative for each faction, a number of representatives
proportional to the size of that faction, . . . There is no universal answer.

2.3.3 Choice of the aggregation method

Even the choice of the aggregation method can be considered as part of the voting
process for, in some cases, the aggregation method is at least as important as the
result of the vote. Consider two countries, A and B: A is ruled by a dictator,
B is a democracy. Suppose that each time a policy is chosen by voting in B,
the dictator of A applies the same policy in his country, without voting. Hence,
all governmental decisions are the same in A and B. The only difference is that
the people in A do not vote; their benevolent dictator decides alone. In what
country would you prefer to live ? I guess you would choose B, unless you are
the dictator. And you would probably choose B even if the decisions taken in
B were a little bit worse than the decisions taken in A. What we value in B is
freedom of choice. Some references or more details on this topic can be found in
(Sen 1997, Suzumura 1999).

2.4 Social choice and multiple criteria decision

2.4.1 Analogies
There is an interesting analogy between voting and multiple criteria decision sup-
port. Replace criteria by voters, alternatives by candidates and you get it. Let
us be more explicit. In multiple criteria decision support, most papers consider
an entity, called decision-maker, that wants to choose an alternative from a set of
available alternatives. The decision-maker is often assumed to be an individual,
a person. To make his choice, the decision maker takes several viewpoints called
criteria into account. These criteria are often conflicting, i.e. according to a cri-
terion, a given alternative is the best one while, according to another criterion,
other alternatives are better.
In a large part of the literature on voting, there is an entity called group
or society that has to choose a candidate from a set of candidates. This entity
consists of individuals and, for some reasons, that can vary largely in different
groups, the choice made by this entity must reflect in some way the opinion of
the individuals. And, of course, the individuals often have conflicting views about
the candidates. In other words, the preferences of an individual play the same
role, in social choice, as the preferences along a single viewpoint or criterion in
multiple criteria decision support. The collective or social preferences, in social
choice theory, and the global or multiple criteria preferences, in multiple criteria
decision support, can be compared in the same way.
The main interest of this analogy lies in the fact that voting has been studied
for a long time. The seminal works by Borda (1781), Condorcet (1785), and Arrow
(1963) have led to an important stream of research in the 20th century. Hence we
have a huge amount of results on voting at our disposal for use in multiple criteria
decision support. Besides, this similarity has widely been used (see e.g. Arrow and
Raynaud 1986, Vansnick 1986).
In this chapter, we only discussed elections in which only one candidate must
be chosen (single-seat constituencies, prime ministers or presidents). However, it
is often the case that several candidates must be chosen. For example, in Belgium
and Germany, in each constituency, several representatives are elected so as to
achieve a proportional representation. A committee that must select projects from
a list often selects several ones, according to the available resources. In multiple
criteria decision support, such cases are common. An investor usually invests in a
portfolio of stocks. A human resources manager chooses amongst the candidates
those that will form an efficient team, etc.
In fact, the comparison can be extended to the processes of voting and decision-
making. In multiple criteria decision support, the decision process is much broader
than just the extraction, by some aggregation method, of the best alternative from
a performance tableau.
The very beginning of the process, the problem definition, is a crucial step.
When a decision maker enters a decision process, he has no clearly defined problem.
He just feels unsatisfied with his current situation. He then tries to structure his

view of the situation, to put labels on different entities, to look for relationships
between entities, etc. Finally he obtains a problem , as one can find in books.
It is a description, in formal language or not, of the current situation. It usually
contains a description of the reasons for which that situation is not satisfying and
it contains an implicit description of the potential solutions to the problem. That
is, the problem statement contains information that allows to recognise if a given
action or course of actions is a potential solution or not. The problem statement
must not be too broad, otherwise anything can be a solution and the decision-
maker is not helped. On the contrary, if the statement is too narrow, some actions
are not recognised as potential solutions even if they would be good ones.
Some authors, mainly in the United Kingdom, have developed methods to help
decision-makers to better structure their problem (Rosenhead 1989, Daellenbach
When the problem has been stated, the decision-maker has a problem, but no
solution. He must construct the set of alternatives, like the candidates set in social
choice. Brainstorming and other techniques promoting and stimulating creativity
have been developed to support this step.
The criteria, like the voters, are not given in a decision process. The decision-
maker needs to identify all the viewpoints that are relevant with respect to his
problem. He then must define a set of criteria that reflect all relevant viewpoints
and that fulfills some conditions. There must not be several criteria reflecting
the same viewpoint. All criteria should be independent except if the aggregation
method to be used thereafter allows dependence between criteria. Depending on
the aggregation method, the scales corresponding to the criteria must have some
properties. And so on. See e.g. Roy (1996) and Keeney and Raiffa (1976).
Last but not least, the aggregation method itself must be chosen by the analyst
and/or the decision-maker. It is hard to imagine how an aggregation procedure
could be scientifically proven to be the best one. The decision-maker must thus
make a choice. He should choose the one that satisfies some properties he judges
important, the one he can understand, the one he trusts.

2.5 Conclusions
In this chapter, we have shown that the operation of voting is far from simple. In
the first section, using small examples, describing very simple situations, we found
that intuition and common sense are not sufficient to avoid the many traps that
await us when using aggregation procedures. In fact, in this domain, common
sense is of very little help. We also presented two theoretical results indicating
that there is no hope of finding a perfect voting procedure. Therefore, if we still
want to use a voting procedurethis seems hardly avoidablewe must accept to use
an imperfect one. But this does not mean that we can use any procedure in any
circumstance and any way. The flaws of a particular procedure are probably less
damageable in some instances than in others. Some features of a voting procedure
may be highly desirable in a given context while not so important in another one.
So, for each voting context, we have to choose the procedure that best matches our

needs. And, when we have made this choice, we must be aware that this match is
not perfect, that we must use the procedure in such a way that the risk of facing
a problematic situation is kept as low as possible.
In Section 2, we found that even the input of voting proceduresthe preferences
of the votersare not simple things. Many different models for preferences exist
and can be used in aggregation procedures. This shows that what is usually
considered as data is not really data. When we feed our aggregation procedures
with preferences, these are not given. They are constructed in some more or less
arbitrary way. The choice of a particular model (ranking with ties, fuzzy relations,
. . . ) is itself arbitrary. Nothing in the problem tells us what model to use.
Finally, in Section 3, we showed that the voting process itself is highly complex.
Voting procedures are decision models, just like student grades, indicators,
cost-benefit analysis, multiple criteria decision support (this has already been dis-
cussed in Section 4), . . . They are decision models devoted to the special case where
a decision must be taken by a group of voters and are mainly concerned with the
case of a finite and small set of alternatives. This peculiarity doesnt make voting
procedures very different from other decision and evaluation models. As you will
see in the following chapters, most decision models suffer the same kind of problems
that we have met in this chapter: there is no perfect aggregation procedure; the
data are not data, they are imperfect and arbitrary models; the decision models
are too narrow, they do not take into account the fact that decision support occurs
in a human process (the decision making process) and in a complex environment.

3.1 Introduction
3.1.1 Motivation
In chapter 2, we tried to show that voting, although being a familiar activity
to almost everyone, raises many important and difficult questions that are closely
connected to the subject of this book. Our main objective in this chapter is
similar. We all share the more or less pleasant experience of having received
grades in order to evaluate our academic performances. The authors of this
book spend part of their time evaluating the performance of students through
grading several kinds of work, an activity that you may also be familiar with. The
purpose of this chapter is to build upon this shared experience. This will allow us
to discuss, based on simple and familiar situations, what is meant by evaluating
a performance and aggregating evaluations, both activities being central to
most evaluation and decision models. Although the entire chapter is based on the
example of grading students, it should be stressed that grades are often used
in contexts unrelated to the evaluation of the performance of students: employees
are often graded by their employers, products are routinely tested and graded by
consumer organisations, experts are used to rate the feasibility or the riskiness of
projects, etc. The findings of this chapter are therefore not limited to the realm
of a classroom.
As with voting systems, there is much variance across countries in the way
education is organised. Curricula, grading scales, rules for aggregating grades
and granting degrees, are seldom similar from place to place (for information on
the systems used in the European Union see
This diversity is even increased by the fact that each instructor (a word that
we shall use to mean the person in charge of evaluating students) has generally
developed his own policy and habits. The authors of this book have studied in four
different European countries (Belgium, France, Greece and Italy) and obtained
degrees in different disciplines (Maths, Operational Research, Computer Science,
Geology, Management, Physics) and in different Universities. We were not overly
astonished to discover that the rules that governed the way our performances were
assessed were quite different. We were perhaps more surprised to realise that


although we all teach similar courses in comparable institutions, our grading

policies were quite different even after having accounted for the fact that these
policies are partly contingent upon the rules governing our respective institutions.
Such diversity might indicate that evaluating students is an activity that is perhaps
more complex than it appears at first sight.

3.1.2 Evaluating students in Universities

We shall restrict our attention in this chapter to education programmes with which
we are familiar. Our general framework will be that of a programme at University
level in which students have to take a number of courses or credits. In each
course the performance of students is graded. These grades are then collected and
form the basis of a decision to be taken about each student. Depending on the
programme, this decision may take various forms, e.g. success or failure, success or
failure with possible additional information such as distinctions, ranks or average
grades, success or failure with the possibility of a differed decision (e.g. the degree is
not granted immediately but there is still a possibility of obtaining it). Quite often
the various grades are summarised, amalgamated, we shall say aggregated,
in some way before a decision is taken.
In what follows, we shall implicitly have in mind the type of programmes in
which we teach (Mathematics, Computer Science, Operational Research, Engi-
neering) that are centred around disciplines which, at least at first sight, seem to
raise less evaluation problems than if we were concerned with, say, Philosophy,
Music or Sports.
Dealing only with technically-oriented programmes at University level will
clearly not allow us to cover the immense literature that has been developed in
Education Science on the evaluation of the performance of students. For good
accounts in English, we refer to Airaisian (1991), Davis (1993), Lindheim, Morris
and Fitz-Gibbon (1987), McLean and Lockwood (1996), Moom (1997) and Speck
(1998). Note that in Continental Europe, the Piagetian influence, different institu-
tional constraints and the popularity of the classic book by Pieron (1963) have led
to a somewhat different school of thought, see Bonboir (1972), Cardinet (1986),
de Ketele (1982), de Landsheere (1980), Merle (1996) and Noizet and Caverini
(1978). As we shall see, this will however allow us to raise several important is-
sues concerning the evaluation and the aggregation of performances. Two types
of questions prove to be central for our purposes:

how to evaluate the performance of students in a given course, what is the

meaning of the resulting grades and how to interpret them?

how to combine the various grades obtained by a student in order to arrive

at an overall evaluation of his academic performance?

These two sets of questions structure this chapter into sections.


3.2 Grading students in a given course

Most of you have probably been in the situation of an instructor having to
attribute grades to students. Although this is clearly a very important task, many
instructors share the view that this is far from being the easiest and most pleasant
part of their jobs. We shall try here to give some hints on the process that leads
to the attribution of a grade as well as on some of its pitfalls and difficulties.

3.2.1 What is a grade?

We shall understand a grade as an evaluation of the performance of a student in
a given course, i.e. an indication of the level to which a student has fulfilled the
objectives of the course.
This very general definition calls for some remarks.

1. A grade should always be interpreted in connection with the objectives of a

course. Although it may appear obvious, this implies a precise statement of
the objectives of the course in the syllabus, a condition that is unfortunately
not always perfectly met.

2. All grades do not have a similar function. Whereas usually the final grade
of a course in Universities mainly has a certification role, intermediate
grades, on which the final grade may be partly based, have a more complex
role that is often both certificative and formative, e.g. the result of a
mid-term exam is included in the final grade but is also meant to be a signal
to a student indicating his strengths and weaknesses.

3. Although this is less obvious in Universities than in elementary schools, it

should be noticed that grades are not only a signal sent by the instructor
to each of his students. They have many other potential important users:
other students using them to evaluate their position in the class, other in-
structors judging your severity and/or performance, parents watching over
their child, administrations evaluating the performance of programmes, em-
ployers looking for all possible information on an applicant for a job.

Thus, it appears that a grade is therefore a complex object with multi-

ple functions (see Chatel 1994, Laska and Juarez 1992, Lysne 1984, McLean and
Lockwood 1996). Interpreting it necessarily calls for a study of the process that
leads to its attribution.

3.2.2 The grading process

What is graded and how?
The types of work that are graded, the scale used for grading and the way of amal-
gamating these grades may vary in significant ways for similar types or courses.

1. The scale that is used for grading students is usually imposed by the pro-
gramme. Numerical scales are often used in Continental Europe with varying
bounds and orientations: 0-20 (in France or Belgium), 0-30 (in Italy), 6-1 (in
Germany and parts of Switzerland), 0-100 (in some Universities). American
and Asian institutions often use a letter scale, e.g. E to A or F to A. Obvi-
ously we would not want to conclude from this that Italian instructors have
come to develop much more sensitive instruments for evaluating performance
than German ones or that the evaluation process is in general more precise
in Europe than it is in the USA. Most of us would agree that the choice of
a particular scale is mainly conventional. It should however be noted that
since grades are often aggregated at some point, such choices might not be
totally without consequences. We shall come back to that point in section

2. Some courses are evaluated on the basis of a single exam. But there are
many possible types of exams. They may be written or oral; they may be
open-book or closed-book. Their duration may vary (45 minute exams are
not uncommon in some countries whereas they may last up to 8 hours in
some French programmes). Their content for similar courses may vary from
multiple choice questions to exercises, case-studies or essays.

3. In most courses the final grade is based on grades attributed to multiple

tests. The number and type of work may vary a lot: final exam, mid-term
exam, exercises, case-studies or even class participation. Furthermore the
way these various grades are aggregated is diverse: simple weighted average,
grade only based on exams with group work (e.g. case-studies or exercises)
counting as a bonus, imposition of a minimal grade at the final exam, etc.
(an overview of grading policies and practices in the USA can be found in
Riley, Checca, Singer and Worthington 1994).

4. Some instructors use raw grades. For reasons to be explained later, others
modify the raw grades in some way before aggregating and/or releasing
them, e.g. standardising them.

Preparing and grading a written exam

Within a given institution suppose that you have to prepare and grade a written,
closed-book, exam. We shall take the example of an exam for an Introduction to
Operational Research (OR) course, including Linear Programming (LP), Integer
Programming and Network models, with the aim of giving students a basic un-
derstanding of the modelling process in OR and an elementary mastering of some
basic techniques (Simplex Algorithm, Branch and Bound, elementary Network
Algorithms). Many different choices interfere with such a task.

1. Preparing a subject. All instructors know that preparing the subject of an

exam is a difficult and time consuming task. Is the subject of adequate diffi-
culty? Does it contain enough questions to cover all parts of the programme?

Do all the questions clearly relate to one or several of the announced objec-
tives of the course? Will it allow to discriminate between students? Is there
a good balance between modelling and computational skills? What should
the respective parts of closed vs. open questions be?

2. Preparing a marking scale. The preparation of the marking scale for a given
subject is also of utmost importance. A nice-looking subject might be
impractical in view of the associated marking scale. Will the marking scale
include a bonus for work showing good communication skills and/or will
misspellings be penalised? How to deal with computational errors? How
to deal with computational errors that lead to inconsistent results? How to
deal with computational errors influencing the answers to several questions?
How to judge an LP model in which the decision variables are incompletely
defined? How to judge a model that is only partially correct? How to judge a
model which is inconsistent from the point of view of units? Although much
expertise and/or rules of thumb are involved in the preparation of a good
subject and its associated marking scale, we are aware of no instructor not
having had to revise his judgement after correcting some work and realising
his severity and/or to correct work again after discovering some frequently
given half-correct answers that were unanticipated in the marking scale.

3. Grading. A grade evaluates the performance of a student in completing

the tasks implied by the subject of the exam and, hopefully, will give an
indication of the extent to which a student has met the various objectives
of the course (in general an exam is far from dealing with all the aspects
that have been dealt with during the course). Although this is debatable,
such an evaluation is often thought of as a measure of performance. For
this kind of measure the psychometric literature (see Ebel and Frisbie
1991, Kerlinger 1986, Popham 1981), has traditionally developed at least
two desirable criteria. A measure should be:

reliable, i.e. give similar results when applied several times in similar
valid, i.e. should measure what was intended to be measured and only

Extensive research in Education Science has found that the process of giving
grades to students is seldom perfect in these respects (a basic reference re-
mains the classic book of Pieron (1963). Airaisian (1991) and Merle (1996)
are good surveys of recent findings). We briefly recall here some of the
difficulties that were uncovered.
The crudest reliability test that can be envisaged is to give similar works to
correct to several instructors and to record whether or not these works are
graded similarly. Such experiments were conducted extensively in various
disciplines and at various levels. Not overly surprisingly, most experiments
have shown that even in the more technical disciplines (Maths, Physics,
Grammar) in which it is possible to devise rather detailed marking scales

there is much difference between correctors. On average the difference be-

tween the more generous and the more severe correctors on Maths work
can be as high as 2 points on a 0-20 scale. Even more strikingly on some
work in Maths the difference can be as high as 9 points on a 0-20 scale (see
Pieron 1963).
In other experiments the same correctors are asked to correct a work that
they have already corrected earlier. These auto-reliability tests give similar
results since in more than 50% of the cases the second grade is significantly
different from the first one. Although few experiments have been conducted
with oral exams, it seems fair to suppose that they are no more reliable than
written ones.
Other experiments have shown that many extraneous factors may interfere in
the process of grading a paper and therefore question the validity of grades.
Instructors accustomed to grading papers will not be surprised to note that:
grades usually show much auto correlation: similar papers handed in
by a usually good student and by a usually uninterested student
are likely not to receive similar grades,
the order in which papers are corrected greatly influences the grades.
Near the end of a correction task, most correctors are less generous and
tend to give grades with a higher variance.
anchoring effects are pervasive: it is always better to be corrected
after a remarkably poor work than after a perfect one.
misspellings and poor hand-writing prove to have a non negligible influ-
ence on the grades even when the instructor declares not to take these
effects into account or is instructed not to.
4. The influence of correction habits. Experience shows that correction habits
tend to vary from one instructor to another. Some of them will tend to give
an equal percentage of all grades and will tend to use the whole range of the
scale. Some will systematically avoid the extremes of the range and the dis-
tribution of their marks will have little variability. Others will tend to give
only extreme marks e.g. arguing that either the basic concepts are under-
stood or they are not. Some are used to giving the lowest possible grade after
having spotted a mistake which, in their minds, implies that nothing has
been understood (e.g. proposing a non linear LP model). The distribu-
tion of grades for similar papers will tend to be highly different according to
the corrector. In order to cope with such effects, some instructors will tend
to standardise the grades before releasing them (the so-called z-scores),
others will tend to equalise average grades from term to term and/or use a
more or less ad hoc procedure.

Defining a grading policy

A syllabus usually contains a section entitled grading policy. Although instruc-
tors do not generally consider it as the most important part of their syllabus, they

are aware that it is probably the part that is read first and most attentively by all
students. Besides useful considerations on ethics, this section usually describes
the process that will lead to the attribution of the grades for the course in detail.
On top of describing the type of work that will be graded, the nature of exams
and the way the various grades will contribute to the determination of the final
grade, it usually also contains many details that may prove important in order
to understand and interpret grades. Among these details, let us mention:
the type of preparation and correction of the exams: who will prepare the
subject of the exam (the instructor or an outside evaluator)? Will the work
be corrected once or more than once (in some Universities all exams are
corrected twice)? Will the names of the students be kept secret?
the possibility of revising a grade: are there formal procedures allowing
the students to have their grades reconsidered? Do the students have the
possibility of asking for an additional correction? Do the students have
the possibility of taking the same course at several moments in the academic
year? What are the rules for students who cannot take the exam (e.g. because
they are sick)?
the policy towards cheating and other dishonest behaviour (exclusion from
the programme, attribution of the lowest possible grade for the course, at-
tribution of the lowest possible grade for the exam).
the policy towards late assignments (no late assignment will be graded, minus
x points per hour or day).

Determining final grades

The process of the determination of the final grades for a given course can hardly
be understood without a clear knowledge of the requirements of the programme
in order to obtain the degree. In some programmes students are only required
to obtain a satisfactory grade (it may or not correspond to the middle of
the grading scale that is used) for all courses. In others, an average grade
is computed and this average grade must be over a given limit to obtain the
degree. Some programmes attribute different kinds of degrees through the use of
distinctions. Some courses (e.g. core courses) are sometimes treated apart; a
dissertation may have to be completed.
The freedom of an instructor in arranging his own grading policy is highly
conditioned by this environment. A grade can hardly be interpreted without a
clear knowledge of these rules (note that this sometimes creates serious problems
in institutions allowing students pertaining to different programmes with different
sets of rules to attend the same courses). Within a well defined set of rules,
however, many degrees of freedom remain. We examine some of them below.

Weights We mentioned that the final grade for a course was often the combina-
tion of several grades obtained throughout the course: mid-term exam, final exam,
case-studies, dissertation, etc. The usual way to proceed is to give a (numerical)

weight to each of the work entering into the final grade and to compute a weighted
average, more important works receiving higher weights. Although this process is
simple and almost universally used, it raises some difficulties that we shall examine
in section 3.3. Let us simply mention here that the interpretation of weights in
such a formula is not obvious. Most instructors would tend to compensate for a
very difficult mid term exam (weight 30%) preparing a comparatively easier final
exam (weight 70%). However, if the final exam is so easy that most students
obtain very good grades, the differences in the final grades will be attributable
almost exclusively to the mid term exam although it has a much lower weight
than the final exam. The same is true if the final grade combines an exam with
a dissertation. Since the variance of the grades is likely to be much lower for the
dissertation than for the exam, the former may only marginally contribute towards
explaining differences in final grades independently of the weighting scheme. In
order to avoid such difficulties, some instructors standardise grades before averag-
ing them. Although this might be desirable in some situations, it is clear that the
more or less arbitrary choice of a particular measure of dispersion (why use the
standard deviation and not the inter quartile range? should we exclude outliers?)
may have a crucial influence on the final grades. Furthermore, the manipulation
of such distorted grades seriously complicates the positioning of students with
respect to a minimal passing grade since their use amounts to abandoning any
idea of absolute evaluation in the grades.

Passing a course In some institutions, you may either pass or fail a course
and the grades obtained in several courses are not averaged. An essential problem
for the instructor is then to determine which students are above the minimal
passing grade. When the final grade is based on a single exam we have seen
that it is not easy to build a marking scale. It is even more difficult to conceive a
marking scale in connection to what is usually the minimal passing grade according
to the culture of the institution. The question boils down to deciding what amount
of the programme should a student master in order to obtain a passing grade, given
that an exam only gives partial information about the amount of knowledge of the
The problem is clearly even more difficult when the final grade results from the
aggregation of several grades. The use of weighted averages may give undesirable
results since, for example, an excellent group case-study may compensate for a
very poor exam. Similarly weighted averages do not take the progression of the
student during the course into account.
It should be noted that the problem of positioning students with respect to a
minimal passing grade is more or less identical to positioning them with respect
to any other special grades, e.g. the minimal grade for being able to obtain a
distinction, to be cited on the Deans honour list or the Academic Honour

3.2.3 Interpreting grades

Grades from other institutions
In view of the complexity of the process that leads to the attribution of a grade,
it should not be a surprise that most instructors find it very difficult to interpret
grades obtained in another institution. Consider a student joining your programme
after having obtained a first degree at another University. Arguing that he has
already passed a course in OR with 14 on a 0-20 scale, he wants to have the
opportunity to be dispensed from your class. Not aware of the grading policy of
the instructor and of the culture and rules of the previous University this student
attended, knowing that he obtained 14 offers you little information. The knowledge
of his rank in the class may be more useful: if he obtained one of the highest grades
this may be a good indication that he has mastered the contents of the course
sufficiently. However, if you were to know that the lowest grade was 13 and that
14 is the highest, you would perhaps be tempted to conclude that the difference
between 13 and 14 may not be very significant and/or that you should not trust
grades that are so generous and exhibit so little variability.

Grades from colleagues

Being able to interpret the grade that a student obtained in your own institution
is quite important at least as soon as some averaging of the grades is performed in
order to decide on the attribution of a degree. This task is clearly easier than the
preceding one: the grades that are to be interpreted here have been obtained in a
similar environment. However, we would like to argue that this task is not an easy
one either. First it should be observed that there is no clear implication in having
obtained a similar grade in two different courses. Is it possible or meaningful to
assert that a student is equally good in Maths and in Literature? Is it possible
to assert that, given the level of the programme, he has satisfied to a greater
extent the objectives of the Maths course than the objectives of the Literature
course? Our experience as instructors would lead us to answer negatively to such
questions even when talking of programmes in which all objectives are very clearly
stated. Secondly, in section 3.2.2 we mentioned that, even within fixed institutional
constraints, each instructor still had many degrees of freedom to choose his grading
policy. Unless there is a lot of co-ordination between colleagues they may apply
quite different rules e.g. in dealing with late assignments or in the nature and
number of exams. This seriously complicates the interpretation of the profile of
grades obtained by a student.

Interpreting your own grades

The numerical scales used for grades throughout Europe tend to give the impres-
sion that grades are real measures and that, consequently these numbers may
be manipulated as any other numbers. There are many possible kinds of mea-
sure and having a numerical scale is no guarantee that the numbers on that scale
may be manipulated in all possible ways. In fact, before manipulating numbers
supposedly resulting from measurements it is always important to try to figure

out on which type of scales they have been measured. Let us notice that this
is true even in Physics. Saying that Mr. X weighs twice as much as Mr. Y makes
sense because this assertion is true whether mass is measured in pounds or in
kilograms. Saying that the average temperature in city A is twice as high as the
average temperature in city B may be true but makes little sense since the truth
value of this assertion clearly depends on whether temperature is measured using
the Celsius or the Fahrenheit scale.

The highest point on the scale An important feature of all grading scales is
that they are bounded above. It should be clear that the numerical value attributed
to the highest point on the scale is somewhat arbitrary and conventional. No loss
of information would be incurred using a 0-100 or a 0-10 scale instead of a 0-20
one. At best it seems that grades should be considered as expressed on a ratio
scale, i.e. a scale in which the unit of measurement is arbitrary (such scales are
frequent in Physics, e.g. length can be measured in meters or inches without loss
of information).
If grades can be considered as measured on a ratio scale, it should be recognised
that this ratio scale is somewhat awkward because it is bounded above. Unless you
admit that knowledge is bounded or, more realistically, that perfectly fulfilling
the objectives of a course makes clear sense, problems might appear at the upper
bound of the scale. Consider two excellent, but not necessarily equally excellent,
students. They cannot obtain more than the perfect grade 20/20. Equality of
grades at the top of the scale (or near the top, depending on grading habits) does
not necessarily imply equality in performance (after a marking scale is devised it is
not exceptional that we would like to give some students more than the maximal
grade, i.e. because some bonus is added for particularly clever answers, whereas
the computer system of most Universities would definitely reject such grades !).

The lowest point on the scale It should be clear that the numerical value that
is attributed to the lowest point of the scale is no less arbitrary and conventional
than was the case for the highest point. There is nothing easier than to transform
grades expressed on a 0-20 scale to grades expressed on a 100-120 scale and this
involves no loss of information. Hence it would seem that a 0-20 scale might
be better viewed as an interval scale, i.e. a scale in which both the origin and
the unit of measurement are arbitrary (think of temperature scale in Celsius or
Fahrenheit). An interval scale allows comparisons of differences in performance;
it makes sense to assert that the difference between 0 and 10 is similar to the
difference between 10 and 20 or that the difference between 8 and 10 is twice as
large as the difference between 10 and 11, since changing the unit and origin of
measurement clearly preserves such comparisons.
Let us notice that using a scale that is bounded below is also problematic. In
some institutions the lowest grade is reserved for students who did not take the
exam. Clearly this does not imply that these students are equally ignorant.
Even when the lowest grade can be obtained by students having taken the exam,
some ambiguity remains. Knowing nothing, i.e. having completely failed to meet
any of the objectives of the course, is difficult to define and is certainly contingent

upon the level of the course (this is all the more true that in many institutions
the lowest grade is also granted to students having cheated during the exam,
with obviously no guarantee that they are equally ignorant). To a large extent
knowing nothing in the context of a course is somewhat as arbitrary as is
knowing everything. Therefore, if grades are expressed on interval scales, care
should be taken when manipulating grades close to the bounds of the scale.

In between We already mentioned that on an interval scale, it makes sense to

compare differences in grades. The authors of this book (even if their students
should know that they spend a lot of time and energy in grading them !) do
not consider that their own grades always allow for such comparisons. First we
already mentioned that a lot of care should be taken in manipulating grades that
are close to the bounds. Second, in between these bounds, some grades are very
particular in the sense that they play a particular role in the attribution of the
degree. Let us consider a programme in which all grades must be above a minimal
passing grade, say, 10 on a 0-20 scale, in order to obtain the degree. If it is clear
that an exam is well below the passing grade, few instructors will claim that there
is a highly significant difference between 4/20 and 5/20. Although the latter exam
seems slightly better than the former, the essential idea is that they are both
well below the minimal passing grade. On the contrary the gap between 9/20
and 10/20 may be much more important since before putting a grade just below
the passing grade most instructors usually make sure that they will have good
arguments in case of a dispute (some systematically avoid using grades just below
the minimal passing grade). In some programmes, not only the minimal passing
grade has a special role: some grades may correspond to different possible levels
of distinction, other may correspond to a minimal acceptable level below which
there is no possibility of compensation with grades obtained in other courses. In
between these special grades it seems that the reliable information conveyed by
grades is mainly ordinal. Some authors have been quite radical in emphasising this
point, e.g. Cross (1995) stating that: [...] we contend that the difficulty of nearly
all academic tests is arbitrary and regardless of the scoring method, they provide
nothing more than ranking information (but see French 1993, Vassiloglou and
French 1982). At first sight this would seem to be a strong argument in favour
of the letter system at use in most American Universities that only distinguishes
between a limited classes of grades (usually from F or E to A with, in some
institutions, the possibility of adding + or to the letters). However, since
these letter grades are usually obtained via the manipulation of a distribution of
numerical grades of some sort, the distinction between letter grades and numerical
grades is not as deep as it appears at first sight. Furthermore the aggregation of
letter grades is often done via a numerical transformation as we shall see in section
Finally it should be observed that, in view of the lack of reliability and validity
of some aspects of the grading process, it might well be possible to assert that small
differences in grades that do not cross any special grades may not be significant at
all. A difference of 1 point on a 0-20 scale may well be due only to chance via the
position of the work, the quality of the preceding papers, the time of correction.

Once more grades appear as complex objects. While they seem to mainly
convey ordinal information (with the possibility of the existence of non significant
small differences) that is typical of a relative evaluation model, the existence of
special grades complicates the situation in introducing some absolute elements
of evaluation in the model (on the measurement-theoretic interpretation of grades
see French 1981, Vassiloglou 1984).

3.2.4 Why use grades?

Some readers, and most notably instructors, may have the impression that we
have been overly pessimistic on the quality of the grading process. We would
like to mention that the literature in Education Science is even more pessimistic
leading some authors to question the very necessity of using grades (see Sager 1994,
Tchudi 1997). We suggest to sceptical instructors the following simple experiment.
Having prepared an exam, ask some of your colleagues to take it with the following
instructions: prepare what you would think to be an exam that would just be
acceptable for passing, prepare an exam that would clearly deserve distinction,
prepare an exam that is well below the passing grade. Then apply your marking
scale to these papers prepared by your colleagues. It would be extremely likely
that the resulting grades show some surprises!
However, none of us would be prepared to abandon grades, at least for the
type of programmes in which we teach. The difficulties that we mentioned would
be quite problematic if grades were considered as measures of performance that
we would tend to make more and more precise and objective. We tend to
consider grades as an evaluation model trying to capture aspects of something
that is subject to considerable indetermination, the performance of students.
As is the case with most evaluation models, their use greatly contributes to
transforming the reality that we would like to measure. Students cannot
be expected to react passively to a grading policy; they will undoubtedly adapt
their work and learning practice to what they perceive to be its severity and
consequences. Instructors are likely to use a grading policy that will depend
on their perception of the policy of the Faculty (on these points, see Sabot and
Wakeman 1991, Stratton, Myers and King 1994). The resulting scale of measure-
ment is unsurprisingly awkward. Furthermore, as with most evaluation models
of this type, aggregating these evaluations will raise even more problems.
This not to say that grades cannot be a useful evaluation model. If these lines
have lead some students to consider that grades are useless, we suggest they try
to build up an evaluation model that would not use grades without, of course,
relying too much on arbitrary judgements. This might not be an impossible task;
we, however, do not find it very easy.

3.3 Aggregating grades

3.3.1 Rules for aggregating grades
In the previous section, we hope to have convinced the reader that grading a
student in a given course is a difficult task and that the result of this process is a
complex object.
Unfortunately, this is only part of the evaluation process of students enrolled
in a given programme. Once they have received a grade in each course, a decision
still has to be made about each student. Depending on the programme, we already
mentioned that this decision may take different forms: success or failure, success
or failure with possible additional information e.g. distinctions, ranks or average
grades, success or failure with the additional possibility of partial success (the
degree is not granted immediately but there remains a possibility of obtaining it),
etc. Such decisions are usually based on the final grades that have been obtained
in each course but may well use some other information, e.g. verbal comments from
instructors or extra-academic information linked to the situation of each student.
What is required from the students to obtain a degree is generally described
in a lengthy and generally opaque set of rules that few instructorsbut generally
all studentsknow perfectly (as an interesting exercise we might suggest that
you investigate whether you are perfectly aware of the rules that are used in the
programmes in which you teach or, if you do not teach, whether you are aware of
such rules for the programmes in which your children are enrolled). These rules
exhibit such variety that it is obviously impossible to exhaustively examine them
here. However, it appears that they are often based on three kinds of principles
(see French 1981).

Conjunctive rules
In programmes of this type, students must pass all courses, i.e. obtain a grade
above a minimal passing grade in all courses in order to obtain the degree. If
they fail to do so after a given period of time, they do not obtain the degree.
This very simple rule has the immense advantage of avoiding any amalgamation
of grades. It is however seldom used as such because:

it is likely to generate high failure rates,

it does not allow to discriminate between grades just below the passing grade
and grades well below it,

it offers no incentive to obtain grades well above the minimal passing grade,

it does not allow to discriminate (e.g. using several kinds of distinctions)

between students obtaining the degree.

Most instructors and students generally violently oppose such simple systems since
they generate high failure rates and do not promote academic excellence.

Weighted averages
In many programmes, the grades of students are aggregated using a simple weighted
average. This average grade (the so-called GPA in American Universities) is then
compared to some standards e.g. the minimal average grade for obtaining the de-
gree, the minimal average grade for obtaining the degree with a distinction, the
minimal average grade for being allowed to stay in the programme, etc. Whereas
conjunctive rules do not allow for any kind of compensation between the grades
obtained for several courses, all sorts of compensation effects are at work with a
weighted average.

Minimal acceptable grades

In order to limit the scope of compensation effects allowed by the use of weighted
averages, some programmes include rules involving minimal acceptable grades
in each course. In such programmes, the final decision is taken on the basis of
an average grade provided that all grades entering this average are above some
minimal level.
The rules that are used in the programmes we are aware of often involve a
mixture of these three principles, e.g. an average grade is computed for each cat-
egory of courses provided that the grade of each course is above a minimal level
and such average grades per category of courses are then used in a conjunctive
fashion. Furthermore, it should be noticed that the final decision concerning a
student is very often taken by a committee that has some degree of freedom with
respect to the rules and may, for instance, grant the degree to someone who does
not meet all the requirements of the programme e.g. because of serious personal
All these rules are based on grades and we saw in section 3.2 that the very
nature of the grades was highly influenced by these rules. This amounts to aggre-
gating evaluations that are highly influenced by the aggregation rule. This makes
aggregation an uneasy task. We study some aspects of the most common aggre-
gation rule for grades below: the weighted average (more examples and comments
will be found in chapters 4 and 6).

3.3.2 Aggregating grades using a weighted average

The purpose of rules for aggregating grades is to know whether the overall per-
formance of a student is satisfactory taking his various final grades into account.
Using a weighted average system amounts to assessing the performance of a stu-
dent combining his grades using a simple weighting scheme. We shall suppose that
all final grades are expressed on similar scales and note gi (a) the final grade for
course i obtained by student
Pn a. The average grade obtained by student a is then
computed as g(a) = i=1 wi gi (a), the (positive) weights wi reflecting the im-
portance (in academic terms and/or in function of the length of the course)
of the course for the degree. The
Pnweights wi may, without loss of generality, be
normalised in such a way that i=1 wi = 1. Using such a convention the aver-
age grade g(a) will be expressed on a scale having the same bounds as the scale

used for the gi (a). The simplest decision rule consists in comparing g(a) with
some standards in order to decide on the attribution of the degree and on possible
distinctions. A number of examples will allow us to understand the meaning of
this rule better and to emphasise its strengths and weaknesses (we shall suppose
throughout this section that students have all been evaluated on the same courses;
for the problems that arise when this is not so, see Vassiloglou (1984)).

Example 1
Consider four students enrolled in a degree consisting of two courses. For each
course, a final grade between 0 and 20 is allocated. The results are as follows:

g1 g2
a 5 19
b 20 4
c 11 11
d 4 6

Student c has performed reasonably well in all courses whereas d has a consis-
tent very poor performance; both a and b are excellent in one course while having
a serious problem in the other. Casual introspection suggests that if the students
were to be ranked, c should certainly be ranked first and d should be ranked last.
Students a and b should be ranked in between, their relative position depending
on the relative importance of the two courses. Their very low performance in 50%
of the courses does not make them good candidates for the degree. The use of
simple weighted average of grades leads to very different results. Considering that
both courses are of equal importance gives the following average grades:

average grades
a 12
b 12
c 11
d 5

which leads to having both a and b ranked before c. As shown in figure 3.1, we can
say even more: there is no vector of weights (w, 1w) that would rank c before both
a and b. Ranking c before a implies that 11w + 11(1 w) > 5w + 19(1 w) which
leads to w > 15 . Ranking c before b implies 11w + 11(1 w) > 20w + 4(1 w), i.e.
w < 16 (figure 3.1 should make clear that there is no loss of generality in supposing
that weights sum to 1). The use of a simple weighted sum is therefore not in line
with the idea of promoting students performing reasonably well in all courses.
The exclusive reliance on a weighted average might therefore be an incentive for
students to concentrate their efforts on a limited number of courses and benefit

a l



c l

6 d l
4 l

0 2 4 6 8 10 12 14 16 18 20

Figure 3.1: Use of a weighted sum for aggregating grades

from the compensation effects at work with such a rule. This is a consequence of
the additivity hypothesis embodied in the use of weighted averages.
It should finally be noticed that the addition of a minimal acceptable grade
for all courses can decrease but not suppress (unless the minimal acceptable grade
is so high that it turns the system in a nearly conjunctive one) the occurrence of
such effects.
A related consequence of the additivity hypothesis is that it forbids to account
for interaction between grades as shown in the following example.

Example 2
Consider four students enrolled in an undergraduate programme consisting in three
courses: Physics, Maths and Economics. For each course, a final grade between 0
and 20 is allocated. The results are as follows:

Physics Maths Economics

a 18 12 6
b 18 7 11
c 5 17 8
d 5 12 13

On the basis of these evaluations, it is felt that a should be ranked before b. Al-
though a has a low grade in Economics, he has reasonably good grades in both

Maths and Physics which makes him a good candidate for an Engineering pro-
gramme; b is weak in Maths and it seems difficult to recommend him for any
programme with a strong formal component (Engineering or Economics). Using
a similar type of reasoning, d appears to be a fair candidate for a programme in
Economics. Student c has two low grades and it seems difficult to recommend him
for a programme in Engineering or in Economics. Therefore d is ranked before c.
Although these preferences appear reasonable, they are not compatible with
the use of a weighted average in order to aggregate the three grades. It is easy to
observe that:
ranking a before b implies putting more weight on Maths than on Economics
(18w1 + 12w2 + 6w3 > 18w1 + 7w2 + 11w3 w2 > w3 ),
ranking d before c implies putting more weight on Economics than on Maths
(5w1 + 17w2 + 8w3 > 5w1 + 12w2 + 13w3 w3 > w2 ),
which is contradictory.
In this example it seems that criteria interact. Whereas Maths do not over-
weigh any other course (see the ranking of d vis-a-vis c), having good grades in
both Math and Physics or in both Maths and Economics is better than having
good grades in both Physics and Economics. Such interactions, although not
unfrequent, cannot be dealt with using weighted averages; this is another conse-
quence of the additivity hypothesis. Taking such interactions into account calls
for the use of more complex aggregation models (see Grabisch 1996).

Example 3
Consider two students enrolled in a degree consisting of two courses. For each
course a final grade between 0 and 20 is allocated; both courses have the same
weight and the required minimal average grade for the degree is 10. The results
are as follows:

g1 g2
a 11 10
b 12 9

It is clear that both students will receive an identical average grade of 10.5: the
difference between 11 and 12 on the first course exactly compensates for the oppo-
site difference on the second course. Both students will obtain the degree having
performed equally well.
It is not unreasonable to suppose that since the minimal required average for
the degree is 10, this grade will play the role of a special grade for the instructors,
a grade above 10 indicating that a student has satisfactorily met the objectives
of the course. If 10 is a special grade then, it might be reasonable to consider
that the difference between 10 and 9 which crosses a special grade is much more
significant than the difference between 12 and 11 (it might even be argued that the
small difference between 12 and 11 is not significant at all). If this is the case, we

would have good grounds to question the fact that a and b are equally good. The
linearity hypothesis embodied in the use of weighted averages has the inevitable
consequence that a difference of one point has a similar meaning wherever on the
scale and therefore does not allow for such considerations.

Example 4
Consider a programme similar to the one envisaged in the previous example. We
have the following results for three students:

g1 g2
a 14 16
b 15 15
c 16 14

All students have an average grade of 15 and they will all receive the degree.
Furthermore, if the degree comes with the indication of a rank or of an average
grade, these three students will not be distinguished: their equal average grade
makes them indifferent. This appears desirable since these three students have
very similar profiles of grades.
The use of linearity and additivity implies that if a difference of one point on
the first grade compensates for an opposite difference on the other grade, then a
difference of x points on the first grade will compensate for an opposite difference
of x points on the other grade, whatever the value of x. However, if x is chosen
to be large enough this may appear dubious since it could lead, for instance, to
view the following three students as perfectly equivalent with an average grade of

g1 g2
a0 10 20
b 15 15
c0 20 10

whereas we already argued that, in such a case, b could well be judged preferable to
both a0 and c0 even though b is indifferent to a and c. This is another consequence
of the linearity hypothesis embodied in the use of weighted averages.

Example 5
Consider three students enrolled in a degree consisting of three courses. For each
course a final grade between 0 and 20 is allocated. All courses have identical
importance and the minimal passing grade is 10 on average. The results are as

g1 g2 g3
a 12 5 13
b 13 12 5
c 5 13 12

It is clear that all students have an average equal to the minimal passing grade
10. They all end up tied and should all be awarded the degree.
As argued in section 3.2 it might not be unreasonable to consider that final
grades are only recorded on an ordinal scale, i.e. only reflect the relative rank of
the students in the class, with the possible exception of a few special grades
such as the minimal passing grade. This means that the following table might as
well reflect the results of these three students:

g1 g2 g3
a 11 4 12
b 13 13 6
c 4 14 11

since the ranking of students within each course has remained unchanged as well
as the position of grades vis-a-vis the minimal passing grade. In this case, only
b (say the Deans nephew) gets an average above 10 and both a and c fail (with
respective averages of 9 and 9.6). Note that using different transformations, we
could have favoured any of the three students.
Not surprisingly, this example shows that a weighted average makes use of the
cardinal properties of the grades. This is hardly compatible with grades that
would only be indicators of ranks even with some added information (a view that
is very compatible with the discussion in section 3.2). As shown by the following
example, it does not seem that the use of letter grades, instead of numerical
ones, helps much in this respect.

Example 6
In many American Universities the Grade Point Average (GPA), which is nothing
more than a weighted average of grades, is crucial for the attribution of degrees and
the selection of students. Since courses are evaluated on letter scales, the GPA
is usually computed by associating a number to each letter grade. A common
conversion scheme is the following:

A 4 (outstanding or excellent)
B 3 (very good)
C 2 (good)
D 1 (satisfactory)
E 0 (failure)

in which the difference between two consecutive letters is assumed to be equal.

Such a practice raises several difficulties. First, letter grades for a given course
are generally obtained on the basis of numerical grades of some sort. This implies
using a first conversion scheme of numbers into letters. The choice of such a
scheme is not obvious. Note that when there are no holes in the distribution
of numerical grades it is possible that a very small (and possibly non significant)
difference in numerical grades results in a significant difference in letter grades.
Secondly, the conversion scheme of letters into numbers used to compute the
GPA is somewhat arbitrary. Allowing for the possibility of adding + or to
the letter grades generally results in a conversion schemes maintaining an equal
difference between two consecutive letter grades. This can have a significant impact
on the ranking of students on the basis of the GPA.
To show how this might happen suppose that all courses are first evaluated
on a 0100 scale (e.g. indicating the percentage of correct answers to a multiple
choice questionnaire). These numbers are then converted into letter grades using
a first conversion scheme. These letter grades are further transformed, using a
second conversion scheme, into a numerical scale and the GPA is computed. Now
consider three students evaluated on three courses on a 0-100 scale in the following

g1 g2 g3
a 90 69 70
b 79 79 89
c 100 70 69

Using an E to A letter scale, a common conversion scheme (that is used in many

Universities) is

A 90100%
B 8089%
C 7079%
D 6069%
E 059%

This results in the following letter grades:

g1 g2 g3
a A D C
b C C B
c A C D

Supposing the three courses of equal importance and using the conversion scheme
of letter grades into numbers given above, the calculation of the GPA is as follows:

g1 g2 g3 GPA
a 4 1 2 2.33
b 2 2 3 2.33
c 4 2 1 2.33

making the three students equivalent.

Now another common (and actually used) scale for converting percentages into
letter grades is as follows:

A+ 98100%
A 9497%
A 9093%
B+ 8789%
B 8386%,
B 8082%
C+ 7779%,
C 7376%,
C 7072%,
D 6069%,
F 059%

This scheme would result in the following letter grades:

g1 g2 g3
a A D C
b C+ C+ B+
c A+ C D

Maintaining the usual hypothesis of a constant difference between two consecu-

tive letter grades we obtain the following conversion scheme:

A+ 10
A 9
A 8
B+ 7
B 6
B 5
C+ 4
C 3
C 2
D 1
F 0

which leads to the following GPA:

g1 g2 g3 GPA
a 8 1 2 3.66
b 4 4 7 5.00
c 10 2 1 4.33

In this case, b (again the Deans nephew) gets a clear advantage over a and c.
It should be clear that standardisation of the original numerical grades before
conversion offers no clear solution to the problem uncovered.

Example 7
We argued in section 3.2 that small differences in grades might not be significant
at all provided they do not involve crossing any special grade. The explicit
treatment of such imprecision is problematic using a weighted average; most often,
it is simply ignored. Consider the following example in which three students are
enrolled in a degree consisting of three courses. For each course a final grade
between 0 and 20 is allocated. All courses have the same weight and the minimal
passing grade is 10 on average. The results are as follows:

g1 g2 g3
a 13 12 11
b 11 13 12
c 14 10 12

All students will receive an average grade of 12 and will all be judged indifferent.
If all instructors agree that a difference of one point in their grades (away from 10)
should not be considered as significant, student a has good grounds to complain.
He can argue that he should be ranked before b: he has a significantly higher grade
than b on g1 while there is no significant difference between the other two grades.
The situation is the same vis-a-vis c: a has a significantly higher grade on g2 and
this is the only significant difference.
In a similar vein, using the same hypotheses, the following table appears even
more problematic:

g1 g2 g3
a 13 12 11
b 11 13 12
c 12 11 13

since, while all students clearly obtain a similar average grade, a is significantly
better than b (he has a significantly higher grade on g1 while there are no signifi-
cant differences on the other two grades), b is significantly better than c and c is

significantly better than a (the reader will have noticed that this is a variant of
the Condorcet paradox mentioned in chapter 2).
Aggregation rules using weighted sums will be dealt with again in chapters 4
and 6. In view of these few examples, we hope to have convinced the reader that
although the weighted sum is a very simple and almost universally accepted rule,
its use may be problematic for aggregating grades. Since grades are a complex
evaluation model, this is not overly surprising. If it is admitted that there is no
easy way to evaluate the performance of a student in a given course, there is no
reason why there should be an obvious one for an entire programme. In particular,
the necessity and feasibility of using rules that completely rank order all students
might well be questioned.

3.4 Conclusions
We all have been accustomed to seeing our academic performances in courses
evaluated through grades and to seeing these grades amalgamated in one way or
another in order to judge our overall performance. Most of us routinely grade
various kinds of work, prepare exams, write syllabi specifying a grading policy,
etc. Although they are very familiar, we have tried to show that these activities
may not be as simple and as unproblematic as they appear to be. In particular,
we discussed the many elements that may obscure the interpretation of grades
and argued that the common weighted sum rule to amalgamate them may not be
without difficulties. We expect such difficulties to be present in the other types of
evaluation models that will be studied in this book.
We would like to emphasise a few simple ideas to be drawn from this example
that we should keep in mind when working on different evaluation models:

building an evaluation model is a complex task even in simple situations.

Actors are most likely to modify their behaviour in response to the imple-
mentation of the model;

evaluation operations are complex and should not be confused with mea-
surement operations in Physics. When they result in numbers, the proper-
ties of these numbers should be examined with care; using numbers may
be only a matter of convenience and does not imply that any operation can
be meaningfully performed on these numbers.

the aggregation of the result of several evaluation models should take the
nature of these models into account. The information to be aggregated may
itself be the result of more or less complex aggregation operations (e.g. ag-
gregating the grades obtained at the mid-term and the final exams) and may
be affected by imprecision, uncertainty and/or inaccurate determination.

aggregation models should be analysed with care. Even the simplest and
most familiar ones may in some cases lead to surprising and undesirable

Finally we hope that this brief study of the evaluation procedures of students will
also be the occasion for instructors to reflect on their current grading practices.
This has surely been the case for the authors.

Our daily life is filled with indicators: I.Q., Dow Jones, GNP, air quality, physicians
per capita, poverty index, social position index, consumer price index, rate of
return, . . . If you read a newspaper, you could feel that these magic numbers rule
the world.

The EU countries with a deficit/GNP ratio lower than 3% will be

allowed to enter the EURO.

Todays air quality is 7: older persons, pregnant women and young

children should stay indoors.

The World Bank threatens country x to suspend its help if it doesnt

succeed in bringing indicator y to level z.

Note that in many cases, the decisions of the World Bank to withdraw help
are not motivated by economic or financial reasons. Violations of human rights
are often presented as the main factor. But it is worth noting that indicators of
human rights also exist (see e.g. Horn (1993)).
Why are these indicators (often called indices) so powerful ? Probably because
it is commonly accepted that they faithfully reflect reality. This forces us to raise
several questions.
1. Is there one reality, several realities or no reality ? Many philosophers nowa-
days consider that reality is not unique. Each person has a particular per-
ception of the world and, hence, a particular reality. One could argue that
these particular realities are just particular views of the same reality but,
as it is impossible to consider reality independently of our perception of it,
it might be meaningless to consider that reality exists per se (Roy 1990).
As a consequence, an indicator might only be relevant for the person who
constructed it.
2. Whatever the answer to the previous question, can we hope that an indicator
faithfully reflects reality (the reality or a reality) ? Reality is so complex that
this is doubtful. Therefore, we must accept that an indicator accounts only
for some aspects of reality. Hence, an indicator must be designed so as


to reflect those aspects that are relevant with respect to our concerns. As
an illustration, the Human Development index (HDI) defined by the United
Nations Development Programme (UNDP) to measure development (United
Nations Development Programme 1997) is used by many different people in
different continents and in different areas of activity (politicians, economists,
businessmen, . . . ). Can we assume that their concerns are similar ?
In the Human development report 1997, UNDP proudly reports that

The HDI has been used in many countries to rank districts or

counties as a guide to identifying those most severely disadvan-
taged in terms of human development. Several countries, such as
the Philippines, have used such analysis as a planning tool. [. . . ]
The HDI has been used especially when a researcher wants a com-
posite measure of development. For such uses, other indicators
have sometimes been added to the HDI.

This clearly shows that many people used the HDI in completely different
Furthermore, are the concerns of UNDP itself with respect to the HDI clearly
defined ? Why do they need the human development index ? To cut subsidies
to nations evolving in the wrong direction ? To share subsidies among the
poorest countries (according to what key) ? To put some pressure on the
governments performing the worst ? To prove that Western democracies
have the best political systems ?

3. Suppose that the purpose of an indicator is clearly defined. Are we sure that
this indicator indicates what we want it to ? Do the arithmetic operations
performed during the computation of the indicator lead to something that
makes sense ?

Let us now discuss three well known indicators arising in completely different
areas of our lives in detail: the human development index, the air quality index
and the decathlon score.

4.1 The human development index

As stated by the United Nations Development Programme (1997), page 14,

The human development index measures the average achievements in

a country in three basic dimensions of human developmentlongevity,
knowledge and a decent standard of living. A composite index, the HDI
thus contains three variables: life expectancy, educational attainment
(adult literacy and combined primary, secondary and tertiary enroll-
ment) and real GDP (Gross Domestic Product) per capita expressed
in PPP$ (Purchasing Power Parity $).

HDIs precise definition is presented on page 122 of the 1997 Human Develop-
ment Report. The HDI is a simple average of the life expectancy index, educational
attainment index and adjusted real GDP per capita (PPP$) index. Here is how
each index is computed.

Life Expectancy Index (LEI) This index measures life expectancy at birth. In
order to normalise the scale of this index, a minimum value (25 years) and
a maximum one (85 years) have been defined. The index is defined as
life expectancy at birth 25
85 25
Hence, it is a value between 0 and 1.
Educational Attainment Index (EAI) It is a combination of two other indi-
cators: the Adult Literacy Index (ALI) and the combined primary, secondary
and tertiary Enrollment Ratio Index (ERI). The first one is the proportion
of literate adults while the second one is the proportion of children in age of
primary, secondary or tertiary school that really go to school. The EAI is a
weighted average of ALI and ERI; it is equal to

Adjusted real GDP per capita (PPP$) Index (GDPI) This index aims at
measuring the income per capita. As the value of one dollar for someone
earning $100 is much larger than the value of one dollar for someone earning
$100 000, the income is first transformed using Atkinsons formula (Atkinson
1970). The transformed value of y, i.e. W (y), is given by one of the following:

if 0 < y < y ,

if y y < 2y ,

y + 2[(y y ) ]
y + 2(y )1/2 + 3[(y 2y )1/3 ] if 2y y < 3y ,


y + 2(y )1/2 + 3(y )1/3 + . . . if (n 1)y y < ny

+n[(y (n 1)y )1/n ]


In this formula, y represents the income, W (y) the transformed income and
y is set at $5 835 (PPP$) which was the World average annual income per
capita in 1994.
Thereafter, the income scale is normalised, using the maximum value of
$40 000, the minimum value of $100 and the formula
transformed income W (100)
W (40 000) W (100)

Hence, it is a value between 0 and 1. Note that W (40 000) = 6 154 and
W (100) = 100.

Some words about the data and their collection time: the Human Development
Report is a yearly publication (since 1990). Obviously, the 1997 report does not
contain the 1997 data. Indeed, the HDI computed in the 97 report is considered
by the UNDP as the HDI of 1994. To make things more complicated, the 199i HDI
(in the 199j report) is an aggregate of data from 199i (for some dimensions) and
from earlier years (for other dimensions). In this volume, we use only data from
the 1997 Human Development Report. We refer to them as HDR97, irrespective
of the collection year.
To illustrate how the HDI works, lets compute the HDI for Greece (HDR97).
Life expectancy in Greece is 77.8 years. Hence, LEI = (77.825)/(8525) = 0.880.
The ALI is 0.967 and the ERI is 0.820. Hence, EAI = (2 0.967 + 0.820)/3 =
0.918. Greeces real GDP per capita at $11 265 is above y by less than twice
y . Thus the adjusted real GDP per capita for Greece is $5 982 (PPP$) because
5 982 = 5 835+2(11 2655 835)1/2 . Hence GDPI = (5 982W (100))/(W (40 000)
W (100)) = (5 982 100)/(6 154 100) = 0.972. Finally, Greeces HDI is (0.880 +
0.918 + 0.972)/3 = 0.923.

4.1.1 Scale Normalisation

To obtain the LEI and the GDPI, maximum and minimum values have been de-
fined so that, after normalisation, the range of the index is [0,1]. The choice of
these bounds is quite arbitrary. Why 25 and 85 years ? Is 25 years the smallest
observed value ? No, the lowest observed value is 22.6 (Rwanda, HDR97). There-
fore the LEI is negative for Rwanda. The value of 25 was chosen for the first
report (1990), when the lowest observed value was above 35. At that time, no one
would ever have thought that life expectancy could be lower than 25. To avoid
this problem, they could have chosen a much lower value: 20 or 10. The likelihood
of observing a value smaller than the minimum would have been much smaller.
But the choice of the bounds is not without consequences. Consider the following
Suppose that the EAI and GDPI have been computed for South Korea and
Costa Rica (HDR97). We also know the life expectancy at birth for South Korea
and Costa Rica (see Table. 4.1) If the maximum and minimum for life expectancy

life expectancy EAI GDPI

South Korea 71.5 .93 .97
Costa Rica 76.6 .86 .95

Table 4.1: Bounds: life expectancy, EAI and GDPI for South Korea and Costa
Rica (HDR97)

are set to 85 and 25, then the HDI is 0.890 for South Korea and 0.889 for Costa
Rica. But if the maximum and minimum for life expectancy are set to 80 and 25,
then the HDI is 0.915 for South Korea and 0.916 for Costa Rica. In the first case,
Costa Rica is less developed than South Korea while in the second one, we obtain
the converse: Costa Rica is more developed than South Korea. Hence, the choice
of the bounds matters.

In fact narrowing the range of life expectancy from [25,85] to [25,80] increases
the difference between any two values of LEI by a factor (8525)/(8025). Hence
it amounts to increasing the weight of LEI by the same factor. In our example,
Costa Rica performed better than South Korea on life expectancy. Therefore, it
is not surprising that its position is improved when life expectancy is given more
weight (by narrowing its range).
Note that, apparently, no bounds were fixed for the ALI and the ERI. In reality,
this is equivalent to choosing 1 for maximum and 0 for minimum. This is also an
arbitrary choice. It is obvious that values 0 and 1 have not been observed and are
not likely to be observed in a foreseeable future. Hence the range of these scales
is narrower than [0,1] and the scale could be normalised, using other values than
0 and 1.

4.1.2 Compensation
Consider Table 4.2 where the data for two countries (Gabon and the Solomon
Islands, HDR97) are presented. The Solomon Islands perform quite well on all
dimensions; Gabon is slightly better than the Solomon Islands on all dimensions
except life expectancy where it is very bad. For us, this very short life expectancy
is clearly a sign of severe underdevelopment, even if other dimensions are good.
Nevertheless, the HDI is equal to 0.56 for both Gabon and Solomon Islands. Hence,

life expectancy ALI ERI real GDP

Gabon 54.1 .63 .60 3 641
Solomon Islands 70.8 .62 .47 2 118

Table 4.2: Compensation: performances of Gabon and Solomon Islands (HDR97)

in spite of the informal analysis we performed on the table, we should conclude

that Gabon and Solomon Islands are at the same development level. This problem
is due to the fact that we used the usual average to aggregate our data into one
number. Weaknesses on some dimensions are compensated by strengths on other
dimensions. This is probably desirable, to some extent. Yet, extreme weaknesses
should not be compensated, even by very good performances on other dimensions.
Let us go further with compensation. As any weakness can be compensated
by a strength, a decrease in life expectancy by one year can be compensated by
some increase in adjusted real GDP (income transformed by Atkinsons formula).
Let us compute this increase. A decrease by one year yields a decrease of LEI by
1/(85 25) = 0.016667. To compensate this, the GDPI must increase by the same
amount. Hence, the adjusted real GDP must be increased by 0.016667(6 154
100)= 100.9$ (recall that W (40 000) = 6 154). Accordingly, a decrease in life
expectancy by 2 years can be compensated by an increase in adjusted real GDP
by 2 times 100.9$; a decrease in life expectancy by n years can be compensated
by an increase in adjusted real GDP by n times 100.9$. The value of one year of
life is thus 100.9$ (adjusted by Atkinsons formula). The value 100.9 is called the
substitution rate between life expectancy and adjusted real GDP.

Other substitution rates are easy to compute: e.g. the substitution rate between
life expectancy and adult literacy is 0.016667(1 0)(3/2)=0.025. To compensate
a decrease of n years of life expectancy, you need an increase of the adult literacy
index of n times 0.025.
Let us now think in terms of real GDP (not adjusted). In a country where
real GDP is 13 071$ (Cyprus, HDR97), a decrease in life expectancy of one year
can be compensated by an increase in real GDP of 21 084$. In a country where
real GDP is 700$ (Chad, HDR97), a decrease of life expectancy by one year can
be compensated by an increase in real GDP by 100.9$. Hence, poor peoples life
expectancy has much less value than that of rich ones.

4.1.3 Dimension independence

Consider the example of Table 4.3. Countries x and y perform equally badly on

life expectancy ALI ERI real GDP

x 30 .80 .65 500
y 30 .35 .40 3 500

Table 4.3: Independence: performances of x and y

life expectancy, y is much lower than x on adult literacy but much higher than
x on income. As life expectancy is very short, one might consider that adult
literacy is not very important (because there are almost no adults) but income is
more important because it improves quality of life in other respects. Furthermore,
health conditions and life expectancy can be expected to improve rapidly due to a
higher income. Hence, one could conclude that y is more developed than x. Our
conclusion is confirmed by the HDI: 0.30 for x and 0.34 for y.
Let us now compare two countries, w and z similar to x and y except that life
expectancy is equal to 70 for both w and z (see Table 4.4). In such conditions, the
performance of z on adult literacy is really bad compared to that of w. The adult
population is very important and its illiteracy is a severe problem. Even if the high
income of z is used to foster education, it will take decades before a significant
part of the population is literate. On the contrary, ws low income doesnt seem to

life expectancy ALI ERI real GDP

w 70 .80 .65 500
z 70 .35 .40 3 500

Table 4.4: Independence: performances of w and z

be a problem for the quality of life, as life expectancy is high as well as education.
Hence, it might not be unreasonable to conclude that w is more developed than
z. But if we compute the HDI, we obtain 0.52 for w and 0.56 for z! This should
not be a surprise; there is no difference between x and y on the one hand and w
and z on the other hand, except for life expectancy. But the differences in life

expectancy between x and w and between y and z are equal. Hence, this results
in the same increase of the HDI (compared to x and y) for both w and z.
When a sum (or an average) is used to aggregate different dimensions, identical
performances of by two items (countries or whatever) on one or more dimensions
are not relevant for the comparison of these items. The identical performances can
be changed in any direction; as long as they remain identical, they do not affect
the way both items compare to each other. This is called dimension independence;
it is inherent to sums and averages. But we saw that this property is not always
desirable. When we compare countries on the basis of life expectancy, education
and income, dimension independence might not be desirable.

4.1.4 Scale construction

In a way, we already have discussed this topic in Section 4.1.1 (Scale Normali-
sation). But there is more to scale construction than scale normalisation. For
example, concerning real GDP, before normalising this scale, the real GDP is ad-
justed using Atkinsons formula. The goal of this adjustment is obvious: if you
earn 40 000 dollars, one more dollar is negligible. If you earn 100 dollars, one
more dollar is considerable. Atkinsons formula reflects this. But why choosing
y = $5 835 ? Why choose Atkinsons formula ? Other formulas and other values
for y would work just as well. Once more, an arbitrary choice has been made
and we could easily build a small example showing that another arbitrary (but
defendable) choice would yield a different ranking of the countries.
Note that the fact that life expectancy, adult literacy and enrollment have
not been adjusted is also an arbitrary choice. One could argue that improving
life expectancy by one year in a country where life expectancy is 30 is a huge
achievement while it is a moderate one in a country where life expectancy is 70.
Some could even argue that increasing life expectancy above a certain threshold is
no longer an improvement. It increases the health budget in such proportions that
no more resources are available for other important areas: education, employment
policy, . . .

4.1.5 Statistical aspects

Let us consider the four indices of the HDI from a statistical point of view. The
life expectancy index is the average over the population and for a determined time
period of the length of the lives of the individuals in the population. It is well
known that averages, even if they are useful, cannot reflect the variety present in
the population. A country where approximately everyone lives until 50 has a life
expectancy of 50 years. A country where a part of the population (rural or poor
or of some race) dies early and where another part of the population lives until 80
might also have a life expectancy of 50 years.
Note that this kind of average is quite particular. It is very different from
the average that we perform when, for example, we have several measures of the
weight of an object and we consider the average as a good estimate of its actual
weight. The weight of an object really exists (as far as reality exists). On the

contrary, even if reality exists, the average of the length of life doesnt correspond
to something real. It is the length of life of a kind of average or ideal human, as
if we (the real humans) were imperfect, irregular or noisy copies of that average
human. Until the 19th- century, both kinds of averages were called by differ-
ent names (moyenne proportionnelledifferent measures of one objectand valeur
communedifferent objects, each measured once) and considered as completely dif-
ferent. During the 19th-century the Belgian astronomer and statistician Quetelet
(1796-1894) invented the concept of the average human and unified both averages
(Desrosieres 1995).
To convince you that the concept of the average human is quite strange (though
possibly useful), consider a country where all inhabitants are right triangles of
different sizes and shapes (example borrowed from Warusfel (1961)). To make it
easy, let us suppose that there are just two kinds of right triangles (see Fig. 4.1),
in the same proportion. A statistician wants to measure the average right triangle.
In order to do so, he computes the average length of each edge. What he gets is
a triangle with edges of length 4, 8 and 9, i.e. a triangle which is not right-angled
for 42 + 82 6= 92 . The average right triangle is no longer a right triangle! What
looks like a right angle is in fact approximately a 91 degrees angle. In the same
spirit, Quetelet measured the average size of humans, in all dimensions, including
the liver, heart, spleen and other organs. What he got was an average human in
which it was impossible to fit all its average organs. They were too large!

5 9
5 4

4 12 8

Figure 4.1: Two right triangles and their average

The adult literacy index is quite different: it is just the number of literate
adults, divided by the total adult population to allow comparisons between coun-
tries. Hence one could think it is not an average. In fact it depends on how we
interpret it. If we consider that an ALI of 0.60 means that 60% of the population
is literate, then it is not an average. If we consider that an ALI of 0.60 means that
the average literacy level is 60%, then it is an average. And this last interpreta-
tion is not more silly than computing a life expectancy index. Consider a variable
whose value is 0 for an illiterate adult and 1 for a literate one. Compute the av-
erage of this variable over the population and over some a time period. What do
you get ? The adult literacy index!
We can analyse the enrolment ratio index and the adjusted real GDP index in
the same way as the ALI. They are quantities that are measured at country level.
The first one being a proportion and the second one being normalised, they can
also be interpreted at individual level, like averages.
What about the HDI itself. According to the United Nations Development
Programme (1997), it is designed to

[. . . ] measure the average achievements in a country [. . . ]

Furthermore, the HDI contains an index (LEI) which can only be interpreted bear-
ing in mind Quetelets average human. Therefore the ALI, GDPI and HDI should
be interpreted in this way as well. The HDI somehow describes how developed the
average human in a country is.

4.2 Air quality index

Due to the alarming increase in air pollution, mainly in urban areas, during the last
decades, several governments and international organisations edited some norms
concerning pollutants concentration in the air (e.g., the Clean Air Act in the US).
Usually these norms specify, for each pollutant, a concentration that should not be
exceeded. Naturally, these norms are just norms and they are often are exceeded.
Therefore, as a good quality air is not guaranteed by norms, different monitoring
systems have been developed in order to provide governments as well as citizens
with some information about air pollution. Two examples of such systems are
the Pollutant Standards Index (PSI), developed by the US Environmental Protec-
tion Agency ((Ott 1978) or, and the
ATMO Index, developed by the French Environment Ministry (
SANTE/paracelse/envirtox/Pollatmo/Surveill/atmo.html). These two indi-
cators are very similar and we will discuss the French ATMO.
The ATMO index is based on the concentration of 4 major pollutants: sulfur
dioxide (SO2 ), nitrogen dioxide (NO2 ), ozone (O3 ) and particulate matter (soot,
dust, particles). For each pollutant, a sub-index is computed and the final ATMO
index is defined as being equal to the largest sub-index. Here is how each sub-
index is defined. For each pollutant, the concentration is converted into a number
on a scale from 1 to 10. Level 1 corresponds to an air of excellent quality; levels
5 and 6 are just around the EU long term norms, levels 8 corresponds to the EU
short term norms and 10 indicates hazardous conditions.
To illustrate, suppose that the sub-indices are as in Table 4.5. The resulting

pollutant CO2 SO2 O3 dust

sub-index 3 3 2 8

Table 4.5: Sub-indices of the ATMO index

ATMO index is the largest value, that is 8. Hence the air quality is very bad. In
the following paragraphs, we discuss some problems arising with the ATMO index.

4.2.1 Monotonicity
Suppose that, due to heavy traffic, the absence of wind and a very sunny day, the
ozone sub-index increases from 3 to 8 for the air described in Table 4.5. Clearly,
this corresponds to a worse air: no pollutant did decreased, one of them increased.
In these conditions, we expect the ATMO index to worsen as well. In fact the

ATMO index does not change. The maximum is still 8. Thus some changes, even
significant ones, are not reflected by the index. In our example, the change is very
significant as the ozone sub-index was almost perfect and became very bad.
Note that if the ozone sub-index decreases from 8 to 3, the ATMO index does
not change either though the air quality improves. This shows that the ATMO
index is not monotonic. Some changes, in both directions, are not reflected by the

4.2.2 Non compensation

Let us consider the ATMO index for two different airs (x and y), as described by
Table 4.6. Air x is perfect on for all measurements but one: it scores just above

pollutant CO2 SO2 O3 dust

x 1 1 6 1
y 5 4 5 5

Table 4.6: Sub-indices for x and y

the EU long term norm for ozone. Air y is not good for any dimensions. It is of
average quality on all dimensions and close to the EU long term norms for three
dimensions. The ATMO index is 6 for air x and 5 for air y. Hence, the quality of
air x is considered to be lower than that of air y. Contrary to what we observed
with the HDI, no compensation at all occurs between the different dimensions.
The small weakness of x (6 compared to 5, for ozone) is not compensated by its
large strengths (1 compared to 4 or 5, for carbon dioxide, nitrogen dioxide and
dust). In the case of human development, the compensation between dimensions
was too strong. Here, we face another extreme: no compensation at all, which is
probably not better.

4.2.3 Meaningfulness
Let us forget our criticism of the ATMO index and suppose that it works well.
Consider the statement Todays ATMO index (6) is twice as high as yesterdays
index (3). What does it mean ? We are going to show that it is meaningless, in
a certain sense. Let us come back to the definition of the sub-indices. For a given
pollutant, the concentration is measured in g/m3 . The concentration figures are
then transformed into numbers between 1 and 10. This is done in an arbitrary
way. For example, instead of choosing 5-6 for the EU long term norms and 8 for
the short term ones, 6-7 and 9 could have been chosen. The index would work
as well. The relevant information provided by the index is not the figure itself; it
is some information about the fact that we are above or below some norms that
are related to the effects of the pollutants on health (a somewhat similar situation
has been encountered in Chapter 3). But in such a case, the values of todays
and yesterdays index would be different, say 7 and 4, and 7 is not twice as large
as 4. To conclude, the statement Todays ATMO index (6) is twice as high as

yesterdays index (3) would be valid, or meaningful, only in a particular context,

depending upon arbitrary choices. Such a statement is said to be meaningless.
On the contrary, the statement Todays ATMO sub-index for ozone (6) is
higher than yesterdays sub-index for ozone (3) is meaningful. Any reasonable
transformation of the concentration figures into numbers between 1 and 10 would
lead to the same conclusion: todays sub-index is higher than yesterdays one. By
reasonable transformation we mean a transformation that preserves the order:
a concentration cannot be transformed into an index value lower than the index
value corresponding to a lower concentration. Concentration of 110 and 180 g/m3
can be transformed in 3 and 6, or 4 and 6, or 2 and 4 but not 4 and 2.
More subtle: Todays ATMO index (6) is larger than yesterdays ATMO index
(3). Is this sentence meaningful ? In the previous paragraph, we saw that the
arbitrariness involved in the construction of the 1 to 10 scale of a sub-index is not
a problem when we want to compare two values of the same sub-index. But if we
want to compare two values of two different sub-indices, it is no longer true. A
value of 3 on a sub-index could be more dangerous for health than a 6 on another
sub-index. Of course, the scales have been constructed with care: 5 corresponds
to the EU long term norms on all sub-indices and 8 to the short term norms.
This is intended to make all sub-indices commensurable. Comparisons should
thus be meaningful. But can we really assume that a 5 (or the corresponding
concentration in g/m3 ) is equivalent on two different sub-indices ? Equivalent in
what terms ? Some pollutants might have short term effects and other pollutants,
long term effects. They can have effects on different parts of the organism. Should
we compare the effects in terms of discomfort, mortality after n years, health care
costs, . . . ?

4.3 The decathlon score

The decathlon is a 10-event athletic contest. It consists of 100-meter, 400-meter,
and 1 500-meter runs, a 110-meter high hurdles race, the javelin and discus throws,
shot put, pole vault, high jump, and long jump. It is usually disputed over two
or three days. It was introduced as a three-day event at the Olympic Games
of Stockholm in 1912. To determine the winner of the competition, a score is
computed for each athlete and the athlete with the best score is the winner. This
score is the sum of the single-event scores. The single event scores are not just
times and distances. It doesnt make sense to add the time of a 100-meter run to
the time of a 1 500-meter run. It is even worse to add the time of a run to the
length of a jump. This should be obvious for everyone.
Until 1908, the single-event scores were just the rank of an athlete in that
event. For example, if an athlete performed the third best high jump, his single-
event score for the high jump was 3. The winner was thus the athlete with the
lowest overall score. Note that this amounts to using the Borda method (see p.14)
to elect the best athlete when there are ten voters and the preferences of each
voter are the rankings defined by each event.
The main problem with these single-event scores is that they very poorly reflect


distance distance

Figure 4.2: Decathlon tables for distances: general shape of a convex (left) and
concave (right) tables

the performances of the athletes. Suppose that an athlete arrived 0.1 second before
the next athlete in the 100-meter run. They have ranks i and i+1. So the difference
in the scores that they receive is 1. Suppose now that the delay between these
two athletes is 1 second. Their ranks are unchanged. Thus the difference of in
the scores that they receive is still 1 though a larger difference would be more
appropriate. That is why other tables of single-event scores have been used since
1908 (de Jongh 1992, Zarnowsky 1989). In the tables used after 1908, high scores
are associated to good performances (contrary to scores before 1908). Hence, the
winner is the athlete that has the highest overall score.
Some of these tables (different versions, in use between 1934 and 1962) are
based on the idea that improving a performance by some amount (e.g. 5 centime-
tres in a long jump) is more difficult if the performance is close to the world record.
Hence, it deserves more points. The general shape of these tables, for distances,
is given in Figure 4.2 (convex table). For times (in runs), the shape is different as
an improvement is a decrease in time.
A problem raised by convex tables is the following: if an athlete decides to
focus on some events (for example the four kinds of runs) and to do much more
training for them than for the other ones, he will have an advantage. He will come
closer to the world record for runs and earn many points. At the same time, he
will be further away from the world record for the other disciplines but that will
make him lose less points as the slope of the curve is more gentle in that direction.
The balance will be positive. Thus these tables encourage athletes to focus on
some disciplines, which is contrary to the spirit of the decathlon.
That is why, since 1962, different concave tables (see Figure 4.2) have been used.
These tables strongly encourage the athletes to be excellent in all disciplines. An
example of a real table, in use in 1998, is presented in Figure 4.3. Note that a new
change occurred: this table is no longer concave. It is almost linear but slightly
There are many interesting points to discuss about the decathlon score.

How are the minimum and maximum values set ? They can highly influ-
ence the score as it was shown with the HDI (in Section 4.1.1). Obviously,
the maximum value must somehow be related to the world record. But as










9.5 10 10.5 11 11.5 12 12.5 13

100 meters time

Figure 4.3: A plot for the 100 meters run score table in 1998

everyone knows, world records are objects that athletes like to break.

Why adding single-event scores ? Other operations might work as well. For
example, multiplication may favour the athletes that perform equally well in
all disciplines. To illustrate this point very simply, consider a 3-event contest
where single-event scores are between 0 and 10. An athlete, say x obtains 8
in all three events. Another one, y obtains 9, 8 and 7. If we add the scores,
x and y obtain the same score: 24. If we multiply the scores, x gets 512
while y looses with 504.


The point on which we will focus, in this decathlon example, is the role of the

4.3.1 Role of the decathlon score

Although one might think that the role of the overall score is clearly to designate
the winner, we are going to show that it plays many roles (like student grades, see
Chapter 3) and that this is one of the reasons why it changes so often. Of course,
one of the roles is to designate the winner and it was probably the only purpose
that the first designers of the score had in mind. But we can be quite sure that
immediately after the first contest, another role arose. Many people probably used
the scores to assess the performance of the athletes. Such athlete has a score very

close to that of the winner and is thus a good athlete. Another one is far from the
winner and is consequently not a good one athlete.
Not much later (after the second competition), a third role appeared. How did
the athletes evolve ? This athlete has improved his score or x has a better score in
this contest than the score of y in the previous contest. This kind of comparison
is not meaningful: suppose that an athlete wins a contest with a score of 16. In
the next contest, he performs very poorly: short jumps, slow runs, short throws.
But his main opponents are absent or perform equally poorly. He might still win
the contest and even with a higher score although his performance is worse than
the previous time.
After some time, the organisers of decathlons became aware of the second and
third role. It was probably part of the motivations to abandon the sum of ranks
and to use convex tables. These tables, to some extent, made the comparisons of
scores across athletes and/or competitions meaningful. At the same time, the score
found a new role as a monitoring tool during the training. Before 1908, the scores
could be computed only during competitions as they were sums of ranks. And it
was not long before a wise coach used it as a strategic tool, advising his athlete to
focus on some events. For this reason, since 1962, the organisers conferred a new
role to the score: to foster excellence in all disciplines. This was achieved by the
introduction of concave tables. But it is most likely that the score is still used as
a strategic tool, hopefully in a less perverse way.
It is worth noting that this new role doesnt replace any of the previous ones.
The score aims at rewarding equal performances in all disciplines but it is also
used to assess the performance of an athlete. Even if we only consider only these
two roles (the other ones could be seen as side effects), it is amazing to see how
incompatible they are.

4.4 Indicators and multiple criteria decision

Classically, in a decision aiding process, a decision-maker wants to rank the ele-
ments of the set of alternatives (or to choose the best element). In order to rank,
he selects several dimensions (criteria) that seem relevant with respect to his prob-
lem. Each alternative is characterised by a performance on each criterion (this is
the evaluation matrix or performance tableau). A MCDA method is then used to
rank the alternatives, with respect to the preferences of the decision-maker.
When an indicator is built, several dimensions are also selected. Each item is
characterised by a performance on each dimension. An index that can be used to
rank the items is computed. The analogy between a decision support method and
an index is obvious: both aim at aggregating multi-dimensional information about
a set of objects. But there is a tremendous difference as well: when an indicator is
built, it is often the case that there is no clearly defined decision problem, decision-
maker and, a fortiori, preferences. To avoid the absence of preference, one could
consider that the preferences are those of the potential users of the indicator.
To some extent, this is possible because very often the preferences of the users

go in the same direction for each dimension taken separately. For example, for
each dimension of the ATMO index, everyone prefers a lower concentration. But
it is definitely not reasonable to assume that the global preferences are similar.
Furthermore, even if single-dimensional preferences go in the same direction, it
does not mean that single-dimensional preferences are identical. Those who are
not very sensitive to a pollutant will value a decrease in concentration much more
if it occurs at high concentration than at low concentration. On the contrary,
sensitive people might value concentration decreases at low and high levels equally.

The relevance of measurement theory

The absence of preferences is crucial. In decision support, many studies and con-
cepts relate to measurement theory. Measurement theory is the theory that studies
how we can measure objects (assign a number to an object) so as to reflect a re-
lation on these objects. E.g., how can we assign numbers to physical objects so
as to reflect the relation heavier than ? That is, how to assign a number (called
weight) to each object so that xs weight > ys weight implies x is heavier than
y ? Additional properties may be required. For example, in the case of weight
measurement, one wishes that the number assigned to x and y taken together be
the sum of their individual weights.
Another example is that of distance. How to assign numbers to points in
the space so as to reflect the relation more distant than with respect to some
reference point ? Contrary to the previous example, this one has several dimensions
(usually two or three: : x, y or x, y, z or altitude, longitude, latitude, etc.). Each
object (point) is characterised by a performance (co-ordinate) in each dimension
and one tries to aggregate these performances into one indicator: the distance
to the reference point. This problem is at the core of geometry. Note that the
answer is not unique. Very often the Euclidean distance is chosen (assuming that
the shortest path between two points is the straight line). Sometimes, a Gaussian
distance is more relevant (when you consider points on the earths surface, unless
you are a mole, the shortest path is no longer a straight line but a curve). In other
circumstances, the Manhattan distance is more appropriate (between two points
in Manhattan, if you are not flying, the shortest path is not a straight line nor a
curve, it is a succession of perpendicular straight lines). And there are many other
As far as physical properties are concerned (larger than, warmer than, faster
than, . . . ), the problem is easy: good measurements were carried out in Antiquity
without any theory of measurement. But when we consider other kinds of relations,
things are more complex. How to assign numbers to people or alternatives so as to
reflect the relations more loveable than, preferable to or more risky than ?
In such cases, measurement theory can be of great assistance but is insufficient to
solve all problems.
In decision support, measuring objects with respect to the relation is preferred
to can be of some help because, once the objects have been measured, it is rather
easy to handle numbers. It is often assumed that a preference relation over the
alternatives exists but is not well known and one tries to measure the alternatives

so as to discover the preference relation. Sometimes, the preference relation is not

assumed to completely exist a priori. Preferences can emerge and evolve during
the decision aid process, but some characteristics of the preference relation still
exist a priori. Measurement theory can therefore be used to build or to analyse a
decision support method.
Many indices are built without the assumption that a relation over the items a
priori exists or without trying to reflect a pre-existent relation. On the contrary,
it seems that, in many cases, the aim of an index is precisely to build or create
a relation over the items. Therefore, in such a case, measurement theory cannot
tell us much about the index. Measurement theory loses some of its power when
there is no a priori relation to be reflected.

Indicators and reality

The index does not help to uncover reality, that is a pre-existent relation. It insti-
tutes or settles reality (Desrosieres 1995). This is very obvious with the decathlon
score. Between 1908 and 1962, the scores were designed to assess the performances
and to compare them. As one of the most important things for a professional ath-
lete is to win (contrary to the opinion of de Coubertin), the score is considered as
the true measure of performance. Any athlete that was not convinced of this had
to change his mind and to behave accordingly if he wanted to compete. This is
not particular to the decathlon score. Many governments probably try to exhibit
good HDI for their country in order to keep international subsidies or to legitimise
their authority to the population of the country or foreign governments. Some city
councils, willing to attract high salaried residents, claim, among others, to have
high air quality. The most efficient way for them to make their claim credible is to
exhibit a good ATMO index (or any other index in countries other than France),
even if other policies might be more beneficial to the country.
One might be tempted to reject any indicator that does not reflect reality,
that, in some arbitrary way, institutes reality. Nevertheless, the indicators are
not useless. An indicator can be considered as a kind of language. It is based on
some (more or less necessarily arbitrary) conventions and helps us to efficiently
communicate about different topics or perform different tasks. By efficiently,
we mean more efficiently than without any language; not necessarily in the
most efficient way. As any language, it is not always precise and leaves room
for ambiguities and contradictions. If the people that created the decathlon had
decided to wait until a sound theory shows them how to designate the winner, it
is very likely that no decathlon contest would ever have taken place.
But this does not mean that all indicators are equally good. Ambiguities and
contradictions are certainly adequate for poetry otherwise we could never enjoy
things like this:

Mis pasos en esta calle

en otra calle

oigo mis pasos

pasar en esta calle
Solo es real la niebla 1

Wenn ich mich lehn an deine Brust,
kommts uber mich wie Himmelslust;
doch wenn du sprichst: ich liebe dich!
so muss ich weinen bitterlich.2

But, when it comes to decision-making, ambiguities and contradictions should

generally be kept at a minimum. When possible, they should be avoided. When
certain elements of preferences are known for sure, all indicators should reflect

Back to multiple criteria decision support

In a decision aiding process, preferences are not perfectly known a priori. Other-
wise, it would be very unlikely that any aid would be required. Therefore, relying
solely on measurement theory is not possible. Most decision aiding processes, like
most indicators, probably cannot avoid some arbitrary elements. They can occur
at different steps of the process: the choice of an analyst, of the criteria, of the
aggregation scheme, to mention a few.
But unlike cases where indicators are built without any decision problem in
mind, most decision aiding processes relate to a more or less precisely defined de-
cision problem. Consequently, at least some elements of preferences are present.
Therefore, if some measurement (associating numbers to alternatives) is performed
during the aiding process, measurement theory can be used to ensure that the
model built during the aiding process does not contradict these elements of pref-
erences, that it reflects them and that all sound conclusions that can be drawn
from the conjunction of these elements are actually drawn.

4.5 Conclusions
Among evaluation and decision models, indicators are probably more widespread
than any other model (this is definitely true if you think of cost-benefit analysis or
1 Octavio Paz, Here, translated by Nims (1990)

My footsteps in this / street / Re-echo / in another street / where / I hear my footsteps /

passing in this street / where / Nothing is real but the fog

2 Heinrich Heine, Ich liebe dich, translated by Louis Untermeyer(van Doren 1928)
And when I lean upon your breast / My soul is soothed with godlike rest; / But when you
swear: I love but thee! / Then I must weepand bitterly.

multiple criteria decision support). Student grades are also very popular, as well
almost every one has faced them at some point of his lifebut, besides the fact that
most people use and/or encounter them, indicators are pervasive in many domains
of human activity, contrary to student grades that are confined to education (note
that student grades could be considered as special cases of indicators).
Indicators are not often thought of as decision support models but, actually,
in many circumstances, are. Indicators are usually presented as an efficient way
to synthesise information. But what do we need information for ? For making
decisions !
In this chapter, we analyzed three different indicators: the human development
index, the ATMO (an air quality index) and the decathlon score.
On the one hand, all three indicators have been shown to present flaws: they
do not always reflect reality or what we consider as reality. This is due to an excess
or a lack of compensation, to non monotonicity, to an incapability of dealing with
dimension dependence, . . . These problems are not specific to indicators. Some of
them have already been discussed in Chapter 3 and/or will be met in Chapter 6.
On the other hand, we saw that an indicator does not necessarily need to reflect
reality or, at least, it does not need to reflect only reality.

5.1 Introduction
Decision-making inevitably implies, at some stage, the allocation of rare resources
to some alternatives rather than to others (e.g. deciding how to use ones income).
It is therefore not at all surprising that the question of helping a decision-maker
to choose between competing alternatives, projects, courses of action and/or to
evaluate them, has attracted the attention of economists. Cost-Benefit Analysis
(CBA) is a set of techniques that economists have developed for this purpose. It
is based on the following simple and apparently inescapable idea: a project should
only be undertaken when its benefits outweigh its costs.
CBA is particularly oriented towards the evaluation of public sector projects.
Decisions made by governments, public agencies and firms or international organ-
isations are complex and have a huge variety of consequences. Some examples of
areas in which CBA has been applied will give a hint of the type of projects that
are evaluated:

Economics: determining investment strategies for developing countries, al-

locating budgets among agencies, developing an energy policy for a nation
(Dinwiddy and Teal 1996, Kirkpatrick and Weiss 1996, Little and Mirlees
1968, Little and Mirlees 1974),

Transportation: building new roads or motor ways (Willis, Garrod and

Harvey 1998), building a high-speed train, reorganising the bus lines in a
city (Adler 1987, Schofield 1989),

Health: building new hospitals, setting up prevention policies, buying new

diagnosis tools, choosing standard treatments for certain types of illnesses
(Folland, Goodman and Stano 1997, Johannesson 1996),

Environment: establishing pollution standards, creating national parks, ap-

proving the human consumption of genetically-modified organisms, or irra-
diated food (Hanley and Spash 1993, International Atomic Energy Agency
1993, Johansson 1993, Toth 1997).


These types of decision are immensely complex. They affect our everyday
life and are likely to affect that of our children. Most economists view CBA as
the standard way of evaluating such projects and of supporting public decision-
making (numerous examples of practical studies using CBA can easily be found in
applied economics journals, e.g. American Journal of Agricultural Economics, En-
ergy Economics, Environment and Planning, Journal of Environmental Economics
and Management, Journal of Health Economics, Journal of Policy Analysis and
Management, Journal of Public Finance and Public Choice, Journal of Transport
Economics and Policy, Land Economics, Pharmaco-Economics, Public Budget-
ing and Finance, Regional Science and Urban Economics, Water Resources Re-
search). Since fairly different approaches to these problems have been advocated,
it is important to have a clear idea of what CBA is; if the claim of economists
was to be perfectly well-founded there would be hardly any need for other deci-
sion/evaluation models.
Although it has distant origins (see Dupuit 1844), the development of CBA
has unsurprisingly coincided with the more active involvement of governments in
economic affairs that started after the great depression and climaxed after World
War II in the 50s and 60s. A good overview of the early history of CBA can
be found in Dasgupta and Pearce (1972). After having started in the USA in
the field of Water Resource Management (see Krutilla and Eckstein (1958) for
an overview of these pioneering developments), the principles of CBA were soon
adopted in other areas and countries, the UK being the first and more active one.
While research on (and applications of) CBA grew at a very fast rate during the
50s and 60s, the principles of CBA were entrenched in a series of very influential
manuals for project evaluation produced by several international organisations
(OECD: Little and Mirlees (1968), Little and Mirlees (1974), ONUDI: Dasgupta,
Marglin and Sen (1972) and, more recently, World Bank: Adler (1987), Asian De-
velopment Bank: Kohli (1993)). In many countries nowadays, the Law makes it
an obligation to evaluate projects using the principles of CBA. Research on CBA
is still active and economists have spent considerable time and energy in investi-
gating its foundations and refining the various tools that it requires in practical
applications (recent references include Boardman 1996, Brent 1996, Nas 1996).
It would be impossible to give a fair account of the immense literature on CBA
in a few pages. Although somewhat old, two excellent introductory references are
Dasgupta and Pearce (1972) and Lesourne (1975). Less ambitiously, we shall try
here to:

give a brief and informal account of the principles underlying CBA,

give an idea of how these principles are applied in practice,

give a few hints on the scope and limitations of CBA.

These three objectives structure the rest of this chapter into sections. Our
aim, while clearly not being to promote the use of CBA, is not to support the
nowadays-fashionable claim (especially among environmentalists) that CBA is an
outdated useless technique either. In pointing out what we believe to be some

limitations of CBA, we only want to give arguments refuting the claim of some
economists that, under all circumstances, it is the only consistent way to support
decision/evaluation processes (Boiteux 1994).

5.2 The principles of CBA

5.2.1 Choosing between investment projects in private firms
The idea that a project should only be undertaken if its benefits outweigh its
costs is at the heart of CBA. This claim may seem so obvious that it need not
be discussed any further. It is of little practical content however unless we define
more precisely what costs and benefits are and how to evaluate and compare
them. Some discussion will therefore prove useful.
A simple starting point is to be found in the literature on Corporate Finance on
the choice between investment projects in private firms. An investment project
may usefully be seen as an operation in which money is spent today (the costs),
with the hope that this money will produce even more money (the benefits)
A useful way to evaluate such an investment project is the following. First a
time horizon for its evaluation must be chosen. If the very nature of the project
may command this choice (e.g. because after a certain date the Law will change,
equipment will have to be replaced) the general case is that the duration of the
project is more or less conventionally chosen as the period of time for which it
seems reasonable and useful to perform the evaluation.
Although a continuous evaluation is theoretically possible, real-world appli-
cations imply dividing the duration of the project into time periods of equal length.
This involves some arbitrariness (should we choose years or semesters?) as well as
trade-offs between the depth and the complexity of the evaluation model.
Suppose now that a project is to be evaluated on T time periods of equal length.
The next step is to try to evaluate the consequences of the project in each of these
time periods. Such a task may be more or less easy depending on the nature of
the project, the environment of the firm and the duration of the project. We seek
to obtain an evaluation of the amount of cash that is generated by the project
during each time period, this amount being the difference between the benefits
and the expenses generated by the project (including the residual value of the
project in the last period). Note that these evaluations are relative: they aim at
capturing the influence of the project on the firm and not its overall situation.
Let us denote b(i) (resp. c(i)) the benefits (resp. the expenses) generated by the
project during the ith period of time. The net effect of the project in period i is
therefore a(i) = b(i) c(i).
At this stage, the evaluation model of the project has the form of an evaluation
vector with T +1 components (a(0), a(1), . . . , a(T )) where 0 conventionally denotes
the starting time of the project. In general, some of the components of this vector
(most notably a(0)) will be negative (if not, you should enjoy the free lunch and
there is hardly any evaluation problem). Although all components of the evaluation
vector are expressed in identical monetary units (m.u.), the (algebraic) sum a(0)

is to be received today while a(1) will only be received one time period ahead.
Therefore these two numbers, although expressed in the same unit, are not directly
comparable. There is a simple way however to summarise the components of the
evaluation vector using a single number.
Suppose that there is a capital market on which the firm is able to lend or
borrow money at a fixed interest rate of r per time period (this market is assumed
to be perfect: borrowing and lending will not affect r and are not restricted). If
you borrow 1 m.u. for one time period on this market today, you will have to spend
(1 + r) m.u. in period 1 in order to respect your contract. Similarly, if you know
that you will receive 1 m.u. in period 1, you can borrow an amount of 1+r m.u.:
your revenue of 1 m.u. in period 1 will allow you to reimburse exactly what you
have to i.e. 1+r (1 + r) = 1 m.u. Hence, being sure of receiving 1 m.u. in period
1 corresponds to receiving, here and now, an amount 1+r m.u. Using a similar
reasoning and taking into account compound interest, receiving 1 m.u. in period
i corresponds to an amount of (1+r) i m.u. now. This is what is called discounting

and r is called the discounting rate.

This suggests a simple way of summarising the components of the vector
(a(0), a(1), . . . , a(T )) as the sum to be received now that is equivalent to this
cash stream via borrowing and lending operations on the capital market. This
sum, called the Net Present Value (N P V ) of the project is given by:

X b(i) c(i)
X a(i)
(5.1) NPV = i
(1 + r) i=0
(1 + r)i

If N P V > 0, the cash stream of the project is equivalent to receiving money

now, i.e. taking into account the costs and the benefits of the project and their
dispersion in time, it appears that the project makes the firm richer and, thus,
should be undertaken. The reverse conclusion obviously holds if N P V < 0. When
N P V = 0, the firm is indifferent between undertaking the project or not.
This simple reasoning underlies the following well-known rule for choosing be-
tween investment projects in Finance: when projects are independent, choose all
projects that have a strictly positive N P V . In deriving this simple rule, we have
made various hypotheses. Most notably:

a duration for the project was chosen,

the duration was divided into conveniently chosen time periods of equal

all consequences of the projects were supposed to be adequately modelled as

benefits b(i) and costs c(i) expressed in m.u. for each time period,

a perfect capital market was assumed to exist,

the effect of uncertainty and/or imprecision was neglected,


other possible constraints were ignored (e.g. projects may be exclusive, syn-

The literature in Finance is replete with extensions of this simple model that allow
to cope with less simplistic hypotheses.

5.2.2 From Corporate Finance to CBA

Although the projects that are usually evaluated using CBA are considerably more
complex than the ones we implicitly envisaged in the previous paragraph, CBA
may usefully be seen as using a direct extension of the rule used in Finance. The
main extensions are the following:

in CBA costs and benefits are evaluated from the point of view of so-

in CBA costs and benefits are not necessarily directly expressed in m.u.;
when this happens, conveniently chosen prices are used to convert them
into m.u.,

in CBA the discounting rate has to be chosen from the point of view of

Retaining the spirit of the notations used above, the benefits b(i) and costs c(i)
of a project in period i are seen in CBA as vectors with respectively ` and `0

b(i) = (b(1, i), b(2, i), . . . b(`, i)) ,

c(i) = (c(1, i), c(2, i), . . . c(`0 , i))

where b(j, i) (resp. c(k, i)) denotes the social benefits (resp. the social costs)
on the jth dimension (resp. on the kth dimension), evaluated in units that are
specific to that dimension, generated by the project in period i.
In each period, costs and benefits are converted into m.u. using suitably
chosen prices. We denote by p(j) (resp p0 (k)) the price of one unit of social
benefit on the jth dimension (resp. one unit of the social cost on the kth dimension)
expressed in m.u. (for simplicity, and consistently with real-world applications,
prices are assumed to be independent from the time period). These prices are
used to summarise the vectors b(i) and c(i) into single numbers expressed in m.u.

b(i) = p(j)b(j, i)

p0 (k)c(k, i)
c(i) =

where b(i) (resp. c(i)) denotes the social benefits (resp. costs) generated by the
project in period i converted into m.u.
After this conversion and having suitably chosen a social discounting rate r,
it is possible to apply the standard discounting formula for computing the Net
Present Social Value (N P SV ) of a project. We have:

p0 (k)c(k, i)
T T p(j)b(j, i)
X b(i) c(i) X j=1 k=1
(5.2) N P SV = =
(1 + r)i i=0
(1 + r)i

and a project where N P SV > 0 will be interpreted as improving the welfare

of society and, thus, should be implemented (in the absence of other constraints).
It should be observed that the difficulties that we mentioned concerning the
computation of the NPV are still present here. Extra difficulties are easily seen to

how can one evaluate benefits and costs from a social point of view?

is it always possible to measure the value of benefits and costs in mon-

etary units and how should the prices be chosen?

how is the social discount rate chosen?

It is apparent that CBA is a mono-criterion approach that uses money as

a yardstick. Clearly the foundations of such a method and the way of using it
in practice deserve to be clarified. Section 5.2.3 presents an elementary theoret-
ical model that helps understanding the foundations of CBA. It may be skipped
without loss of continuity.

5.2.3 Theoretical foundations

It is obviously impossible to give a complete account of the vast literature on the
foundations of CBA which has deep roots in Welfare Economics here. We would
however like to give a hint of why CBA consistently insists on trying to price out
every effect of a project. The important point here is that CBA conducts project
evaluation within an environment in which markets are especially important
instruments of social co-ordination.

An elementary theoretical model

Consider a one-period economy in which m individuals consume n goods that are
exchanged on markets. Each individual j is supposed to have completely ordered
preferences for consumption bundles. These preferences can be conveniently repre-
sented using a utility function Uj (qj1 , qj1 , . . . , qjn ) where qji denotes the quantities
of good i consumed by individual j.

Social preferences are supposed to be well-defined in terms of the preferences

of the individuals through a social utility function (or social welfare function)
W (U1 , U2 , . . . , Un ). It is useful to interpret W as representing the preferences of a
planner regarding the various social states.
Starting from an initial situation in the economy, consider a project, inter-
preted as an external shock to the economy, consisting in a modification of the
quantities of goods consumed by each individual. These modifications are sup-
posed to be marginal; they will not affect the prices of the various goods. The
impact of such a shock on social welfare is given by (assuming differentiability):

m X
X n
(5.3) dW = Wj Uji dqji
j=1 i=1

Wj = U j
and Uji = qjij
Social welfare will increase following the shock if dW > 0.
The existence of markets for the various goods and the hypothesis that indi-
viduals operate on these markets so as to maximise utility ensure that, before the
shock, we have, for all individuals j and for all goods i and k:

Uji pi
(5.4) =
Ujk pk
where pi denotes the price of the ith good. Having chosen a particular good
for numeraire (we shall call that good money), this implies that:

(5.5) Uji = j pi
where j can be interpreted as the marginal effect on the utility of individual
j of a marginal variation of the consumption of the numeraire good, i.e. as the
marginal utility of income for individual j.
Using 5.5, 5.3 can be rewritten as:

X n
(5.6) dW = i Wj pi dqji
j=1 i=1

In equation 5.6, the coefficient i Wj has a useful interpretation: it represents

the increase in social welfare following a marginal increase of the income of indi-
vidual j.
Under the hypothesis that, before the shock, the distribution of income is op-
timal in the society, the conclusion is that the coefficients i Wj are constant over
individuals (otherwise income would have been reallocated in favour of individuals
for which i Wj is the larger). Under this hypothesis, we may always normalise W
in such a way that i Wj = 1, for all j. We therefore rewrite equation 5.6 as:

m X
X n
(5.7) dW = pi dqji
j=1 i=1

which amounts to saying that the social effects of the shock are measured as
the sum over individuals of the variation of their consumption evaluated at market
prices (i.e. the so-called consumer surplus). In this simple model, variations of
social welfare are therefore conveniently measured in money terms using market
Returning to CBA, the relation 5.7 coincides with the computation of the
N P SV when time is not an issue and the effects (costs or benefits) of a project
can be expressed in terms of consumption of goods exchanged on markets. The
general formula for computing the N P SV may be seen as an extension of 5.7
without these restrictions.

Extensions and remarks

The limitations of the elementary model presented above are obvious. The most
important ones seem to be the following:

the model only deals with marginal changes in the economy,

the model considers a single-period economy without production,

the economy is closed (no imports or exports) and there is no government

(and in particular no taxes),

the distribution of income was assumed to be optimal.

In spite of all its limitations, our model allows us to understand, through the
simple derivation of equation 5.7, the rationale for trying to price out all effects of
a project in order to assess its contribution to social welfare.
A detailed treatment of the foundations of CBA without our simplifying hy-
potheses can be found in Dreze and Stern (1987). Although we shall not enter
into details, it should be emphasised that the theoretical foundations of CBA are
controversial on some important points. The appropriateness of equation 5.7 and
of related formulas is particularly clear in situations that are fairly different from
the ones in which CBA is currently used as an evaluation tool. These are often
characterised by:

non-marginal changes (think of the construction of a new underground line

in a city),

the presence of numerous public goods for which no market price is available
(think of health services or education),

the presence of numerous externalities (think of the pollution generated by

a new motorway),

markets in which competition is altered in many ways (monopolies, taxes,


effects that are highly complex and may concern a very long period of time
(think of a policy for storing used nuclear fuel),

effects that are very unevenly distributed among individuals and raise im-
portant equity concerns (think of your reaction if a new airport were to be
built close to your second residence in the middle of the countryside),

the overwhelming presence of uncertainty (technological changes, future prices,

long term effects of air pollution on health),

the difficulty of evaluating some effects in well-defined units (think of the

aesthetic value of the countryside) and, thus, to price them out

In spite of these difficulties, CBA still mainly rests on the use of the N P SV (or
some of its extensions) to evaluate projects. Economists have indeed developed an
incredible variety of tools in order to use the N P SV even in situations in which
it would a priori seem difficult to do so. It is impossible to review the immense
literature that these efforts have generated here. It includes: the determination of
prices for goods without markets, e.g. contingent valuation techniques or hedonic
prices (see Scotchmer 1985, Loomis, Peterson, Champ, Brown and Lucero 1998),
the determination of an appropriate social discounting rate (useful references on
this controversial topic include Harvey 1992, Harvey 1994, Harvey 1995, Keeler
and Cretin 1983, Weitzman 1994), the inclusion of equity considerations in the
calculation of the NPSV (Brent 1984), the treatment of uncertainty, the consid-
eration of irreversible effects (e.g. through the use of option values). An overview
of this literature may be found in Sugden and Wiliams (1983) and in Zerbe and
Dively (1994). We will simply illustrate some of these points in section 5.3.

5.3 Some examples in transportation studies

Public investment in transportation facilities amounts to over 80 109 FRF annu-
ally in France (around 14 109 USD or 14 109 e). CBA is presently the standard
evaluation technique for such projects. It is impossible to give a detailed account
of how CBA is currently applied in France for the evaluation of transportation
investment projects; this would take an entire book even for a project of moderate
importance. In order to illustrate the type of work involved in such studies, we
shall only take a few examples (for more details, see Boiteux (1994) and Syndi-
cat des Transports Parisiens (1998); a useful reference in English is Adler (1987))
based on a number of real-world applications. For concreteness, we shall envisage a
project consisting in the extension of an underground line in the suburbs of Paris.
Effects of such a project are clearly very diverse. We will concentrate on some of
them here, leaving direct financial effects aside (construction costs, maintenance
costs, exploitation costs) although their evaluation may raise problems.

5.3.1 Prevision of traffic

An inevitable step in all studies of this type is to forecast the modification of the
volume and the structure of the traffic that would follow the implementation of the
project. Its main benefits consist in time gains, which are obviously directly
related to traffic forecasts (time gains converted into m.u. frequently account for
more than 50% of the benefits of these types of projects).
Implementing such forecasting models is obviously an enormous task. Local
modifications in the offer of public transportation may have consequences on the
traffic in the whole region. Furthermore, such forecasts are usually made at an
early stage of development of the project, a stage in which all details (concerning
e.g. the tariffing of the new infrastructure or the frequency of the trains) may not
be completely decided yet.
Traffic forecast models usually involve highly complex modal choice modules
coupled with forecasting and/or simulation techniques. Their outputs are clearly
crucial for the rest of the study. Nearly all public transportation firms and gov-
ernmental agencies in France have developed their own tools for generating traffic
forecasts. They differ on many points, e.g. the statistical tools used for modal
choice or the segmentation of the population that is used (Boiteux 1994). Unsur-
prisingly these models lead to very different results.
As far as we know, all these models forecast the traffic for a period of time that
is not too distant from the installation of the new infrastructure. These forecasts
are then more or less mechanically updated (e.g. increased following the observed
rate of growth of the traffic in the past few years) in order to obtain figures for all
the periods of study. None of them seem to integrate the potential modifications
of behaviour of a significant proportion of the population in reaction to the new
infrastructure (e.g. by moving away from the centre of the city) whereas such
effects are well-known and have proved to be overwhelming in the past.
These models are not part of CBA and indicating their limitations should
not be seen as a criticism of CBA. Their results, however, form the basis of the
evaluation model.

5.3.2 Time gains

Traffic forecasts are used to evaluate the time that inhabitants of the Paris region
would gain with the extension of the metro line. Such evaluations, on top of being
technically rather involved, raise some basic difficulties:

is one minute equal to one minute? Such a question may not be as silly as
it seems. In most models time gains are evaluated on the basis of what is
called generalised time i.e. a measure of time that accounts for elements of
(dis)comfort of the journey (e.g. temperature, stairs to be climbed, a more
or less crowded environment). Although this seems reasonable, much less
efforts have been devoted to the study of models allowing to convert time
into generalised time than on the price of time that will be used afterwards,

is one hour worth 60 times one minute? Most models evaluating and pricing
out time gains are strictly linear. This is dubious since some gains (e.g. 10
seconds per user-day) might well be considered insignificant. Furthermore,
the loss of one hour daily for some users may have a much greater impact
than 60 losses of 1 minute,
what is the value of time and how should time gains be converted into mon-
etary units? Should we take the fact that people have different salaries into
account? Should we rather use price based on stated preferences? Should
we take into account the fact that most surveys using stated preferences have
shown that the value of time highly depends on the motive of the journey
(being much lower for journeys not connected to work)?

The present practice in the Paris region is to linearly evaluate all (generalised)
time gains using the average hourly net salary in the Region (74 FRF/hour in 1994
or approximately 13 USD/hour or 13 e/hour). In view of the major uncertainties
surrounding traffic forecasts that are used to compute the time gains and the
arbitrariness of the price of time that is used, it does not seem unfair to consider
that such evaluations give, at best, interesting indications.

5.3.3 Security gains

Important benefits of projects in public transportation are security gains (hope-
fully, using the metro is far less risky than driving a car). A first step consists in
evaluating, based on traffic forecasts, the gain of security in terms of the number
of (statistical) deaths and serious injuries that would be avoided annually by the
project. The following one consists in converting these figures into monetary units
through the use of a price for human life. The following figures are presently
used in France (in 1993 FRF; they should be divided by a little less than 6 in order
to obtain 1993 USD):

Death 3 600 000 FRF

Serious injury 370 000 FRF
Other injury 79 000 FRF

these figures being based on several stated preference studies (it is not without
interest to note that these figures were quite different before 1993, human life
being, at that time, valued at 1 866 000 FRF). Using these figures and combining
them with statistical information concerning the occurrence of car accidents and
their severity, leads to benefits in terms of security which amount to 0.08 FRF per
vehicle-km avoided in the Paris region.
Although this might not appear as a very pleasant subject of study, econo-
mists have developed many different methods for evaluating the value of human
life, including methods based on human capital, the value of life insurance con-
tracts, sums granted by courts following accidents, stated preference approaches,
revealed preference approaches including smoking and driving behaviour, wages

for activities involving risk (Viscusi 1992). Besides raising serious ethical dif-
ficulties (Broome 1985), these studies exhibit incredible variations across tech-
niques and, seemingly similar, countries (this explains why in many medical stud-
ies, in which benefits mainly include lives saved, cost-effectiveness analysis
is often preferred to CBA since it does not require to price out human life (see
Johannesson 1995, Weinstein and Stason 1977). We reproduce below some sig-
nificant figures for the value of life used in several European countries (this table
is adapted from Syndicat des Transports Parisiens 1998); all figures are in 1993
European Currency Unit (ECU), one 1993 ECU being approximately one 1993

Country Price of human life

Denmark 628 147 ECU
Finland 1 414 200 ECU
France 600 000 ECU
Germany 406 672 ECU
Portugal 78 230 ECU
Spain 100 529 ECU
Sweden 984 940 ECU
UK 935 149 ECU

5.3.4 Other effects and remarks

The inclusion of other effects in the computation of the NPSV of a project in such
studies raises difficulties similar to the ones mentioned for time gains and security
gains. Their evaluation is subject to much uncertainty and inaccurate determina-
tion. Moreover the prices that are used to convert them into monetary units can
be obtained using many different methods leading to significantly different results.
As is apparent in Syndicat des Transports Parisiens (1998), prices used to
monetarise effects like:
local air pollution,
contribution to the greenhouse effect,
are mainly conventional.
The social discounting rate used for such projects is determined by the govern-
ment (the Commissariat General du Plan). Presently a rate of 8% is used (note
that this rate is about twice as high as the rate commonly used in Germany). A
period of evaluation of 30 years is recommended for this type of project.
The conclusions and recommendations of a recent official report (Boiteux 1994)
on the evaluation of public transportation projects stated that:
although CBA has limitations, it remains the best way to evaluate such

all effects that can reasonably be monetarised should be included in the

computation of the NPSV,

all other effects should be described verbally. Monetarised effects and non
monetarised ones should not be included in a common table that would
give the same statute and, implicitly, importance to all. A multiple criteria
presentation would furthermore attribute an unwarranted scientific value to
such tables,

extensive sensitivity analyses should be conducted,

all public firms and administrations should use a similar methodology in

order to allow meaningful comparisons,

an independent group of CBA experts should evaluate all important projects,

CBA studies should remain as transparent as possible.

In view of:

the immense complexity of such evaluation studies,

the unavoidable elements of uncertainty and inaccurate determination enter-

ing in the evaluation model,

the rather unconvincing foundations of CBA for this type of project,

the conclusion that CBA remains the best method seems unwarranted. CBA
has often been criticised on purely ideological grounds, which seems ridiculous.
However the insistence on seeing CBA as a scientific, rational and objective
evaluation model, all words that are frequently spotted in texts on CBA (Boiteux
1994), seems no more convincing.

5.4 Conclusions
CBA is an important decision/evaluation method. We would like to note in par-
ticular that:

it has a sound, although limited and controversial on some points, theoretical

basis. Contrary to many other decision/evaluation methods that are more or
less ad hoc, the users of CBA can rely on more than 50 years of theoretical
and practical investigations,

CBA emphasises the fact that decision and/or evaluation methods are not
context-free. Having emerged from economics, it is not surprising that mar-
kets and prices are viewed as the essential parts of the environment in CBA.
More generally, any decision/evaluation method that would claim to be
context-free would seem of limited interest to us,

CBA emphasises the need for consistency in decision-making. It aims at

providing simple tools allowing, in a decentralised way, to ensure a minimal
consistency between decisions taken by various public bodies. Any deci-
sion/evaluation model should tackle this problem,
CBA explicitly acknowledges that the effects of a project may be diverse
and that all effects should be taken into account in the model. In view of
the popularity of purely financial analyses for public sector projects, this is
worth recalling (Johannesson 1995),
although the implementation of CBA may involve highly complex models
(e.g. traffic forecasts), the underlying logic of the method is simple and easily
CBA is a formal method of decision/evaluation. It is the belief and expe-
rience of the authors of this book that such methods may have a highly
beneficial impact on the treatment of highly complex questions. Although
other means of evaluation and of social co-ordination (e.g. negotiation, elec-
tions, exercise of power) clearly exist, formal methods based on an explicit
logic can provide invaluable contributions allowing sensitivity analyses, pro-
moting constructive dialogue and pointing out crucial issues.
We already mentioned that we disagree with the view held by some economists
that CBA is the only rational scientific and objective method for helping
decision-makers (such views are explicitly or implicitly present in Boiteux (1994)
or Mishan (1982)). We strongly recommend Dorfman (1996) as an antidote to this
radical position.
We shall stress here why we think that decision/evaluation models should not
be confused with CBA:
supporting decision/evaluation processes involves many more activities than
just evaluation. As we shall see in chapter 9, formulation is a basic
activity of any analyst. The determination of the frontiers of the study and
of the various stakeholders, the modelling of their objectives, the invention
of alternatives, form an importantwe would tend to say a crucialpart of
any decision/evaluation support study. CBA offers little help at this stage.
Even worse, too radical an interpretation of CBA might lead (Dorfman 1996)
to an excessive attention given to monetarisation, which may be detrimental
to an adequate formulation,
having sound theoretical foundations, such as CBA, is probably a necessary
but insufficient condition to build useful decision/evaluation tools (let alone
the best ones). A recurrent theme in OR is that a successful implemen-
tation of a model is contingent on many other factors than just the quality
of the underlying method. Creativity, flexibility and reactivity are essen-
tial ingredients of the process. They do not seem always to be compatible
with a too rigid view on what a good decision/evaluation model should
be. Furthermore, the foundations of CBA are especially strong in situa-
tions that are at variance with the usual context of public sector projects:

non-marginal changes, public goods, externalities are indeed pervasive (see

Brekke 1997, Holland 1995, Laslett 1995),

a decision/evaluation tool will be all the more useful that it lends itself
easily to an insertion into a decision process. Decision processes involving
public sector projects are usually extremely complex. They last for years
and involve many stakeholders generally having conflicting objectives. CBA
tries to summarise the effects of complex projects into a single number. The
complex calculations leading to the NPSV use a huge amount of data with
varying levels of credibility. Merging rather uncontroversial information (e.g.
the number of deaths per vehicle-km in a given area) with much more sensible
and debatable information (e.g. the price of human life) from the start might
not give many opportunities to stakeholders for reaching partial agreements
and/or for starting negotiations. This might also result in a model that might
not appear transparent enough to be really convincing (Nyborg 1998),

CBA is a mono-criterion approach. Although this allows to produce outputs

in simple terms (the NPSV) it might be argued that the efforts that have to
be made in order to monetarise all effects may not always be needed. On the
basis of less ambitious methods, it is not unlikely that some projects may be
easily discarded and/or that some clearly superior project will emerge. Even
when monetarisation is reasonably possible, it may not always be necessary,

in CBA the use of prices supposedly revealed by markets (most often

in market-like mechanisms) tend to obscure the, implicit, weighting of the
various effects of a project. This leaves little room for political debate, which
might be an incentive for some stakeholders to simply discard CBA,

the additive linear structure of the, implicit, aggregation rule used in CBA
can be subjected to the familiar criticisms already mentioned in chapters 3
and 4. Probably all users of CBA would agree that an accident killing 10 000
people might result in a dramatic situation in which the costs incurred
have little relation with the costs of 10 000 accidents each resulting in one
loss of life (think of a serious nuclear accident compared to ordinary car
accidents). Similarly, they might be prepared to accept that there may exist
air pollution levels above which all mammal life on earth could be endangered
and that although these levels are multiples of those currently manipulated
in the evaluation of transportation projects, they may have to be priced out
quite differently. If there are limits to linearity, CBA offers almost no clue
as to where to place these limits. It would seem to be a heroic hypothesis to
suppose that such limits are simply never reached in practice,

the implicit position of CBA vis-a-vis distributional considerations is puz-

zling. Although the possibility of including in the computation of the NPSV
individual weights (capturing a different impact on social welfare of indi-
vidual variations of income) exists (Brent 1984), it is hardly ever used in prac-
tice. Furthermore, this possibility is at much variance with more subtle views

on equity and distributional considerations (see Fishburn 1984, Fishburn and

Sarin 1991, Fishburn and Sarin 1994, Fishburn and Straffin 1989, Gafni and
Birch 1997, Schneider, Schieber, Eeckoudt and Gollier 1997, Weymark 1981),

the use of a simple social discounting rate as a surrogate for taking a clear
position on inter-generational equity issues is open to discussion. Even ac-
cepting the rather optimistic view of a continuous increase of welfare and of
technical innovation, taking decisions today that will have important conse-
quences in 1000 years (think of the storage of used nuclear fuel) while using a
method that gives almost no weight to what will happen 60 years from now
( 1.08 60 1%) seems debatable (see Harvey 1992, Harvey 1994, Weitzman

the very idea that social preferences exist is open to question. We showed
in chapter 2 that elections were not likely to give rise to such a concept.
It seems hard to think of other forms of social co-ordination that could do
much better. We doubt that markets are such particular institutions that
they always allow to solve or bypass the problem in an undebatable way. But
if social preferences are ill-defined, the meaning of the NPSV of a project
is far from being obvious. We would argue that it gives, at best, a partial
and highly conventional view of the desirability of the project,
decision/evaluation models can hardly lead to convincing conclusions if el-
ements of uncertainty and inaccurate determination entering the model are
not explicitly dealt with. This is especially true in the context of the eval-
uation of public sector projects. Practical texts on CBA always insist on
the need for sensitivity analysis before coming to conclusions and recom-
mendations. Due to the amount of data of varying quality included in the
computation of the NPSV, sensitivity analysis is often restricted to studying
the impact of the variation of a few parameters on the NPSV, one parameter
varying at a time. This is rather far from what we could expect in such situ-
ations; a true robustness analysis should combine simultaneous variations
of all parameters in a given domain,
These limitations should not be interpreted as implying a condemnation of
CBA. We consider them as arguments showing that, in spite of its many qual-
ities, CBA is far from exhausting the activity of supporting decision/evaluation
processes (Watson 1981). We are afraid to say that if you disagree on this point,
you might find the rest of this book of extremely limited interest. On the other
hand, if you expect to discover in the next chapters formal decision/evaluation
tools and methodologies that would solve all problems and avoid all difficulties
you should also realise that your chances of being disappointed are very high.

6.1 Thierrys choice

How to choose a car is probably the multiple criteria problem example that has
been most frequently used to illustrate the virtues and possible pitfalls of multiple
criteria decision aiding methods. The main advantage of this example is that the
problem is familiar to most of us (except for one of the authors of this book who is
definitely opposed to owning a car) and it is especially appealing for male decision-
makers and analysts for some psychological reason. However, one can object that
in many illustrations, the problem is too roughly stated to be meaningful; the
motivations, needs, desires and/or phantasms of the potential buyer of a new or
second-hand car can be so diversified that it will be very difficult to establish a list
of relevant points of view and build criteria on which everybody would agree; the
price for instance is a very delicate criterion since the amount of money the buyer
is ready to spend clearly depends on his social condition. The relative importance
of the criteria also very much depends on the personal characteristics of the buyer:
there are various ideal types of car buyers, for instance people who like sportive
car driving, or large comfortable cars or reliable cars or cars that are cheap to run.
One point should be made very clear: it is unlikely that a car could be universally
recognised as the best, even if one restricts oneself to a segment of the market;
this is a consequence of the existence of decision-makers with many different value

Despite these facts, we have chosen to use the Choosing a car example,
in a properly defined context, for illustrating the hypotheses underlying various
elementary methods for modelling and aggregating evaluations in a decision aiding
process. The case is simple enough to allow for a short but complete description;
it also offers sufficient potential for reasoning on quite general problems raised by
the treatment of multi-dimensional data in view of decision and evaluation. We
describe the context of the case below and will invoke it throughout this chapter
for illustrating a sample of decision aiding methods.


Trademark and type

1 Fiat Tipo 20 ie 16V
2 Alfa 33 17 16V
3 Nissan Sunny 20 GTI 16
4 Mazda 323 GRSI
5 Mitsubishi Colt GTI
6 Toyota Corolla GTI 16
7 Honda Civic VTI 16
8 Opel Astra GSI 16
9 Ford Escort RS 2000
10 Renault 19 16S
11 Peugeot 309 GTI 16V
12 Peugeot 309 GTI
13 Mitsubishi Galant GTI 16
14 Renault 21 20 turbo

Table 6.1: List of the cars selected as alternatives

6.1.1 Description of the case

Our example is adapted from an unpublished report by a Belgian engineering
student who describes how he decided which car he would buy. The story dates
back to 1993; our studentcall him Thierryaged 21, is passionate about sportive
cars and driving (he has taken lessons in sports car driving and participates in car
races). Being a student, he cannot afford to buy either a new car nor a luxury
second hand sports car; so he decides to explore the middle range segment, 4 year
old cars with powerful engines. Thierry intends to use the car in everyday life and
occasionally in competitions. His strategy is first to select the make and type of
the car on the basis of its characteristics, estimated costs and performances, then
to look for such a car in second hand car sale advertisements. This is what he
actually did, finding the rare pearl about twelve months after he made up his
mind as to which car he wanted.

Selecting the alternatives

The initial list of alternatives was selected taking an additional feature into ac-
count. Thierry lives in town and does not have a garage to park the car in at
night. So he does not want a car that would be too attractive to thieves. This
explains why he discards cars like VW Golf GTI or Honda CRX. He thus limits
his selection of alternatives to the 14 cars listed in Table 6.1.
Selecting the relevant points of view and looking for or constructing indices
that reflect the performances of the alternatives for each of the viewpoints often
constitutes a long and delicate task; it is moreover a crucial one since the quality
of the modelling will determine the relevance of the model as a decision aiding
tool. Many authors have advocated a hierarchical approach to criteria building,
each viewpoint being decomposed into sub-points that can be further decomposed

(Keeney and Raiffa (1976), Saaty (1980)). A thorough analysis of the properties
required of the family of criteria selected in any particular context (consistent
family, i.e. exhaustive, non-redundant and monotonic) can be found in Roy and
Bouyssou (1993) (see also Bouyssou (1990), for a survey).
We shall not emphasise the process of selecting viewpoints in this chapter,
although it is a matter of importance. It is sufficient to say that Thierrys concerns
are very particular and that he accordingly selected five viewpoints related to cost
(criterion 1), performance of the engine (criteria 2 and 3) and safety (criteria 4
and 5).
Evaluations of the cars on these viewpoints have been obtained from monthly
journals specialised in the benchmarking of cars. The official quotation of second
hand vehicles of various ages is also published in such journals.

Evaluating the alternatives

Evaluating the expenses incurred by buying and using a specific car is not as
straightforward as it may seem. Large variations from the estimation may oc-
cur due to several uncertainty and risk factors such as actual life-length of the
car, actual selling price (in contrast to the official quotation), actual mileage per
year, etc. Thierry evaluates the expenses as the sum of an initial fixed cost and
expenses resulting from using the car. The fixed costs are the amount paid for
buying the car, estimated by the official quotation of the 4-year old vehicle, plus
various taxes. The yearly costs involve another tax, insurance and petrol consump-
tion. Maintenance costs are considered roughly independent of the car and hence
neglected. Petrol consumption is estimated on the basis of three figures that are
highly conventional: the number of litres of petrol burned in 100 km is taken from
the magazine benchmarks; Thierry somehow estimates his mileage at 12 000 km
per year and the price of the petrol to .9 e per litre (1 e, the European currency
unit, is approximately equivalent to 1 USD). Finally he expects (hopes) to use the
car for 4 years. On the basis of these hypotheses he gets the estimations of his
expenses for using the car during 4 years that are reported in Table 1 (Criterion
1 = Cost). The resale value of the car after 8 years is not taken into account due
to the high risk of accidents resulting from Thierrys offensive driving style. Note
that the petrol consumption cost which is estimated with a rather high degree of
imprecision counts for about one third of the total cost. The purchase cost is also
highly uncertain.
For building the other criteria Thierry has a large number of performance in-
dices whose value is to be found in the magazine benchmarks at his disposal.
Thierrys particular interest in sporty cars is reflected in his definition of the other
criteria. Car performances are evaluated by their acceleration; criterion 2 (Accel
in Table 6.2) encodes the time (in seconds) needed to cover a distance of one kilo-
metre starting from rest. One could alternatively have taken other indicators such
as power of the engine, time needed to reach a speed of 100 km/h or to cover 400
meters that are also widely available. Some of these values may be imprecisely
determined: they may be biased when provided by the car manufacturer (the
procedures for evaluating petrol consumption are standardised but usually under-

Name of cars Crit1 Crit2 Crit3 Crit4 Crit5

Cost Accel Pick up Brakes Road-h
1 Fiat Tipo 18 342 30.7 37.2 2.33 3
2 Alfa 33 15 335 30.2 41.6 2 2.5
3 Nissan Sunny 16 973 29 34.9 2.66 2.5
4 Mazda 323 15 460 30.4 35.8 1.66 1.5
5 Mitsubishi Colt 15 131 29.7 35.6 1.66 1.75
6 Toyota Corolla 13 841 30.8 36.5 1.33 2
7 Honda Civic 18 971 28 35.6 2.33 2
8 Opel Astra 18 319 28.9 35.3 1.66 2
9 Ford Escort 19 800 29.4 34.7 2 1.75
10 Renault 19 16 966 30 37.7 2.33 3.25
11 Peugeot 309 16V 17 537 28.3 34.8 2.33 2.75
12 Peugeot 309 15 980 29.6 35.3 2.33 2.75
13 Mitsubishi Galant 17 219 30.2 36.9 1.66 1.25
14 Renault 21 21 334 28.9 36.7 2 2.25

Table 6.2: Data of the choosing a car problem

estimate the actual consumption for everyday use); when provided by specialised
journalists in magazines, the procedures for measuring are generally unspecified
and might vary since the cars are not all evaluated by the same person.
The third criterion that Thierry took into consideration is linked with the
pick up or suppleness of the engine in urban traffic; this dimension is considered
important since Thierry also intends to use his car in normal traffic. The indicator
selected to measure this dimension (Pick up in Table 6.2) is the time (in seconds)
needed for covering one kilometre when starting in fifth gear at 40 km/h. Again
other indicators could have been chosen (e.g. the torque). This dimension is not
independent of the second criterion, since they are generally positively correlated
(powerful engines generally lead to quick response times on both criteria); cars
that are specially prepared for competition may however lack suppleness in low
operation conditions which is quite unpleasant in urban traffic. So, from the point
of view of the user, i.e. in terms of preferences, criteria 2 and 3 reflect different
requirements and are thus both necessary. For a short discussion about the notions
of independence and interaction, the reader is referred to Section 6.2.4.
In the magazines evaluation report, several other dimensions are investigated
such as comfort, brakes, road-holding behaviour, equipment, body, boot, finish,
maintenance, etc. For each of these, a number of aspects are considered: 10 for
comfort, 3 for brakes, 4 for road-holding, . . . . In view of Thierrys particular
motivations, only the qualities of braking and of road-holding are of concern to
him and lead to the building of criteria 4 and 5 (resp. Brakes and Road-h
in Table 6.2). The 3 or 4 partial aspects of each viewpoint are evaluated on an
ordinal scale the levels of which are labelled serious deficiency, below average,
average, above average, exceptional. To get an overall indicator of braking
quality (and also for road-holding), Thierry re-codes the ordinal levels with integers

from 0 to 4 and takes the arithmetic mean of the 3 or 4 numbers; this results in
the figures with 2 decimals provided in the last two columns of Table 1. Obviously
these numbers are also imprecise, not necessarily because of imprecision in the
evaluations but because of the arbitrary character of the cardinal re-coding of
the ordinal information and its aggregation via an arithmetic mean (postulating
implicitly that, in some sense, the 3 components of each viewpoint are equally
important and the levels of each of the three scales are equally spaced). We shall
however consider that these figures reflect, in some way, the behaviour of each car
from the corresponding viewpoint; it is clear however that not too much confidence
should be awarded to the precision of these evaluations.
Note that the first 3 criteria have to be minimised while the last 2 must be
This completes the description of the data which, obviously, are not given
but selected and elaborated on the basis of the available information. Being in-
trinsically part of this data is an appreciation (more or less explicit) of their degree
of precision and their reliability.

6.1.2 Reasoning with preferences

In the second part of the presentation of this case, Thierry will provide information
about his preferences. In fact, in the relatively simple decision situation he was
facing (no wife, no boss, Thierry decides for himself and the consequences of his
decision should not affect him crucially), he was able to make up his mind without
using any formal aggregation method. Let us follow his reasoning.
First of all he built a graphic representation of the data. Many types of rep-
resentations can be thought of; popular spreadsheet software offer a large number
of graphical options for representing multi-dimensional data. Figure 6.1 shows
such a representation. Note that the evaluations for the various criteria have been
re-scaled in view of a better readability of the figure. The values for all criteria
have been mapped (linearly) onto intervals of length 2, the first criterion being
represented in the [0, 2] interval, the second criterion, in the [2, 4] interval and so
on. For each criterion, the lowest evaluation observed for the sample of cars is
mapped on the lower bound of the interval while the highest value is represented
on the upper bound of the interval. Such a transformation of the data is not
always innocent; we briefly discuss this point below.
In view of reaching a decision, Thierry first discards the cars whose braking
efficiency and road-holding behaviour is definitely unsatisfactory, i.e. car numbers
4, 5, 6, 8, 9, 13. The reason for such an elimination is that a powerful engine is
needless in competition if the chassis is not good enough and does not guarantee
good road-holding; efficient brakes are also needed to keep the risk inherent to
competition at a reasonable level. The rules for discarding the above mentioned
cars have not been made explicit by Thierry in terms of unattained levels on the
corresponding scales. Rules that would restate the set of remaining cars are for
criterion 4 2

Criteria to be minimised

MitsuGal Supple
R21 Accel

Criteria to be maximised





R21 Roadh

Figure 6.1: Performance diagram of all cars along the first three criteria (above;
to be minimised) and the last two (below; to be maximised)

criterion 5 2
with at least one strict inequality.
Looking at the performances of the remaining cars, those labelled 1, 2, 10 are
further discarded. The set of remaining cars is restated for instance by the rule:

criterion 2 < 30

Finally, the car labelled 14 is eliminated since it is dominated by car number 11.
Dominated by car 11 means that car 11 is at least as good on all criteria and
better on at least one criterion (here all of them!). Notice that car number 14
would not have been dominated if other criteria had been taken into consideration
such as comfort or size: this car is indeed bigger and more classy than the other
cars in the sample.
The cars left after the above elimination process are those labelled 3,7,11,12;
their performances are shown on Figure 6.2. In these star-diagrams each car is
represented by a pentagon; their values on each criterion have all been linearly
re-scaled, being mapped on the [1, 3] interval. The choice of interval [1, 3] instead
of interval [0, 2] is dictated by the mode of representation: the value 0 plays a
special role since it is common to all axes; if an alternative was to receive a 0 value
on several criteria, those evaluations would all be represented by the origin, which
makes the graph less readable. On each axis, the value 1 corresponds to the lowest
value for one of the cars in the initial set of 14 alternatives on each criterion; the
value 3 corresponds to the highest value for one of the 14 cars. In interpreting the
diagrams, remember that criteria 1, 2 and 3 are to be minimised while the others
have to be maximised.
Thierry did not use the latter diagram (Figure 6.2); he drew the same diagram
as in Figure 6.1 instead after reordering the cars; the 4 candidate cars were all
put on the right of the diagram as shown in Figure 6.3; in this way Thierry was
still able to compare the difference in the performances of two candidate cars for a
criterion to typical differences for that criterion in the initial sample. This suggests
that the evaluations of the selected cars should not be transformed independently
of the values of the cars in the initial set; these still constitute reference points in
relation to which the selected cars are evaluated. On Figure 6.4, for the readers
convenience, we show a close-up of Figure 6.3 that is focused on the 4 selected
cars only.
Thierry first eliminates car number 12 on the basis of its relative weakness
on the second criterion (acceleration). Among the 3 remaining cars the one he
chooses is number 11. Here are the reasons for this decision.
1. Comparing cars 3 and 11, Thierry considers that the price difference (about
500 e ) is worth the gain (.7 second) on the acceleration criterion.
2. Comparing cars 7 and 11, he considers that the cost difference (car 7 about
1 500 e more expensive) is not balanced by the small advantage on accelera-
tion (.3 second) coupled with a definite disadvantage (.8 second) on supple-

Honda civic VTI 16

Nissan sunny 20 GTI 16V

crit 1 (cost)
crit 1 (cost) 3,00
1,00 crit 2 (accel)
crit 5 (road-h) 1,00 crit 5 (road-h)
crit 2 ( accel) 0,00

crit 4 (brakes) crit 3 (supple) crit 4 (brakes) crit 3 (supple)

Peugeot 309 GTI 16

Peugeot 309 GTI

crit 1 (cost) crit 1 (cost)

3,00 3,00
2,00 2,00
1,00 1,00 crit 2 (accel)
crit 5 (road-h) crit 2 (accel) crit 5 (road-h)
0,00 0,00

crit 3 (supple)
crit 4 (brakes) crit 3 (supple) crit 4 (brakes)

Figure 6.2: Star graph of the performances of the 4 cars left after the elimination

Name of car Crit1 Crit2 Crit3 Crit4 Crit5

Cost Acc Pick Brakes Road
3 Nissan Sunny 16 973 29 34.9 2.66 2.5
7 Honda Civic 18 971 28 35.6 2.33 2
11 Peugeot 16V 17 537 28.3 34.8 2.33 2.75
12 Peugeot 15 980 29.6 35.3 2.33 2.75

Table 6.3: Performances of the 4 candidate cars



Fiat (1)
Alfa (2)
Mazda (4)
Mitsu Colt (5)
Toyota (6)
Opel (8)
Ford (9)
R19 (10)
Mitsu Gal (13)
R21 (14)
Nissan (3)
Honda (7) Roadh (Max)
Brakes (Max)
Peu16 (11)
Pick up (min)
Peu (12) Accel (min)
Cost (min)

Figure 6.3: Performance diagram of all cars; the 4 candidate cars stand on the


Nissan (3)

Honda (7) Roadh (Max)

Brakes (Max)
Peu16 (11) Pick up (min)
Accel (min)
Peu (12)
Cost (min)

Figure 6.4: Detail of Figure 6.3: the 4 cars remaining after initial screening

Thierrys reasoning process can be analysed as being composed of two steps. The
first one is a screening process in which a number of alternatives are discarded
on the basis of the fact that they do not reach aspiration levels on some criteria.
Notice that these levels have not been set a priori as minimal levels of satis-
faction; they have been set after having examined the whole set of alternatives, to
a value that could be described as both desirable and accessible. The rules that
have been used for eliminating certain alternatives have exclusively been combined
in conjunctive mode since an alternative is discarded as soon as it does not fulfil
any of the rules.
More sophisticated modes of combinations may be envisaged, for instance mix-
ing up conjunctive and disjunctive modes with aspiration levels defined for sub-
sets of criteria (see Fishburn (1978) and Roy and Bouyssou (1993), pp. 264-266).
Another elementary method that has been used is the elimination of dominated
alternatives (car 11 dominates car 14).
In the second step of Thierrys reasoning,

1. Criteria 4 and 5 were not invoked; there are several possible reasons for this:
criteria 4 and 5 might be of minor importance or considered satisfactory
once a certain level is reached; they could be insufficiently discriminating for
the considered subset of cars (this is certainly the case for criterion 4): the
values of the differences for the set of candidate cars could be such that they
are not large enough to balance the differences on other criteria.

2. Subtle considerations on whether the balance of differences in performance

between pairs of cars on 2 or 3 criteria results in an advantage to one of the
cars in the pair.

3. The reasoning is not made on the basis of re-coded values like those used
in the graphics; more intuition is needed, which is better supported by the
original scales. Since criteria 4 and 5 are aggregates and, thus, are not
expressed in directly interpretable units, this might also have been a reason
for not exploiting them in the final selection.

This kind of reasoning that involves comparisons of differences in evaluations

is at the heart of the activity of modelling preferences and aggregating them in
order to have an informed decision process. In the simple case we are dealing with
here, the small number of alternatives and criteria has allowed Thierry to make up
his mind without having to build a formal model of his preferences. We have seen,
however, that after the first step consisting in the elimination of unsatisfactory
alternatives, the analysis of the remaining four cars has been much more delicate.
Note also that if Thierrys goal had been to rank order the cars in order of
decreasing preference, it is not sure that the kind of reasoning he used for just
choosing the best alternative for him would have fit the bill. In more complex
situations (when more alternatives remain after an initial elimination or more
criteria have to be considered or if a ranking of the alternatives is wanted), it may
appear necessary to use tools for modelling preferences.
There is another rather frequent circumstance in which more formal methods
are mandatory; if the decision-maker is bound to justify his decision to other per-
sons (shareholders, colleagues, . . . ), the evaluation system should be more system-
atic, for instance being able to cope with new alternatives that could be suggested
by the other people.
In the rest of this chapter, we discuss a few formal methods commonly used for
aggregating preferences. We report on how Thierry applied some of them to his
case and extrapolate on how he could have used the others. This can be viewed as
an ex post analysis of the problem, since the decision was actually made well before
Thierry became aware of multiple criteria methods. In his ex post justification
study, Thierry has in addition tried to derive a ranking of the alternatives that
would reflect his preferences.

6.2 The weighted sum

When dealing with multi-dimensional evaluations of alternatives, the basic and
almost natural (or perhaps, cultural?) attitude consists in trying to build a one-
dimensional synthesis, which would reflect the value of the alternatives on a syn-
thetic super scale of evaluation. This attitude is perhaps inherited from school
practice where all other performance evaluations of the pupils have long been (and
often still are) summarised in a single figure, a weighted average of their grades in
the various subjects. The problems raised by such a practice have been discussed
in depth in Chapter 3. We discuss the application of the weighted sum to the car
example below, emphasising the very strong hypotheses underlying the use of this
type of approach.
Starting from the standard situation of a set of alternatives a A evaluated
on n points of view by a vector g(a) = (g1 (a), g2 (a), . . . , gn (a)), we consider the

value f (a) obtained by linearly combining the components of g , i.e.

(6.1) f (a) = k1 g1 (a) + k2 g2 (a) + . . . + kn gn (a)

Suppose, without loss of generality, that all criteria are to be maximised, i.e. the
larger the value gi (a), the better the alternative a on criterion i (if, on the contrary,
gi were to be minimised, substitute gi by gi or use a negative weight ki ). Once the
weights ki have been determined, choosing an alternative becomes straightforward:
the best alternative is the one associated with the largest values of f . Similarly,
a ranking of the alternatives is obtained by ordering them in decreasing order of
the value of f .
This simple and most commonly used procedure relies however on very strong
hypotheses that can seldom be considered plausibly satisfied. These problems
appear very clearly when trying to use the weighted sum approach on the car

6.2.1 Transforming the evaluations

A look at the evaluations of the cars (see Table 6.2) prompts a remark that was
already made when we considered representing the data graphically. The ranges
of variation on the scales are very heterogeneous: from 13841 to 21334 on the cost
criterion; from 1.33 to 2.66 on criterion 4. Clearly, asking for values of the weights
ki in terms of the relative importance of the criteria without referring to the
scales would yield absurd results. The usual way out consists in normalising the
values on the scales but there are several manners of doing this. One consists in
dividing gi by the largest value on the ith scale, gi,max ; alternatively one might
subtract the minimal value gi,min and divide by the range gi,max gi,min . These
normalisations of the original gi functions are respectively denoted gi0 and gi00 in
the following formulae

gi (a)
(6.2) gi0 (a) =
gi (a) gi,min
(6.3) gi00 (a) =
gi,max gi,min

For simplicity, we suppose here that gi are positive. In the former case the maximal
value of gi0 will be 1 while value 0 is kept fixed which means that the ratio of the
evaluations of any pair a, b of alternatives remains unaltered:

gi0 (a) gi (a)

(6.4) 0 =
gi (b) gi (b)

This transformation can be advanced when using ratio scales, in which the value
0 plays a special role. Statements such as alternative a is twice as good as b on
criterion i remain valid after transformation.
In the case of gi , the top evaluation will be mapped onto 1 while the bottom
one goes onto 0; ratios are not preserved but ratios in differences of evaluations

do: for all alternatives a, b, c, d,

gi00 (a) gi00 (b) gi (a) gi (b)

(6.5) 00 00 =
gi (c) gi (d) gi (c) gi (d)

Such a transformation is appropriate for interval scales; it does not alter the
validity of statements like the difference between a and b on criterion i is twice
the difference between c and d.
Note that the above are not the only possible options for transforming the data;
note also that these transformations depend on the set of alternatives: considering
the 14 cars of the initial sample or the 4 cars retained after the first elimination
would yield substantially different results since the values gi,min and gi,max depend
on the set of alternatives.

6.2.2 Using the weighted sum on the case

Suppose we consider that 0 plays a special role in all scales and we choose the first
transformation option. The values of the gi s that are obtained are shown in Table
6.4. A set of weights has been chosen which is, to some extent, arbitrary but seems
compatible with what is known about Thierrys preferences and priorities. The
first three criteria receive negative weights namely and respectively 1, 2, 1
(since they have to be minimised), while the last two are given the weight .5. The
alternatives are listed in Table 6.4 in decreasing order of the values of f . As can
be seen in the last column of Table 6.4, this rough assignment of weights yields
car number 3 as first choice followed immediately by car number 11 which was
actually Thierrys choice. Moreover, the difference in the values of f for those two
cars is tiny (less than .01) but we have no idea as to whether such a difference is
meaningful; all we can do is being very prudent in using such a ranking since the
weights were chosen in a rather arbitrary manner. It is likely that by varying the
weights slightly from their present value, one would readily get rank reversals i.e.
permutations of alternatives in the order of preference; in other words, the ranking
is not very stable. Varying the values that are considered imprecisely determined
is what is called sensitivity analysis; it helps to detect what the stable conclusions
in the output of a model are; this is certainly a crucial activity in a decision aiding

6.2.3 Is the resulting ranking reliable?

Weights depend on scaling
To illustrate the lack of stability of the ranking obtained, let us consider Table
6.5 where the set of alternatives is reduced to the 4 cars remaining after the
elimination procedure; the re-scaling of the criteria yields values of gi that are
not the same as in Table 6.4 since gi,max depends on the set of alternatives. This
perturbation, without any change in the values of the weights, is sufficient to cause
a rank reversal between the leading two alternatives. Of course, one could prevent
such a drawback, by using a normalising constant that would not depend on the

Weights ki Value
1 2 1 0.5 0.5 f
Nr Name of cars Cost Accel Pick Brak Road
3 Nissan Sunny 0.80 0.94 0.84 1.00 0.77 -2.63
11 Peugeot 16V 0.82 0.92 0.84 0.88 0.85 -2.64
12 Peugeot 0.75 0.96 0.85 0.88 0.85 -2.66
10 Renault 19 0.80 0.97 0.91 0.88 1.00 -2.71
7 Honda Civic 0.89 0.91 0.86 0.88 0.62 -2.82
1 Fiat Tipo 0.86 1.00 0.89 0.88 0.92 -2.85
5 Mitsu Colt 0.71 0.96 0.86 0.62 0.54 -2.91
2 Alfa 33 0.72 0.98 1.00 0.75 0.77 -2.92
8 Opel Astra 0.86 0.94 0.85 0.62 0.62 -2.96
6 Toyota 0.65 1.00 0.88 0.50 0.62 -2.97
4 Mazda 323 0.72 0.99 0.86 0.62 0.46 -3.02
9 Ford Escort 0.93 0.95 0.83 0.75 0.54 -3.03
14 Renault 21 1.00 0.94 0.88 0.75 0.69 -3.04
13 Mitsu Galant 0.81 0.98 0.89 0.62 0.38 -3.15

Table 6.4: Normalising then ranking through a weighted sum

Weights ki Value
-1 -2 -1 0.5 0.5 f
Nr Name of car Cost Accel Pick Brak Road
11 Peugeot 16V 0.92 0.96 0.98 0.88 1.00 -2.876
3 Nissan Sunny 0.89 0.98 0.98 1.00 0.91 -2.890
12 Peugeot 0.84 1.00 0.99 0.88 1.00 -2.896
7 Honda Civic 1.00 0.95 1.00 0.88 0.73 -3.090

Table 6.5: Normalising then ranking a reduced set of alternatives


set of alternatives, for instance the worst acceptable value (minimal requirement
for a performance to be maximised; maximal level of a variable to be minimised,
a cost, for instance) on each criterion; with such an option, the source of the lack
of stability would be the imprecision in the determination of the worst acceptable
value. Notice that the above problem has already been discussed in Chapter 4,
Section 4.1.1.

Conventional codings
Another comment concerns the figures used for evaluating the performances of the
cars on criteria 4 and 5. Recall that those were obtained by averaging equally
spaced numerical codings of an ordinal scale of evaluation. The obtained figures
presumably convey a less quantitative and more conventional meaning than for
instance acceleration performances measured in seconds in standardisable (if not
standardised) trials. These figures however are treated in the weighted sum just
like the more quantitative ones associated with the first three criteria. In par-
ticular, other codings of the ordinal scale might have been envisaged, for instance
codings with unequal intervals separating the levels on the ordinal scale. Some of
these codings could obviously have changed the ranking.

6.2.4 The difficulties of a proper usage of the weighted sum

The meaning of the weights

What is the exact significance of the weights in the weighted sum model? The
weights have a very precise and quantitative meaning; they are trade-offs: to
compensate for a disadvantage of ki units for criterion j, you need an advantage
of kj units for criterion i. An important consequence is that the weights depend
on the determination of the unit on each scale. In a weighted sum model that
would directly use the evaluations of the alternatives given in Table 6.2, it is clear
that the weight of criterion 2 (acceleration time) has to be multiplied by 60 if
times are expressed in minutes instead of seconds. This was implicitly a reason
for normalising the evaluations as was done through formulae 6.2 and 6.3. After
transformation, both gi0 and gi00 are independent of the choice of a unit; yet they are
not identical and, in a consistent model, their weights should be different. Indeed,
we have
(6.6) gi00 (a) = gi0 (a) + i = i gi0 (a) + i
gi,max gi,min
where i is a constant. Additive constants do not matter since they do not alter
the rating. So, unless gi,min = 0, gi00 is essentially related to gi0 by a multiplicative
factor i 6= 1; in order to model the same preferences through a weighted sum of
the gi00 and a weighted sum of the gi0 , the weight ki00 of gi00 should be obtained by
dividing the weight ki0 by i . Obviously, the weights have to be assessed in relation
to a particular determination of the evaluations on each scale and eliciting them
in practice is a complex task. In any case, they certainly cannot be evaluated in a

meaningful manner through naive questions about the relative importance of the
criteria; reference to the underlying scale is essential.
Up to this point we have considered the influence on the weights of multiplying
the evaluations by a positive constant. Note that translating the origin of a scale
has no influence on the ranking of the alternatives provided by the weighted sum
since it results in adding a (positive or negative) constant to f , the same for all
alternatives. There is still a very important observation that has to be made:
all scales used in the model are implicitly considered linear in the sense that
equal differences in values on a criterion result in equal differences in the overall
evaluation function f and this does not depend on the position of the interval
of values corresponding to that difference on the scale. For instance in the car
example, car number 12 is finally eliminated because it accelerates too slowly. The
difference between car 12 and car 3 with respect to acceleration is 0.6 between 29
seconds and 29.6 seconds. Does Thierry perceive this difference as almost equally
important as a difference of 0.7 between cars 11 and 3, the latter difference being
positioned between 28.3 seconds and 29 seconds on the acceleration scale? It seems
rather clear from Thierrys motivations, that coming close to a performance of 28
seconds is what matters to him while cars above 29 seconds are unworthy. This
means that the gain for passing from 29.6 seconds to 29 seconds has definitely less
value than a gain of similar amplitude, say from 29 to 28.3 seconds. As will be
confirmed in the sequel (see Section 6.3 below), it is very unlikely that Thierrys
preferences are correctly modelled by a linear function of the current scales of

Independence or interaction

The next issue is more subtle. Evaluations of the alternatives for the various points
of view taken into consideration by the decision-maker often show correlations; this
is because the attributes that are used to reflect these viewpoints are often linked
by logical or factual interdependencies. For instance, indicators of cost, comfort
and equipment, which may be used as attributes for assessing the alternatives for
those viewpoints, are likely to be positively correlated. This does not mean that
the corresponding points of view are redundant and that one should eliminate
some of them. One is perfectly entitled to work with attributes that are (even
strongly) correlated. That is the first point.
A second point is about independence. In order to use a weighted sum, the
viewpoints should be independent, but not in the statistical sense implying that
the evaluations of the alternatives should be uncorrelated! They should be in-
dependent with respect to preferences. In other words, if two alternatives that
share the same profile on a subset of criteria compare in a certain way in terms of
overall preferences, their relative position should not be altered when the profile
they share on a subset of criteria is substituted by any other common profile. On
the contrary, a famous example of dependence in the sense of preferences in a
gastronomic context is the following: the preference for white wine or red wine
usually depends on whether you are eating fish or meat. There are relatively sim-
ple tests for independence in the sense of preferences, which consist in asking the

decision-maker about his preferences on pairs of alternatives that share the same
profile for a subset of attributes; varying the common profile should not reverse
the preferences when the points of view are independent. Independence is a nec-
essary condition for the representation of preferences by a weighted sum; it is not
a sufficient one of course.
There is a different concept that has been recently implemented for modelling
preferences. It is the concept of interacting criteria that was already discussed
in example 2 of Chapter 3. Suppose that in the process of modelling the prefer-
ences of the decision-maker, he declares that the influence of positively correlated
aspects should be dimmed and that conjoint good performances for negatively
correlated aspects should be emphasised. In our case for instance, criteria 2 and
3, respectively acceleration and suppleness, may be thought of as being positively
correlated. It may then prove impossible to model some preferences by means of a
weighted sum of the evaluations such as those in Table 6.2 (and even of transfor-
mations thereof such as obtained through formulae like 6.3). This does not mean
that no additive model would be suitable and it does not imply that the prefer-
ences are not independent (in the above-defined sense). In the next section we
shall study an additive model, more general than the weighted average, in which
the evaluations gi may be re-coded through using value functions ui . With
appropriate choices of u2 and u3 it may be possible to take the decision-makers
preferences about positively and negatively correlated aspects into account, pro-
vided they satisfy the independence property. If no re-coding is allowed (like in
the assessment of students, see Chapter 3) there is a non-additive variant of the
weighted average that could help modelling interactions among the criteria; in
such a model the weight of a coalition of criteria may be larger or smaller than
the sum of the weights of its components (see Grabisch (1996), for more detail on
non-additive averages).

Arbitrariness, imprecision and uncertainty

In the above discussion as well as in the presentation of our example we have
emphasised the many sources of uncertainty (lack of knowledge) and of imprecision
that bear on the figures used as input in the weighted sum. Let us summarise some
of them:

1. Uncertainty in the evaluation of the cost: the buying price as well as the
life-length of a second hand car are not known. This uncertainty can be
considered of stochastic nature; statistical data could help to masterto
some extentsuch a source of uncertainty; in practice, it will generally be
very difficult to get sufficient relevant and reliable statistical information in
for this kind of problems.

2. Imprecision in the measurement of some quantities: for instance, how precise

is the measurement of the acceleration? Such an imprecision can be reduced
by making the conditions of the measurement as standard as possible and
can then be estimated on the basis of the precision of the measurement

3. Arbitrary coding of non-quantitative data: re-coding of ordinal scales of

appreciation of braking and road-holding behaviour. Any re-coding that
respects the order of the categories would in principle be acceptable. To
master such an imprecision one could try to build quantitative indicators
for the criteria or try to get additional information on the comparison be-
tween differences of levels on the ordinal scale: for instance, is the difference
between below average and average larger than the difference between
above average and exceptional?

4. Imprecision in the determination of the trade-offs (weights ki ); the ratios of

weights kj /ki must be elicited as conversion rates: a unit for criterion j is
worth kj /ki units for criterion i; of course, the scales must first be re-coded in
order that one unit difference on a criterion has the same value everywhere
on the scale (linearisation); these operations are far from obvious and as a
consequence, the imprecision of the linearisation process combines with the
inaccuracy in the determination of weights.

Making a decision
All these sources of imprecision have an effect on the precision of the determination
of the value of f that is almost impossible to quantify; contrary to what can (often)
be done in physics, there is generally little information on the size of the impre-
cisions; quite often, there is not even probabilistic information on the accuracy
of the evaluations. As a consequence, the apparently straightforward decision
choosing the alternative with the highest value of f or ranking the alternatives in
decreasing order of the values of f might be unconsidered as illustrated above.
The usual way out is extensive sensitivity analysis, which could be described as
part of the validation of the model. This part of the job is seldom carried out
with the required exhaustivity because it is a delicate task at least in two respects.
On the one hand there are many possible strategies for varying the values of the
imprecisely determined parameters; usually parameters are varied one at a time
which is not sufficient but is possibly tractable; the range in which the parameters
must be varied is not even clear as suggested above. On the other hand, once
the sensitivity analysis has been performed, one is likely to be faced with several
almost equally valuable alternatives; in the car problem for instance, the simple
remarks made above strongly suggest that it will be very difficult to discriminate
between cars 3 and 11.
In view of the previous discussion, there are two main approaches to solve the
difficulties raised by the weighted sum:

1. Either one tries to prepare the inputs of the model (linearised evaluations and
trade-offs) as carefully as possible, paying permanent attention to reducing
imprecision and finishing with extensive sensitivity analysis;

2. Or one takes imprecision into account from the start, by avoiding to exploit
precise values when knowing that they are not reliable but rather working
with classes of values and ordered categories. Note that imprecision may well

lie in the link between evaluations and preferences rather than in the eval-
uations themselves; detailed preferential information, even extracted from
perfectly precise evaluations, may prove rather difficult to elicit.

The former option will lead us to the construction of multi-attribute value or

utility functions, while the latter leads to the outranking approach. These two
approaches will be developed in the sequel. There is however a whole family of
methods that we shall not consider here, the so-called interactive methods (Steuer
(1986), Vincke (1992), Teghem (1996)). These implement various strategies for
exploring the efficient boundary, i.e. the set of non-dominated solutions; the ex-
ploration jumps from one solution to another; it is guided by the decision-maker
who is asked to tell, for instance, which characteristics of the current solution he
would like to see improved. Such methods are mainly designed for dealing with
infinite and even continuous sets of alternatives; moreover, they do not lead to an
explicit model of the decision-makers preferences. On the contrary, we have set-
tled on problems with a (small) finite number of alternatives and we concentrate
on obtaining explicit representations of the decision-makers preferences.

6.2.5 Conclusion
The weighted sum is useful for obtaining a quick and rough draft of an overall
evaluation of the alternatives. One should however keep in mind that there are
rather restrictive assumptions underlying a proper use of the weighted sum. As a
conclusion to this section we summarise these conditions.

1. Cardinal character of the evaluations on all scales. The evaluations

of the alternatives for all criteria are numbers and these values are used as
such even if they result from the re-coding of ordinal data.

2. Linearity of each scale. Equal differences between values on scale i,

whatever the location of the corresponding intervals on the scale (at the
bottom, in the middle or at the top of the scale), produce the same effect on
the overall evaluation f : if alternatives a, b, c, d are such that gi (a) gi (b) =
gi (c) gi (d) for all i, then f (a) f (b) = f (c) f (d).

3. The weights are trade-offs. Weights depend on the scaling of the cri-
teria; transforming the (linearised) scales results in a related transformation
of the weights. Weights tell how many units on the scale of criterion i are
needed to compensate one unit of criterion j.

4. Preference independence. Criteria do not interact. This property,

called preference independence, can be formulated as follows. Consider two
alternatives that share the same evaluation on at least one criterion, say
criterion i. Varying the level of that common value on criterion i does not
alter the way the two alternatives compare in the overall ranking.

6.3 The additive multi-attribute value model

Our analysis of the weighted sum brought us very close to the requirements for
additive multi-attribute value functions. The most common model in multiple
criteria decision analysis is a formalisation of the idea that the decision-maker,
when making a decision, behaves as if he was trying to maximise a quantity called
utility or value (the term utility tends nowadays to be used preferably in the
context of decision under risk, but we shall use it sometimes for value).
This postulates that all alternatives may be evaluated on a single super-scale
reflecting the value system of the decision-maker and his preferences. In other
words, the alternatives can be measured, in terms of worth on a synthetic
dimension of value or utility. Accordingly, if we denote by % the overall preference
relation of the decision-maker on the set of alternatives, this relation relates to the
values u(a), u(b) of the alternatives in the following way:
(6.7) a % b iff u(a) u(b)
As a consequence, the preference relation % on the set of alternatives is a complete
preorder, i.e. a complete ranking possibly with ties. Of course, the value u(a)
usually is a function of the evaluations {gi (a), i = 1, . . . , n}. If this function is
a linear combination of gi (a), i = 1, . . . , n, we get back to the weighted sum. A
slightly more general case is the following additive model:
(6.8) u(a) = ui (gi (a))

where the function ui (single-attribute value function) is used to re-code the

original evaluation gi in order to linearise it in the sense described in the previous
section; the weights ki are incorporated in the ui functions. The additive value
function model can thus be viewed as a clever version of the weighted sum since it
allows us to take some of the objectionsmainly the second hypothesis in Section
6.2.5against a naive use of it into account. Note however that the imprecision
issue is not dealt with inside the model (sensitivity analysis has to be performed
in the validation phase, but is neither part of the model nor straightforward in
practice); the elicitation of the partial value functions ui may also be a difficult
Much effort has been devoted to characterising various systems of conditions
under which the preferences of a decision-maker can be described by means of
an additive value function model. Depending on the context, some systems of
conditions may be interpretable and tested, at least partially, i.e. it may be possible
to ask the decision-maker questions that will determine whether an additive value
model is compatible with what can be perceived of his system of preferences. If the
preferences of the decision-maker are compatible with an additive value model, a
method of elicitation of the ui s may then be used; if not, another model should be
looked for: a multiplicative model or, more generally, a non-additive one, a non-
independent one, a model that takes imprecision more intrinsically into account,
etc. (see Krantz, Luce, Suppes and Tversky (1971), Chapter 7, Luce, Krantz,
Suppes and Tversky (1990), Vol. 3, Chapter 19).

6.3.1 Direct methods for determining single-attribute

value functions
A large number of methods have been proposed to determine the u0i s in an additive
value function model. For an accessible account of such methods, the reader is
referred to von Winterfeldt and Edwards (1986), Chapter 8.
There are essentially two families of methods, one based on direct numerical
estimations and the other on indifference judgements. We briefly describe the
application of a technique of the latter category relying on what is called dual
standard sequences, (Krantz et al. (1971), von Winterfeldt and Edwards (1986),
Wakker (1989)) that builds a series of equally spaced intervals on the scale of

An assessment method based on indifference judgments

Suppose we want to assess the u0i s in an additive model for the Cars case. It is
assumed that the suitability of such a model for representing the decision-makers
preferences has been established. Consider a pair of criteria, say Cost and Ac-
celeration. We are going to outline a simulated dialog between an analyst and
a decision-maker that could yield an assessment of u1 and u2 , the corresponding
single-attribute value functions, for ranges of evaluations corresponding to accept-
able cars. Note that we start the construction of the sequence from a central
point instead of taking a worst point (see for instance von Winterfeldt and
Edwards (1986), pp. 267 sq for an example starting from a worst point)
The range for the cost will be the interval between 21 500 e to 13 500 e and
from 28 to 31 seconds for acceleration. First ask the decision-maker to select a
central point corresponding to medium range evaluations on both criteria. In
view of the set of alternatives selected by Thierry, let us start with (17 500, 29.5) as
average values for cost and acceleration. Also ask the decision-maker to define a
unit step on the cost criterion; this step will consist, say, of passing from a cost of
17 500 e to 16 500 e. Then the standard sequence is constructed by asking which
value x1 for the acceleration would make a car costing 16 500 e and accelerating
in 29.5 seconds indifferent to a car costing 17 500 e and accelerating in x1 seconds.
Suppose the answer is 29.2 meaning that from the chosen starting point, a gain
of 0.3 second on the acceleration time is worth an increase of 1 000 e in cost. The
answer could be explained by the fact that at the starting level of performance
for the acceleration criterion, the decision-maker is quite interested by a gain in
acceleration time. Relativising the gains as percentages of the half range from
the central to the best values on each scale, this means that the decision-maker
1000 .3
is ready to lose 4000 =25% of the potential reduction in cost for gaining 1.5 =20%
of acceleration time. We will say in the sequel that the parity is equal when the
decision-maker agrees to exchange a percentage of the half range on a criterion
against an equal percentage on another criterion.
The second step in the construction of the standard sequence is asking the
decision-maker which value to assign to x2 to have (16 500, 29.2) (17 500, x2 ),
where denotes indifferent to. The answer might be, for instance, 28.9. Con-
tinuing along the same line would for instance yield the following sequence of



value 2



28 28.5 29 29.5
acceleration (sec)

Figure 6.5: Single-attribute value function for acceleration criterion (half range)

(16 500, 29.5) (17 500, 29.2)
(16 500, 29.2) (17 500, 28.9)
(16 500, 28.9) (17 500, 28.7)
(16 500, 28.7) (17 500, 28.5)
(16 500, 28.5) (17 500, 28.3)
(16 500, 28.3) (17 500, 28.1)
Such a sequence gives the analyst an approximation of the single-attribute
value function u2 , on the half range from 28 to 29.5 seconds but it is easy to devise
a similar procedure for the other half range, from 29.5 to 31. Figure 6.5 shows the
re-coding u2 of the evaluations g2 on the interval [28, 29.5]; there are two linear
parts in the graph: one ranging from 28 to 28.9 where the slope is proportional to
1 1
.2 and the other valid between 28.9 and 29.5 with a slope proportional to .3 .
From there, using the same idea, one is able to re-code the scale of the cost cri-
terion into the single-attribute value function u1 . Then, considering (for instance)
the cost criterion with criteria 3, 4 and 5 in turn, one obtains a re-coding of each
gi into a single-attribute value function ui .
The trade-off between u1 and u2 is easily determined through solving the fol-
lowing equation that just expresses the initial indifference in the standard sequence
(16 500, 29.5) (17 500, 29.2)

k1 u1 (16 500) + k2 u2 (29.5) = k1 u1 (17 500) + k2 u2 (29.2)

from which we get

k2 u1 (16 500) u1 (17 500)
= .
k1 u2 (29.2) u2 (29.5)

If we set k1 to 1, this formula yields k2 and the trade-offs k3 , k4 and k5 are obtained
similarly. Notice that the re-coding process of the original evaluations into value
functions results in a formulation in which all criteria have to be maximised (in
The above procedure, although rather intuitive and systematic is also quite
complex; the questions are far from easy to answer; starting from one reference
point or another (worst point instead of central point) may result in variations in
the assessments. There are however many possibilities for checking for inconsisten-
cies. Assume for instance that a single-attribute value function has been assessed
by means of a standard sequence that links its scale to the cost criterion; one
may validate this assessment by building a standard sequence that links its scale
to another criterion and compare the two assessments of the same value function
obtained in this way; hopefully they will be consistent; otherwise some sort of
retroaction is required.
Note finally that such methods may not be used when the scale on which the
assessments are made only has a finite number of degrees instead of being the set
of real numbers; at least numerous and densely spaced degrees are needed.

Methods relying on numerical judgements

In another line of methods, simplicity and direct intuition are more praised than
scrupulous satisfaction of theoretical requirements, although the theory is not
ignored. An example is SMART (Simple Multi-Attribute Rating Technique),
developed by W. Edwards, which is more a collection of methods than a single
one. We just outline here a variant referring to von Winterfeldt and Edwards
(1986), pp. 278 sq., for more details. In order to re-code, say, the evaluations for
the acceleration criterion, one initially fixes two anchor points that may be the
extreme values of the evaluations on the set of acceptable cars, here 28 and 31
seconds. On the value scale, the anchor points are associated to the endpoints of a
conventional interval of values, for instance 31 to 0 and 28 to 100. Since 29 seconds
seems to be the value under which Thierry considers that a car becomes definitely
attractive from the acceleration viewpoint, it should be assigned to the interval
[28, 29] a range of values larger than 13 , its size (in relative terms) in the original
scale. Thierry could for instance assign 29 seconds to 50 on the value scale. Then
28.5 and 30 could be located respectively in 70 and 10, yielding the initial sketch
of a value function shown on Figure 6.6(a), (with linear interpolation between the
specified values. This picture can be further improved by asking Thierry to see
whether the relative spacings of the locations correctly reflect the strength of his
preferences. Thierry might say that almost the same gain in value (40) from 30
seconds to 29 as from 29 to 28 (gain of 50) is unfair and he could consequently
propose to lower to 40 the value associated with 29 seconds; he also lowers to 65
the value of 28.5 seconds. Suppose he is then satisfied with all other differences
of values; the final version is drawn in Figure 6.6(b). A similar work has to be
carried over for all criteria and the weights must be assessed.
The weights are usually derived through direct numerical judgements of relative
attribute importance. Thierry would be asked to rank-order the attributes; an

(a) (b)
100 100

90 90

80 80

70 70

60 60


50 50

40 40

30 30

20 20

10 10

0 0
28 29 30 31 28 29 30 31
acceleration (sec) acceleration (sec)

Figure 6.6: Value function for acceleration criterion: (a) initial sketch; (b) final,
with initial sketch in dotted line

importance of 10 could be arbitrarily assigned to the least important criterion

and the importance of each other criterion should be assessed in relation to the least
important one, directly as an estimation of the ratio of weights. This approach
in terms of importance can be and has been criticised. In assessing the relative
weights no reference is made to the underlying scales. This is not appropriate
since weights are trade-offs between units on the various value scales and must
vary with the scaling.
For instance, on the acceleration value scale that is normalised in the 0-100
range, the meaning of one unit varies depending on the range of original evalua-
tions (acceleration measured in seconds) that are represented between value 0 and
value 100 of the value scale. If we had considered that the acceleration evaluations
of admissible cars range from 27 to 32 seconds, instead of from 28 to 31, we would
have constructed a value function u02 with u02 (32) = 0 and u02 (27) = 100; a differ-
ence of one unit of value on the scale u2 illustrated in Figure 6.6 corresponds to a
u0 (28)u0 (31)
(less-than-unit) difference of 2 100 2 on the scale u02 . The weight attached to
that criterion must vary in inverse proportion to the previous factor when passing
from u2 to u02 . It is unlikely that a decision-maker would take the range of evalua-
tions into account when asked to assess weights in terms of relative importance
of criteria, a formulation that seems independent of the scalings of the criteria.
A way of avoiding these difficulties is to give up the notion of importance that
seems misleading in this context and to use a technique called swing-weighting;
the decision-maker is asked to compare alternatives that swing between the
worst and the best level for each attribute in terms of their contribution to the
overall value. The argument of simplicity in favour of SMART is then lost since
the questions to be answered are similar, both in difficulty and in spirit, to those
raised in the approach based on indifference judgements.

6.3.2 AHP and Saatys eigenvalue method

The eigenvalue method for assessing attribute weights and single-attribute value
functions is part of a general methodology called Analytic Hierarchy Process; it
consists in structuring the decision problem in a hierarchical manner (as it is also
advocated for building value functions, for instance in Keeney and Raiffa (1976)),
constructing numerical evaluations associated with all levels of the hierarchy and
aggregating them in a specific fashion, formally a weighted sum of single-attribute
value functions (see Saaty (1980), Harker and Vargas (1987)).
In our case, the top level of the hierarchy is Thierrys goal of finding the best
car according to his particular views. The second level consists in the 5 criteria
into which his global goal can be decomposed. The last level can be described as
the list of potential cars. Thus the hierarchical tree is composed of 1 first level
node, 5 second level nodes and 5 times 14 third level nodes also called leaves.
What we have to determine is the strength or priority of each element of a level
in relation to their importance for an element in the next level.
The assessment of the nodes may start (as is usually done) from the bottom
nodes; all nodes linked to the same parent node are compared pairwise; in our
case this amounts to comparing all cars from the point of view of a criterion and
repeating this for all criteria. The same is then done for all criteria in relation to the
top node; the influence of all criteria on the global goal are also compared pairwise.
At each level, the pairwise comparison of the nodes in relation to the parent node is
done by means of a particular method that allows, to some extent, to detect and
correct inconsistencies. For each pair of nodes a, b, the decision-maker is asked
to assess the priority of a as compared to the priority of b. The questions
are expressed in terms of importance or preference or likelihood according
to the context. It is asked for instance how much alternative a is preferred to
alternative b from a certain point of view. The answers may be formulated either
on a verbal or a numerical scale. The levels of the verbal scale correspond to
numbers and are dealt with as such in the computations. The conversion of verbal
levels into numerical levels is described in Table 6.6. There are five main levels on
the verbal scale, but 4 intermediary levels that correspond to numerical codings
2, 4, 6, 8 can also be used. For instance, the level Moderate corresponds to an
alternative that is preferred 3 times more than another or a criterion that is 3
times more important than another. Such an interpretation of the verbal levels
has very strong implications; it means that preference, importance and likelihood
are considered as perceived on a ratio scale (much like sound intensity). This is
indeed Saatys basic assumption; what the decision-maker expresses as a level on
the scale is postulated to be the ratio of values associated to the alternatives or
the criteria. In other words, a number f (a) is assumed to be attached to all a;
when comparing a to b, the decision-maker is assumed to give an approximation
of the ratio ff (a)
(b) . Since verbal levels are automatically translated into numbers in
Saatys method, we shall concentrate on assessing directly on the numerical scale.
Let (a, b) denote the level of preference (or of relative importance) of a over b
expressed by the decision-maker; the results of the pairwise comparisons may thus
be encoded in a square matrix . If Saatys hypotheses are correct, there should

Verbal Equal Moderate Strong Very strong Extreme

Numeric 1 3 5 7 9

Table 6.6: Conversion of verbal levels into numbers in Saatys pairwise comparison
method; e.g. Moderate means 3 times more preferred

be some sort of consistency between elements of , namely, for all a, b, c,

(6.9) (a, c) (a, b) (b, c)

and in particular,
(6.10) (a, b)
(b, a)
In view of the latter relation, only one half (roughly) of the matrix has to be
elicited, which amounts to answering n(n1)
2 questions.
Relation (6.9) implies that all columns of matrix should be approximately
proportional to f . The pairwise comparisons enable to

1. detect departure from the basic hypothesis in case the columns of are too
far from proportional;

2. correct errors made in the estimation of the ratios; some sort of averaging of
the columns is performed yielding an estimation of f .

A test based on statistical considerations allows the user to determine whether the
assessments in the pairwise comparison matrix show sufficient agreement with the
hypothesis that they are approximations of ff (a) (b) , for an unknown f . If the test
conclusion is negative, it is recommended either to revise the assessments or to
choose another approach more suitable for the type of data.
If one wants to apply AHP in a multiple criteria decision problem, pairwise
comparisons of the alternatives must be performed for each criterion; criteria must
also be compared in a pairwise manner to model their importance. This process
results in functions ui that evaluate the alternatives on each criterion i and in
coefficients of importance ki . Each alternative a is then assigned an overall value
v(a) computed as
(6.11) v(a) = ki ui (a)

and the alternatives can be ranked according to the values of v.

Applying AHP to the case

Since Thierry did not apply AHP to his analysis of the case, we have answered
the questions on pairwise comparisons on the basis of the information contained in
his report. For instance, when comparing cars on the cost criterion, more weight
will be put on a particular cost difference, say 1 000 e, when located in the range

from 17 500 e to 21 500 e than when lying between 13 500 e and 17 500 e. This
corresponds to the fact that Thierry said he is rather insensitive to cost differences
up to about 17 500 e, which is the amount of money he had budgeted for his car.
For the sake of concision, we have restricted our comparisons to a subset of cars,
namely the top four cars plus the Renault 19, Mazda 323 and Toyota Corolla.
A major issue in the assessment of pairwise comparisons, for instance of alter-
natives in relation to a criterion, is to determine how many times a is preferred to b
on criterion i from looking at the evaluations gi (a) and gi (b). Of course the (ratio)
scale of preference on i is not in general the scale of the evaluations gi . For ex-
ample, Car 11 costs approximately 17 500 e and Car 12 costs about 16 000 e. The
17 500
ratio of these costs, 16 000 , is equal to 1.09375 but this does not necessarily mean
that Car 12 is preferred 1.09375 times more than Car 11 on the cost criterion; this
is because the cost evaluation does not measure the preferences directly. Indeed, a
transformation (re-scaling) is usually needed to go from evaluations to preferences;
for the cost, according to Thierry himself, the transformation is not linear since
equal ratios corresponding to costs located either below or above 17 500 e do not
correspond to equal ratios of preference. But even in linear parts, the question
is not easily answered. A decision-maker might very well say that Car 12 is 1.5
times more preferred than Car 11 for the cost criterion; or he could say 2 times or
4 times. All depends on what the decision-maker would consider as the minimum
possible cost; for instance (supposing that the transformation of cost into prefer-
ence is linear), if Car 12 is declared to be 1.5 times more preferred to Car 11, the
zero of the cost scale x would be such that
17 500 x
= 1.5 ,
16 000 x
i.e. x = 14 500 e. The problem is even more crucial for transforming scales such
as those on which braking or road-holding are evaluated. For instance, how many
times is Car 3 preferred to Car 10 with respect to the braking criterion? In other
words, how many times is 2.66 better than (preferred to) 2.33?
Similar questions arise for the comparison of importance of criteria. We discuss
the determination of the weights ki of the criteria in formula 6.11 below. For
computing those weights, the relative importance of each criterion with respect
to all others must be assessed. Our assessments are shown in Table 6.7. We
made them directly in numerical terms taking into account a set of weights that
Thierry considered as reflecting his preferences; those weights have been obtained
using the Prefcalc software and a method that is discussed in the next session.
By default, the blanks on the diagonal should be interpreted as 1s; the blanks
below the diagonal are supposed to be 1 over the corresponding value above the
diagonal, according to equation 6.10.
Once the matrix in Table 6.7 has been filled, several algorithms can be proposed
to compute the priority of each criterion with respect to the goal symbolised by
the top node of the hierarchy (under the hypothesis that the elements of the
assessment matrix are approximations of the ratios of those priorities). The most
famous algorithm, which was initially proposed by Saaty, consists in computing
the eigenvector of the matrix corresponding to the largest eigenvalue (see Harker

Relative importance Cost Accel Pick-up Brakes Road-h

Cost 1.5 2 3 3
Acceleration 1.5 2 2
Pick-up 1.5 1.5
Brakes 1

Table 6.7: Assessment of the comparison of importance for all pairs of criteria.
For instance, the number 2 at the intersection of 1st row and 3rd column means
that Cost is considered twice as important as Pick-up

and Vargas (1987), for an interpretation of the eigenvector method as a way

of averaging ratios along paths). Since eigenvectors are determined up to a
multiplicative factor, the vector of priorities is the normalised eigenvector whose
components sum up to unity; the special structure of the matrix (reciprocal matrix)
guarantees that all priorities will be positive. Alternative methods for correcting
inconsistencies have been elaborated; most of them are based on some sort of
a least squares criterion or on computing averages (see e.g. Barzilai, Cook and
Golany (1987) who argue in favour of a geometric mean). Applying the eigenvector
method to the matrix in Table 6.7, one obtains the following values that reflect
the importance of the criteria:

(.352, .241, .172, .117, .117)

Note that only the lowest degrees of the 1 to 9 scale have been used in Table 6.7.
This means that the weights are not perceived as very contrasted; in order to get
the sort of gradation of the weights as above (the ratio of the highest to the lowest
value is about 3), some comparisons have been assessed by non-integer degrees,
which normally are not available on the verbal counterpart of the 1 to 9 scale
described in Table 6.6. When the assessments are made through this verbal scale,
approximations should be made, for instance by saying that cost and acceleration
are equally important and substituting 1.5 by 1. Note that the labelling of the
degrees on the verbal scale may be misleading; one would quite naturally qualify
the degree to which Cost is more important than Acceleration as Moderate
until it is fully realised that Moderate means three times as important; using
the intermediary level between Equal and Moderate would still mean twice
as important.
It should be emphasised that the eigenvalue method is not linear. What
would have changed if we had scaled the importance differently, for instance as-
sessing the comparisons of importance by degrees twice as large as those in Table
6.7 (except for 1s that remain constant)? Would the coefficients of importance
have been twice as large? Not at all! The resulting weights would have been much
more contrasted, namely:

(.489, .254, .137, .060, .060) .


Name of car Nr 7 11 3 12 10 4 6
Honda Civic 7 1.0 1.0 2.0 4.0 4.0 5.0 5.0
Peugeot 309/16V 11 1.0 1.0 2.0 3.0 4.0 4.0 4.0
Nissan Sunny 3 0.50 0.50 1.0 1.50 2.0 3.0 3.0
Peugeot 309 12 0.25 0.33 0.67 1.0 1.0 2.0 2.0
Renault 19 10 0.25 0.25 0.5 1.0 1.0 1.0 1.5
Mazda 323 4 0.2 0.25 0.33 0.5 1.0 1.0 1.0
Toyota Corolla 6 0.2 0.25 0.33 0.5 0.67 1.0 1.0

Table 6.8: Pairwise comparisons of preferences of 7 cars on the acceleration crite-


Using the latter set of weights instead of the former would substantially change the
values attached to the alternatives through formula 6.11 and might even alter their
ordering. So, contrary to the determination of the trade-offs in an additive value
model (which may be re-scaled through multiplying them by a positive number,
without altering the way in which alternatives are ordered by the multi-attribute
value function), there is no degree of freedom in the assessment of the ratios in
AHP; in other words, these assessments are made on an absolute scale.
As a further example, we now apply the method to determine the evaluation
of the alternatives in terms of preference on the Acceleration criterion. Suppose
the pairwise comparison matrix has been filled as shown in Table 6.8, in a way
that seems consistent with what we know of Thierrys preferences. Applying the
eigenvalue method yields the following priorities attached to each of the cars in
relation to acceleration:

(.2987, .2694, .1507, .0934, .0745, .0584, .0548).

A picture of the resulting re-scaling of that criterion is provided in Figure

6.7; the solid line is a linear interpolation of the priorities in the eigenvector. A
re-scaling of the same criterion had been obtained through the construction of a
standard sequence (see Figure 6.5). Comparing these scales is not straightforward.
Notice that the origin is arbitrary in the single-attribute value model; one may add
any constant number to the values without changing the ranking of the alternatives
(a term equal to the constant number times the trade-off associated to the attribute
would just be added to the multi-attribute value function); since trade-offs depend
on the scaling of their corresponding single-attribute value function, changing the
unit on the vertical axis amounts to multiplying ui by a positive number; the
corresponding trade-off must then be divided by the same number. In the multi-
attribute value model, the scaling of the single-attribute value function is related
to the value of the trade-off; transformation of the former must be compensated
for by transforming the latter. In AHP since the assessments of all nodes are
made independently, no transformation is allowed. In order to compare the two
figures, one may transform the value function of Figure 6.5 so it coincides with
AHP priority on the extreme values of the acceleration half range, i.e. 28 and 29.5.
Figure 6.7 shows the transformed single-attribute value function superimposed



priorities (solid); value (dotted)





28 28.5 29 29.5 30 30.5 31

acceleration (sec)

Figure 6.7: Priorities relatively to acceleration as obtained through the eigenvector

method are represented by the solid line; the linearly transformed single-attribute
values of Figure 6.5 are represented by the dotted line on the range from 28 to
29.5 seconds

(dotted line) on the graph of the priorities. There seems to be a good fit of the
two curves but this is only an example from which no general conclusion can be

Comments on AHP
Although the models for describing the overall preferences of the decision-maker
are identical in multi-attribute value theory and in AHP, this does not mean that
applying the respective methodologies of these theories normally yields the same
overall evaluation of the alternatives. There are striking differences between the
two approaches from the methodological point of view. The ambition of AHP is
to help construct evaluations of the alternatives for each viewpoint (in terms of
preferences) and of the viewpoints with regard to the overall goal (in terms of
importance); these evaluations are claimed to belong to a ratio scale, i.e. to be
determined up to a positive multiplicative constant. Since the eigenvalue method
yields a particular determination of this constant and this determination is not
taken into account when assessing the relative importance of the various criteria,
the evaluations in terms of preference must be considered as if they were made on
an absolute scale, which has been repeatedly criticised in the literature (see for
instance Belton (1986) and Dyer (1990)). This weakness (that can also be blamed
on direct rating techniques, as mentioned above) could be corrected by asking the
decision-maker about the relative importance of the viewpoints in terms of passing
from the least preferred value to the most preferred value on criterion i compared

to a similar change on criterion j (Dyer 1990). Taking this suggestion into account
would however go against one of the basic principles of Saatys methodology, i.e.
the assumption that the assessments at all levels of the hierarchy can be made
along the same procedure and independently of the other levels. That is probably
why the original method, although seriously attacked, has remained unchanged.
AHP has been criticised in the literature in several other respects. Besides the
fact already mentioned that it may be difficult to reliably assess comparisons of
preferences or of importance on the standard scale described in Table 6.6, there
is an issue about AHP that has been discussed quite a lot, namely the possibility
of rank reversal. Suppose alternative x is removed from the current set and
nothing is changed to the pairwise assessments of the remaining alternatives; it
may happen that an alternative, say, a among the remaining ones could now be
ranked below an alternative b whilst it was ahead of b in the initial situation. This
phenomenon was discussed in Belton and Gear (1983) and Dyer (1990) (see also
Harker and Vargas (1987) for a defense of AHP).

6.3.3 An indirect method for assessing single-attribute value

functions and trade-offs
Various methods have been conceived in order to avoid direct elicitation of a
multi-attribute value function. A class of such methods consists in postulating
an additive value model (as described in formulae 6.7 and 6.8) and inferring all
together the shapes of all single-attribute value functions and the values of all
the trade-offs from declared global preferences on a subset of well-known alterna-
tives. The idea is thus to infer a general preference model from partial holistic
information about the decision-makers preferences.
Thierry used a method of disaggregation of preferences described in Jacquet-
Lagreze and Siskos (1982); it is implemented in a software called Prefcalc, which
computes piece-wise linear single-attribute value functions and is based on linear
programming (see also Jacquet-Lagreze (1990), Vincke (1992)). More precisely,
the software helps to build a function
u(a) = ui (gi (a))

such that a % b u(a) u(b). Without loss of generality, the lowest (resp.
highest) value of u is conventionally set to 0 (resp. 1); 0 (resp. 1) is the value of an
(fictitious) alternative whose assessment on each criterion would be to the worst
(resp. best) evaluation attained for the criterion on the current set of alternatives.
This fictitious alternative is sometimes called the anti-ideal (resp. ideal ) point.
In our example, the anti-ideal car, costs 21 334 e, needs 30.8 seconds to cover
1 km starting from rest and 41.6 seconds, starting in fifth gear at 40km/h; its
performance regarding brakes and road-holding are respectively 1.33 and 1.25.
The ideal car on the opposite side of the range, costs 13 841 e, needs 28 seconds
to cover 1km starting from rest and 34.7 seconds, starting in fifth gear at 40km/h;
its performance regarding brakes and road-holding are respectively 2.66 and 3.25.

Cost .43 Acc .23 Pick .13

13.84 17.59 21.33 28 29 30 34 38 42

Brake .1 Road .1

1.3 2.0 2.7 1.2 2.2 3.2

Figure 6.8: Single-attribute value functions computed by means of Prefcalc in the

Choosing a car problem; the value of the trade-off is written in the right upper
corner of each box

The shape of the single-attribute value function for the cost criterion for in-
stance is modelled as follows. The user fixes the number of linear pieces; suppose
that you decide to set it to 2 (which is a parsimonious option and the default
value proposed in Prefcalc); the single-attribute value function of the cost could
for instance be represented as in Figure 6.8. Note that the maximal value of the
utility (reached for a cost of 13 841 e) is scaled in such a way that it corresponds
to the value of the trade-off associated with the cost criterion, i.e. .43 in the exam-
ple shown in Figure 6.8. Note also that with two linear pieces, one for each half
of the cost range, the single-attribute value function is completely determined by
two numbers, i.e. the utility value at mid-range and the maximal utility. Those
values, say u1,1 , u1,2 are variables of the linear program that Prefcalc writes and
solves. The pieces of information on which the formulation of the linear program
relies are obtained from the user. The user is asked to select a few alternatives
that he is familiar with and feels able to rank-order according with his overall
preferences. The ordering of these alternatives, which include the fictitious ideal
and anti-ideal ones, induces the corresponding order on their overall value and
hence, generates constraints of the linear program. Prefcalc then tries to find
levels ui,1 , ui,2 for each criterion i, which will make the additive value function
compatible with the declared information. If the program is not contradictory,
i.e. if an additive value function (with 2-piece piece-wise linear single-attribute
value functions) proves compatible with the preferences, the system tries to find a
solution among all feasible solutions, that maximises the discrimination between

the selected alternatives. If no feasible solution can be found, the system proposes
to increase the number of variables of the model, for instance by using a higher
number of linear pieces in the description of the single-attribute value functions.
This method could be described as a learning process; the system fits the
parameters of the model on the basis of partial information about the users pref-
erences; the set of alternatives on which the user declares his global preferences
may be viewed as a learning set. For more details on the method, the reader is
referred to Vincke (1992), Jacquet-Lagreze and Siskos (1982).
In his ex post study Thierry selects five cars, besides the ideal and anti-ideal
ones and ranks them in the following order:

1. Peugeot 309 GTI 16 (Car 11)

2. Nissan Sunny (Car 3)

3. Mitsubishi Galant (Car 13)

4. Ford Escort (Car 9)

5. Renault 21 (Car 14)

This ranking is compatible with an additive value function. Such a compatible

value function is described in Figure 6.8.
Thierry examines this result and makes the following comments. He agrees
with many features of the fitted single-attribute value functions and in particular

1. the lack of sensitivity in the price in the range from 13 841 e to 17 576 e (he
was a priori estimating his budget at about 17 500 e);

2. the high importance (weight = .23) given to approaching 28 seconds on the

acceleration criterion (above 29 seconds, the car is useless since a difference
of 1 second in acceleration results in the faster car being two car lengths
ahead of the slower one at the end of the test; Thierry declares this criterion
to be the second most important after cost (weight = .43);

3. the importance (weight = .13) of getting as close as possible to 34 seconds in

the acceleration test starting from 40 km/h (above 38 seconds he agrees that
the car loses all attractiveness; the car is not only used in competition; it
must be pleasant in everyday use and hence, the third criterion has a certain
importance although it is of less importance than the second one);

4. the modelling of the road-holding criterion.

However, Thierry disagrees with the modelling of the braking criterion, which
he considers equally important as road-holding. He believes that the relative
importance of the fourth and fifth criteria should be revised. Thierry then looks
at the ranking of the cars according to the computed value function. The ranking
as well as the multi-attribute value assigned to each car are given in Table 6.9.

Rank Cars Value

1 * Peugeot 309/16 (Car 11) 0.84
2 * Nissan Sunny (Car 3) 0.68
3 Renault 19 (Car 10) 0.66
4 Peugeot 309 (Car 12) 0.65
5 Honda Civic (Car 7) 0.61
6 Fiat Tipo (Car 1) 0.54
7 Opel Astra (Car 8) 0.54
8 Mitsubishi Colt (Car 5) 0.53
9 Mazda 323 (Car 4) 0.52
10 Toyota Corolla (Car 6) 0.50
11 Alfa 33 (Car 2) 0.49
12 * Mitsubishi Galant (Car 13) 0.48
13 * Ford Escort (Car 9) 0.32
14 * R 21 (Car 14) 0.16

Table 6.9: Ranking obtained using Prefcalc. The cars ranked by Thierry are those
marked with a *

Thierry feels that Car 10 (Renault 19) is ranked too high while Car 7 (Honda
Civic) should be in a better position.
In view of these observations, Thierry modifies the single-attribute value func-
tions for criteria 4 and 5. For the braking criterion, the utility (0.01) associated
with 2 remains unchanged while the utility of the level 2.7 is raised to 0.1 instead
of 0.01. The road-holding criterion is also modified; the value (0.2) associated with
the level 3.2 is lowered to 0.1 (see Figure 6.9). Note that Prefcalc normalises the
value function in order that the ideal alternative is always assigned the value 1;
of course due to the numbers display format with two decimal positions, the sum
of the maximal values of the single-attribute value functions may be only approx-
imately equal to 1. Running Prefcalc with the altered value functions returns the
ranking in table 6.10 and the revised multi-attribute value after each car name.

After he sees the modified ranking yielded by Prefcalc, Thierry feels that the
new ranking is fully satisfactory. He observes that if he had used Prefcalc a few
years earlier, he would have made the same choice as he actually did; he considers
this as a good point as far as Prefcalc is concerned. He finally makes the following
comments: Using Prefcalc has enhanced my understanding of both the data and
my own preferences; in particular I am more conscious of the relative importance
I give to the various criteria.

Comments on the method

First let us emphasise an important psychological aspect of the empirical validation
of a method or a tool, which is common in human practice: the fact that previous
intuition or previous more informal analyses are confirmed by using a tool, here
Prefcalc, contributes to raising the level of confidence the user puts in the tool.

Brake .1 Road .1

1.3 2.0 2.7 1.2 2.2 3.2

Figure 6.9: Modified single-attribute value functions for the braking and road-
holding criteria

Rank Cars Value

1 * Peugeot 309/16 (Car 11) 0.85
2 * Nissan Sunny (Car 3) 0.75
3 Honda Civic (Car 7) 0.66
4 Peugeot 309 (Car 12) 0.65
5 Renault 19 (Car 10) 0.61
6 Opel Astra (Car 8) 0.55
7 Mitsubishi Colt (Car 5) 0.54
8 Mazda 323 (Car 4) 0.53
9 Fiat Tipo (Car 1) 0.51
10 Toyota Corolla (Car 6) 0.50
11 * Mitsubishi Galant (Car 13) 0.48
12 Alfa 33 (Car 2) 0.47
13 * Ford Escort (Car 9) 0.32
14 * R 21 (Car 14) 0.16

Table 6.10: Modified ranking using Prefcalc. The cars ranked by Thierry are those
marked with *

Observe that the user may well have a very vague understanding of the method
itself; he simply validates the method by using it to reproduce results that he has
confidence in. After such a successful empirical validation step he will be more
prone to use the method in new situations that he does not master that well.
What are the drawbacks and traps of Prefcalc? Obviously Prefcalc can only be
used in cases where the overall preference of the decision-maker can be represented
by an additive multi-attribute value function (as described by Equation 6.8). In
particular, this is not the case when preferences are not transitive or not complete
(for arguments supporting the possible observation of non-transitive preferences,
see the survey by Fishburn (1991)). There are some additional restrictions due
to the fact that the shapes of the single-attribute value functions that can be
modelled by Prefcalc are limited to piece-wise linear functions. This is hardly a
restriction when dealing with a finite set of alternatives; by adapting the number
of linear pieces one can obtain approximations of any continuous curve that can
be as accurate as desired. When bounded to a small number of pieces, this may
however be a more serious restriction.

Stability of ranking
The main problem raised by the use of such a tool is the indetermination of the
estimated single-attribute value functions (including the estimation of the trade-
offs). Usually, if the preferences declared on the set of well-known alternatives are
compatible with an additive value model, there will be several value functions that
can represent these preferences. Prefcalc chooses one such representation according
to the principles outlined above, i.e. the most discriminating (in a sense). Other
choices of a model albeit compatible with the declared preferences on the learning
set, may lead to variations in the rankings of the remaining alternatives. Slight
variations in the trade-off values can yield rank reversals. For instance, with all
trade-offs within .02 of their value in Figure 6.9, changes already occur. Passing
from the set of trade-offs (.43, .23, .13, .10, .10) to (.45, .21, .11, .12, .10) results in
exchanging the positions of Honda Civic and Peugeot 309, which are ranked 3rd
and 4th respectively after the change. This rank reversal is obtained by putting
slightly more emphasis on cost and slightly less on performance. Note that such
a slight change in the trade-offs has an effect on the ranking of the top 4 cars,
those on which Thierry focused after his preliminary analysis (see Table 6.3). It
should thus be very clear that in practice, determining the trade-offs with sufficient
accuracy could be both crucial and challenging. It is therefore of prime importance
to carry out a lot of sensitivity analyses in order to identify which parts of the
result remain reasonably stable.

Dependence on the learning set

In view of the fact that small variations of the trade-offs may even result in changes
in the ranking of the top alternatives, one may question the influence of the se-
lection of a learning set. In the case under examination, the top two alternatives
were chosen to be in the learning set and hence, are constrained to appear in the

correct order in the output of Prefcalc. What would have happened if the learning
set had been different?

Let us take another subset of 5 cars and declare preferences that agree with
the ranking validated by Thierry (Table 6.10). When substituting the top 2 cars
(Peugeot 309/16V, Nissan Sunny) by Renault 19, Mitsubishi Colt, two cars in the
middle segment of the ranking, the vector of trade-offs is (.53, .06, .08, .08, .25)
and the top four in the new ranking are Renault 19 (1), Peugeot 309 (2), Peugeot
309/16V (3), and Nissan Sunny (4); Honda Civic is relegated to the 12th position.
In the choice of the present learning set, stronger emphasis has been put on cost
and safety (brakes and road-holding) and much less on performance (acceleration
and pick up); three of the former top cars remain in the top four; Honda recedes
due to its higher cost and its weakness on road-holding; Renault 19 is heading the
race mainly due to excellent road-holding.

Further experiments have been performed, reintroducing in turn one of the 4

top cars and removing Renault 19. Clearly, the value of the trade-offs may depend
drastically on the learning set. Some sort of preliminary analysis of the users
preferences can help to choose the learning set or understand the variations in the
ranking and the trade-offs a posteriori. In the present case, one can be relatively
satisfied with the results since the top 3 cars are usually well-ranked; the ranking
of the Honda Civic is much more unstable and it is not difficult to understand why
(weakness on road-holding and relatively high cost). The Renault 19 appears as
an outsider due to excellent road-holding. Of course for the rest of the cars huge
variations may appear in their ranking, but one is usually more interested in the
top ranked alternatives.

From a general point of view, the option implemented in the mathematical

programming model to reduce the indeterminacies (essentially, by choosing to
maximise the contrast between the evaluations of the alternatives in the learning
set) is not aimed at being as insensitive as possible with regard to the selection
of a learning set. Other options could be experimentally investigated in order to
see whether some could consistently yield more stable evaluations. It should be
noted however that stability, which may be a desirable property in the perspective
of uncovering an objective model of preferences measurement, is not necessarily
a relevant requirement when the goal is to exploit partial available information.
One may expect that the decision-maker will naturally choose alternatives that
he considers as clearly distinct from one another as members of the learning set;
the analyst might alternatively instruct the decision-maker to do so. In a learning
process, where, typically, information is incomplete, it must be decided how to
complement the available facts by some arbitrary default assumptions. The infor-
mation should then be collected while taking the assumptions made into account;
one may consider that in the case of Prefcalc, the analysts instructions of select-
ing alternatives that are as contrasted as possible, is in good agreement with the
implementation options.

6.3.4 Conclusion
This section has been devoted to the construction of a formal model that represents
preferences on a numerical scale. Such a model can only be expected to exist when
preferences satisfy rather demanding hypotheses; it thus relies on firm theoretical
bases, which is undoubtedly part of the intellectual appeal of the method. There
is at least one additional advantage to theoretically well-founded decision models;
such models can be used to legitimate a decision to persons that have not been
involved in the decision making process. Once the hypotheses of the model have
been accepted or proved valid in a decision context and provided the process of
elicitation of the various parameters of the model has been conducted correctly,
the decision becomes transparent.
The additive multi-attribute value model is rewarding, when established and
accepted by the stake-holders, since it is directly interpretable in terms of decision;
the best decision is the one the model values most (provided the imprecisions in
the establishment of the model and the uncertainties in the evaluation information
allow to discriminate at least between the top alternatives). The counterpart of
the clear-cut character of the conclusions that can be drawn from the model is
that establishing the model requires a lot of information and of a very precise and
particular type. This means that the model may be inadequate not only because
the hypotheses could not be fulfilled but also because the respondents might feel
unable to answer the questions or because their answers might not be reliable.
Indirect methods based on exploiting partial information and extrapolating it (in
a recursive validation process) may help when the information is not available in
explicit form; it remains that the quality of the information is crucial and that a lot
of it is needed. In conclusion, direct assessment of multi-attribute value functions
is a narrow road between the practical problem of obtaining reliable answers to
difficult questions and the risks involved in building a model on answers to simpler
but ambiguous questions.
In the next section we shall explore a very different formal approach that may
be less demanding with regard to the precision of the information, but also provides
less conclusive outputs.

6.4 Outranking methods

6.4.1 Condorcet-like procedures in decision analysis
Is there any alternative way of dealing with multiple criteria evaluation in view
of a decision to the one described above for building a one-dimensional synthetic
evaluation on some sort of super-scale? To answer this question (positively), in-
spiration can be gained from the voting procedures discussed in Chapter 2 (see
also Vansnick (1986)). Suppose that each voter expresses his preferences through
a complete ranking of the candidates. With Bordas method, each candidate is
assigned a rank for each of the voters (rank 1 if candidate is ranked first by a voter,
rank 2 if he is ranked second, and so on); the Borda score of a candidate is the
sum of the ranks assigned to him by the voters; the winner is the candidate with

Cars 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 5 3 1 2 2 3 3 2 3 2 2 2 2 3
2 2 5 2 4 2 3 2 3 3 1 1 1 4 3
3 4 4 5 4 4 4 4 4 4 3 2 3 5 4
4 3 1 1 5 1 3 1 2 1 2 1 1 4 2
5 3 3 1 5 5 3 2 2 2 3 1 1 5 2
6 2 2 1 2 2 5 2 2 2 2 1 1 3 2
7 3 3 1 4 4 4 5 3 4 3 2 2 4 4
8 3 2 1 4 4 4 3 5 3 2 0 2 4 3
9 2 3 1 4 4 3 1 2 5 2 1 2 4 3
10 4 4 2 3 2 3 2 3 3 5 3 2 4 3
11 4 4 3 4 4 4 4 5 4 3 5 4 4 5
12 4 4 2 4 4 4 4 4 3 4 3 5 5 4
13 3 2 0 2 1 2 1 2 1 1 1 0 5 1
14 2 3 1 3 3 3 1 3 3 2 0 1 4 5

Table 6.11: Number of criteria in favour of a when compared to b for all pairs of
cars a, b in the Choosing a car problem

the smallest Borda score. This method can be seen as a method of construction of
a synthetic evaluation of the alternatives in multiple criteria decision analysis, the
points of view corresponding to the voters and the alternatives to the candidates;
all criteria-voters have equal weight and coding by the rank number of the position
of the candidate in a voters preference looks like a form of evaluation.
Condorcets method consists of a kind of tournament where all candidates
compare in pairwise contests. A candidate is declared to be preferred to another
according to a majority rule, i.e. if more voters rank him before the latter than
the converse. The result of such a procedure is a preference relation on the set
of candidates that in general is neither transitive nor acyclic. A further step is
thus needed in order to exploit this relation in view of the selection of one or
several candidates or in view of ranking all the candidates. This idea can of course
be transposed in the multiple criteria decision context. We do this below, using
Thierrys case again for illustrative purpose; we show how the problems raised by
a direct transposition rather naturally lead to elementary outranking methods.
For each pair of cars a and b, we count the number of criteria according to
which a is at least as good as b. This yields the matrix given in Table 6.11; the
elements of the matrix are integers ranging from 0 to 5. Note that we might have
alternatively decided to count the criteria for which a is better than b, not taking
into account criteria for which a and b are tied.
What we could call the Condorcet preference relation is obtained by deter-
mining for each pair of alternatives a, b whether or not there is a (simple) majority
of criteria for which a is at least as good as b. Since there are 5 criteria, the ma-
jority is reached as soon as at least 3 criteria favour alternative a when compared
to b. The preference matrix is thus obtained by substituting 1 to any number
larger or equal to 3 in Table 6.11 and 0 to any number smaller than 3 yielding the

Cars 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 1 1 0 0 0 1 1 0 1 0 0 0 0 1
2 0 1 0 1 0 1 0 1 1 0 0 0 1 1
3 1 1 1 1 1 1 1 1 1 1 0 1 1 1
4 1 0 0 1 0 1 0 0 0 0 0 0 1 0
5 1 1 0 1 1 1 0 0 0 1 0 0 1 0
6 0 0 0 0 0 1 0 0 0 0 0 0 1 0
7 1 1 0 1 1 1 1 1 1 1 0 0 1 1
8 1 0 0 1 1 1 1 1 1 0 0 0 1 1
9 0 1 0 1 1 1 0 0 1 0 0 0 1 1
10 1 1 0 1 0 1 1 1 1 1 1 0 1 1
11 1 1 1 1 1 1 1 1 1 1 1 1 1 1
12 1 1 0 1 1 1 1 1 1 1 0 1 1 1
13 1 0 0 0 0 0 0 0 0 0 0 0 1 0
14 0 1 0 1 1 1 0 1 1 0 0 0 1 1

Table 6.12: Condorcet Preference relation for the Choosing a carproblem. A

1 at the intersection of the a row and the b column means that a is rated not
lower than b on at least 3 criteria

relation described by the 0-1 matrix in Table 6.12. Note that a criterion counts
both in favour of a and in favour of b only if a and b are tied on that criterion;
the relation is reflexive since any alternative is at least as good as itself along all

Majority rule and cycles

It is not immediately apparent that this relation has cycles and even cycles that
go through all alternatives; an instance of such a cycle is 1, 7, 10, 11, 3, 12, 5, 2,
14, 8, 9, 4, 6, 13, 1. Obviously it is not straightforward to suggest a good choice on
the basis of such a relation since one can find 3 criteria (out of 5) saying that 1 is
at least as good as 7, 3 (possibly different) criteria saying that 7 is at least as good
as 10, . . . , and finally 3 criteria saying that 13 is at least as good as 1. How can
we possibly obtain something from this matrix in view of our goal of selecting the
best car? A closer look at the preference relation reveals that some alternatives
are preferred to most others while some to only a few ones; among the former are
alternatives 11 (preferred to all), 3 (preferred to all but one), 12 (preferred to all
but 2), 7 and 10 (preferred to all but 3). The same alternatives appear as seldom
beaten: 3 and 11 (only once, excluding by themselves), 12 (twice), then come 10
(5 times) and 7 (6 times).
To make things appear more clearly, by avoiding cycles as much as possible,
one might decide to impose more demanding levels of majority in the definition of
a preference relation. We might require that an alternative be at least better than
another on 4 criteria. The new preference relation is shown in Table 6.13.
All cycles in the previous relation disappeared. When ranking the alternatives

Cars 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 1 0 1 0 0 0 0 0 0 0 0 1 0
3 1 1 1 1 1 1 1 1 1 0 0 0 1 1
4 0 0 0 1 0 0 0 0 0 0 0 0 1 0
5 0 0 0 1 1 0 0 0 0 0 0 0 1 0
6 0 0 0 0 0 1 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 1 0 1 0 0 0 1 1
8 0 0 0 1 1 0 0 1 0 0 0 0 1 0
9 0 0 0 0 0 0 0 0 1 0 0 0 1 0
10 1 1 0 0 0 0 0 0 0 1 0 0 1 0
11 1 1 0 1 1 0 1 1 1 0 1 1 1 1
12 1 1 0 1 1 1 1 1 0 1 0 1 1 1
13 0 0 0 0 0 0 0 0 0 0 0 0 1 0
14 0 0 0 0 0 0 0 0 0 0 0 0 0 1

Table 6.13: Condorcet preference relation for the Choosing a car problem. A
1 at the intersection of the a row and the b column means that a is rated not
lower than b on at least 4 criteria

by the number of those they beat (i.e. are at least as good on 4 criteria or more)
one sees that 3, 11 and 12 come in the first position (they are preferred to 10 other
cars), then there is a big gap after which come 7, 8 and 10 that beat only 3 other
cars. Conversely, there are two non-beaten cars, 3 and 11, then come 10 and 12
(beaten by one car); 7 is beaten by 3 cars.
In the present case, we see that the simple approach that was used essentially
makes the same cars emerge as the methods used so far. There are at least two
radical differences between approaches based on the weighted sum and some more
sophisticated way of assessing each alternative by a single number that synthesises
all the criteria values. One is that all criteria have been considered equally impor-
tant; it is possible however to take information on the relative importance of the
criteria into account as will be seen in section 6.4.3.
The second difference is more in the nature of the type of approach; the most
striking point is that the size of the differences in the evaluations of a and b for all
criteria does not matter; only the signs of those differences do. In other words, had
the available information been rankings of the cars with respect to each criterion
(instead of numeric evaluations), the result of the Condorcet procedure would
have been exactly the same. More precisely, suppose that all that we know (or
that Thierry considers relevant in terms of preferences) about the cost criterion is
the ordering of the cars according to the estimated cost, i.e.

Car 6 1 Car 5 1 Car 2 1 Car 4 1 Car 12 1

Car 10 1 Car 3 1 Car 13 1 Car 11 1 Car 8 1
Car 1 1 Car 7 1 Car 9 1 Car 14

where 1 represents is preferred to . . . on Criterion 1 , i.e. is cheaper than

. . . . Suppose that similar hypotheses are made for the other 4 criteria; if this were
the case we would have obtained the same matrices as in Tables 6.12 and 6.13.
Of course, neglecting the size of the differences for a criterion such as cost may
appear as misusing the available information; there are at least two considerations
that could mitigate this commonsense reaction:

the assessments for the cars on the cost criterion are rather rough estimations
of an expected cost (see section 6.1.1); in particular it is presumed that on
average the lifetimes of all alternatives are equal; is it reasonable in those
circumstances to rely on precise values of differences of these estimations to
select the best alternative?

estimations of cost, even reliable ones, are not necessarily related with pref-
erences on the cost criterion in a simple way.

Such issues were discussed extensively in section 6.2.4. The whole analysis carried
out there was aimed towards the construction of a multiple criteria value function,
which implies making any difference in evaluations on a criterion equivalent to
some uniquely defined difference for any other criterion. The many methods that
can be used to build a value function by questioning a decision-maker about his
preferences may well fail however; let us list a few reasons for the possible failure
of these methods:

time pressure may be so intense that there is not enough time available to
engage in the lengthy elicitation process of a multiple criteria value function;

it may be that the importance of the decision to be made does not justify
such an effort;

the decision-maker might not know how to answer the questions or might
try to answer but prove inconsistent or might feel discomfort in being forced
to give precise answers where things are vague to him;

in case of group decision, the analyst may be unable to make the various
decision-makers agree on the answers to be given to some of the questions
raised in the elicitation process.

In such cases it may be inappropriate or inefficient to try building a value function

and other approaches may be preferred. This appears perhaps better if we consider
the more artificial scales associated with criteria 4 and 5 (see section 6.1.1 concern-
ing the construction of these scales). Take, for instance, criterion 4 (Brakes). Does
the difference between the levels 2.33 and 2.66 have a quantitative meaning? If it
does, is this difference, in terms of preferences, more than, less than or equal to
the difference between the levels 1.66 and 2? How much would you accept to pay
(in terms of criterion 1) to raise the value for criterion 4 from 2.33 to 2.66 or from
1.33 to 2.33? Of course questions raised for eliciting value functions are more indi-
rect but they still require a precise perception of the meaning of the levels on the
scale of criterion 4 by the decision-maker. Such a perception can only be obtained

by having experienced the braking behaviour of specific cars rated at the various
levels of the scale, but such knowledge cannot be expected from a decision-maker
(otherwise there would be no room on the marketplace for all the magazines that
evaluate goods in order to help consumers spend their money while making the
best choice). Also remember that braking performance has been described by the
average of 3 indices evaluating aspects of the cars braking behaviour; this does
not favour a deep intuitive perception of what the levels on that scale may really
mean. So, one has to admit that in many cases the definition of the levels on scales
is quite far from precise in quantitative terms and it may be hygienic not to use
the fallacious power of numbers. This is definitely the option chosen in the meth-
ods discussed in the present section. Not that these methods are purely ordinal;
but differences between levels on a scale are carefully categorised, yet usually in a
coarse-grained fashion, in order not to take into account differences that are only
due to the irrelevant precision of numbers.

6.4.2 A simple outranking method

The Condorcet idea for a voting procedure has been transposed in decision analysis
under the name of outranking methods. Such a transposition takes the peculiari-
ties of the decision analysis context into account, in particular the fact that criteria
may be perceived as unequally important; additional elements such as the notion
of discordance have also been added. The principle of these methods is as fol-
lows. Each pair of alternatives is considered in turn independently of third part
alternatives; when looking at alternatives a and b, it is claimed that a outranks
b if there are enough arguments to decide that a is at least as good as b, while
there is no essential reason to refute that statement (Roy (1974), cited by Vincke
(1992), p. 58). Note that taking strong arguments against declaring a preference
into account is typically what is called discordance and is original with respect
to the simple Condorcet rule. Such an approach has been operationalised through
various procedures and particularly the family of ELECTRE methods associated
with the name of B. Roy. (For an overview of outranking methods, the reader is
referred to the books by Vincke (1992) and Roy and Bouyssou (1993)). Below, we
discuss an application of the simplest of these methods, ELECTRE I, to Thierrys
case; ELECTRE I is a tool designed to be used in the context of a choice deci-
sion problem; it builds up a set of which the best alternativeaccording to the
decision-makers preferencesshould be a member. Let us emphasise that this set
cannot be described as the set of best alternatives, not even a set of good alter-
natives, but just a set that contains the best alternatives. We shall then show
how the fundamental ideas of ELECTRE I can be sophisticated, in particular in
view of helping to rank the alternatives. Our goal is not to make a survey of all
outranking methods; we just want to present the basic ideas of such methods and
illustrate some problems they may raise.

The lack of transitivity, acyclicity and completeness issues

As a preamble, it may be useful to emphasise the fact that outranking methods

(and more generally methods based on pairwise comparisons) do not generally yield
preferences that are transitive (not even acyclic). This point was already made in
Chapter 2 about Condorcets method. Since the hypotheses of Arrows theorem
can be re-formulated to be relevant in the framework of multiple criteria decision
analysis (through the correspondence candidate-alternative, voter-criterion; see
also Bouyssou (1992) and Perny (1992)), it is no wonder that methods based on
comparisons of alternatives by pairs, independently of the other alternatives, will
seldom directly yield a ranking of the alternatives. The pairs of alternatives that
belong to the outranking relation are normally those between which the preference
is established with a high degree of confidence; contradictions are reflected either in
cycles (a outranks b that outranks c that . . . that outranks a) or incomparabilities
(neither a outranks b nor the opposite).
Let us emphasise that the lack of transitivity or of completeness, although rais-
ing operational problems, may be viewed not as a weakness but rather as faithfully
reflecting preferences as they can be perceived at the end of the study. Defenders
of the approach support the idea that forcing preferences to be expressed in the
format of a complete ranking is in general too restrictive; there is experimental
evidence that backs their viewpoint (Tversky (1969), Fishburn (1991)). Explicit
recognition that some alternatives are incomparable may be an important piece of
information for the decision-maker.
In addition, as repeatedly stressed in the writings of B. Roy, the outranking
relation should be interpreted as what is clear-cut in the preferences of the decision-
maker, something like the surest and most stable expression of a complex, vague
and evolving object that is named, for simplicity, the preferences of the decision-
maker. In this approach very little hypotheses are made on preferences (like
rationality hypotheses); one may even doubt that preferences pre-exist the process
from which they emerge.
The analysis of a decision problem is conceived as an informational process,
in which, carefully, prudently and interactively, models are built that reflect, to
some extent, the way of thinking, the feelings and the values of a decision-maker;
in this concept, the concern is not making a decision but helping a decision-maker
to make up his mind, helping him to understand a decision problem while taking
his own values into account in the modelling of the decision situation.
The approach could be called constructive; it has many features in common
with a learning process; however, in contrast with most artificial intelligence prac-
tice, the model of preferences is built explicitly and formally; preferences are not
simply described through rules extracted from partial information obtained on a
learning set. For more about the constructive approach including comparisons with
the classical normative and descriptive approaches (see Bell, Raiffa and Tversky
(1988)), the reader is referred to Roy (1993).
Once the outranking relation has been constructed, the job of suggesting a
decision is thus not straightforward. A phase of exploitation of the outranking
relation is needed in order to provide the decision-maker with information more

directly interpretable in terms of a decision. Such a two-stage process offers the

advantage of good control on the transformation of the multi-dimensional infor-
mation into a model of the decision-makers preferences including a certain degree
of inconsistency and incompleteness.

6.4.3 Using ELECTRE I on the case

We briefly review the principles of the ELECTRE I method. For each pair of
alternatives a and b, the so-called concordance index is computed; it measures
the strength of the coalition of criteria that support the idea that a is at least as
good as b. The strength of a coalition is just the sum of the weights associated to
the criteria that constitute the coalition. The notion of weights will be discussed
below. If all criteria are equally important, the concordance index is proportional
to the number of criteria in favour of a as compared to b as in the Condorcet-like
method discussed above. The level from which a coalition is judged strong enough
is determined by the so-called concordance threshold; in the Condorcet voting
method, with the simple majority rule, this threshold is just half the number of
criteria and in general one will choose a number above half the sum of the weights
of all criteria. Another feature that contrasts ELECTRE with pure Condorcet but
also with purely ordinal methods, is that some large differences in evaluation, when
in disfavour of a, might be pinpointed as preventing a from outranking b. One
therefore checks whether there is any criterion for which b is so much better than a
that it would make it meaningless for a to be declared preferred overall to b; if this
happens for at least one criterion one says that there is a veto to the preference of a
over b. If the concordance index passes some threshold (concordance threshold)
and there is no veto of b against a, then a outranks b. Note that the outranking
relation is not asymmetric in general; it may happen that a outranks b and that b
outranks a.
This process yields a binary relation on the set of alternatives, which may
have cycles and be incomplete (neither a outranks b nor the opposite). In order
to propose a set of alternatives of particular interest to the decision-maker from
which the best compromise alternative should emerge, one extracts the kernel of
the graph of the outranking relation after having the cycles reduced; in other words,
all alternatives in a cycle are considered to be equivalent; they are substituted by
a unique representative node; in the resulting relation without cycles, the kernel
is defined as a subset of alternatives that do not outrank one another and such
that each alternative not in the kernel is outranked by at least one alternative in
the kernel; in particular all non-outranked alternatives belong to the kernel. In a
graph without cycles, a unique kernel always exists. It should be emphasised that
all alternatives in the kernel are not necessarily good candidates for selection; an
alternative incomparable to all others is always in the kernel; alternatives in the
kernel may be beaten by alternatives not in the kernel. So, the kernel may be
viewed as a set of alternatives on which the decision-makers attention should be
In order to apply the method to Thierrys case, we successively have to deter-

weights for the criteria

a concordance threshold
ordered pairs of evaluations that lead to a veto (and this for every criterion)

Evaluating coalitions of criteria

The concordance index c(a, b), that measures the coalition of criteria along which
a is at least as good as b may be computed by the formula
(6.12) c(a, b) = pi
i:gi (a)gi (b)

where the pi s are normalised weights that reflect the relative importance of the
criteria; gi (a) denotes, as usual, the evaluation of alternative a for criterion i (which
is assumed to be maximised; if it were to be minimised, the weight pi would be
added when the converse inequality holds, i.e. gi (a) gi (b)). So, as often as the
evaluation of a passes or equals that of b on a criterion, its weight now enters into
the weight of the coalition (additively) in favour of a. A criterion can count both
for a against b and the opposite if and only if gi (a) = gi (b).
In the context of outranking, the weights are not trade-offs; they are completely
independent of the scales for the criteria. A practical consequence is that one
may question the decision-maker in terms of relative importance of the criteria
without reference to the scales on which the evaluations for the various viewpoints
are expressed. This does not mean however that they are independent of the
method and that one could use values given spontaneously by the decision-maker
or through questioning in terms of importance without care, without reference
to the evaluations as is done in Saatys procedure. It is important to bear in mind
how the weights will be used, in this case to measure the strength of coalitions
in pairwise comparisons and decide on the preference only on the basis of the
To be more specific and contrast the meaning of the weights from those used
in weighted sums, let us first consider those suggested by Thierry in section 6.2.2,
i.e. (1, 2, 1, 0.5, 0.5). Note that these were not obtained through questioning on the
relative importance of criteria but in the context of the weighted sum with Thierry
bearing re-scaled evaluations in mind: the evaluations on each criterion had been
divided by the maximal value gi,max attained for that criterion. Dividing the
weights by their sum (= 5), yields the normalised weights (.2, .4, .2, .1, .1). Using
these weights in outranking methods would lead to an overwhelming predominance
of criteria 2 (Acceleration) and 3 (Pick-up), which are also linked since they are
facets of the cars performance. With such weights and a concordance threshold of
at least .5 , it is impossible for a car to be outranked when it is better on criteria 2
and 3 even if all other criteria are in favour of an opponent. It was never Thierrys
intention that once a car is better on criteria 2 and 3, there is no need for looking
at the other criteria; the whole initial analysis shows on the contrary, that a fast
and powerful car is useless, for instance, if it is bad on the braking or road-holding
criterion. Such a feature of the preference structure could indeed be reflected

through the use of vetoes, but only in a negative manner, i.e. by removing the
outranking of a safe car by a powerful one, not by allowing a safe car to outrank
a powerful one. Note that the above weights may nevertheless be appropriate
for a weighted sum because in such a method, the weights are multiplied by the
evaluations (or re-coded evaluations). To make it clearer, consider the following
reformulation of the condition under which a is preferred to b in the weighted sum
model (a similar formulation is straightforward in the additive value model)
(6.13) a % b iff ki (gi (a) gi (b)) 0.

If a is slightly better than b on a point of view i, the influence of this fact in the
comparison between a and b is reflected by the term ki (gi (a) gi (b)) which
is presumably small. Hence, important criteria count for little in pairwise com-
parisons when the difference between the evaluations of the alternatives are small
enough. On the contrary, in outranking methods, weights are not divided; when a
is better than b on some criterion, the full weight of the criterion counts in favour
of a, whether a is either slightly or by far better than b.
Since the weights in a weighted sum depend on the scaling of each criterion and
there is no acknowledged standard scaling, it makes no sense in principle to use
the weights initially provided by Thierry as coefficients measuring the importance
of the criteria in an outranking method. If we nevertheless try to use them, we
might consider the weights used with the normalised criteria of Table 6.4. We see
that the importance of the safety coalition (Criteria 4 and 5) would be negligible
(weight = .20), while the importance of the performance coalition (Criteria 2
and 3) would be overwhelming (weight = .60). There is another reasonable nor-
malisation of the criteria that does not fix the zero of the scale but rather maps
the smallest attained value gi,min onto 0 and the largest gi,max onto 1. Transform-
ing the weights accordingly (i.e. multiplying them by the inverse of the range of
the values for the corresponding criterion prior to the transformation) one would
obtain (.28, .14, .13, .20, .25) as a weight vector. With these values as coefficients of
importance, the safety coalition (Criteria 4 and 5; weight = .45) becomes more
important than the performance coalition (Criteria 2 and 3; weight = .27) that
Thierry may consider unfair. As an additional conclusion, one may note that the
values of the weights vary tremendously depending on the type of normalisation
Now look at the weights (.35, .24, .17, .12, .12 ) obtained through Saatys ques-
tioning procedure in terms of importance (see section 6.3.2). Using these weights
for measuring strength of coalitions does not seem appropriate, since criteria 1 and
2s predominance is too strong (joint weight = .35 + .24 = .59).
Due to the all or nothing character of the weights in ELECTRE I, one is
inclined to choose less contrasted weights than those examined above. Although
there are procedures that have been proposed to elicit such weights (see Mousseau
(1993), Roy and Bouyssou (1993)), we will just choose a set of weights in an
intuitive manner; let us take weights proportional to (10, 8, 6, 6, 6) as reflecting the
relative importance of the criteria. At least the ordering of the values seems to be

Cars 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 1 .5 .17 .33 .33 .56 .61 .33 .61 .33 .33 .33 .33 .61
2 .49 1 .44 .83 .33 .56 .44 .61 .61 .28 .28 .28 .83 .61
3 .83 .73 1 .73 .73 .73 .78 .78 .83 .56 .44 .56 1 .78
4 .66 .17 .28 1 .17 .56 .28 .44 .28 .44 .28 .28 .78 .44
5 .66 .66 .28 1 1 .56 .44 .44 .44 .66 .28 .28 1 .44
6 .44 .44 .28 .44 .44 1 .44 .44 .44 .44 .28 .28 .61 .44
7 .56 .56 .22 .73 .73 .73 1 .56 .83 .56 .39 .39 .73 .83
8 .66 .39 .22 .73 .73 .73 .61 1 .66 .39 0 .39 .73 .66
9 .39 .56 .17 .73 .73 .56 .17 .33 1 .39 .17 .39 .73 .61
10 .83 .73 .44 .56 .33 .56 .61 .61 .61 1 .61 .33 .83 .61
11 .83 .73 .56 .73 .73 .73 .78 1 .83 .56 1 .73 .73 1
12 .83 .73 .44 .73 .73 .73 .78 .78 .61 .83 .61 1 1 .78
13 .66 .39 0 .39 .17 .39 .28 .44 .28 .17 .28 0 1 .28
14 .39 .56 .22 .56 .56 .56 .17 .56 .56 .39 0 .22 .73 1

Table 6.14: Concordance index (rounded to two decimals) for the Choosing a
car problem

in agreement with what is known about Thierrys perceptions. Normalising the

weight vector yields (.27, .22, .17, .17, .17) after rounding in such a way that the
normalised weights sum up to 1.00. The weights of the three groups of criteria
are rather balanced; .27 for cost, .39 for performance and .34 for safety. The
concordance matrix c(a, b) computed with these weights is shown in Table 6.14.

Determining which coalitions are strong enough

At this stage we have to build the concordance relation, a binary relation obtained
through deciding which coalitions in Table 6.14 are strong enough; this is done by
selecting a concordance threshold above which we consider that they are. If we set
the concordance threshold at .60, we obtain a concordance relation with a cycle
passing through all alternatives but one, which is Car 3. This tells us something
about coalitions that we did not know. Previous analysis with equal weights (see
Section 6.4.1) showed that the relation in Table 6.12, obtained through looking at
concordant coalitions involving at least three criteria, had a cycle passing through
all alternatives; with the weights we have now chosen, the lightest coalition
of three criteria involves criteria 3, 4 and 5 and weighs .51; then, in increasing
order, we have three different coalitions weighing .56 (two of the criteria 3, 4, 5
with criterion 2), and three coalitions weighing .61 (two of the criteria 3, 4, 5 with
criterion 1); finally there are three coalitions weighing .66 (one of the three criteria
3, 4, 5 together with criteria 1 and 2). Cutting the concordance index at .60 thus
only keeps the 3-coalitions that contain criterion 1 with the coalitions involving at
least 4 criteria.
The new thing that we can learn is the following: the relation obtained by
looking at coalitions of at least 4 criteria plus coalitions of three that involve
criterion 1 has a big cycle. When we cut above .62 there is no longer a cycle. The
lightest 4-coalition weighs .73 and there is only one value of the concordance
index between .61 and .73, namely .66. So cutting between .66 and .72 will yield
the relation in Table 6.13, which we have already looked at; a poorer relation
(i.e. with fewer arcs) is obtained when cutting above .73. In the sequel we will

concentrate on two values of the concordance threshold, .60 and .65, that are
on both sides of the borderline separating concordance relations with and without
cycles; above these values, concordance relations tend to become increasingly poor;
below, they are less and less discriminating.
In the above presentation the weights sum up to 1. Note that multiplying
all the weights by a positive number would yield the same concordance relations
provided the concordance threshold is multiplied by the same factor; the weights
in ELECTRE I may be considered as being assessed on a ratio scale, i.e. up to a
positive scaling factor.

Supporting choice or ranking

Before studying discordance and veto we show how a concordance relation, which
is just an outranking relation without veto, can be used for supporting a choice
or a ranking in a decision process. Introducing vetoes will just remove arcs from
the concordance relation but the operations performed on the outranking relation
during the exploitation phase are exactly those that are applied below to the
concordance relation.
In view of supporting a choice process, the exploitation procedure of ELECTRE
I firstly consists in reducing the cycles, which amounts to consider all alternatives
in a cycle as equivalent. The kernel of the resulting acyclic relation is then searched
for and it is suggested that the kernel contains all alternatives on which the at-
tention of the decision-maker should be focused. Obviously, reducing the cycles
involves some drawbacks. For example, cutting the concordance relation of Table
6.14 at .60 yields a concordance relation with cycles involving all alternatives but
Car 3; there is no simple cycle passing once through all alternatives except Car 3;
an example of (non-simple) cycle is 1, 7, 9, 5, 10, 11, 12, 2, 14, 13, 1 plus, starting
from 12, 12, 8, 4, 1 and again, 12, 6, 1. Reducing the cycles of this concordance
relation results in considering two classes of equivalent alternatives; one class is
composed of the single Car 3 while the other class comprises all other alternatives.
Beside the fact that this partition is not very discriminating it also considers as
equivalent alternatives that are not in the same simple cycle. Moreover, the infor-
mation on how the alternatives compare with respect to all others is completely
lost; for instance Car 12, which beats almost all other alternatives in the cut at
.60 of the concordance relation, would be considered as equivalent to Car 6 which
beats almost no other car.
For illustrative purposes, we consider the cut at level .65 of the concordance
index, which is the largest acyclic concordance relation that can be obtained; this
relation is shown in Table 6.15. Its kernel is composed of cars 3, 10 and 11. Cars 3
and 11 are not outranked and car 10 is the only alternative that is not outranked
either by car 3 or by car 11. This seems to be an interesting set in a choice process,
in view of the analysis of the problem carried out so far.
Rankings of the alternatives may also be obtained from Table 6.15 in a rather
simple manner. For instance, consider the alternatives either in decreasing order
of the number of alternatives they beat in the concordance relation or in increasing
order of the number of alternatives by which they are beaten in the concordance

Cars 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 1 0 1 0 0 0 0 0 0 0 0 1 0
3 1 1 1 1 1 1 1 1 1 0 0 0 1 1
4 1 0 0 1 0 0 0 0 0 0 0 0 1 0
5 1 1 0 1 1 0 0 0 0 1 0 0 1 0
6 0 0 0 0 0 1 0 0 0 0 0 0 0 0
7 0 0 0 1 1 1 1 0 1 0 0 0 1 1
8 1 0 0 1 1 1 0 1 1 0 0 0 1 1
9 0 0 0 1 1 0 0 0 1 0 0 0 1 0
10 1 1 0 0 0 0 0 0 0 1 0 0 1 0
11 1 1 0 1 1 1 1 1 1 0 1 1 1 1
12 1 1 0 1 1 1 1 1 0 1 0 1 1 1
13 1 0 0 0 0 0 0 0 0 0 0 0 1 0
14 0 0 0 0 0 0 0 0 0 0 0 0 1 1

Table 6.15: Concordance relation for the Choosing a car problem with weights
.28, .22, .17, .17, .17 and concordance threshold .65

Class 1 2 3 4 5 6 7 8 9
A 11 3, 12 8 7 5 9, 10 2, 4 13, 14 1, 6
(11) (10) (7) (6) (5) (3) (2) (1) (0)
B 3, 11 12 10 7, 8 9 2, 6, 14 5 1, 4 13
(0) (1) (2) (3) (4) (5) (6) (8) 11

Table 6.16: Rankings obtained from counting how many alternatives are beaten
(ranking A) or beat (ranking B) each alternative in the concordance relation
(threshold .65); the numbers between parentheses in the second row of ranking A
(resp. ranking B) are the numbers of beaten (resp. beating) alternatives for each
alternative of the same column in the first row

relation. This amounts to counting the 1s respectively in rows and columns of

Table 6.15 and ranking the alternatives accordingly (we do not count the 1s on
the diagonal since the coalition of criteria saying that an alternative is at least
as good as itself always encompasses all criteria); the corresponding rankings are
respectively labelled A and B in Table 6.16. We observe that the usual group
of good alternatives form the top two classes of these rankings.
There are more sophisticated ways of obtaining rankings from outranking re-
lations. ELECTRE II, which we do not describe here, was designed for fulfilling
this goal. To some extent, it makes better use of the information contained in the
concordance index, since the ranking is based on two cuts, one linked with a weak
preference threshold, the other, with a strong preference threshold; for instance
in our case, one could consider that the .60 cut corresponds to weak preference
(or weak outranking) while the .65 cut corresponds to strong preference. In the

above method, the information contained in other cutting levels has been totally
ignored although the rankings obtained from them may not be identical. They
may even differ significantly as can be seen when deriving a ranking from the .60
cut by using the method we applied to the .65 cut.


To this point, both in the Condorcet-like method and the basic ELECTRE I
method (without veto), we treated the assessments of the alternatives as if they
were ordinal data, i.e. we could have obtained exactly the same results (kernel
or ranking) by working with the orders induced from the set of alternatives by
their evaluations on the various criteria. Does this mean that outranking methods
are purely ordinal? Not exactly! More sophisticated outranking methods exploit
information that is richer than purely ordinal but not as demanding as cardinal.
This is done through what we shall call thresholding. Thresholding amounts to
identifying intervals on the criteria scales, which represent the minimal difference
evaluation above which a particular property holds. For instance, consider that
the assessment of b on criterion i, gi (b), is given and criterion i is to be maximised;
from which value gi (b) + ti (gi (b)) onwards, will an alternative a be said to be
preferred to b? Implicitly, we have considered previously that b was preferred to a
on criterion i as soon as gi (b) gi (a), i.e. we have considered that ti (gi (b)) = 0. In
view of imprecision in the assessments and since it is not clear for all criteria that
there is a marked preference when the difference |gi (a)gi (b)| is small, one may be
led to consider a non-null threshold to model preference. In our case, for instance,
it is not likely that Thierry would really mark a preference between cars 3 and 10
on the Cost criterion since their estimated costs are within 10 e (see Table 6.2).
Thresholding is all the more important that, as mentioned at the end of section
6.4.1, the size of the interval between the evaluations is not taken into account
when deciding that a is overall preferred to b. Hence one should be prudent when
deciding that a criterion is or is not an argument for saying that a is at least as
good as b; therefore, it is reasonable to determine a threshold function ti and say
that criterion i is such an argument as soon as gi (a) gi (b) + ti (gi (b)); since we
examine reasons for saying that a is at least as good as b, not for saying that a is
(strictly) better than b, the function ti should be negatively valued.
Determining such a threshold function is not necessarily an easy task. One
could ask the decision-maker to tell, ideally for each evaluation gi (a) of each al-
ternative on each criterion, from which value onwards an evaluation should be
considered at least as good as gi (a). Things may become simpler if the threshold
may be considered constant or proportional to gi (a) (e.g. ti (gi (a)) = .05 gi (a)).
Note that constant thresholds could be used when a scale is linear in the sense
that equal differences throughout a scale have the same meaning and consequences
(see end of section 6.2.3); however this is not a necessary condition since some dif-
ferences, but not all, need to be equivalent throughout the scale. In any case,
Definition 6.12 of the concordance index is adapted in a straightforward manner

as follows and the method for building an outranking relation remains unchanged:
(6.14) c(a, b) = pi .
i:gi (a)gi (b)+ti (gi (b))

Note that preference thresholds, that lead to indifference zones, are used in
a variant of the ELECTRE I method called ELECTRE IS (see Roy and Skalka
(1984) or Roy and Bouyssou (1993)).
Thresholding is a key tool in the original outranking methods; it allows one
to bypass the necessity of transforming the original evaluations to obtain linear
scales. There is another occasion for invoking thresholds, which is in the analysis
of discordance.

Discordance and vetoes

Remember that the principle of the outranking methods consists in examining the
validity of the proposition a outranks b; the concordance index measures the
arguments in favour of saying so, but there may be arguments strongly against that
assertion (discordant criteria). These discordant voices can be viewed as vetoes;
there is a veto against declaring that a outranks b if b is so much better than
a on some criterion that it becomes disputable or even meaningless to pretend
that a might be better overall than b. Let us emphasise that the effect of a veto
is quite radical, just like in the voting context. If a veto threshold is passed on
a criterion when comparing two alternatives, then the alternative against which
there is a veto, say a, may not outrank the other one, say b; this may result in
incomparabilities in the outranking relation if in addition b does not outrank a,
either because the coalition of criteria stating that b is at least as good as a is not
strong enough or because there is also a veto of a against b on another criterion.
To be more precise, a veto threshold on criterion i is in general a function vi
encoding a difference in evaluations so big that it would be out of the question to
say that a outranks b if

(6.15) gi (a) > gi (b) + vi (gi (b))

when criterion i is to be minimised, or

(6.16) gi (a) < gi (b) vi (gi (b))

when criterion i is to be maximised. Of course it may be the case that the function
vi be a constant.
In our case, in view of Thierrys particular interest in sporty cars, the criterion
most likely to yield a veto is acceleration. Although there was no precise indica-
tion on setting vetoes in Thierrys preliminary analysis (section 6.1.2), one might
speculate that on the acceleration criterion, pairs such as (28, 29.6), (28.3, 30),
(29, 30.4), (29, 30.7) (all evaluations expressed in seconds) and all intervals wider
than those listed, lead to a veto (against claiming that the alternative with the
higher evaluation could be preferred to the other one, since here, the criterion is
to be minimised). If this would seem reasonable then we would not be far from

accepting a constant veto threshold of about 1.5 or 1.6 second. If we decide that
there is a veto with a constant threshold on the acceleration criterion for differ-
ences exceeding 1.5 second, it means that a car that accelerates from 0 to 100
km/h in 29.6 seconds (as is the case of Peugeot 309 GTI) could not conceivably
outrank a car which does it in 28 (as Honda Civic does) whatever the evaluations
on the other criteria might be. Of course, setting the veto threshold to 1.5 implies
that a car needing 30.4 seconds (like Mazda 323) may not outrank a car that
accelerates in 28.9 (like Opel Astra or Renault 21) but might very well outrank
a car that accelerates in 29 (like Nissan Sunny) if the performances on the other
criteria are superior. Using 1.5 as a veto threshold thus implies that differences
of at least 1.5 from 28 to 29.6 or from 28.9 to 30.4 have the same consequences
in terms of preference. Setting the value of the veto threshold obviously involves
some degree of arbitrariness; why not set the threshold at 1.4 second, which would
imply that Mazda 323 may not outrank Nissan Sunny? In such cases, it must be
verified whether small variations around the chosen value of a parameter (such as
a veto threshold) do not influence the conclusions in a dramatic manner; if small
variations do have a strong influence, detailed investigation is needed in order to
decide which setting of the parameters value is most appropriate. A related facet
of using thresholds is that growing differences that are initially not significant,
brutally crystallise into significant ones as soon as a crisp threshold is passed; ob-
viously methods using thresholds may show discontinuities in their consequences
and that is why sensitivity analysis is even more crucial here than with more clas-
sical methods. However, the underlying logic is quite similar to that on which
statistical tests are based; here as well, conventional levels of significance (like the
famous 5% rejection intervals) are widely used to decide whether a hypothesis
must be rejected or not. We will allude in the next section to more gradual
methods that can be designed on the basis of concordance-discordance principles
similar to those outlined above.
In order not to be too long we do not develop the consequences of introducing
veto thresholds in our example. It suffices to say that the outranking relation, its
kernel and the derived rankings are not dramatically modified in the present case.

6.4.4 Main features and problems of elementary outranking

The ideas behind the methods analysed above may be summarised as follows. For
each pair of alternatives (a, b) it is determined whether a outranks b by comparing
their evaluations gi (a) and gi (b) on each point of view i. The pairs of evaluations
are compared to intervals that can be viewed as typical of classes of ordered pairs of
evaluations on each criterion (for instance the classes indifference, preference
and veto). On the basis of the list of classes to which it belongs for each criterion
(its profile), the pair (a, b) is declared to be or not to be in the outranking
Note that
a credibility index of outranking (for instance weak and strong outrank-
ing) may be defined; to each value of the index corresponds a set of profiles;

if the profile of the pair (a, b) is one of those associated with a particular
value of credibility of outranking, then the outranking of b by a is assigned
this value of credibility index; there are of course rationality requirements
for the sets of profiles associated with the various values of the credibility
index; this credibility index is to be interpreted in logical terms; it models
the degree to which it is true that there are enough arguments in favour of
saying that a is better than b while there is no strong reason of refuting this
statement (see the definition of outranking in Section 6.4.2);
thresholds may be used to determine the classes in differences for preference
on each criterion, provided differences gi (a) gi (b) equal to such thresh-
olds have the same meaning independently of their location on the scale of
criterion i (linearity property);
the rules for determining whether a outranks b (eventually to some degree of
a credibility index) generally involve weights that describe the relative impor-
tance of the criteria; these weights are typically used additively to measure
the importance of coalitions of criteria independently of the evaluations of
the alternatives.
The result of the construction, i.e. the outranking relation (possibly qualified
with a degree of a credibility index), is then exploited in view of a specific type of
decision problems (choice, ranking, . . . ). It is supposed to include all the relevant
and sure information about preference that could be extracted from the data and
the questions answered by the decision-maker.
Due to their lack of transitivity and acyclicity, procedures are needed to derive
a ranking or a choice set from the outranking relation. In the process of deriving
a complete ranking from the outranking relation, the property of independence
of irrelevant alternatives (see Chapter 2 where this property is evoked) is lost;
this property was satisfied in the construction of the outranking relation since
outranking is decided by looking in turn at the profiles of each pair of alternatives,
independently of the rest. Since this is an hypothesis of Arrows theorem and it is
violated, the conclusion of the theorem is not necessarily valid and one may hope
that there is no criterion playing the role of dictator.
The various procedures that have been proposed for exploiting the outrank-
ing relation (for instance transforming it into a complete ranking) are not above
criticism; it is especially difficult to justify them rigorously since they operate on
an object that has been constructed, the outranking relation. Since the decision-
maker has no direct intuition of this object, one can hardly expect to get reliable
answers when questioning him about the properties of this relation. On the other
hand, a direct characterisation of the ranking produced by the exploitation of an
outranking relation seems out of reach.

The weights count entirely or not at all in the comparison of two alternatives; the
smaller or larger difference in evaluations between alternatives does not matter
once a certain threshold is passed. This fact, which was discussed in the second

paragraph of this section 6.4.3, is sometimes called the non-compensation property

of outranking methods. A large difference in favour, say, of a over b on some
criterion is of no use to compensate for small differences in favour of b on many
criteria since all that counts for deciding that a outranks b is the list of criteria in
favour of a. Vetoes only have a negative action, impeding that outranking be
declared. The reader interested in the non-compensation property is referred to
Fishburn (1976), Bouyssou and Vansnick (1986), Bouyssou (1986).

Incomparability and indifference

For some pairs (a, b) it may be the case that neither a outranks b nor the opposite;
this can occur not only because of the activation of a veto but alternatively because
the credibility of both the outranking of a by b and of b by a are not sufficiently
high. In such a case a and b are said to be incomparable. This may be interpreted in
two different ways. One may advance that some alternatives are too contrasted to
be compared. It has been argued, for instance, that comparing a Rolls Royce with
a small and cheap car proves impossible because the Rolls Royce is incomparably
better on many criteria but is also incomparably more expensive. Another example
concerns the comparison of projects that involve the risk of loss of human life;
should one prefer a more expensive project with a lower risk or a less expensive one
with higher risk (see Chapter 5, Section 5.3.3, for evaluations of the cost of human
losses in various countries)? Other people support the idea that incomparability
results from insufficient information; the available information sometimes does not
allow to make up ones mind on whether a is preferred to b or the converse.
In any case, incomparability should not be assimilated to indifference. Indiffer-
ence occurs when alternatives are considered as almost equivalent; incomparability
is more concerned with very contrasted alternatives. The treatment of the two cat-
egories is quite different in the exploitation phase; indifferent alternatives should
appear in the same class of a ranking or in neighbouring one, while incomparable
alternatives may be ranked in classes quite far apart.

6.4.5 Advanced outranking methods: from thresholding to-

wards valued relations
Looking at the variants of the ELECTRE method suggests that there is a general
pattern on which they are all built:
alternatives are considered in pairs and eventually, outranking is determined
on the basis of the profiles of performance of the pair only;
the differences between the evaluations of a pair of alternatives for each cri-
terion are categorised in discrete classes delimited by thresholds (preference,
veto, . . . );
rules are invoked to decide which combinations of these classes lead to out-
ranking; more generally, there are several grades of outranking (weak, strong
in ELECTRE II, . . . ) and rules associate specific combinations of classes to
each grade;

specialised procedures are used to exploit the various grades of outranking

in view of supporting the decision process.
Defining the classes through thresholding raises the problem of discontinuity
alluded to in the previous section. It is thus appealing to work with continuous
classes of differences of preference for each criterion, i.e. directly with valued re-
lations. A value cj (a, b) on arc (a, b) models the degree to which alternative a is
preferred to alternative b on criterion j. These degrees are often interpreted in
logical fashion as a degree of credibility of the preference. Then each combination
of values of the credibility index on the various criteria may be assigned an overall
value of the credibility index for outranking; the outranking relation is also valued
in such a context.
Dealing with valued relations and especially combining values raises a ques-
tion: which operations may be meaningfully (or just reasonably) performed on
them. Our analysis of the weighted sum in section 6.2 has taught us that opera-
tions that may appear as natural, rely on strong assumptions that suppose very
detailed information on the preferences.
Consider the following formula which is used in ELECTRE III, a method
leading to a valued outranking relation (see Roy and Bouyssou (1993) or Vincke
(1992)), to compute the overall degree of credibility S(a, b) of the outranking of b
by a.

if Dj (a, b) c(a, b) j

c(a, b)
S(a, b) = Q 1Dj (a,b)
c(a, b) j:Dj (a,b)>c(a,b) 1c(a,b) otherwise

In the above formula, Dj (a, b) is a degree of credibility of discordance. We do not

enter into the detail of how c(a, b) or Dj (a, b) can be computed; just remember
that they are valued between 0 and 1.
The justification of such a formula is mainly heuristic in the sense that the re-
sponse of the formula to the variation of some inputs is not counter-intuitive: when
discordance raises outranking decreases; the converse with concordance; when dis-
cordance is maximal there may not be any degree of outranking at all. This does
not mean that the formula is fully justified. Other formulae might have been
chosen with similarly good heuristic behaviour. The weighted sum also has good
heuristic properties at first glance, but deeper investigation shows that the val-
ues it yields cannot be trusted as a valid representation of the preferences unless
additional information is requested from the decision-maker and used to re-code
the original evaluations gj . The formula above involves operations such as mul-
tiplication and division that suppose that concordance and discordance indices
are plainly cardinal numbers and not simply labels of ordered categories. This is
indeed a strong assumption that does not seem to us to be supported by the rest
of the approach, in particular by the manner in which the indices are elaborated;
in the elementary outranking methods (ELECTRE I and II) much care was taken,
for instance, to avoid performing arithmetical operations on the evaluations gi (a);
only cuts of the concordance index were considered (which is typically an opera-
tion valid for ordinal data); vetoes were used in a very radical fashion. No special

attention, comparable to what was needed to build value functions from the eval-
uations, was paid to building concordance and discordance indices; in particular,
nothing guarantees that these indices can be combined by means of arithmetic
operations and produce an overall index S representative of a degree of credibility
of an outranking. For instance, consider the following two cases which lead to an
outranking degree of .4:
the concordance index c(a, b) is equal to .40 and there is no discordance (i.e.
Dj (a, b) = 0 for all j);
the concordant coalition weighs .80 but there is a strong discordance on
criterion 1; D1 (a, b) = .90 while Dj (a, b) = 0 for all j 6= 1.
For both, the formula yields a degree of outranking of .40. Obviously another
formula with similar heuristic behaviour might have resulted in quite different
outputs. Consider for instance the following:

S(a, b) = min{c(a, b), min{1 Dj (a, b), j = 1, . . . , n}}

On the first case, it yields an outranking degree of .40 as well but on the second
case, the degree falls to .10. It is likely that in some circumstances a decision-maker
might find the latter model more appropriate. Note also that the latter formula
does not involve arithmetic operations on c(a, b) and the 1 Dj (a, b)s but only
ordinal operations, namely taking the minimum. This means that transforming
c(a, b) and the 1 Dj (a, b)s by an increasing transformation of the [0, 1] interval
would just amount to transforming the original value of S(a, b) by the same trans-
formation. This is not the case with the former formula. Hence, if the information
content of the c(a, b) and the 1 Dj (a, b)s just consists in the ordering of their
values in the [0, 1] interval, then the former formula is not suitable. For a survey
of possible ways of aggregating preferences into a valued relation, the reader is
referred to chapters 2 and 3 of the book edited by Slowinski (1998).
The fact that the value obtained for the outranking degree may involve some de-
gree of arbitrariness did not escape Roy and Bouyssou (1993) who explain (p.417)
that the value of the degree of outranking obtained by a formula like the above
should be handled with care; they advocate that thresholds be used when com-
paring two such values: the outranking of b by a can be considered to be more
credible than the outranking of d by c only if S(a, b) is significantly larger than
S(c, d). We agree with this statement but unfortunately it seems quite difficult to
assign a value to a threshold above which the difference S(a, b) S(c, d) could be
claimed as significant.
There are thus two directions that can be followed for taking the objections to
the formula of ELECTRE III into account. In the first option, one considers that
the meaning of the concordance and discordance degrees is ordinal and one tries to
determine a family of aggregation formulae that fulfil basic requirements including
compatibility with the ordinal character of concordance and discordance. The
other option consists in revising the way concordance and discordance indices are
constructed in order to have a quantitative meaning that allows to use arithmetic
operations for aggregating them. That is, at least tentatively, the option followed

in the PROMETHEE methods (see Brans and Vincke (1985) or Vincke (1992);
these methods may be interpreted as aiming towards building a value function
on the pairs of alternatives; this function would represent the overall difference in
preference between any two alternatives. The way that this function is constructed
in practice however, leaves the door open to remarks analogous to those addressed
to the weighted sum in Section 6.2.

6.5 General conclusion

This long chapter has enabled us to travel through the continent of formal methods
of decision analysis; by formal we mean those methods relying on an explicit
mathematical model of the decision-makers preferences. We neither looked into
all methods nor did we explore those we looked into completely. There are other
continents that have been almost completely ignored, in particular all the methods
that do not rely on a formal modelling of the preferences (see for instance the
book edited by Rosenhead (1989) in which various approaches are presented for
structuring problems in view of facilitating decision making).
On the particular topic of multi-attribute decision analysis, we may summarise
our main conclusions as follows:
Numbers do not always mean what they seem to. It makes no sense to ma-
nipulate raw evaluations without taking the context into account. Numbers
may have an ordinal meaning, in which case it cannot be recommended to
perform arithmetic operations on them; they may be evaluations on an inter-
val scale or a ratio scale and there are appropriate transformations that are
allowed for each type of scale. We have also suggested that the significance
of a number may be intermediate between ordinal and cardinal; in that case,
the interval separating two evaluations might be given an interpretation: one
might take into consideration the fact that intervals are e.g. large, medium
or small. Evaluations may also be imprecise and knowing that should influ-
ence the way they will be handled. Preference modelling is specifically the
activity that deals with the meaning of the data in a decision context.
Preference modelling does not only take objective information linked with
the evaluations or with the data, such as the type of scale or the degree
of precision or the degree of certainty into account. It also incorporates
subjective information in relation to the preferences of the decision maker.
Even if numeric evaluations actually mean what they seem to, their signifi-
cance is not immediately in terms of preferences: the interval separating two
evaluations must be reinterpreted in terms of difference in preferences.
The (vague) notion of importance of the criteria and its implementation are
strongly model-dependent. Weights and trade-offs should not be elicited in
the same manner depending on the type of model since e.g. they may or may
not depend on the scaling of the criteria.
There are various types of models that can be used in a decision process.
There is no best model; all have their strong points and their weak points.

The choice of a particular approach (including a type of model) should be the

result of an evaluation, in a given decision situation, of the chances of being
able to elicit the parameters of the corresponding model in a reliable manner;
these chances obviously depend on several factors including the type and
precision of the available data, the way of thinking of the decision-maker,
his knowledge of the problem. Another factor that should be considered for
choosing a model, is the type of information that is wanted as output: the
decision maker needs different information when he has to rank alternatives
to when he has to choose among alternatives or when he has to assign them
to predefined (ordered) categories (we put the latter problem aside in our
discussion of the car choosing case). So, in our view, the ideal decision
analyst, should master several methodologies for building a model. Notice
that additional dimensions make the choice and the construction of a model
in group decision making even more difficult; the dynamics of such decision
processes is by far more complex, involving conflicts and negotiation aspects;
constructing complete formal models in such contexts is not always possible,
but it remains that using problem structuring tools (such as cognitive maps)
may prove profitable.

A direct consequence of the possibility of using different models is that the

output may be discordant or even contradictory. We have encountered such
a situation several times in the above study; cars may be ranked in different
positions according to the method that is used. This does not puzzle us too
much. First of all, because the observed differences appear more as vari-
ants than as contradictions; the various outputs are remarkably consistent
and the variants can be explained to some extent. Second, the approaches
use different concepts and the questions the decision maker has to answer
are accordingly expressed in different languages; this of course induces vari-
ability. This is no wonder since the information that decision analysis aims
at capturing cannot usually be precisely measured. It is sufficient to recall
that experiments have shown that there is much variability in the answers
of subjects submitted to the same questions at time intervals. Does this
mean that all methods are acceptable? Not at all. There are several criteria
of validity. One is that the method has to be accepted in a particular de-
cision situation; this means that the questions asked to the decision-maker
must make sense to him and he should not be asked for information he is
unable to provide in a reliable manner. There are also internal and external
consistency criteria that a method should fulfil. Internal consistency implies
making explicit the hypotheses under which data form an acceptable input
for a method; then the method should perform operations on the input that
are compatible with the supposed properties of the input; this in turn induces
an output which enjoys particular properties. External consistency consists
in checking whether the available information matches the requirements of
acceptable inputs and whether the output may help in the decision process.
The main goal of the above study was to illustrate the issue of internal and
external validity on a few methods in a specific simple problem.

Besides the above points that are specific to multiple criteria preference models,
more general lessons can also be drawn.
If we consider our trip from the weighted sum to the additive multi-attribute
value model in retrospect, we see that much self-confidence and therefrom
much convincing power can be gained by eliciting conditions under which
an approach such as the weighted sum would be legitimate. The analysis
is worth the effort because precise concepts (like trade-offs and values) are
sculptured through analysis that also results in methods for eliciting the
parameters of the model. Another advantage of theory is to provide us
with limits, i.e. conditions under which a model is valid and a method is
applicable. From this viewpoint and although the outranking methods have
not been fully characterised, it is worth noticing that their study has recently
made theoretical progress (see e.g. Arrow and Raynaud (1986), Bouyssou
and Perny (1992), Vincke (1992), Fodor and Roubens (1994), Tsoukias and
Vincke (1995) , Bouyssou (1996), Marchant (1996), Bouyssou and Pirlot
(1997)), Pirlot (1997)) .
An advantage of formal models that could not be overemphasised is that
they favour communication. In the course of the decision process, the con-
struction of the model requires that pieces of information, knowledge and
priorities that are usually implicit or hidden, be brought into light and taken
into account; also, the choice of the model reflects the type of available in-
formation (more or less certain, precise, quantitative). The result is often
a synthesis of what is known and what has been learnt about the decision
problem in the process of elaborating the model. The fact that a model is
formal also allows for some sort of calculations; in particular, testing to what
extent the conclusions are stable when the evaluation of imprecise data are
varied is possible within formal models. Once a decision has been made, the
model does not lose its utility. It can provide grounds for arguing in favour
or against a decision. It can be adapted to make ulterior decisions in similar
The decisiveness of the output depends on the richness of the infor-
mation available. If the knowledge is uncertain, imprecise or simply non-
quantitative in nature, it may be difficult to build a very strong model;
by strong, we mean a model that clearly suggests a decision as, for in-
stance, those that produce a ranking of the alternatives. Other models
(and especially those based on pairwise comparisons of alternatives and ver-
ifying the independence of irrelevant alternatives property) are not able
structurallyto produce a ranking; they may nevertheless be the best possi-
ble synthesis of the relevant information in particular decision situations. In
any case, even if the model leads to a ranking, the decision is to be taken by
the decision-maker and it is not in general an automatic consequence of the
model (due for instance to imprecisions in the data that calls for a relativi-
sation of the models prescription). As will be illustrated in greater detail in
Chapter 9, the construction of a model is not all of the decision process.

7.1 Introduction
The increasing development of automatic systems in most sectors of human ac-
tivities (e.g. manufacturing, management, medicine, etc.) has progressively led
to involving computers in many tasks traditionally reserved to humans, even the
more strategic ones such as control, evaluation and decision-making. The main
function of automatic decision systems is to act as a substitute for humans (deci-
sion makers, experts) in the execution of repetitive decision tasks. Such systems
can be in charge of all or part of the decision process. The main tasks to be per-
formed by automatic decision systems are collecting information (e.g. by sensors),
making a diagnosis of the current situation, selecting relevant actions, executing
and controlling these actions. Automatisation of these tasks requires the elabo-
ration of computational models able to simulate human reasoning. Such models
are, in many respects, comparable to those involved in the scientific preparation
of human decisions. Indeed, deciding automatically is also a matter of representa-
tion, evaluation and comparison. For this reason, we introduce and discuss some
very simple techniques used to design rule-based decision/control systems. This is
one more opportunity for us to address some important issues linked to descrip-
tive, normative and constructive aspects of mathematical modelling for decision

descriptive aspects: the function of automatic decision systems is, to some

extent, to be able to predict, simulate and extrapolate human reasoning and
decision-making in an autonomous way. This requires different tasks such
as the collection of human expertise, the representation of knowledge, the
extraction of rules and the modelling of preferences. For all these activities,
the choice of appropriate formal models, symbolical as well as numerical, is
crucial in order to describe situations and process information.

constructive aspects: in most fields of application, there is no completely

fixed and well formalised body of knowledge that could be exploited by the
analyst responsible for the implementation of a decision system. Valuable
information can be obtained from human experts, but this expertise is often


very complex and ill-structured, with a lot of exceptions. Hence, the

formal model handling the core of human skill in decision-making must be
constructed by the analyst, in close cooperation with experts. They must
decide together what type of input should be used, what type of output is
needed, and what type of consideration should play a role in linking output
to input. One must also decide how to link subjective symbolic information
(close to the language of the expert) and objective numeric data that can be
accessible to the system.

normative aspects: it is generally not possible to ask the expert to produce

an exhaustive list of situations with their adequate solution. Usually, this
type of information is given only for a sample of typical situations, which
implies that only a partial model can be constructed. To be fully efficient,
this model must be completed with some general principles and rules used
by the expert. In order to extrapolate examples as well as expert decision
rules in a reasonable way, there is a need for normative principles putting
constraints on inference so as to decide what can seriously be inferred by the
system from any new input. Hence, the analysis of the formal properties of
our model is crucial for the validation of the system.

These three points show how the use of formal models and the analysis of the
mathematical properties of the models are crucial in automatic decision-making.
In this respect, the modelling exercise discussed here is comparable to those treated
in the previous chapters, concerning human decision-making, but includes spe-
cial features due to the automatisation (stable pre-existing knowledge and pref-
erences, real-time decision-making, closed system completely autonomous, etc.).
We present a critical introduction to the use of simple formal tools such as fuzzy
sets and rule-based system to model human knowledge and decision rules. We also
make explicit multiple criteria aggregation problems arising in the implementation
of these rules and discuss some important issues linked to rule aggregation.
For the sake of illustration, we consider two types of automatic decision Systems
in this chapter:

decision systems based on explicit decision rules: such systems are used in
practical situations where the decision-maker or the expert is able to make
explicit the principles and rules he uses to make a decision. It is also assumed
that these rules constitute a consistent body of knowledge, sufficiently ex-
haustive to reproduce, predict and explain human decisions. Such systems
are illustrated in section 7.2 where the control of an automatic watering sys-
tem is discussed, and in section 7.4 where a decision problem in the context
of the automatic control of a food process is briefly presented. In the first
case, the decision problem concerns the choice of an appropriate duration for
watering, whereas in the second case, it concerns the determination of oven
settings aimed at preserving the quality of biscuits.

decision systems based on implicit decision rules: such systems are used in
practical applications for which it is not possible to obtain explicit decision

rules. This is very frequent in practice. The main possible reasons for it are
the following:

the decision-maker or the expert is unable to provide sufficiently clear

information to construct decision rules, or his expertise is too complex
to be simply representable by a consistent set of decision rules,
the decision-maker or the expert is able to provide a set of decision rules,
but these decision rules are not easily expressible using variables that
can be observed by the system. A typical example of such a situation
occurs in the domain of subjective evaluation (see Grabisch, Guely and
Perny 1997) where the quality of a product is defined on the basis of
human perception.
the decision-maker or the expert does not want to reveal his own strat-
egy for making decisions. This can be due to the existence of strategic
or confidential information that cannot be revealed or alternatively be-
cause this expertise represents his only competence making him indis-
pensable to his organisation.

Such systems are illustrated in section 7.3, also in the context of the auto-
matic control of food processes. We will use the problem of controlling the
biscuit quality during baking as an illustrative case where numerical deci-
sion models based on pattern matching procedures can be used to perform
a diagnosis of disfunction and a regulation of the oven, without any explicit

7.2 A System with Explicit Decision Rules

Automatising human decision-making is often a difficult task because of the com-
plexity of the information involved in human reasoning. In some cases, however,
the decision making process is repetitive and well-known so that automatisation
becomes feasible. In this section, we would like to consider an interesting sub-
class of easy problems where human decisions can be explained by a small set
of decision rules of type:

if X is A and Y is B then Z is C

where the X and Y variables are used to describe the current decision context
(input variables) and Z is a variable representing the decision (output variable).
Whenever X and Y can be automatically observed by the decision system (e.g.
using sensors), human skill and experience in problem solving can be approximated
and simulated using the fuzzy control approach (see e.g. Nguyen and Sugeno
1998). Such an approach is based on the use of fuzzy sets and multiple criteria
aggregation functions. Our purpose is to emphasise the interest as well as the
difficulty of resorting to such formal notions on real practical examples.

7.2.1 Designing a decision system for automatic watering

Let us consider the following case: the owner of a nice estate has the responsibility
of watering the family garden, and this task must be performed several times
per week. Every evening, the man usually estimates the air temperature and the
ground moisture so as to decide the appropriate time required for watering his
garden. This amount of time is determined so as to satisfy a twofold objective:
on the one hand he wants to preserve the nice aspect of his garden (especially the
dahlias put in by his wife at the beginning of the summer) but on the other hand,
he does not want to use too much water for this, preferring to allocate his financial
resources to more essential activities. Because this small decision problem is very
repetitive and also because the occasional gardener does not want to delegate the
responsibility of the garden to somebody else, he decided to purchase an automatic
watering system. The function of this system is first to check every evening,
whether watering is necessary or not, and second to determine automatically the
watering time required. The implicit aim of the occasional gardener is to obtain
a system that implement the same rules as he does; in his mind, this is the best
way to really preserve the current beautiful aspect of the garden.
In this case, we need a system able to periodically measure the air temperature
and the soil moisture and a decision module able to determine the appropriate
duration of watering, as shown in Figure 7.1.

Figure 7.1: The Decision Module of the Watering System

7.2.2 Linking symbolic and numerical representations

Let t denote the current temperature of the air (in degrees Celsius), and m the
moisture of the ground defined as the water content of the soil. This second
quantity, expressed in centigrams per gram (cg/g), corresponds to the ratio:
x1 x2
m = 100
where x1 is the weight of a soil sample and x2 the weight of the same sample
after drying in a low-temperature oven (75105 C). Assuming the quantities t
and m can be observed automatically, they will constitute the input data of the
decision module in charge of the computation of the watering time w (expressed
in minutes), which is the sole output of the module.
Clearly, w must be defined as a function of the input parameters. Thus, we are
looking for a function f such that w = f (t, m) that can simulate the usual decisions
of the gardener. Function f must be defined so as to include the subjectivity of

the gardener both in diagnosis steps (evaluation of the current situation) and in
decision-making steps (choice of an appropriate action). A common way to achieve
this task is to elicit decision rules from the gardener using a very simple language,
as close as possible to the natural language used by the gardener to explain his
decision. For instance, we can use propositional logic and define rules of the
following form:

If T is A and M is B then W is C
where T and M are descriptive variables used for temperature and soil moisture, W
is an output variable used to represent the decision and A, B, C are linguistic values
(labels) used to describe temperature, moisture and watering time respectively.
For example, suppose the gardener is able to formulate the following empirical
decision rules:

Decision rules provided by the gardener:

R1 if air temperature is Hot and soil moisture is Low

then watering time is VeryLong;
R2 if air temperature is Warm and soil moisture is Low
then watering time is Long;
R3 if air temperature is Cool and soil moisture is Low
then watering time is Long;
R4 if air temperature is Hot and soil moisture is Medium
then watering time is Long;
R5 if air temperature is Warm and soil moisture is Medium
then watering time is Medium;
R6 if air temperature is Cool and soil moisture is Medium
then watering time is Medium;
R7 if air temperature is Hot and soil moisture is High
then watering time is Medium;
R8 if air temperature is Warm and soil moisture is High
then watering time is Short;
R9 if air temperature is Cool and soil moisture is High
then watering time is VeryShort
R10 if air temperature is Cold then watering time is Zero

Notice that the elicitation of such rules is usually not straightforward, even if it
is the result of a close collaboration with experts in that domain. Indeed, general
rules used by experts may appear to be partially inconsistent and must often
include explicit exceptions to be fully operational. Even without any inconsistency,
the individual acceptance of each rule is not sufficient to validate the whole set
of rules. In some situations, unsuitable conclusions may appear, resulting from
several inferences due to the coexistence of apparently reasonable rules. This
makes the validation of a set of rules particularly difficult. Even in the case of
control rules where there is no need for chaining inferences (we assume here that
the rules directly link inputs (observations) to outputs (decisions)), structuring

the expert knowledge so as to obtain a synthesis of the expert rules in the form
of a decision table (table linking outputs to inputs) requires a significant effort.
We will show alternative approaches that do not require the explicit formulation
of decision rules in Section 7.3.
Now, assuming that the above set of decision rules has been obtained, the
problem is the following: suppose the current air temperature and soil moisture
are known, how can a watering time be computed from these sentences, in other
words how can f be defined so as to properly reflect the strategy underlying these
rules? Some partial answers could be obtained if we could define a formal relation
linking the various labels occurring in the decision rules and the physical quantities
observable by the system. We can observe that the decision rules are expressed
using only three variables, i.e. the air temperature T , the soil moisture M , and
the watering time W . Moreover, they all take the following form:
either if T is Ti then W is Wk
or if T is Ti and M is Mj then W is Wk
The possible labels Ti , Mj and Wk for temperature, moisture and watering
time are given by the sets Tlabels, Mlabels and Wlabels respectively:
Tlabels = {Cold, Cool, Warm, Hot}. These labels can be seen as different
words used to specify different areas on the temperature scale.
Mlabels = {Low, Medium, High}. These labels can be seen as words used to
specify different areas on the moisture scale
Wlabels = {Zero, VeryShort, Short, Medium, Long,VeryLong}. These labels
can be seen as different words used to specify different areas on the time
Using these labels, the rules can be synthesised by the following decision table
(see Table 7.1):

Mj \ T i Cold Cool Warm Hot

Low Zero (R10 ) Long (R3 ) Long (R2 ) VeryLong (R1 )
Medium Zero (R10 ) Medium (R6 ) Medium (R5 ) Long (R4 )
High Zero (R10 ) VeryShort (R9 ) Short (R8 ) Medium (R7 )

Table 7.1: The decision table of the gardener

This decision table represents a symbolic function F linking Tlabels and Mla-
bels to Wlabels (Wk = F (Ti , Mj )). Now, we need to produce a numerical trans-
lation of function F in order to construct a numerical function f called transfer
function, whose role is to compute a watering time w from any input (t, m). To
build such a function, the standard process consists in the following stages:

1. identify the current state (diagnosis) and provide a symbolic description of

this state,

2. activate the relevant decision rules for the current state (inference),

3. synthesise the recommendations induced from the rules and derive a numer-
ical output (decision)

The diagnosis stage consists in identifying the current state of the system using
numerical measures and describing this state in the language used by the expert
to express his decision rules. The inference stage consists of an activation of the
rules whose premises match the description of the current state. The decision
stage consists of a synthesis of the various conclusions derived from the rules and
the selection of the most appropriate action (at this stage, the selected action is
precisely defined by numerical output values). Thus, the definition of the decision
function f relies on a symbolic translation of the initial numerical information in
the diagnosis stage, a purely symbolic inference implementing the usual decision-
making reasoning and then a numerical translation of the conclusions derived
from the rules. The symbolic/numerical translation possibly includes the subjec-
tivity of the decision-maker (perceptions, beliefs, etc), both in the diagnosis and
decision stages. For example, in the gardener example, the subjectivity of the
decision maker is not only expressed in choosing particular decision rules, but also
in linking input labels (T labels and M labels) to observable values chosen on the
basis of the temperature and moisture scales. In the decision step, the expert or
decision-makers subjectivity can also be expressed by linking output labels (Wla-
bels) with elements of the time scale. There are several ways of establishing the
symbolic/numeric translation first in the diagnosis stage and then in the decision
stage. In both stages, symbols can be linked to scalars, intervals or fuzzy sets,
depending of the level of sophistication of the model. In the following subsections,
we present the main basic possibilities and discuss the associated representation
and aggregation problems.

7.2.3 Interpreting input labels as scalars

A first and simple way of building the symbolic/numerical correspondence is by
asking the decision-maker to associate a typical scalar value to each input label
used in the rules. Note that the simplicity of the task is only apparent. An
individual, expert or not, may feel uncomfortable in specifying the scalar transla-
tion precisely. This is particularly true concerning parameters like soil moisture
which are not easily perceived by humans and whose qualification requires an im-
portant cognitive effort. Even for apparently simpler notions such as temperature
and duration, the expert may be reluctant to make a categorical symbolic/scalar
translation. If nevertheless he is constrained to produce scalars, he will have to
sacrifice a large part of his expertise and the resulting model may lose much of its
relevance to the real situation. We will see later how the difficulty can partly be
overcome by the use of non-scalar translations of labels. Let us assume now, for
the sake of illustration, that the following numerical information has been provided
by the expert (see Tables 7.2, 7.3 and 7.4).
A possible way of constructing such tables is to put the expert in various situ-
ations, to ask him to qualify each situation with one of the admissible labels, and

Tlabels Cold Cool Warm Hot

Temperatures (o C) 10 20 25 30

Table 7.2: Typical temperatures associated to labels Ti

Mlabels Low Medium High

Soil water content (cg/g) 10 20 30

Table 7.3: Typical moisture levels associated to labels Mj

Wlabels VeryShort Short Medium Long VeryLong

Times (mn) 5 10 20 35 60

Table 7.4: Typical times associated to Wk

to measure the observable parameters with gauges so as to make the correspon-

dence. Of course, the reliability of the information elicited with such a process is
questionable. The analyst must be aware of the share of arbitrariness attached to
such a symbolic/numerical translation. He must keep it in mind during the whole
construction of the system and also later in interpreting the outputs of the system.
From the above tables of scalars, the rules allow the following reference points
to be constructed:

t 30 25 20 30 25 20 30 25 20 10 10 10
m 10 10 10 20 20 20 30 30 30 10 20 30
w 60 35 35 35 20 20 20 10 5 0 0 0

Table 7.5: Typical reference points

Hence, the transfer function f linking watering time w to the pair (t, m) is
known for a finite list of cases and must be extrapolated to the entire range of
possible inputs (t, m). This leads to a well-known mathematical problem since
function f must be defined so as to interpolate points of type (t, m, w) where
w = f (t, m). Of course, the solution is not unique and some additional assumptions
are necessary to define precisely the surface we are looking for. There is no space
in this chapter to discuss the relative interest of the various possible interpolation
methods that could be used to obtain f . The simplest method is to perform a linear
interpolation from the reference points given in Table 7.5. This implies averaging
the outputs associated to the reference points located in the neighbourhood of the
observed parameters (t, m). For instance, if the observation is (t, m) = (29, 16) the
neighbourhood is given by 4 reference points obtained from rules R1 , R2 , R4 , and
R5 . This yields points P1 = (30, 10), P2 = (25, 10),P4 = (30, 20), and P5 = (25, 20)
with the respective weights 0.32, 0.08, 0.48, 0.12, weight ij of point (xi , yj ) being

defined by:
|29 xi | |16 yj |
(7.1) ij = 1 1
30 25 20 10

The watering times associated to points P1 , P2 , P4 and P5 are 60, 35, 35, 20 and
therefore, the final time obtained by a weighted linear aggregation is 41 minutes
and 12 seconds. Performing the same approach for any possible input (t, m) leads
to the following piecewise linear approximation of function f , see Figure 7.2.

Figure 7.2: Approximation of f by linear interpolation

This piecewise linear interpolation method is however not completely satis-

factory. First of all, no information justifies that function f is linear between
points to be interpolated. Many other interpolation methods could be used as
well, making a non-linear f possible. For example, one can use more sophisticated
interpolating methods based on B-spline functions that produce very smooth sur-
faces with good continuity and locality properties (see e.g. Bartels, Beatty and
Barsky 1987). Moreover, as mentioned above, the definition of reference points
from the gardeners rules is far from being easy and other relevant sets of scalar
values could be considered as well. As a consequence, the need of interpolating
the reference points given in Table 7.5 is itself questionable. Instead of performing
an exact interpolation of these points, one may prefer to modify the link between
symbols and numerical scales in order to allow symbols to be represented by sub-
sets of plausible numerical values. Thus, reference points are replaced by reference
areas in the parameters space (t, m, w), and the interpolation problem must be
reformulated. This point is discussed below.

7.2.4 Interpreting input labels as intervals

In the gardeners example, substituting labels Ti and Mj by scalar values on the
temperature and moisture scales has the advantage of simplicity. However, it
does not provide a complete solution since function f is only known for a finite
sample of inputs and requires interpolation to be extended to the entire set of
possible inputs. Moreover, in many cases, each label represents a range of values
rather than a single value on a numerical scale. In such cases, representing the
different labels used in the rules by intervals seems preferable. If the intervals are
defined so as to cover all plausible values, any possible input belongs to at least
one interval and therefore, can be translated into at least one label. Basically, we
can distinguish two cases, depending on whether the intervals associated to labels
partially overlap or not.

Labels represented by disjoint intervals

Suppose that the gardener is able to divide the temperature scale into consecutive
intervals, each corresponding to the most plausible values attached to a label
Ti . Assuming this is also possible for the moisture scale, these intervals form a
partition of the temperature and moisture scales respectively. Hence, each input
(t, m) corresponds to a pair {Ti , Mj } where Ti (resp. Mj ) is the label associated
to the interval containing t (resp. m). In this case, thereis a unique active rule
in Table 7.1 and the conclusion is easy to reach. For example, let us consider the
following intervals:

Tlabels Cold Cool Warm Hot

Temperatures (o C) (, 17.5) [17.5, 22.5) [22.5 27.5) [27.5, +)

Table 7.6: Intervals associated to labels Ti

Mlabels Low Medium High

Soil water content (cg/g) [0, 15) [15, 25) [25, 100]

Table 7.7: Intervals associated to labels Mj

If (t, m) = (29, 16), then the associated labels are {Hot, M edium} and there-
fore, the only active rule is R4 whose conclusion is watering time is long. Thus,
if we keep the interpretation of long given in Table 7.4 the numerical output is
This process is simple but has serious drawbacks. The granularity of the lan-
guage used to describe the current state of the system is poor and many signif-
icantly different states are seen as equivalent. This is the case, for example, of
the two inputs (17.5, 15) and (22.4, 24.9) that both translate as (Cool, M edium).
On the contrary, for some other pairs of inputs that are very similar, the trans-
lation diverges. This is the case of (17.4, 14.9) and (17.5, 15) that respectively

give (Cold, Low) and (Cool, M edium). In the first case, rule R10 is activated and
a zero watering time is decided. In the second case, rule R6 is activated and a
medium watering time is recommended, 20 minutes according to Table 7.4. Such
discontinuities cannot really be justified and make the output f (t, m) arbitrarily
sensitive to the inputs (t, m). This is not suitable because such decision systems
are often included in a permanent observation/reaction loop. Suppose for example
that several consecutive situations of temperature and moisture in a stable situa-
tion yield different values for parameter t and m due to the imperfection of gauges
and that these variations occur around a point of discontinuity in the system.
This can produce alternated sequences of outputs such as Short, Zero, Medium,
Zero, leading to alternate starts and stops of the system, and possibly leading to
It is true that narrowing the intervals and multiplying the labels would reduce
these drawbacks and refine the granularity of the description, but the number
of rules necessary to characterise f would grow significantly with the number of
labels. Expressing so many labels and rules requires a very important cognitive
effort that cannot reasonably be expected from the expert. Nevertheless, reducing
discontinuity induced by interval boundaries without multiplying labels is possible.
A first option for this is allowing for overlap between consecutive intervals, as
shown below.

Labels represented by overlapping intervals

In order to improve on the previous solution, we have to specify the links between
the values of physical variables describing the system and the symbolic labels used
to describe the current state of the system more carefully. Since it is difficult
to separate such intervals with precise boundaries, one can make them partially
overlap. As a consequence, in some intermediary areas of the temperature scale,
two consecutive labels are associated to a given temperature, reflecting the possible
hesitation of the gardener in the choice of a unique label. Typically, if Warm and
Hot are represented by intervals [20, 30] and [25, +) respectively, 29o C becomes
a temperature compatible with the two labels. More precisely, from 20o C to
25o C, Warm is a valid label (a possible source of rule activation) but not Hot,
from 25o C to 30o C both labels are valid, and from 30o C, hot is valid but not
warm. This progressive transition between the two states warm and hot refines
the initial sharp transition from warm to hot by introducing an intermediary state
corresponding to an hesitation between the two labels. This is more realistic,
especially because there is no reasonable way of separating the warm and hot
with a precise boundary. Note however that measuring a temperature of 29o C
possibly allow several rules to be active in the same time. This raises a new
problem since these rules may possibly conclude to diverging recommendations
from which a synthesis must be derived. Any output label (labels Wk in the
example) must be translated by numbers and these numbers must be aggregated
to obtain the numerical output of the system (the value of w in the example). Thus,
the definition of a numerical output can be seen as an aggregation problem, where
aggregation is used to interpolate between conflicting rules. As an illustration, we

assume now that the labels are represented by the intervals given in Tables 7.8
and 7.9:

Tlabels Cold Cool Warm Hot

Temperatures (o C) (, 20] [15, 25] [20, 30] [25, +)

Table 7.8: Intervals associated to labels Ti

Mlabels Low Medium High

Soil water content (cg/g) [0, 20] [10, 30] [20, 100]

Table 7.9: Intervals associated to labels Mj

If the observation of the current situation is t = 29o C and m = 16cg/g, the

relevant labels are {W arm, Hot} for temperature and {M edium, High} for mois-
ture. These qualitative labels allow some of the gardeners rules to be activated,
namely R1 , R2 , R4 , R5 . This gives several symbolic values for the watering dura-
tion, namely Medium (by R5 ), Long (by R2 , R4 ) and VeryLong (by R1 ). Therefore,
we can observe 3 conflicting recommendations and the final decision must be de-
rived from a synthesis of these results. Of course, defining what could be a fair
synthesis of conflicting qualitative outputs is not an easy task. Deriving a numer-
ical duration from this synthesis is not any easier.
A simple idea is to process symbols as numbers. For this, one can link symbolic
and numerical information using Table 7.4. In the example, we obtain three dif-
ferent durations, i.e 20, 35 and 60 minutes that must be aggregated. For example,
one can calculate the arithmetic mean of the 3 outputs. More generally, we can
define a weight (R) for each decision rule R in the gardener database B. This
weight represents the activity of the rule and, by convention, for any state (t, m),
we set (R) = 1 when the decision rule R is activated and (R) = 0 otherwise.
Let B() denote the subset of rules concluding to a watering time . For any
possible value of w, a weight () measuring the activity or importance of the
set B() can be defined as a continuous and increasing function of the quantities
(R), R B(). For example, we can choose:

(7.2) () = sup (R)


Hence, each watering time activated by at least one rule receives the weight 1
and any other time receives the weight 0. For example, with the observation
(t, m) = (29, 16), we have seen that the active rules are R1 , R2 , R4 and R5 and
therefore (R1 ) = (R2 ) = (R4 ) = (R5 ) = 1 whereas (R) = 0 for any
other rule R. Let us now present in detail the calculation of (35). Since 35
(minutes) is the scalar translation of Long, we obtain from the gardeners rules
B(35) = {R2 , R3 , R4 }. Hence (35) = sup{(R2 ), (R3 ), (R4 )} = 1. Similarly

we get (20) = 1 thanks to R5 and (60) = 1 thanks to R1 . Because there are no

active rules left, () = 0 for all other .
Another option taking account of the number of rules supporting each time
could be: X
(7.3) () = (R)

Coming back to the example, we now obtain: (60) = (R2 )+ (R3 )+ (R4 ) =
2 whereas the others () remain unchanged. This second option gives more im-
portance to a time supported by several rules than to a time 0 supported by
a single rule. Everything works as if each active rule was voting for a time. The
more a given time is supported by the set of active rules, the more it becomes
important in the calculation of the final watering time. The option (7.3) could
be preferred when the activation of the various rules are independent. On the
contrary, when the activation of a subset of rules necessarily implies that another
subset of rules is also active, one could prefer resorting to (7.2) so as to avoid pos-
sible overweighing due to redundancy in the set of rules. In a practical situation,
one can easily imagine that the choice of one of these options is not easy to justify.
Since there is a finite number of rules, there is only a finite number of times
activated by the rules in a given state. In order to synthesise these different times,
the most popular approach is the centre of gravity method which amounts to
performing a weighted sum (see also chapter 6) of all possible times . Formally
the final output is defined by:
(7.4) w = P

From the observation (t, m) = (29, 16), equations (7.2) and (7.4) yield a water-
ing time of (60 + 35 + 20)/3 yielding 38 minutes and 20 seconds, whereas equation
(7.3) yields: w = 0.25 (60 + 35 + 35 + 20) that amounts to 37 minutes and 30
seconds. Note that the choice of a weighted sum as final aggregator in equation
(7.4) is questionable and one could formulate criticisms similar to those addressed
to the weighted average in the previous chapters (especially in chapter 6).

In this approach, as in the linear interpolation approach used in the previous

subsection, the final result has been obtained as a result of the following sequence:

1. read the current values of input parameters t and m

2. find the symbolic qualifiers that best fit these values

3. detect the decision rules activated by these observations

4. collect the symbolic outputs resulting from the inferences

5. translate symbols into quantitative numerical outputs

6. aggregate these numerical outputs


This process is perhaps the more elementary way of using a set of symbolic decision
rules to build a numerical decision function. It shows a simple illustration of the
so-called computing with words paradigm advocated by Zadeh (see Zadeh 1999).
The main advantages of such a process are the following:

it relies on simple decision rules expressed in a language close to the natural

language used by the expert,
it allows one to define a reasonable decision function allowing numerical
outputs to be computed from any possible numerical input,
if necessary, any decision can be explained very simply. The outputs can
always be presented as a compromise between recommendations derived from
several of the experts decision rules.

Nevertheless, interpreting labels as intervals does not really prevent discontinu-

ous transfers from inputs to outputs. In fact, it is not easy to describe a continuum
of states (characterised by all pairs (t, m) in the gardener example) with a finite
number of labels of type (Ti , Mj ). This induces arbitrary choices in the descrip-
tion of the current state which could disrupt the diagnosis stage and make the
automatic decision process discontinuous, as shown by the following example.

Example (1). Consider two very similar states s1 and s2 characterised by the
observations (t, m) = (25.01, 19.99) and (t, m) = (24.99, 20.01). According to Ta-
bles 7.8 and 7.9, state s1 makes valid the labels {W arm, Hot} for temperature,
and {Low, M edium} for soil moisture. This activates rules R1 , R2 , R4 and R5
whose recommendations are VeryLong, Long, Long, Medium respectively. The
resulting watering time obtained by equation (7.4) is therefore 38 minutes and
45 seconds. Things are really different for s2 however. The valid labels are
{Cool, W arm} for temperature, and {M edium, High} for soil moisture. This
activates rules R5 , R6 , R8 and R9 whose recommendations are Medium, Medium,
Short, VeryShort respectively. The resulting watering time obtained by equation
(7.4) is therefore 13 minutes and 45 seconds. It is worth noting that, despite the
close similarity between states s1 and s2 , there is a significant difference in the wa-
tering times computed from the two input vectors. This is due to the discontinuity
of the transfer function that defines the watering time from the input (t, m) for
(t, m) = (25, 20). In the right neighbourhood of this entry (t > 25 and m < 20),
the decision rules R1 , R2 and R4 are fully active but this is no longer the case in
the left neighbourhood of the point (t < 25 and m > 20) where they are replaced
by rules R6 , R8 and R9 , thus leading to a much shorter time. The activations
and computations performed for s1 and s2 differ significantly. They lead to very
different outputs, despite the similarity of the states.

This criticism is serious, but the difficulty can partly be overcome. It is true
that, depending on the choice of the numerical encoding of the labels, the numer-
ical outputs resulting from the decision rules may vary significantly. Since the
numerical/symbolic and then symbolic/numerical translations are both sources of
arbitrariness, the following question can be raised: why not usenumbers directly?

There are two partial answers: first, in many decision contexts, the possibility
of justifying decisions is a great advantage. Although this is not crucial in our
illustrative example, the ability of automatic decision systems to simulate human
reasoning and explain decision by rules is generally seen as an important advan-
tage. This argument often justifies the use of rule-based systems to automatise
decision-making, even if each decision considered separately is of marginal impor-
tance. Second, there are several ways of improving the process proposed above
and of refining the formal relationship between qualitative labels and numerical
values. It is not our purpose to cover all possibilities in detail. We only present and
discuss some very simple and intuitive ideas used to construct more sophisticated
models and tools in this context.

7.2.5 Interpreting input labels as fuzzy intervals

One step back in the modelling process, we can redefine the relationship between
a given label and the numerical scale associated to the label more precisely. As an
expert, the gardener can easily specify the typical temperatures associated with
each label. He can also define areas that are definitely not concerned with each
label. For example, he could explain that Warm means between 20 and 30 degrees
with 25 as the most plausible value. More precisely, one can define the relative
likelihood of each temperature when the temperature has been qualified as Hot,
Warm, Cool or Cold. In this case, each label Ti is represented by a [0, 1]-valued
function Ti defined on the temperature scale in such a way that Ti (t) represents
the compatibility degree between temperature t and label Ti . As a convention,
we set Ti (t) = 0 when temperature t is not connected to the label Ti , and Ti (t)
= 1 when t is perfectly representative of the label. Thus, each label Ti is defined
with fuzzy boundaries and characterised by the function Ti . These fuzzy labels
can partially overlap but they must be defined in such a way that any part of the
temperature scale is covered by at least one label. A simple example of such fuzzy
labels is represented in Figures 7.3 and 7.4.

Figure 7.3: Fuzzy labels for the air temperature

Note that sometimes, the fuzzy labels are defined in such a way that member-
ship adds up to 1 for any possible value of the numerical parameter. This is the
case of labels defined in figure 7.4 for which we have:

(7.5) m 0, Low (m) + M edium (m) + High (m) = 1


Figure 7.4: Fuzzy labels for the soil moisture

This property (7.5) the numerical translation of a natural condition requiring

that the fuzzy labels Low, Medium and High form a partition of the set of possible
moistures. Note however that this property makes sense only when membership
values have a cardinal meaning.
With such fuzzy labels, each decision rule can be activated to a certain degree.
This is the degree to which the numerical inputs match the premises of the rule.
More precisely, for any rule Rij of type:

if T is Ti and M is Mj then W is Wk

where Wk = F (Ti , Mj ), and for any numerical observation (t, m), the weight (or
activation degree) ij of the rule Rij reflects the importance (or relevance) of the
rule in the current situation. This importance depends on the matching of the
input (t, m) and the premise (Ti , Mj ). It is therefore natural to state:

(7.6) ij = h(Ti (t), Mj (m))

where h is an aggregation function representing the logical and used in the rule,
e.g. h(x, y) = min(x, y).

As a numerical example, consider the gardeners rule R1 . The observation

(t, m) = (29, 16) leads to Hot (t) = 0.8 and Low (m) = 0.4. Thus, the temperature
is Hot to the degree 0.8 and the moisture is Low to the degree 0.4 and therefore,
the weight of the rule R1 is min(0.8, 0.4) = 0.4. Using this approach for each rule
with h = min yields the following activation weights (see Table 7.10):

ij Ti Cold Cool Warm Hot

Mj Mj \ Ti 0 0 0.2 0.8
Low 0.4 0 (R10 ) 0 (R3 ) 0.2 (R2 ) 0.4 (R1 )
Medium 0.6 0 (R10 ) 0 (R6 ) 0.2 (R5 ) 0.6 (R4 )
High 0 0 (R10 ) 0 (R9 ) 0 (R8 ) 0 (R7 )

Table 7.10: The weights of the rules when (t, m) = (29, 16)

Hence, from equation (7.4) we get

0.4 60 + 0.2 35 + 0.6 35 + 0.2 20
0.4 + 0.2 + 0.6 + 0.2
and therefore the watering time is 40 minutes.
Note that the definition of an aggregation function yields a compromise solution
between the various active decision rules whose outputs are partially conflicting.
In the additive formulation characterised by equation (7.4), everything works as
if each active rule was voting for one candidate chosen in the set Wlabels. The
more the premise of the rule matches the current situation, the more important
the rule is in the voting process. The activation level of each rule is graduated
on the [0, 1] scale and the weights directly reflect the adequacy of the rule in the
current situation. This enables a soft control of the output that can be perfectly
illustrated by the example discussed at the end of subsection 7.2.4. If we consider
the two neighbour states s1 and s2 introduced in this example, and if we choose
h = min in equation (7.6), the resulting activation weights are those given in
Tables 7.11 and 7.12.

ij Ti Cold Cool Warm Hot

Mj Mj \ Ti 0 0 0.998 0.002
Low 0.001 0 (R10 ) 0 (R3 ) 0.001 (R2 ) 0.001 (R1 )
Medium 0.999 0 (R10 ) 0 (R6 ) 0.998 (R5 ) 0.002 (R4 )
High 0 0 (R10 ) 0 (R9 ) 0 (R8 ) 0 (R7 )

Table 7.11: The weights of rules when (t, m) = (25.01, 19.99)

ij Ti Cold Cool Warm Hot

Mj Mj \ Ti 0 0.002 0.998 0
Low 0 0 (R10 ) 0 (R3 ) 0 (R2 ) 0 (R1 )
Medium 0.999 0 (R10 ) 0.002 (R6 ) 0.998 (R5 ) 0 (R4 )
High 0.001 0 (R10 ) 0.001 (R9 ) 0.001 (R8 ) 0 (R7 )

Table 7.12: The weights of rules when (t, m) = (24.99, 20.01)

Hence, using equation (7.4) and Table 7.4, we get w(s1 ) = 20 minutes and
5 seconds as the final output. Similarly, for state s2 , the activation of the rules
obtained from equation (7.6) are only slightly different from those for s1 and
the final output derived from Table 7.12 using equation (7.4) gives w(s2 ) = 19
minutes and 58 seconds. Here, we notice that the activity of each rule does not
vary significantly when passing from state s1 to state s2 . This is due to the
way activation weights are defined and used in the process. These weights depend
continuously on input parameters t and m, and the membership functions defining
the labels have soft variations. As a consequence, since the aggregation function

used to derive the final watering time w is also a continuous function of quantities
(R) (see equation (7.4)), quantity w depends continuously on input parameters t
and m. This explains the observed improvement with respect to the previous model
based on the use of all or nothing activation rules. Thus, the use of fuzzy labels
to interpret input labels has a significant advantage: it makes it possible to define
a continuous transformation of numerical input data (temperature, moisture) into
symbolic variables used in decision rules. The resulting decision system is more
realistic and robust to slight variations of inputs. This advantage is due to the
use of fuzzy sets and has greatly contributed to the practical success of the fuzzy
approach in automatic control (fuzzy control, (see e.g. Mamdani 1981, Sugeno
1985, Bouchon 1995, Gacogne 1997, Nguyen and Sugeno 1998). However, several
criticisms can be addressed to the small fuzzy decision module presented above.
Among them, let us mention the following:

the choice h = min in equation (7.6) requires that quantities of type Ti (t)
and Mj (m) are commensurate. This assumption, which is rarely explicit,
is very strong because it requires much more than comparing the relative
fit of two temperatures (resp. two moistures) to a Label Ti (resp. Mj ).
It also requires comparing the fit of any temperature to any label Ti with
the fit of any moisture to any label Mj . A perfectly sound definition of
such membership values would require more information than can easily be
obtained in practice. Moreover, the choice of min is often justified by the fact
that h is used to evaluate a conjunction between several premises of a given
rule (a conjunction of type temperature is Ti and moisture is Mj ). Note
however that the idea of the conjunction is captured by any other t-norm (see
for instance, Fodor and Roubens (1994)). Thus, the product could perhaps
replace the min and the particular choice of the min is not straightforward.
This is problematic because this choice is not without consequence on the
definition of the watering time.

the interpretation of symbolic labels used to describe outputs of the rules as

scalar values is not easy to justify. Why not use a description of these labels
as intervals, in the same way as for input labels?

The last criticism suggests an improvement of the current system. We have

to sophisticate the previous construction so as to improve the output processing.
Paralleling the treatment of symbolic inputs, we can use intervals or fuzzy inter-
vals later in the process so as to continuously link symbolic outputs of the rules
(Wlabels) to numerical outputs (watering times). This point is discussed in the
next subsection.

7.2.6 Interpreting output labels as (fuzzy) intervals

Suppose for example that Wlabels are no longer described by scalar values but by
subsets of the time scale. For instance, the labels Wk could be represented by a
set of intervals (overlapping or not) with advantages similar to those mentioned
for input labels Ti and Mj . More generally, we assume here that Wlabels are

represented by fuzzy intervals of the time scale. For the sake of illustration, we let
us consider the labels represented in Figure 7.5.

Figure 7.5: Fuzzy labels for the watering time

For any state (t, m) of the system, the range of relevant watering times is the
union of all values compatible with labels Wk derived from active rules. In the
example, the active rules are R1 , R2 , R4 , R5 , and therefore the Wlabels concerned
are Medium, Long and VeryLong. Hence the set of relevant watering times
is [10, 70]. However, all times are not equivalent inside this set. Each of them
represents a possible numerical translation of a label Wk obtained by the acti-
vation of one or several rules. To be fully considered, a time must be perfectly
representative of a label Wk that has been obtained by a fully active rule. In more
nuanced situations, the weight attached to a possible time is function of the fitness
of the times activated to a certain degree by the rules. For example, by analogy
with Mamdanis approach to fuzzy control (Mamdani 1981), the weight of any
watering time can be defined by:

(7.7) t,m () = sup h(Ti (t), Mj (m), Wk ())

Rij B

where B represents the set of rules (here the gardeners rules) and Rij represents
the rule:
If T = Ti and M = Mj then W = Wk
and h is a non-decreasing function of its arguments (in Mamdanis approach,
h = min). The idea in equation (7.7) is that a watering time must receive
an important weight when there is at least one rule Rij whose premises (Ti , Mj )
are valid for the observation (t, m) and whose conclusion Wk is compatible with
. This explains that t,m () is defined as an increasing function of quantities
Ti (t), Mj (m) and Wk (). Notice that equation (7.7) is a natural extension
of equation (7.2). In our example, the observation (t, m) = (29, 16) leads to a
function 29,16 (w) represented in Figure 7.6.
In order to obtain a precise watering time, we can use an equation similar to
(7.4). However, this equation must be generalised because there may be an infinity
of times activated by the rules (e.g. a whole interval). The usual extension of the
weighted average to an infinite set of values is given by the following integral:
t,m () d
(7.8) w= R
t,m () d

Figure 7.6: Weighted times induced by rules

that can be approximated by the following quantity:

i t,m (i ).i
(7.9) w= P
i t,m (i )

where (i ) is a strictly increasing sequence of times resulting from a fine discreti-

sation of the time scale. In our example, a discretisation with step 0.1 gives a final
time of 37 minutes and 32 seconds.

This last sophistication meets our objective because it provides a transfer func-
tion f with good continuity properties. However, the use of equations (7.77.9)
can be seriously criticised:

the definition of t,m () proposed in equation (7.7) from an increasing ag-

gregation function h is not very natural. Indeed, bearing in mind the form
of rule Rij , the quantity h(Ti (t), Mj (m), Wk ()) stands for the numerical
translation of the proposition:

(Ti = t and Mj = m) implies Wk =

In the fields of multi-valued logic and fuzzy sets theory, admissible functions
used to translate implications are required to be non-increasing with re-
spect to the value of the left hand-side of the implication and non-decreasing
with respect to the value of the right hand-side (Fodor and Roubens 1994,
Bouchon 1995, Perny and Pomerol 1999). As an example the value attached
to the sentence A implies B can be defined by the Lukasiewicz implication
min(1 v(A) + v(B), 1) where v(A) and v(B) are the values of A and B
respectively. In our case, the conjoint use of the min operator to interpret
the conjunction on the left hand-side and that of the Lukasiewicz implication
would lead to the following h function:

h(x, y, z) = min(1 min(x, y) + z, 1)

Note that this function is not increasing in its arguments, as required above in
the text. However, resorting to implication operators instead of conjunctions
in order to implement an inference via rule Rij also seems legitimate. This

is usual in the field of fuzzy inference and approximate reasoning where a

formula like (7.7) is used to generalise the so-called modus ponens inference
rule (Zadeh 1979), (Baldwin 1979), (Dubois and Prade 1988), (Bouchon
1995). To go further in this direction, one could also discuss the use of min
to interpret a conjunction whereas the Lukasiewicz implication is used to
interpret implications. A reasonable alternative to min(x, y) could be the
Lukasiewicz t-norm: max(x + y 1, 0). As a conclusion, the definition of h is
not straightforward and must be justified in the context of the application.
Some general guidelines for choosing a suitable h are given in (Bouchon

Equation (7.7) requires even more commensurability than equation (7.6).

Now, inequalities of type Ti (t) > Wk () play a role in the process. Thus,
we should be able to determine whether any temperature t is a better rep-
resentative of a label Ti than time is representative of label Wk . This
is a very strong assumption, especially if we consider the way these labels
are represented in the model. Usually, a label thought as a fuzzy interval is
assessed on the basis of 3 elements:

the support, i.e. the interval of all numerical values compatible with
the label, their membership must be strictly positive,
the core, i.e. the interval of all numerical values perfectly representative
of the label (the core is a subset of the support), their membership is
equal to 1,
the membership function making a continuous transition from the bor-
der of the support to the border of the kernel.

For example, the label Long in Figure 7.5 is defined by support [20, 55], core
[30, 40] and two linear transitions (membership to non-membership) in the
range [20, 30] [40, 55]. One could expect that the decision-maker is able to
specify the support and core of each fuzzy label, as well as the trend of the
membership function (increasing from the border of the support to the border
of the core). Even with this information, however, the choice of a precise
membership function often remains arbitrary. The above information leaves
room for an infinity of functions. In practice, the shape of the membership
function in the transition area is often chosen as linear or gaussian (for
derivability) but rarely justified by questioning the decision-maker. Thus,
in many cases, the only reliable information contained in the membership
function is the relative adequation of each temperature, moisture, time, to
each label. For example, Long (21) = 0.1 and Long (25) = 0.5 only means
that 25 minutes is a better numerical translation of the qualifier Long than
21 minutes. This does not necessarily mean that 25 minutes is more Long
than 30 minutes is Medium, even if M edium (30) = 0.4, nor that 25 minutes
is more Long than 26o C is Hot, even if Hot (26o ) = 0.2. However, without
such assumptions, the definition of weights t,m () in equation (7.8) with
h = min is difficult to justify.

Bearing in mind that the weights t,m are used as cardinal weights in (7.4)
while they are defined from membership values Ti (t), Mj (m), and Wk (),
the membership values should have a cardinal interpretation. This is one
more very strong hypothesis. For example, we need to consider that 25
minutes is 5 times better than 21 minutes to represent long, because the
membership value is 5 times larger. Even when the commensurability as-
sumption of membership scales is realistic, the weights cannot necessarily
be interpreted as cardinal values and the weighted aggregation proposed in
equation (7.8) is questionable.

As an illustration of the latter, consider the following example showing the im-
pact of an increasing transformation of membership values on the output watering
Example (2). Consider the two following input vectors i1 = (29, 29) and i2 =
(18, 16). These two inputs lead to activation weights given in Tables 7.13 and
7.14. Then, for the sake of simplicity, we use the non-fuzzy labels given in Table
7.4 for interpretation of labels Wk . Then, assuming we use equations (7.2) and
(7.4) to define the watering time w, we obtain the following result: w(i1 ) = 19
minutes and 33 seconds and w(i2 ) = 21 minutes and 40 seconds. Notice that
the times as not so different, despite the important difference between inputs i1
and i2 . This can be easily explained by observing that, in the second case, the
temperature is lower, but the soil water content is also lower, and the two aspects
compensate each other. Now, we transform all membership functions of the labels
by the function (x) = 3 x. This preserves the support and the core of each label,
as well as the slope (increasing or decreasing) of membership functions. In fact,
it represents the same ordinal information about membership degrees. However,
the activation tables are altered as shown in Tables 7.15 and 7.16. This gives the
following watering times: w(i1 ) = 20 minutes and 34 seconds, w(i2 ) = 19 minutes
and 42 seconds. Note that we now have w(i1 ) > w(i2 ) whereas it was just the
opposite before the transformation of membership values.

ij Ti Cold Cool Warm Hot

Mj Mj \ Ti 0 0 0.2 0.8
Low 0 0 (R10 ) 0 (R3 ) 0 (R2 ) 0 (R1 )
Medium 0.1 0 (R10 ) 0 (R6 ) 0.1 (R5 ) 0.1 (R4 )
High 0.9 0 (R10 ) 0 (R9 ) 0.2 (R8 ) 0.8 (R7 )

Table 7.13: The weights of the rules for input i1

This example shows that comparison of output values is not invariant to mono-
tonic transformations of membership values and this explains the more than ordi-
nal interpretation of membership values in the computation of w. Although this
inversion of duration is not a crucial problem in the case of the watering system,
it could be more problematic in other contexts. For instance, if we use a similar
system (based on fuzzy rules) to rank candidates in a competition, the choice of

ij Ti Cold Cool Warm Hot

Mj Mj \ Ti 0.2 0.6 0 0
Low 0.4 0.2 (R10 ) 0.4 (R3 ) 0 (R2 ) 0 (R1 )
Medium 0.6 0.2 (R10 ) 0.6 (R6 ) 0 (R5 ) 0 (R4 )
High 0 0 (R10 ) 0 (R9 ) 0 (R8 ) 0 (R7 )

Table 7.14: The weights of the rules for input i2

ij Ti Cold Cool Warm Hot

Mj Mj \ Ti 0 0 0.585 0.928
Low 0 0 (R10 ) 0 (R3 ) 0 (R2 ) 0 (R1 )
Medium 0.464 0 (R10 ) 0 (R6 ) 0.464 (R5 ) 0.464 (R4 )
High 0.965 0 (R10 ) 0 (R9 ) 0.585 (R8 ) 0.928 (R7 )

Table 7.15: The modified weights of the rules for input i1

ij Ti Cold Cool Warm Hot

Mj Mj \ Ti 0.585 0.843 0 0
Low 0.737 0.585 (R10 ) 0.737 (R3 ) 0 (R2 ) 0 (R1 )
Medium 0.843 0.585 (R10 ) 0.843 (R6 ) 0 (R5 ) 0 (R4 )
High 0 0 (R10 ) 0 (R9 ) 0 (R8 ) 0 (R7 )

Table 7.16: The modified weights of the rules for input i2

a particular shape for membership must be well justified because it may really
change the winner.
Another possibility is resorting to other aggregation methods that do not re-
quire the same level of information. Several alternatives to the weighted sum are
compatible with ordinal weights, e.g. Sugeno integrals (see Sugeno 1977, Dubois
and Prade 1987), and could be used advantageously to process ordinal weights.
However, they also have some limitations. They are not as discriminating as the
weighted sum and they cannot completely avoid commensurability problems (see
Dubois, Prade and Sabbadin 1998, Fargier and Perny 1999).

There is no room here to discuss the use of numerical representations in rule-

based automatic decision systems further.
To go further with rule-based systems using fuzzy sets, the reader should con-
sult the literature about fuzzy inference and fuzzy control, which has received
much attention in the past decades. As a first set of references for theory and
applications, one can consult (Mamdani 1981), (Sugeno 1985), (Bouchon 1995),
(Gacogne 1997) and (Nguyen and Sugeno 1998) for a recent synthesis on the sub-
ject. These works present formal models but also empirical principles derived from
practical applications and thus provide a variety of techniques that have proved

efficient in practice. Moreover, some theoretical justifications of choices of repre-

sentations and operators are now available, bringing justifications to some methods
used by engineers in practical applications and also suggesting also multiple im-
provements (see Dubois, Prade and Ughetto 1999).

7.3 A System with Implicit Decision Rules

7.3.1 Controlling the quality of biscuits during baking
The control of food processes is a typical example where humans traditionally play
an important role to preserve the standard quality of the product. The overall
efficiency of production lines and the quality of the final product highly depend
on the ability of human supervisors to identify a degradation of the quality of
the final product and on their aptitude to best fit the control parameters to the
current situation.
As an example, let us report some elements of an application concerning the
control of the quality of biscuits through oven regulation during baking (for more
details see Trystram, Perrot and Guely 1995, Perrot, Trystram, Le Guennec and
Guely 1996, Perrot 1997, Grabisch et al. 1997).
In the field of biscuit manufacturing, human operators controlling biscuit bak-
ing lines have the possibility of regulating the ovens during the baking process.
This implies periodic evaluation, diagnosis and decision tasks that could perhaps
be automatised. However, such automatisation is not obvious because human ex-
pertise in oven control during the baking of biscuits mainly relies on a subjective
evaluation, e.g. a visual inspection of the general aspect, the colour of the bis-
cuits and on the operators skill in reacting to possible perturbations of the baking
For instance, when an overcooked biscuit is detected, the operator properly
retroacts on the oven settings after checking its current temperature. In the case
of an automatic system, the only information accessible to the system consists of
physical objective parameters obtained from measures and sensors, which are not
easily linked to human perception.
In the example of automatic diagnosis during baking, the only available mea-
sures are the following:

a sensor located in the oven measures the air moisture, within the oven, near
the biscuit line. The evaluation m is given in cg/g (centigrams per one gram
of dry matter) in the range [0, 10] with the desired values being around 4

the thickness t of the biscuit is measured every 10 minutes. t is defined as

the mean of 6 consecutive measures performed on biscuits and expressed in
mm and the desired values are about 33 or 34 mm.

concerning the biscuit aspect, a colour sensor is located in the oven. It

measures colours with 3 parameters, which are the luminance L, a level a on

the red-green axis and a level b on the yellow-blue axis. The desired color is
not easy to specify.

Moreover, it is not always possible to obtain sufficiently explicit knowledge

from the expert to construct a satisfactory rule database (in section 7.4 we will
see an approach integrating expert rules in the control of baking). Sometimes, the
only information accessible must be directly inferred from the experts observation
during his control activity. Hence, following the approach adopted in section 7.2
seems problematic, especially concerning the aspect of the biscuit that cannot
be easily linked by the expert to the physical parameters (L, a, b) measured by
an automatic system. The following subsection presents an alternative way of
establishing this link using similarity from known examples.

7.3.2 Automatising human decisions by learning from ex-

In performing oven control, the decision-making process consists of two consecutive
stages: a diagnosis stage, which consists in evaluating the state of the last biscuits,
and a decision stage, which must determine a regulation action on the oven, if
necessary. Like in many other domains, the diagnosis task performed by the
expert controlling baking can be seen as a pattern recognition task. It is not
unrealistic to assume that usual disfunctions have been identified and categorised
by the expert and that for each of them, a standard regulation action is known.
Thus, assuming that a finite list of categories is implicitly used by the expert
(each of them being associated to a pattern, i.e. a characteristic set of irregular
biscuits) the diagnosis stage consists in identifying the relevant pattern for any
irregular biscuit and the decision stage consists in performing the regulation action
appropriate to the pattern.
In this context, the patterns are implicit and subjective. They can be approx-
imated by observing the action of a human controller on the oven in a variety of
cases. However, we can construct an explicit representation of patterns in a more
objective space formed by the observable variables. In this space, subjective
evaluation of biscuits can be partially explained by their objective description.
Assuming a representative sample of biscuit is available, using sensors, we can
represent each biscuit i of the sample by a vector xi = (mi , ti , Li , ai , bi ) in the
multiple attribute space of physical variables used to describe biscuits. Then, each
biscuit can be evaluated by the expert and a diagnosis of disfunction d(xi ) can
be obtained for each description xi , explaining the bad quality of biscuit i (e.g.
oven too hot, oven not hot enough). Hence, a pattern associated to each
disfunction z is defined by the set of points xi such that d(xi ) = z. Determining
the right pattern for any new input vector x is a classification problem where the
categories C1 , . . . , Cq are the q possible disfunctions and the objects to be assigned
are vectors x = (m, t, L, a, b).

Let X be the set of all possible vectors x = (m, t, L, a, b) describing an object

(e.g. a biscuit), a classification procedure can be seen as a function assigning to

each vector x X the vector (C1 (x), . . . , Cq (x)) giving the membership of x
to each category (e.g possible disfunction of the oven). One of the most popular
classification methods is the so called Bayes rule which is known to minimise the
expected error rate. However, the rule requires knowing the prior and conditional
probability densities of all categories, which is not frequent in practice. When this
information is not available (this is the case in our example) the nearest neighbour
algorithm is very useful. The basic principle of the kNearest Neighbour assign-
ment rule (kNN) introduced in (Fix and Hodges 1951) is to assign an object to
the class to which the majority of its k-nearest neighbours belong.
More precisely, for any sample S X of vectors whose correct assignment is
known, if Nk (x) represents the subset of S formed by the k nearest neighbours of
x within S, the kNN rule is defined for any k {1, . . . , n} by:
1 if j = Arg maxi { yNk (x) Ci (y)}
(7.10) Cj (x) =
0 otherwise

where Arg maxi , g(i) represents, the value i for which g(i) is maximal. This
supposes that the maximum is reached for a unique i. When this is not the
case, one can use a second criterion for discriminating between all g-maximal
P or, alternatively, choose all of them. In equation (7.10), function g(i)
equals yNk (x) Ci (y) and represents the total number of vectors, among the
k-nearest neighbours of x that have been assigned to category i.
It has been proved that the error rate of the kNN rule tends towards the
optimal Bayes error rate when both k and n tend to infinity while k/n tends to 0
(see Cover and Hart 1967). The main drawback of the k N N procedure is that
all elements of Nk (x) are equally weighted. Indeed, in most cases, the neighbours
are not equally distant from x and one may prefer to give less importance to
neighbours very distant from x. For this reason, several weighted extensions of
the kNN algorithm has been proposed (see Keller, Gray and Givens 1985, Bezdek,
Chuah and Leep 1986, Bereau and Dubuisson 1991). For example, the fuzzy kNN
rule proposed by Keller et al. (1985) is defined by:
P Cj (y)
yNk (x) 2
kxyk m1
(7.11) Cj (x) = P 1
yNk (x) 2
kxyk m1

where m (1, +) is a technical parameter. Note that membership induction of a

new input x is also a matter of aggregation. Indeed, the membership value Cj (x)
is defined as the weighted average of quantities Cj (y), y Nk (x), weighted by
coefficients inversely proportional to a power of the Euclidean distance between x
and y. This formula seems natural but several points are questionable. Firstly,
the choice of the weighted sum as an aggregator of membership values Cj (y)
for all y in the neighbourhood Nk is not straightforward. It includes several im-
plicit assumptions that are not necessarily valid (see chapter 6) and alternative
compromise aggregators could possibly be used advantageously. The choice of a
compromise operator itself can be criticised and one can readily imagine cases
where a disjunctive or a conjunctive operator should be preferred. Moreover, even

when the weighted arithmetic mean seems convenient, the use of weights linked to
distances of type k x y k and to parameter m is not obvious. Indeed, the norm
of x y is not necessarily a good measure of the relative dissimilarity between the
two biscuits represented by x and y. This is the case, for instance, when units
are different and non commensurate on the various axis. In order to distinguish
between significant and non significant differences on each dimension, one may
include discrimination thresholds (see chapter 6) in the comparison, allowing to
distinguish differences that are significant for the expert from those that are neg-
ligible. This is particularly suitable in the field of subjective evaluation in which
preferences and perceptions of the expert (or decision-maker) are not usually lin-
early related to the observable parameters. For instance, one could define a fuzzy
similarity relation (x, y) as a function of quantities of type k xi yi k for any
attribute i, representing the relative closeness of x and y for the expert. Then, we
can use a general aggregation rule of type:

(7.12) Cj (x) = (Cj (y1 ), . . . , Cj (yk ); (x, y1 ), . . . , (x, yk ))

where Nk (x) = {y1 , . . . , yk } and is an aggregation function.

This is the proposition made in (Henriet 1995), (Henriet and Perny 1996) and
(Perny and Zucker 1999) where the membership of Cj (x) is defined by:
(7.13) Cj (x) = 1 (1 (x, yi ).Cj (yi ))

and (x, y) is the weighted average of one-dimensional similarity indices (i

(x, y), one per attribute i) defined as follows:

if |xi yi | qi

|xi yi |qi
(7.14) i (x, y) = pi qi if qi < |xi yi | < pi
if |xi yi | pi

In the above formula, qi and pi are thresholds (possibly varying with the level
xi or yi ) used to define a continuous transition from full similarity to dissimilarity
as shown in the example given in Figure 7.7. It should be noted however that the
definition of similarity indices i (x, y) is very demanding. It requires assessing
two thresholds for attribute level xi . Moreover the linear transition from similarity
to non-similarity is not easy to justify and a full justification of the shape of the
similarity function i would require a lot of information about difference of type
xi yi . Usually, the construction of such similarity functions is only based on
empirical evidence and common sense principles.

Coming back to the example, the kNN algorithm can be used for periodically
computing two coefficients too hot (x) and not hot enough (x). These coeffi-
cients evaluate the necessity for a regulation action, by analysing the measure x
of the last biscuit. For instance, too hot (x) = 1 and not hot enough (x) = 0
means that decreasing the oven temperature is necessary. The decision process

~ i (x, y)

0 yi
x i - pi xi - qi xi + qi x i + pi

Figure 7.7: One-dimensional similarity indices i (x, y)

is improved if we use the fuzzy version of the kNN algorithm in the diagno-
sis stage. In this case, the values too hot (x) and not hot enough (x) possibly
take any value within the unit interval, and these values can be interpreted as
indicators of the amplitude of the regulation and help the system in choosing a
soft regulation action. The main drawback of this automatic decision process is
the absence of explicit decision rules explaining the regulation actions. This is
not a real drawback in this context because the quality of biscuits is a sufficient
argument for validation. However, in many other decision problems involving an
automatic system, e.g. the automatic pre-filtering of loan files in a bank, the need
for explanations is more crucial, first to validate a priori the system, and secondly
to explain decisions a posteriori to the clients. The use of rules in the context of
baking control is discussed in the next section.

7.4 An hybrid approach for automatic decision-

In the case reported in (Perrot 1997) about the control of biscuits during baking,
the diagnosis stage was not uniquely based on the kNN algorithm. Indeed, in
this application, it was possible to elicit decision rules for the diagnosis stage.
Actually, the quality of the biscuit is evaluated by the expert on the basis of
3 attributes, subjectively evaluated, which are the moisture (m), the thickness
(t) and the aspect of the biscuit (colour). The qualifiers used for labelling these
attributes are:
moisture: dry, normal, humid
thickness: too thin, good, too thick
aspect burned, overdone, done, underdone, not done,
Then, the human expertise in the diagnosis stage is expressed using these labels
by rules of type:

If moisture is normal or dry and colour is overdone

then the oven is too hot

If moisture is humid or normal and colour is underdone

then the oven is not hot enough

It has therefore been decided to construct membership functions linking param-

eters (m, t, L, a, b) to the labels used in the rules, in order to be able to implement
a hybrid approach based on kNN algorithms to get a fuzzy symbolic description
of the biscuit and the fuzzy rule-based approach presented in section 7.2 to in-
fer a regulation action. The numeric-symbolic translation is natural for moisture
and thickness. The labels used for these two parameters are represented by the
following fuzzy sets (see Figures 7.8 and 7.9).

dry normal humid


0 m
3 3.8 4.7 5.8

Figure 7.8: Fuzzy labels used to describe biscuit moisture

too thin good too thick


0 t
28 32 35 38 (mm)

Figure 7.9: Fuzzy labels used to describe biscuit thickness

The translation is more difficult for labels used for the biscuit aspect because
the aspect is represented by a fuzzy subset of the 3-dimensional space characterised
by the components (L, a, b). This problem has been solved by the fuzzy kNN
algorithm. It is indeed sufficient to ask an expert in baking control to qualify,
with a label yi each element i of a representative sample of biscuits, using only
the 5 labels introduced to describe aspect. At the same time, the sensors assess
the vector xi = (Li , ai , bi ) describing the biscuit i in the physical space. Then the
fuzzy kNN algorithm is applied with reference points (xi , yi ) for all biscuits i
in the sample. For any input x = (L, a, b) it gives the membership values yj (x)

for any label yj , j {1, . . . , 5} used to describe the biscuits aspect. The fuzzy
nearest neighbour algorithm provides a representation of labels yj , j = 1, . . . , 5 by
fuzzy subsets of the (L, a, b) space. This makes it possible to resort to the fuzzy
control approach presented in section 7.2.

In the biscuit example, the integration of the k N N algorithm to a fuzzy

rule-based system provides a soft automatic decision system whose action can be
explained by the experts rules. This control system can be integrated within a
continuous regulation loop, alternating action and retroaction steps, as illustrated
in Figure 7.10

too hot (x)
t Diagnosis Decision
x L
Module Module
a not hot enough(x)

Measures Baking oven


Figure 7.10: The action-retroaction loop controlling baking

7.5 Conclusion
We have presented simple examples illustrating some basic techniques used to
simulate human diagnosis, reasoning and decision-making, in the context of re-
peated decision problems, convenient for an automatisation. We have shown the
importance of constructing suitable mathematical representation of knowledge and
decision rules. The task is difficult because human diagnosis is mainly based on
human perception whereas sensors naturally give numerical measures, and be-
cause human reasoning is mainly based on words and propositions drawn from the
natural language, whereas computers are basically suited to perform numerical
computations. As shown in this chapter, some simple and intuitive formal mod-
els have been proposed, enabling to establish a formal correspondence between
symbolic and numeric information. They are based on the definition of fuzzy sets
linking labels to observable numerical measures through membership functions.
However, a proper use of these fuzzy sets requires a very careful analysis. Indeed,
we have shown that many apparently natural choices in the modelling process
possibly hide strong assumptions that can turn out to be false in practice. For
instance, small numerical examples given in the chapter show that, in the context
of rule based control systems, the output of the system highly depends on the
choice of numbers used to represent symbolic knowledge. In particular, one must
be aware that multiplying arbitrary choices in the construction of membership
functions can make the output of the system completely meaningless.

Moreover, we have shown that, at any level of computation, there is a need

of weighting propositions and aggregating numerical information. This shows the
great importance of mastering the variety of aggregation operations, their proper-
ties and the constraints to be satisfied in order to preserve the meaningfulness of
conclusions. It must be clear that by not thoroughly respecting these constraints,
the outputs of any automatic decision system are more the consequences of arbi-
trary choices in the modelling process than those of a sound deduction justified
by the observations and the decision rules. Designing an automatic decision pro-
cess in which the arbitrary choice of numbers used to represent knowledge is more
decisive than the knowledge itself is certainly the main pitfall of the modelling
Since one cannot reasonably expect to avoid all arbitrary choices in the mod-
elling process, both theoretical and empirical validations of the decision system are
necessary. The theoretical validation consists in investigating the mathematical
properties of the transfer function that forms the core of the decision module. This
is the opportunity to control the continuity and the derivatives of the function, but
also to check whether the computation of the outputs is meaningful with respect
to the nature of the information given to the system as input. The empirical or
practical validation consists in testing the decisional behaviour of the system in
various typical states of the system. It takes the form of trial and errors sequences
enabling a progressive tuning of the fuzzy-rule based model to better approxi-
mate the expected decisional behaviour. This can be used to determine suitable
membership functions characterising the rules. This can even be used to learn
the rules themselves. Indeed, when a sufficiently rich basis of examples is avail-
able, the rules and the membership values can be learned automatically (see e.g
Bouchon-Meunier and Marsala 1999) or (Nauck and Kruse 1999) for neuro-fuzzy
methods in fuzzy rule generation. The neuro-fuzzy approach is very interesting
for designing an automatic decision system, because it takes advantage of the ef-
ficiency of neural networks while preserving the easy to interpret feature of a
rule based-system. Notice however that, due to the need for learning examples to
show the system what the right decisions in a great number of situations are, the
learning-oriented approach is only possible when the decision task is completely
understood and mastered by a human. This is usually the case when the automa-
tisation of a decision task is expected, but one should be aware that this approach
is not easily transposable to more complex decision situations where preferences
as well as decision rules are still to be constructed.

8.1 Introduction

In this chapter, we describe an application that was the theme of a research col-
laboration between an academic institution and a large company in charge of the
production and distribution of electricity. We do not give an exhaustive descrip-
tion of the work that was done and of the decision-aiding tool that was developed.
A detailed presentation of the first discussions, of the progressive formulation of
the problem, of the assumptions chosen, of the hesitations and backtrackings,
of the difficulties encountered, of the methodology adopted and of the resulting
software would require nearly a whole book. Our purpose is to point out some
characteristics of the problem, especially on the modelling of uncertainties. The
description was thus voluntarily simplified and some aspects, of minor interest in
the framework of this book, were neglected. The main purpose of this presenta-
tion is to show how difficult it is to build (or to improvise) a pragmatic decision
model that is consistent and sound. It illustrates the interest and the importance
of having well-studied formal models at our disposal when we are confronted with
a decision problem. Sections 8.2 and 8.3 present the context of the application
and the model that was established. Section 8.4 is based on a didactical example:
it first illustrates and comments some traditional approaches that could have been
used in the application; then it gives a detailed description of the approach that
was applied in the concrete case. Section 8.5 provides some general comments on
the advantages and drawbacks of this approach.

8.2 The context

The company must periodically make some choices for the construction or closure


of coal, gas and nuclear power stations, in order to ensure the production of elec-
tricity and satisfy demand. Due to the diversity of points of view to be taken into
account, the managers of the production department wanted to develop a multiple
criteria approach for evaluating and comparing potential actions. They considered
that aggregating financial, technical and environmental points of view into a type
of generalised cost (see Chapter 5) was neither possible nor very serious. A collab-
oration was established between the company and an academic department (we
will call it the analyst) that rapidly discovered that, beside the multiple criteria
aspect, an enormous set of potential actions, a significant temporal dimension and
a very high level of uncertainty on the data needed to be managed. The next
section points out these aspects through the description of the model as it was
formulated in collaboration with the companys engineers.

8.3 The model

8.3.1 The set of actions
In this chapter, we call decision a choice made at a specific point in time: it
consists in choosing the number of production units of the different types of fuel
(Nuclear, Coal, Gas) to be planned and in specifying whether the downgrade plan
(previously defined by another department of the company) has to be followed,
or partially anticipated (A) or delayed (D). In terms of electricity production and
delay, each unit and modification of the downgrade plan has different specificities
(see Table 8.1).

Type Power (MW) Delay (years)

N 900 9
C 400 6
G 350 3
A 300 0
D +300 0

Table 8.1: Power and construction delay for the different types of production unit

For simplicity, the decisions are only taken at chosen milestones, separated by
a time period of about 3 years (this period between two decisions is called block ).
At most one unit of each type per year may be ordered, and the choice concerning
the downgrade plan (follow, anticipate or delay) is of course exclusive. A decision
for a block of 3 years could thus be for example

{1N, 1C, 2G, A},

meaning that one nuclear, one coal and two gas production units are planned and
that the downgrade plan has to be anticipated.
8.3. THE MODEL 181

Each decision is irrevocable and naturally has consequences for the future, not
only on the production of electricity, as seen in Table 8.1, but also in terms of
investment, exploitation cost, safety, environmental effects, ... (see Section 8.3.2).
An action is a succession of decisions over the whole time period concerned by
the simulation (the horizon), i.e. a period of about 20-25 years or 7 blocks. An
action is thus for example
{1N, 1C, 2G, A}, {1C}, {2G}, {}, {3G}, {1G, 1C}, {1N, 2G} .

The number of possible actions is of course enormous. Even after adding some
simple rulesonly one (or zero) nuclear units are allowed exclusively on the first and
last block, anticipation and delay are only allowed on the first and second blocks,
an anticipation followed by a delay (or the inverse) is forbiddenthe number of
actions is still of around 108 . Many of these actions are completely unrealistic,
as for example no new unit for 20 years or 3G and 3C in every block: they can
be eliminated by fixing reasonable limits on the power production of the park.
In this problem, the decision-maker only kept the actions so that, for each block,
the surplus is less than 1 000 MW and the deficit be less than 200 MW. These
limitations led to a set of approximately 100 000 potential actions. The temporal
dimension of the problem naturally leads to a tree structure for these actions, built
on decision nodes (represented by squares in Figure 8.1). Depending on the block
considered, there are typically between 3 and 30 branches leaving each decision

8.3.2 The set of criteria

The list of criteria was defined by the industrial partner in order to avoid unbear-
able difficulties in data collection and to work on a sufficiently realistic situation.
Remember that the purpose of the study was to build a decision-aiding method-
ology and was not to make a decision. It was important to test the methodology
with a realistic set of criteria but it was also clear that the methodology should be
independent of the criteria chosen. In the application described here, the following
eight criteria were taken into account, for the time period of the simulation:

fuel cost, in Belgian Francs (BEF), to minimise;

exploitation cost, in BEF, to minimise;

investment cost, in BEF, to minimise;

marginal cost, i.e. the amount of total cost for a variation of 1 GWh, in BEF,
to minimise;

deficient power in TWh, to minimise;


A : {}, {2G}, {3G}, {2G}, {3G}, {}, {}

B : {1N, 2G, 2C}, {2C, 1G}, {3C}, {2C}, {1N }, {}, {}

Fuel cost 33 500 31 000 MBEF
Exploitation cost 45 000 49 000 MBEF
Investment cost 360 000 770 000 MBEF
Marginal cost 730 620 KBEF/GWH
Deficient power 16.7 10.3 TWH
CO2 emissions 22 000 16 000 Ktons
SO2 + N Ox emissions 70 48 Ktons
Sales Balance 23 000 30 000 MBEF

Table 8.2: The evaluations of two particular actions

CO2 emissions, in tons, to minimise;

SO2 and N Ox emissions, in tons, to minimise;

purchase and sales balance, in BEF, to maximise.

The evaluations of the actions on these criteria are of course not known with
certainty, because they depend on many factors that are not or not well known by
the decision-maker. The uncertainties have an impact on the evaluations, which
can be direct (the prices of the raw materials influence their total costs) or indirect
(if the gas price increases more than the coal price, the coal power stations will be
more intensively exploited than the gas ones; this will have an impact on the fuel
costs and the environmental impacts of the production park). Table 8.2 presents
an example of evaluations for two particular actions in a scenario where the fuel
price is low and the demand for electricity is relatively weak. Other scenarios must
be envisaged in order to improve the realism and usefulness of the model.

8.3.3 Uncertainties and scenarios

Generally speaking, the determination of the value of a parameter at a given
moment can lead to the following situations:

the value is not known: the value is relative to the past and was not measured,
the value is relative to the present but is technically impossible or very
expensive to obtain, the value is relative to the future for a parameter with
a completely erratic evolution;
8.3. THE MODEL 183

the value can be approximated by an interval: the bounds result from the
properties of the system considered, the interval is due to the imprecision of
the measure or to the use of a forecasting method; sometimes, a probability,
a possibility or a confidence index can be associated with each value of the

the value is not unique: several measures did not yield the same value, several
scenarios are possible; again a probability, a possibility, a confidence index
or the result of a voting process can be associated with each value;

the value is unique but not reliable, with a certain information on the degree
of reliability.

In the particular situation described here, the industrial partner was already
using stochastic programming for the management of the production park. He
wanted to have another methodology in order to take better account of the num-
ber of potential actions and the multiple criteria aspects. For the uncertainties,
however, they were used to working with probabilities and the framework of the
study did not allow to suggest anything else. So, scenarios were defined and subjec-
tive probabilities were assigned to them by the companys experts. More precisely,
two types of uncertainties were distinguished and respectively called aleas and
major uncertainties: the difference between them is based on the more or less
strong dependence between the past and the future. The industrial partner con-
sidered that nuclear availability in the future was completely independent of the
knowledge of the past and called this type of uncertainty alea: this means that
the level of nuclear availability was completely open for each period of three years
(a breakdown at a given time does not imply that there will be no breakdown in
the near future). The selling price of electricity was also considered as an alea
in order to be able to capture the deregulation phenomena due to a forthcoming
new legislation.
The major uncertainties (for which some dependence can exist between the
values at different moments) were the fuel price (the market presents global ten-
dencies and a high price for the first two blocks reinforces the probability of having
a high price for the third one), the demand for electricity (same reasoning) and
the legislation concerning pollution (in this example, the law may change for the
third block, and the uncertain parameters after this block are thus strongly re-
lated: either the same as for the first blocks, or more severe, but in both cases,
constant over all blocks after block 2).
The major uncertainties allow for a learning process that must be taken into
account in the analysis: each decision, at a given time, may use the previous values
of the uncertain parameters and deduce information from them about the future.
This information may modify the choices of the decision-maker. Suppose for in-
stance that a variable x may be equal to 0 or 1 in the future. The corresponding
probabilities are assessed as follows:

P (x = 0) > 0.5, after past scenario A,
P (x = 0) < 0.5, after past scenario B,

where the past scenario is known at the time of decision. The decision-maker
has to choose between two decisions: a and b. If he prefers a when x = 0 and b
when x = 1, a reasonable decision will be to choose a after scenario A and b after
scenario B.
The previous explanation is not valid for aleas, because their independence
does not allow for direct inference from the past.
Because of the statistical dependence and of the possible learning process in
the major uncertainty case, a complete treatment and a tree-structure for these
scenarios (a scenario is a succession of observed uncertainties) are necessary. If
there are 3 levels for the fuel price, 3 levels for the demand, 2 levels for the
legislation, and if the horizon is divided into 7 blocks, there are, a priori, (3 3
2)7 ' 6108 possible scenarios. Fortunately, most of these scenarios are negligible
because the probability of a very fluctuating scenario is very small: the major
uncertainty scenarios are rather strongly correlated, and a sequence of levels for
the fuel price such as HHLMHLH (H for high, M for medium and L for low) is
much less probable than a sequence HHHMMMM. In practice, two sequences were
retained for legislation (MMMMMMM and MMHHHHH ), it was imposed that
scenarios could only change after two blocks, and each modification was penalised
so that very fluctuating scenarios were hardly possible. The analyst finally retained
around 200 representative scenarios that were gathered in a tree-structure of major
uncertainty nodes (represented by circles in Figure 8.1).
Of course, the complete scenario for a decision node at time t is not known but
a probability is associated to each of them, allowing to compute the conditional
probability of each complete scenario knowing the already observed partial scenario
at time t.
On the contrary, the aleas are by essence uncorrelated and there is no reason
to neglect any scenario. If there are 3 levels for the selling price and 2 levels for the
availability of nuclear units , then the number of scenarios is (3 2)7 = 279 936.
Fortunately, the tree structure of the aleas is obvious: each node gives rise to
the same possibilities, with the same probability distribution. For these reasons,
the aleas act much more simply than the major uncertainties, and it is possible to
take the whole set of scenarios into account.

8.3.4 The temporal dimension

Independently of the dependence between the past and the future in the modelling
of the uncertainties, the temporal dimension plays an important role in this kind
of problem.
First, the time period between the decision to build a certain type of power
station and the beginning of the exploitation of that station is far from being
negligible. Second, some consequences of the decisions appear after a very long
time (as the environmental consequences for example). Third, the consequences
8.3. THE MODEL 185

First First Second Last Last Consequences

decisions period decisions decisions period

Figure 8.1: The decision tree


themselves can be dispersed over rather long periods and vary within these pe-
riods. Fourth, the consequences of a decision can be different according to the
moment that decision is taken. It is rather usual, in planning models, to introduce
a discounting rate that decreases the weight of the evaluations for distant conse-
quences (see Chapter 5) and the industrial partner did this here. However, for a
long term decision problem with important consequences for future generations,
such an approach may not be the best one and the decision-maker could be more
confident in the flexible approach and the richness of the scenarios. That is why
the analyst kept the possibility to introduce discounting or not.

8.3.5 Summary of the model

The complete model can be described by a tree structure including decision nodes
(squares) and uncertainty nodes (circles), as illustrated in Figure 8.1. At t = 0
(square node at the beginning of block 1), a first decision is made (a branch is
chosen) without any information on the scenario, leading to a circle node. During
block 1, one may observe the actual values of the uncertain parameters (nuclear
disponibility, electricity selling price, fuel price, electricity demand and environ-
mental legislation), determining one branch leaving the considered circle node and
leading to one of the decision nodes at time t = 1. A new decision is then made,
taking the previous information into account, and so on until the last decision
(square) node and the last scenario (circle) node that determine the whole action
and the whole observed scenario. In the resulting tree (Figure 8.1), the decision
nodes (squares) correspond to active parts of the analysis where the decision-maker
has to establish his strategy, while the uncertainty nodes (circles) correspond to
passive parts of the analysis where the decision-maker undergoes the modifications
of the parameters.

8.4 A didactic example

Consider Figure 8.2 describing two successive time periods. At time t = 0, two
decisions A and B are eligible; during the first period, two events S and T are
possible, each with probability 1/2. At the beginning of the second period, two
decisions C and D are eligible if the first decision was A and three decisions E,
F, G are eligible if the first decision was B. During the second period, two events
U and V are possible after S (with respective probabilities 1/4 and 3/4) and two
events Y and Z are possible after T (with respective probabilities 3/4 and 1/4).
Figure 8.2 presents the tree and the evaluation of each action (set of decisions)
for each complete scenario. Remark that this didactic example contains only one

evaluation for each action (problem with one criterion). We do not insist on the
multiple criteria aspect of the problem here (this was treated in Chapter 6) and
focus on the treatment of uncertainty.

8.4.1 The expected value approach

In the traditional approach, the nodes of the tree are considered from the leaves
to the root (folding back) and the decisions are taken at each node in order
to maximise their expected values, i.e. the mean of the corresponding probability
distributions for the evaluations. Of course, this is only possible when the eval-
uations are elements of a numerical scale. At node N2 (beginning of the second
period), the expected value of decision C is (1/4 7 + 3/4 4.5) = 41/8 while
the expected value of decision D is (1/4 4.5 + 3/4 5.5) = 42/8. So, the best
decision at node N2 is D and the expected value associated to N2 is 42/8. Making
similar calculations for N3, N4 and N5, one obtains the tree represented in Figure
At node N1, the expected values of decisions A and B are respectively 39/8
and 5, so the best decision is B.
In conclusion, the optimal action obtained by the traditional approach will
consist in applying decision B at the beginning of the first period and decision
E or G at the beginning of the second period, depending on whether the event
occurred in the first period was S or T.

8.4.2 Some comments on the previous approach

Just as the weighted sum (already discussed in the other chapters of this book), the
expected value presents some characteristics that the user must be aware of. For
example, probabilities intervene as tradeoffs between the values for different events:
the difference of one unit in favour of C over D for event V, whose probability is 3/4,
would be completely compensated by a difference of three units in favour of D over
C for event U because its probability is 1/4. A consequence is that a big difference
in favour of a specific decision in some scenario could be sufficient to overcome a
systematic advantage for another decision in all the other scenarios, as illustrated
in the example presented in Figure 8.4. In this example, if the probabilities of S, T
and U are all equal to 1/3, the expected value will give preference to A, although
B is better than A in two scenarios out of three.
Remember the famous St. Petersburg game (see for example Sinn 1983)
showing that the expected value approach does not always represent the attitude
of the decision-maker towards risk very well. The game consists of tossing a coin
repeatedly until the first time it lands on heads; if this happens on the k th toss,
the player wins 2k e. The question is to find out how much a player would be
ready to bet in such a game. Of course, the answer depends on the player but,
in any case, the amount would not be very big. However, applying the expected
value approach, we see that the expected gain is

X 1 k
.2 = +.


U (1/4) N6 7

V (3/4) N7 4.5
N2 D
U (1/4) N8 4.5
S (1/2)

V (3/4) N9 5.5

Y (3/4) N10 4.5

T (1/2) Z (1/4) N11 4.5
N3 D
Y (3/4) N12 1

Z (1/4) N13 5

U (1/4) N14 3.5

V (3/4) N15 5.5

U (1/4) N16 3
G V (3/4) N17 1

B S (1/2)
U (1/4) N18 1

V (3/4) N19 1

Y (3/4) N20 6

Z (1/4) N21 1
T (1/2)
Y (3/4) N22 2
G Z (1/4) N23 2

Y (3/4) N24 5

Z (1/4) N25 5

Figure 8.2: A didactic example


The expected utility model, which is the subject of the next section, allows
to resolve this paradox and, more generally, to take different possible attitudes
towards risk into account.

8.4.3 The expected utility approach

As the preferences of the decision-maker are not necessarily linearly linked to the
evaluations of the actions, it may be useful to replace these evaluations by the
psychological values they have for the decision-maker through so-called utility
functions (Fishburn 1970).
Denoting by u(xi ) the utility of the evaluation xi , the expected utility value of
a decision leading to the evaluation xi with probability pi (i = 1, 2, ..., n) is given
pi u(xi ).

This model dates back at least to Bernoulli (1954) but the basic axioms, in
terms of preferences, were only studied in the present century (see for instance von
Neumann and Morgenstern 1944).
In the case of the St. Petersburg game, if we denote by u(x) the utility of
winning x e, the expected utility of refusing the game is u(0), while the expected
utility of betting an amount of s e in the game is

1/2k u(2k s).

As an exercise, the reader can verify that for a utility function defined by

x/220 iff x 220 ,

u(x) =
1 iff x > 220 ,

the expected utility of betting s e in the game is positive (hence superior to the
expected utility of refusing the game) as long as s is less than 21(1 1/220 ) e,
and is negative for larger values. The expected utility can also be finite with an
unbounded utility function such as, for example, the logarithmic function.
In the example in Figure 8.2 and with a utility function defined by

u(1) = u(2) = 1,
u(3) = u(3.5) = 2,

u(4.5) = u(5) = u(5.5) = 3,
u(6) = u(7) = 4,

we obtain the tree given in Figure 8.5.

The optimal action is then to apply decision A at the beginning of the first
period and decision C at the beginning of the second period, contrary to what was
obtained with the expected value approach.

Best decision Value

S(1/2) N2 D 5.25

A T(1/2) N3 C 4.5


B S(1/2) N4 E 5

T(1/2) N5 G 5

Figure 8.3: Application of the expected value approach





Figure 8.4: Illustration of the compensation effect


Best decision Value

S(1/2) N2 C 13/4

A T(1/2) N3 C 1/2


B S(1/2) N4 E 11/4

T(1/2) N5 E 1/2

Figure 8.5: Application of the expected utility approach

8.4.4 Some comments on the expected utility approach

Much literature is devoted to this approach, the probabilities being objective or
subjective: see for example Savage (1954), Luce and Raiffa (1957), Ellsberg (1961),
Fishburn (1970) and Fishburn (1982), Allais and Hagen (1979), McCord and de
Neufville (1983), Loomes (1988), Bell et al. (1988), Barbera, Hammond and Seidl
We simply recall one or two characteristics here that every user should be aware
of. As in every model, the expected utility approach implicitly assumes that the
preferences of the decision-maker satisfy some properties that can be violated in
practice. The following example illustrates the well-known Allais paradox (see
Allais 1953) . It is not unusual to prefer a guaranteed gain of 500 000 e to an
alternative providing 500 000 e with probability 0.89, 2 500 000 e with probability
0.1 and 0 e with probability 0.01. Applying the expected utility model leads to
the following inequality

u(500 000) > 0.89u(500 000) + 0.1u(2 500 000) + 0.01u(0),

hence, grouping terms,
0.11u(500 000) > 0.1u(2 500 000) + 0.01u(0).
At the same time, it is reasonable to prefer an alternative providing 2 500 000 e
with probability 0.1 and 0 e with probability 0.9 to an alternative providing
500 000 e with probability 0.11 and 0 e with probability 0.89. In this case, the
expected utility model yields
0.1u(2 500 000) + 0.9u(0) > 0.11u(500 000) + 0.89u(0),

A 100 0 0
B 0 100 0

C 100 0 100
D 0 100 100

Table 8.3

hence, grouping the terms,

0.1u(2 500 000) + 0.01u(0) > 0.11u(500 000),

which is in contradiction with the inequality obtained above. So, the expected
utility model cannot explain the two previous preference situations simultaneously.
A possible attitude in this case is to consider that the decision-maker should
revise his judgment in order to be more rational, that is, in order to satisfy the
axioms of the model. Another interpretation is that the expected utility approach
sometimes implies unreasonable constraints on the preferences of the decision-
maker (in the previous example, the violated property is the so-called independence
axiom of Von Neumann and Morgenstern). This last interpretation led scientists
to propose many variants of the expected utility model, as in Kahneman and
Tversky (1979), Machina (1982, 1987), Bell et al. (1988), Barbera et al. (1998).
Before explaining why the expected utility model (or one of its variants) was
not applied by the analyst in the electricity production planning problem, let us
mention why using probabilities may cause some trouble in modelling uncertainties
or risk. The following example illustrates the so-called Ellsberg paradox and is
extracted from Fishburn (1970, p.172). An urn contains one white ball (W) and
two other balls. You only know that the two other balls are either both red (R),
or both green (G), or one is red and one is green. Consider the two situations in
Table 8.3 where W, R, and G represent the three states according to whether one
ball drawn at random is white, red or green. The figures are what you will be paid
(in Euros) after you make your choice and a ball is drawn.
Intuition leads many people to prefer A to B and D to C, while the expected
utility approach leads to indifference between A and B and as well as between C
and D.
This type of situation shows that the use of the probability concept may be
debatable for representing attitude towards risk or uncertainty; other tools (pos-
sibility theory, belief functions or fuzzy integrals) can also be envisaged.

Events Probab. C D
U 1/4 7 4.5
V 3/4 4.5 5.5

Table 8.4

8.4.5 The approach applied in this case: first step

We will now present the approach that was applied in the electricity production
planning problem. This approach is certainly not ideal (some drawbacks will be
pointed out in the presentation). However, it does not aggregate the multiple
criteria consequences of the decisions into a single dimension, thus avoiding some
of the pitfalls mentioned in Chapter 6 on the multi-attribute value functions.
Moreover, it does not introduce a discounting rate for the dynamic aspect (see
Chapter 5) and it allows to model the particular preferences of the decision-maker
along each evaluation scale.
In the electricity production planning problem described in Section 8.3, the
analyst did not know whether the probabilities given by the company were really
probabilities (and not plausibility coefficients) and it was not sure that the con-
sequences of one scenario were really comparable to the consequences of another.
On the one hand, it was definitely excluded to transform all the consequences into
money and to aggregate them with a discounting rate (as in Chapter 5). On the
other hand, the company was not prepared to devote much time to the clarification
of the probabilities and to long discussions about the multiple criteria and dynamic
aspects of the problem, so that it was impossible to envisage an enriched variant of
the expected utility model. The analyst decided to propose a paired comparison
of the actions, scenario by scenario, as illustrated below for the didactical example
presented in Figure 8.2.
At node N2, we have to consider Table 8.4.
The comparison between C and D was made on the basis of the differences
in preference between them for each of the considered events similarly to what
is done in the Promethee method (Brans and Vincke 1985). Let us consider a
preference function defined by

1 x > 1,
f (x) =
0 elsewhere,

where x is the difference in the evaluations of two decisions. Other functions can
be defined similarly to what is done in the Promethee method. This function
expresses the fact that a difference which is smaller or equal to 1 is considered
to be non significant. As we see, an advantage of this approach is to enable the
introduction of indifference thresholds.
The analyst proposed the following index to measure the preference of C over
D, on the basis of the data contained in Table 8.4:

1/4 f (7 4.5) + 3/4 f (4.5 5.5) = 1/4,


C 0 1/4
D 0 0

Table 8.5

Events Probab. C D
Y 3/4 4.5 1
Z 1/4 4.5 5

Table 8.6

while the preference of D over C is given by

1/4 f (4.5 7) + 3/4 f (5.5 4.5) = 0.

These preference indices are summarised in Table 8.5. The score of each deci-
sion is then the sum of the preferences of this decision over the other minus the
sum of the preferences of the other over it. In the case of Table 8.5, this trivially
gives 1/4 and 1/4 as respective scores for C and D. The maximum score deter-
mines the chosen decision. So, the chosen decision at node N2 is C. Remark that,
despite the analysts doubt about the real nature of the probabilities, he used
them to calculate a sort of expected index of preference for each decision over each
other decision. This is certainly a weak point of the method and other tools, which
will be described in a volume in preparation, could have been used here. Note also
that, in the multiple criteria case, a (possibility weighted) sum is computed for all
the criteria in order to obtain the global score of a decision.
At node N3, we have to consider Table 8.6, leading to the preference indices
presented in Table 8.7. For example, the preference index of C over D is

3/4 f (4.5 1) + 1/4 f (4.5 5) = 3/4.

The scores of C and D are respectively 3/4 and 3/4, so that the chosen
decision at node N3 is also C.
At node N4, decision E dominates F and G and is thus chosen (where domi-
nates means is better in each scenario).
At node N5, we must consider Table 8.8.
The preference index of G over E (for example) is

C 0 3/4
D 0 0

Table 8.7

Probab. E F G
Y 3/4 6 2 5
Z 1/4 1 2 5

Table 8.8

E 0 3/4 0
F 0 0 0
G 1/4 1 0

Table 8.9

3/4 f (5 6) + 1/4 f (5 1) = 1/4.

The other preference indices are presented in Table 8.9; they yield 1/2, 7/4
and 5/4 as respective scores for E, F and G, so that G is the chosen decision at
node N5.
We can now consider Table 8.10 associated to N1. The values in this table are
those that correspond to the chosen decisions at the nodes N2 to N5 (they are
indicated in parentheses).
On basis of this table, the preference of A over B is

1/8f (3.5) + 3/8f (1) + 3/8f (0.5) + 1/8f (0.5) = 1/8,

while the preference of B over A is

1/8f (3.5) + 3/8f (1) + 3/8f (0.5) + 1/8f (0.5) = 0,

giving A as the best first decision.

In conclusion, the optimal action obtained through this first step consists
in choosing A at the beginning of the first period and C at the beginning of the
second period.
This approach allows to take the comparisons of the decisions separately for
each scenario into account. Let us illustrate this point for the example of Figure

Scenarios Probab. A B
S-U 1/8 7(C) 3.5(E)
S-V 3/8 4.5(C) 5.5(E)
T-Y 3/8 4.5(C) 5(G)
T-Z 1/8 4.5(C) 5(G)

Table 8.10

8.4, where 9 has been replaced by 10 in the evaluation of B for event U. If the
probabilities of S,T and U are equal to 1/3,
 the expected utility approach gives
the same value 1/3 u(10) + u(15) + u(20) to A and B that are thus considered
as indifferent. However, if we compare A and B separately for each event, we see
that B is better than A for events S and T, with a probability equal to 2/3. The
approach described in this section will give a preference index of A over B equal

1/3 f (10 15) + 1/3 f (15 20) + 1/3 f (20 10)

and a preference index of B over A equal to

1/3 f (15 10) + 1/3 f (20 15) + 1/3 f (10 20).

With the same function f as before, this will lead to the choice of B. Making
the (natural) assumption that f (x) = 0 when x is negative, we see that this
approach will lead to indifference between A and B only with a function f such
that f (20 10) = f (15 10) + f (20 15).

8.4.6 Comment on the first step

As this approach is based on successive pairwise comparisons, it also presents some
pitfalls which must be mentioned. The example presented in Figure 8.6 will allow
to illustrate a first drawback. In this example, three periods of time are considered,
but there are no uncertainties during the first two periods. Two decisions A and
B are possible at the beginning of the first period. At the beginning of the second
period, two decisions C and D are possible after A and only one decision is possible
after B. At the beginning of the third period, two decisions E and F are possible
after C while only one decision is possible in each of the other cases. During the
last period, three events S, T and U can occur, each with a probability of 1/3.
Let us apply the approach described in Section 4.5 with the same function f .
At node N4, the preference index of E over F will be

1/3 f (10 15) + 1/3 f (15 20) + 1/3 f (20 0) = 1/3,

while the preference index of F over E will be

1/3 f (15 10) + 1/3 f (20 15) + 1/3 f (0 20) = 2/3,

so that F will be the decision chosen at node N4.

At node N2, we must consider Table 8.11, where the values of C are those of
F (decision chosen at node N4).
On basis of Table 8.11, we compute the preference index of C over D by

1/3 f (15 20) + 1/3 f (20 0) + 1/3 f (0 5) = 1/3,

and the preference of D over C by


N7 10

T N8 15
E N9 20

N4 N10 15
T N11 20
N2 N12 0

A S N13 20
N5 N14 0

N1 N15 5

N16 0
T N17 5
N3 N6 U
N18 10

Figure 8.6: A pitfall of the first step

Events Probab. C D
S 1/3 15 20
T 1/3 20 0
U 1/3 0 5

Table 8.11

Events Probab. A B
S 1/3 20 0
T 1/3 0 5
U 1/3 5 10

Table 8.12

Events Probab. B (A,C,E)

S 1/3 0 10
T 1/3 5 15
U 1/3 10 20

Table 8.13

1/3 f (20 15) + 1/3 f (0 20) + 1/3 f (5 0) = 2/3,

so that D will be the decision chosen at node N2.

At node N1, we must consider the Table 8.12, where the values of A are those
of D (decision chosen at node N2).
On basis of Table 8.12, the preference index of A over B is given by

1/3 f (20 0) + 1/3 f (0 5) + 1/3 f (5 10) = 1/3,

while the preference index of B over A is

1/3 f (0 20) + 1/3 f (5 0) + 1/3 f (10 5) = 2/3,

so that B will be chosen at node N1.

In conclusion, the methodology leads to the choice of the action B despite the
fact that it is dominated by the action (A,C,E) as is shown in Table 8.13.
This is due to the fact that the comparisons are too local in the tree; in the
concrete application described in this chapter another drawback was the fact that,
for decisions at nodes relative to the last periods, the evaluations were not very
different, due to the large common part of the actions and scenarios preceding
these decisions. The conclusion was many indifferences between the decisions at
each decision node.
To improve the methodology, the analyst proposed to introduce a second step
that is the subject of the next section.

8.4.7 The approach applied in this case: second step

In order to introduce more information into the comparisons of local decisions and
to take the tree as a whole into account, a second step was added by the analyst.

Events Probab. C D E(N4)

U 1/4 7 4.5 3.5
V 3/4 4.5 5.5 5.5

Table 8.14

At each decision node, the local decisions are also compared to the best actions in
the same scenarios in each of the branches of the tree.
In Figure 8.2, at node N2, C and D are also compared to the best decision in
N4, i.e. to E (after event S).
This leads to the consideration of Table 8.14
Using the same preference function as before, the preference of C over D is
still 1/4 (see section 4.4), the preference of D over C is still 0, the preference of
C over E is [1/4 f (3.5) + 3/4 f (1)] = 1/4, the preference of E over C is
[1/4 f (3.5) + 3/4 f (1)] = 0, the preference of D over E is [1/4 f (1) + 3/4
f (0)] = 0 and the preference of E over D is [1/4 f (1) + 3/4 f (0)] = 0.
Table 8.15 summarises these values.

C 0 1/4 1/4
D 0 0 0
E 0 0 0

Table 8.15

The scores for C and D are respectively 1/2 and 1/4, C is therefore chosen
at node N2.
At node N3, we compare C and D with the best decision in N5, i.e. with G
(after event T), on basis of Table 8.16.
Table 8.17 gives the preference indices.
The scores of C and D are respectively 3/4 and 3/2, so that C is also chosen
in N3.
The analysis of N4 (comparison of E, F, G and C (N2)) and of N5 (comparison
of E, F, G and C (N3)) lead to the same conclusions as in the first step, so that,
in this example, the second step does not change anything.
However, the interest of this second step is to choose, at each decision node,
a decision leading to a final result that is strong, not only locally, but also in

Events Probab. C D G
Y 3/4 4.5 1 5
Z 1/4 4.5 5 5

Table 8.16

C 0 3/4 0
D 0 0 0
G 0 3/4 0

Table 8.17

Prob. E F D B
1/3 10 15 20 0
1/3 15 20 0 5
1/3 20 0 5 10

Table 8.18

comparison with the strongest results obtained during the first step in the other
branches of the tree (always in the same scenarios). This is illustrated by the
example in Figure 8.6 where the second step works as follows. At node N4, we
compare E and F with D and B (the best actions in the other branches as they
are unique), through Table 8.18
Table 8.19 presents the preference indices.
The scores of E and F respectively become 1 and 1/3, so that the best decision
at N4 is now E.
At N2, we have to compare C (followed by E) with D and B (best action in
the other branch): the scores of C and D are respectively 4/3 and -2/3, so that
the best decision in N2 is now C.
At N1, we have to compare A (followed by C and E) with B and we choose
A (that dominates B). So we see that this second step somehow avoids to choose
dominated actions, although this property is not guaranteed in all cases.

8.5 Conclusions
This approach (first and second steps) was successfully implemented and applied
by the company (after many difficulties due to the combinatorial aspects of the
problem) and some visual tools were developed in order to facilitate the decision-

E 0 1/3 2/3 1
F 2/3 0 1/3 2/3
D 1/3 2/3 0 1/3
B 0 1/3 2/3 0

Table 8.19

makers understanding of the problem.

Let us now summarise the characteristics of this approach. It presents the

following advantages:

it compares the consequences of a decision in a scenario with the conse-

quences of another decision in the same scenario;

it allows to introduce indifference thresholds or, more generally, to model

the preferences of the decision-maker for each evaluation scale.

However, this approach also presents some mysterious aspects that should be
more thoroughly investigated:

it computes a sort of expected index for preference of each action over each
other action, although the role of the so-called probabilities is not that clear
in the modelling of uncertainty;

it is a rather bizarre mixture of local (first step) and global (second step)
comparisons of the actions, but it does not guarantee that the chosen action
is non-dominated.

The literature on the management of uncertainty is probably one of the most

abundant in decision analysis. Beside the expected utility model (traditional ap-
proach), a lot of other approaches were studied by many authors, such as Dekel
(1986), Jaffray (1989), Munier (1989), Quiggin (1993), Gilboa and Schmeidler
(1993), . . . They pointed out more or less desirable properties: linearity, replace-
ment separability, mixture separability, different kinds of independence, stochastic
dominance, . . . Moreover, as mentioned by Machina (1989), it is important to make
the distinction between what he calls static and dynamic choice situations. A dy-


N1 0

B 10
0.2 1


Figure 8.7: The dynamic consistency

namic choice problem is characterised by the fact that at least one uncertainty
node is followed by a decision node (this is typically the case of the application de-
scribed in this chapter). In such a context, an interesting property is the so-called
dynamic consistency: a decision-maker is said to be dynamically inconsistent
if his actual choice when arriving at a decision node differs from his previously
planned choice for that node.
Let us illustrate this concept by a short example. Assume that a decision-
maker prefers a game where he wins 50 e with probability 0.1 (and nothing with
probability 0.9) to a game where he wins 10 e with probability 0.2 (and nothing
with probability 0.8). At the same time, he prefers to receive 10 e with certainty
to a game where he wins 50 e with probability 0.5 (and nothing with probability
0.5). Note that these preferences violate the independence axiom of Von Neumann
and Morgenstern. Now consider the tree of Figure 8.7.
According to the previous information, the actual choice of the decision-maker,
at node N1, will be B. However, if he has to plan the choice between A and B
before knowing the first choice of nature, he can easily calculate that if he chooses
A, he wins 50 e with probability 0.1 (and nothing with probability 0.9), while if he
chooses B, he wins 10 e with probability 0.2 (and nothing with probability 0.8),
so that the best choice for him (before knowing the first choice of nature) is A.
So, the actual choice at N1 differs from the planned choice for that node,
illustrating the so-called dynamic inconsistency. It can be shown that any depar-
ture from the traditional approach can lead to dynamic inconsistency. However,
Machina (1989) showed that this argument relies on a hidden assumption con-
cerning behaviour in dynamic choice situations (the so-called consequentialism)
and argued that this assumption is inappropriate when the decision-maker is a
non-expected utility maximiser.
This example shows that no approach can be considered as ideal in the context
of decision under uncertainty. As for the other situations studied in this book,
each model, each procedure, can present some pitfalls that have to be known by
the analyst. Knowing the underlying assumptions of the decision-aid model which

will be used is probably the only way, for the analyst, to guarantee an as scientific
as possible approach of the decision problem. It is a fact that, due to lack of
time and other priorities, many decision tools are developed in real applications
without taking enough precautions (this is also the case in the example presented
in this chapter, due to the short delays and to the necessity of overcoming the
combinatorial aspects of the problem). This is why we consider providing some
guidelines for modelling a decision problem important to the analysts: this will be
the subject of a volume in preparation.

In this chapter1 we report on a real world decision aiding process which took place
in a large Italian firm, in late 1996 and early 1997, concerning the evaluation
of offers following a call for tenders for a very important software acquisition.
We will try to extensively present the decision process for which the decision
support was requested, the actors involved, the decision aiding process, including
the problem structuring and formulation, the evaluation model created and the
multiple criteria method adopted. The reader should be aware of the fact that
very few real world cases of decision support are reported in literature although
much more occur in reality (for noteworthy exceptions see Belton, Ackermann and
Shepherd 1997, Bana e Costa, Ensslin, Correa and Vansnick 1999, Vincke 1992,
Roy and Bouyssou 1993).
We introduce such a real case description for two reasons.
1. The first reason consists in our will to give an account of what providing de-
cision support in a real context means and to show the importance of elements
such as the participating actors, the problem formulation, the construction of the
criteria etc., often neglected in many conventional decision aiding methodologies
and in operational research. From this point of view the reader may find questions
already introduced in previous chapters of the book, but here they are discussed
from a decision aiding process perspective.
2. The second reason is our will to introduce the reader to some concepts and
problems that will be extensively discussed in a forthcoming volume by the au-
thors. Our objective is to stimulate the reader to reflect on how decision support
tools and concepts are used in real life situations and how theoretical research
may contribute to aide real decision makers in real decision situations. More
precisely, the chapter is organised as follows. Section 1 introduces and defines
some preliminary concepts that will be used in the rest of the chapter such as
decision process, actors, decision aiding process, problem formulation, evaluation
model etc.. Section 2 presents the decision process for which the decision sup-
port was requested, the actors involved and their concerns (stakes), the resources
1A large part of this chapter uses material already published in Paschetta and Tsoukias (1999).


involved and the timing. Section 3 describes the decision aiding process, mainly
through the different products of such a process that are specifically analysed
(the problem formulation, the evaluation model and the final recommendation)
and discusses the experience conducted. The clients comments on the experience
are also included in this section. Section 4 summarises the lessons learned in such
an experience. All technical details are included in Appendix A (an ELECTRE-
TRI type procedure is used), while the complete list of the evaluation attributes
is provided in Appendix B.

9.1 Preliminaries
We will make extensive use of some terms (like actor, decision process etc.) in this
chapter that, although present in literature (see Simon 1957, Mintzberg, Rais-
inghani and Theoret 1976, Jacquet-Lagreze, Moscarola, Roy and Hirsch 1978,
Checkland 1981, Heurgon 1982, Masser 1983, Humphreys, Svenson and Vari 1993,
Moscarola 1984, Nutt 1984, Rosenhead 1989, Ostanello 1990, Ostanello 1997, Os-
tanello and Tsoukias 1993), can have different interpretations. In order to help
the reader understand how such terms are used in this presentation we introduce
some informal definitions.

Decision Process: a sequence of interactions amongst persons and/or organ-

isations characterising one or more objects or concerns (the problems).

Actors: the participants in a decision process.

Client: an actor in a decision process who asks for a support in order to

define his behaviour in the process. The term decisionmaker is also used
in the literature and in other chapters of this book, but in this context we
prefer to use the term client.

Analyst: an actor in a decision process who supports a client in a specific


Decision Aiding Process: part of the decision process and more precisely the
interactions occurring at least between the client and the analyst.

Problem Situation: a descriptive model of what happens in the decision pro-

cess when the decision support is requested and what the client is expecting
to obtain form the decision support (this is one of the products of the decision
aiding process).

Problem Formulation: a formal representation of the problem for which the

client asked the analyst to support him (this is one of the products of the
decision aiding process).

Evaluation Model: a model creating a specific instance of the problem for-

mulation for which a specific decision support method can be used (this is
one of the products of the decision aiding process).

9.2 The Decision Process

In early 1996 a very large Italian company operating a network based service de-
cided, as part of a strategic development policy, to equip itself with a Geographical
Information System (GIS) on which all information concerning the structure of the
network and the services provided all over the country was to be transferred. How-
ever, since (at that time) this was quite a new technology, the companys Infor-
mation Systems Department (ISD) asked the affiliated research and development
agency (RDA) and more specifically the department concerned with this type of
information technology (GISD) to perform a pilot study of the market in order to
orient the company towards an acquisition. The GISD of the RDA noticed that:

the market offered a very large variety of software which could be used as a
GIS for the companys purposes;

the company required a very particular version of GIS that did not exist as
a ready made product on the market, but had to be created by customising
and combining different modules of existing software, with the addition of
ad-hoc written software for the purpose of the company;

the question asked by the ISD was very general, but also very committing,
because it included an evaluation prior to an acquisition and not just a simple
description of the different products;

the GISD felt able to describe and evaluate different GIS products based on
a set of attributes (at the end several hundreds), but was not able to provide
a synthetic evaluation, the purpose of which was just as obscure (the use
of a weighted sum was immediately set aside because it was perceived as

At this point of the process the GISD found out that a unit concerned with
the use of the MCDA (Multiple Criteria Decision Analysis) methodology in soft-
ware evaluation (MCDA/SE) was operating within the RDA and presented this
problem as a case study opening a specific commitment. The MCDA/SE unit
responsible then decided to activate its links with an academic institution in order
to get more insight and advice on the problem that soon appeared to overcome the
knowledge level of the unit at that time. At this point we can make the following

The decision process for which the decision aid was provided concerned the
acquisition of a GIS for X (the company). The actors involved at this level
are the companys IS manager, acquisition (AQ) manager, the RDA, differ-
ent suppliers of GIS software, some of the companys external consultants
concerned with software engineering.

A first decision aiding process was established where the client was the IS
manager and the analyst was the GIS department of the RDA.

A second decision aiding process was established where the client was the
GIS department of the RDA and the analyst was the MCDA/SE unit. A
third actor involved in this process was the supervisor of the analyst in
the sense of someone supporting the analyst in different tasks, providing him
with expert methodological knowledge and framing his activity.

We will focus our attention on this second decision aiding process where four
actors are involved: the IS manager, the GISD (or team of analysts) as the client
(bear in mind their particular position of clients and analysts at the same time),
the MCDA/SE unit as the analyst and the supervisor.

The first advice by the analyst to the GISD was to negotiate a more specific
commitment such that their task could be more precise and better defined with
their client. After such a negotiation the GISDs activity has been defined as
technical assistance to the IS manager in a bid, concerning the acquisition of a
GIS for the company and its specific task was to provide a technical evaluation
of the offers that were expected to be submitted. For this purpose the GISD drafted
a decision aiding process outline where the principal activities to be performed were
specified, as well as the timing, and submitted this draft to its client (see figure
9.1). At this point it is important to note the following.

1. The call for tenders concerned the acquisition of hundreds of software li-
censes, plus the hardware platforms on which such software was expected to
run, the whole budget being several million e. From a financial point of view
it represented a large stake for the company and a high level of responsibility
for the decisionmakers.
2. From a procedural point of view the administration of a bid of this type is
delegated to a committee which in this case included the IS manager, the
AQ manager, a delegate of the CEO and a lawyer from the legal staff. From
such a perspective the task of the GISD (and of the decision aiding process)
was to provide the IS manager with a global technical evaluation of the
offers that could be used in the negotiations with the AQ manager (inside
of the committee) and the suppliers (outside of the committee).
3. As already noted before, the bid concerned software that was not ready made,
but a collection of existing modules of GIS software which was expected to
be used in order to create ad-hoc software for the specific necessities of the
company. Two difficulties arose from this:

the a priori evaluation of the software behaviour and its performance

without being able to test it on specific company-related cases;
the timing of the evaluation (including testing the offers) could be ex-
tremely long compared with the rapidity of the technological evolution
of this type of software.

Bid Start

Preparation Client desired Methodology

of call for environment study
tenders study

Call for tenders

technical advisor
Call for tenders Definition of
answer client
preparation points of view &
advisor + client
First set of answers decision problem
from suppliers supplier
Selection Problem

Make invitation letter


Completion of
decision model Definition of
for second prototype
requirements Lab preparation
for prototype
Second set of evaluation
answers from
suppliers Second selection

Prototype Completion of decision

Requirements model for ranking:
definition of criteria &
Prototype aggregation procedure


Prototype analysis;
sorting & final ranking

Final Choice

Figure 9.1: the bid process


Once the call for tenders had been prepared (including the software require-
ments sections, the tenderers requirements section, the timing and evaluation pro-
cedure), a set of was presented to the company and the technical evaluation activity
was settled. It is interesting to notice that the GISD staff charged with this evalu-
ation has been supported by external consultants, software engineering experts
in the companys sector who practically acted as the IS managers delegates in the
group. It is this extended group that signed the final recommendation presented
to the IS manager and that we will hereafter call team of analysts (for the IS
manager) or client (for the MCDA/SE unit and for us).

A second step in the decision aiding process was the generation of a problem
formulation and of an evaluation model. Although we formally consider the two
as two distinct products of the process, in reality and in this case specifically, they
have been generated contemporaneously. We will discuss the problem formulation
and the evaluation model in detail in the next section, but we can anticipate that
the final formulation consisted in an absolute evaluation of the offers under a set
of points of view that could be divided into two parts: the quality evaluation
and the performance evaluation. Although the set of alternatives was relatively
small (only six alternatives were considered), the set of attributes was extremely
complex (as often happens in software evaluation). Actually there were seven basic
evaluation dimensions, expanded in an hierarchy with 134 leaves resulting in 183
evaluation nodes (see Appendix B).

A third and final step in the decision aiding process was the elaboration of the
final recommendation after all the necessary information for the evaluation had
been obtained and the evaluation performed. We will discuss such constructions
in detail in the next sections, but we can anticipate that such an elaboration
highlighted some questions (substantial and methodological) that have not been
considered before.
Some months after the end of the process and the delivery of the final report
we asked our client (the team of analysts) to discuss their experience with us and
to answer some questions concerning the methodology used, how they perceived it,
what they learned and what their appreciation was. The discussion was conducted
in a very informal way, but the client provided us with some written remarks that
were also reported during a conference presentation (see Fiammengo, Buosi, Iob,
Maffioli, Panarotto and Turino 1997). Such remarks are introduced in the following

9.3 Decision Support

We present the three products of the decision aiding process here: the problem
formulation, the evaluation model and the final recommendation. We should re-
member that the problem formulation and a first outline of the evaluation model
were established while the call for tenders was under elaboration for two reasons:
for legal reasons, an outline of the evaluation model has to be included in

the call for tenders;

the evaluation model implicitly contains the software requirements of the

offers which in turn defines the information to be provided by the tenderers.
For instance, the call for tenders specified that a prototype was requested in
order to test some performances. The tenderers therefore knew that they had
to produce a prototype within a certain time frame. The choice to introduce
some tests was made during the definition of the evaluation model.

9.3.1 Problem Formulation

From the presentation of the process we can make the following observations:

1. It was extremely important for the client (the team of analysts) to under-
stand his role in the process, what his client expected and what they were
able to provide. In fact, at the beginning of the process, the problem situ-
ation was absolutely unclear. Moreover, the client considered to be able to
understand that the expectations of the other actors involved in the process
were extremely relevant both for strategic reasons (having to do with or-
ganisational problems of the company) and operational reasons (recommend
something reliable in a clear and sound way for all the actors involved in the
Reporting the clients remarks: ....MCDA (Multi Criteria Decision Analy-
sis) was very useful in organising the overall process and structure of the bid:
what were the important steps to do, how to define the call for tenders,....,
....MCDA was used as a background for the whole decision process. With
such a perspective it turned out to be very useful because every activity had a
justification...., a formal process MCDA guaranteed greater control
and transparency to the process...., A complex process, such as a bid, could
be greatly eased by the use of any process centred methodology.
It is this last sentence which clearly highlights the necessity for the client to
have a support along the whole process and for all its aspects, which could
be able to take what was happening in the decision process into account. We
actually agree with their comment that any process modelled methodology
could be useful and we consider that their positive perception of MCDA is
based on the fact that it was the first decision support approach process they
came to know.

2. We recall the clients remarks: a formal approach MCDA generated

greater control and transparency..... Complex decision processes are based
on human interactions and these are based on the intrinsic ambiguity of
human communication (thanks to ambiguity human communication is also
very efficient). However, such an ambiguity might result in an impossibility
to understand and ultimately to propose viable solutions. Moreover, when
significant stakes are considered (as in our case), decisionmakers may con-
sider it dangerous to make a decision without having a clear idea of the

consequences of their acts. The use of a formal approach enables the reduc-
tion of ambiguity (without completely eliminating it) and thus appears to
be an important support to the decision process.

It is clear that defining a precise problem formulation became a key issue for
the client because it clarified his role in the decision process (the bid management),
his relation with the IS manager (his client) and gave him a precise activity to
We define (Morisio and Tsoukias 1997) a problem formulation as the collection
of: a set of actions, a set of points of view and a problem statement. The only point
that caused a discussion in the analysts team concerning the problem formulation
was the problem statement. The set of alternatives was considered to be the set of
offers submitted after the call for tenders. A first idea to evaluate the tenderers,
as well as the offers, was eliminated due to the particular technology where no
consolidated producers exist. The set of points of view was defined using the
team of analysts technical knowledge and can be viewed in two basic sets. One
concerning quality including specific technical features required for the software
plus some ISO/IEC 9126 (1991) based dimensions and the second concerning the
performance of the offered software to be tested on prototypes. Such points of
view formed a huge hierarchy (see further on for details). No cost estimates were
required by the client and so they were not considered in this set.
After some discussion the problem statement adopted was the one of an ab-
solute evaluation of the offers both on a disaggregated level and on a global one.
Actually, the team of analysts interpreted the clients demand as a question of
whether the offers could be considered as intrinsically good, bad etc. and not
to compare bids amongst themselves. There were two reasons for this choice.

1. A simple ranking of the offers could conceal the fact that all of them could
be of very poor quality or satisfy the software requirements to a very low
level. In other words it could happen that the best bid could be bad and
this was incompatible with the importance and cost of the acquisition.

2. The team of analysts felt uncomfortable with the idea of comparing the
merits (or de-merits) of an offer with merits (or de-merits) of another offer.
A first informal discussion of the problem of compensation convinced them
to overcome this question by comparing the offers to profiles about which
they had sufficient knowledge.

If we interpret the concept of measurement in a wide sense (comparing the

offers to pre-established profiles can be viewed as a measurement procedure) the
result that the team of analysts was looking for appeared to be the conclusion
of repeated aggregations of measures. Using the terminology introduced by Roy
(1996), the problem statement appeared to be an hierarchically organised sorting
of the offers, the sorting being repeated at all levels of the hierarchy.
As far as the problem formulation is concerned, an ex-post remark made by the
team of analysts concerned the length of the evaluation process. They considered
that such a process was so long that the information available at the beginning and

the formulation itself could no longer be valid at the end of the process. This was
partly due to the very rapid evolution of GIS technology that could completely
innovate the state of the art in six months. Another observation made by part of
the team of analysts was that towards the end of the process, due to the knowledge
acquired in this period (mainly due to the process itself), they could revise some
of their judgements. Actually, the length of the evaluation was considered as a
negative critical issue in the clients remarks.
The final report did not consider any revision of the formulation and the eval-
uations since in the context of a call for tenders, it could be considered unfair to
modify the evaluations just before the final recommendation.
We consider that this is a critical issue for decision support and decision aiding
processes. Information is valid only for a limited period of time and consequently
the same is true for all evaluations based on such information. Moreover the
client himself may revise the problem formulation or update his perception of the
information and modify his judgements. This is rarely considered in decision aiding
methodologies. While for relatively short decision aiding processes the problem
may be irrelevant, it is certain that in long processes such a problem cannot be
neglected and requires specific consideration.

9.3.2 The Evaluation Model

The different components of the evaluation model were specified in an iterative
fashion. In the following we present their definition as they occurred in the decision
aiding process. We may notice that despite the fact that we had a large amount
of information to handle in our model, the case did not present any exogenous
uncertainty since the client considered the basic data and its judgements reliable
and felt confident with them.
The set of alternatives was identified as the set of offers legally accepted by the
company in reply to the call for tenders. No preliminary screening of the offers
was expected to be made. Although each offer was composed of different modules
and software components, they have been considered as wholes.
The set of evaluation dimensions was a complex hierarchy with seven root
nodes, 134 leaves and 183 nodes in total (the complete list is available in Appendix
B). This is a typical situation in software evaluation (see Morisio and Tsoukias
1997, Blin and Tsoukias 1998, Stamelos and Tsoukias 1998). The key idea was that
each node of the hierarchy was an evaluation model itself for which the evaluation
dimensions to aggregate and the aggregation procedure had to be defined. Each
node was subject to extensive discussion before arriving at a final version. Basically
two issues have been considered in such discussions:
- the choice of the attributes to use;
- the semantics of each attribute.
Regarding the first issue, a frequent attitude of technical committees charged
with evaluating complex objects (as in our case) is to define an excellence list
where every possible aspect of the object is considered. Such a list is generally
provided by the literature, the experience, international standards etc.. The result
is that such a list is an abstract collection of attributes, independent from the spe-

cific problem at hand, thus containing redundancies and conceptual dependencies

which can invalidate the evaluation. Our client was aware of the problem, but had
no knowledge and no tools to enable him to simplify and reduce the first version
of the list they had defined. The repeated use of a coherence test (in the sense
of Roy and Bouyssou 1993) for each intermediate node of the hierarchy made it
possible to eliminate a significant number of redundant and dependent attributes
(more than 30%) and to better understand the semantics of each attribute used.
Verifying the separability of each subdimension with respect to the parent node
was very helpful, in the sense that each subnode should be able to discriminate
alone the offers with respect to the evaluation considered at the parent level.
Despite this work, the client wrote, in his ex-post considerations: was
not necessary to be so detailed in the evaluation; the whole process could be faster
because we needed the software for a due date; it could be preferable to use a limited
number of criteria..... On the other hand it is also true that it is only after the
process that the client was able to determine which were the really significant
criteria that discriminated among the alternatives.
With respect to the second issue we pushed the client to provide us with a short
description of each attribute and when a preference model was associated to it, a
short description of the model (why a certain value was considered as better than
another). Such an approach helped the client both to eliminate redundancies (be-
fore using the coherence test which is time consuming) and in better understanding
the contents of the evaluation model.
For instance, at a certain point in the hierarchy definition process, there was
a discussion about some attributes that could also be considered as leaves at the
top level of the hierarchy. These were the so called process attributes, i.e. they
were intended to evaluate special functionality inside different processes (in this
context process means a chunk of functionality aiming towards supporting a
stream of activities of a software). In fact, one can consider a process attribute (at
the final level) and then subdivide it in quality aspects, or alternatively consider
single independent quality aspects whose evaluation depends on how the process
attribute is considered. The final choice was to put process attributes at the top
level because directly emanating from the evaluation scope.
Such an activity also helped the client to realise that they needed an absolute
evaluation of the alternatives for almost all the intermediate nodes of the hierarchy
thus implicitly defining the problem statement of the model.
The basic information available was of the subjective ordinal measurement
type. With this term we want to indicate that each alternative could be described
by a vector of the 134 elementary pieces of information that were in the large
majority either subjective evaluations by experts (mostly part of the team of an-
alysts, the client) of the good, acceptable etc. type or descriptions of the
operating system X, compatible with graphic engine Y etc. type. The latter
were expressed on nominal scales, while the former were expressed on ordinal
scales. It was almost impossible that the experts could be able to give more in-
formation than such an order and it was exactly this type of information that
pushed the client to look for another evaluation model than the usual weighted
sum widely diffused in software evaluation manuals and standards (see ISO/IEC

9126 1991, IEEE 92 1992).

Obtaining the information was not a difficult task, but a time consuming pro-
cess that required the establishment of an ad-hoc procedure during the process
(see figure 9.1). We consider that this is also a critical issue in a decision aiding
process. Gathering and obtaining the relevant information for an evaluation model
is often considered as a second level activity and therefore neglected from further
specific considerations. But such a problem can invalidate the problem formulation
adopted. Moreover, the information used in an evaluation model results from the
manipulation of the rough information available at the beginning of the process.
We can consider that the information is constructed during the decision aiding
process and cannot be viewed as a simple input.
Before continuing the definition of the model associated to each node the prob-
lem of the aggregation procedure was faced since it could influence the construction
of such models. An important discussion with the client concerned the distinction
between measures and preferences.
As already reported, the basic information consisted either in observations con-
cerning the offers (expressed in nominal scales) or in expert judgements (expressed
in ordinal scales of value of the good, acceptable etc.. type). All the interme-
diate nodes were expected to provide information of the second type. Clearly all
nominal scales had to be transformed into ordinal ones, associating a preference
model on the elements of the nominal scale of the attribute. Under such a perspec-
tive it was important for the client to understand on what they were expressing
their preferences on.
Actually, the client did not compare the alternatives amongst themselves, but
to a-priori defined (by the client) standards of good, acceptable etc.. When
asked to formulate preferences they concerned the elements of the nominal scales
and not the alternatives themselves. The preference among the alternatives was ex-
pected to be induced once the alternatives could be measured by the attributes.
From a certain point of view we can claim that, except for the final aggregation
level, the client needed to aggregate ordinal measures and not preferences (in the
sense that they had to aggregate the ordinal measures obtained when comparing
the alternatives to the standards and not to compare the alternatives amongst
themselves). Such an observation greatly helped the client to understand the
nature and scope of the evaluation model and ultimately to define the problem
statement of the model. Moreover, the discussion on the different typologies of
measurement scales helped the client to understand the problem of choosing an
appropriate aggregation procedure.
In our case, the presence of ordinal information for almost all leaves and the
problem statement that required a repeated sorting of the offers, oriented the
team of analysts to choose an aggregation procedure based on the ELECTRE-TRI
method (see Yu 1992). See also appendix A for a presentation of the procedure.
At this point the team was ready to define their specific evaluation models for all
nodes. In particular we had the following cases.

1. For all leave nodes an ordinal scale was established. The available technical
knowledge consisted in different possible states in which an offer could find

itself. For instance, consider the leave nodes 1.1.1 (type of presentation on
the user interface in the land-base management), 1.1.2 (graphic engine of
the user interface in the land-base management), 1.1.3 (customisation of the
user interface in the land-base management). The possible states on these
characteristics were:
1.1.1: standard graphics (SG), non standard graphics (NSG);
1.1.2: station M (M; graphic engine already adopted in other software used
in the company), other acceptable graphic engine (OA), other non accept-
able graphic engine (ON);
1.1.3: availability of a graphic tool (T), availability of an advanced graphic
language (E), availability of a standard programming language (S), no cus-
tomisation available (N). In this case different possible combinations were
possible (for instance a software could provide both an advanced graphic
language and a standard programming language: value E,S). The three or-
dinal scales associated to the three nodes were ( representing the scale
1.1.1: SG  NSG;
1.1.2: M  OA  ON;
1.1.3: T,E,S  T,E  T,S  T  E,S  E  S  N.
2. For all parent nodes, a brief descriptive text of what the node was expected
to evaluate was provided. All parent nodes were equipped with the same
number of classes: unacceptable (U), acceptable (A), good (G), very good
(VI), excellent (E). Then, two possibilities for defining the relationship be-
tween the values on the subnodes and the values on the parent nodes were

2.1 When possible, an exhaustive combination of the values of the sub

nodes was provided. For instance consider node 1.1 (user interface
of the land-base management) which has the three evaluation models
introduced in the previous example as subnodes. In this case we have
the following evaluation model:
- E: T,E,S;M;SG or T,E;M;SG or T,S;M;SG;
- VG: T;M;SG or T,E,S;OA;SG or T,E;OA;SG or T,S;OA;SG;
- G: T;OA;SG or E,S;M;SG or E;M;SG;
- A: all remaining cases except the unacceptable;
- U: all cases where 1.1.1 is NSG or 1.1.2 is ON or 1.1.3 is N.
2.2 When an exhaustive combination of the values was impossible, an ELECTRE-
TRI procedure was used. For this purpose, the following information
was requested:
- the relative importance of the different sub nodes;
- the concordance threshold for the establishment of the outranking re-
lation among the offers and the profiles;
- a veto condition on the sub node such that the value on the parent
node could be limited (possibly unacceptable).

The relative importance of the subnodes and the concordance threshold


have been established using a reasoning on coalitions (for details see Chapter
6). In other words the team of analysts established the characteristics of the
subnodes for which an offer could be considered very good (therefore should
outrank the very good profile) and consequently compared the values of the
parameters of relative importance and of the concordance threshold. The
veto condition was established as the presence of the value unacceptable
at a subnode. The presence of a veto also produced an unacceptable
value at the level of the parent node. In other words, the team of analysts
considered any unacceptable value to be a severe technical limitation of the
offer. The reader may notice that this is a very strong interpretation of a veto
condition among the ones used in the outranking based sorting procedures,
but it was the one with which the team of analysts felt comfortable at the
time of construction of the evaluation model. The team of analysts also
established very high concordance thresholds (never less than 80%, very
often around 90%) that result in very severe evaluations. Such a choice
reflected the conviction, of at least a part of the team of analysts, that very
strong reasons were required to qualify an offer as very good. Since the whole
model was calibrated starting from the very good value, this conviction had
wider effects than the team of analysts could imagine. For example we can
take node 1 (land-base management) which has eight sub nodes:
1.1: User interface;
1.2: Functionality;
1.3: Development environment;
1.4: Administration tools;
1.5: Work flow connection;
1.6: Interoperability;
1.7: Integration between land-base products and the Spatial Data manager;
1.8: Integration among land-base products;
The relative importance parameters were established as follows:w(1.1) =
4, w(1.2) = 8, w(1.3) = 5, w(1.4) = 4, w(1.5) = 1, w(1.6) = 8, w(1.7) =
8, w(1.8) = 2 and the concordance threshold was fixed as 29/36 (around
0.8). Such choices imply that no coalition that excluded nodes 1.2 or 1.7
was acceptable and that the smallest acceptable coalition should necessarily
include the nodes 1.2, 1.7, 1.3 and any two of the nodes 1.1, 1.4 and 1.6.
The analyst and the supervisor explained this aspect to the client who on
this basis, revised the importance parameters several times.

3. As already mentioned, the set of dimensions was built around two basic
points of view: the quality and the performances. The first generated
six evaluation dimensions, which will be called the quality attributes or
quality criteria or quality part of the hierarchy hereafter, corresponding
to six (among seven) of the root nodes of the model. The seventh root
node (node 7, subnodes 7.1, 7.2, 7.3, 7.4) concerned the evaluation of the
performances of the prototypes submitted to tests by the team of analysts.
Such performances are basically measured in the time necessary to execute
a set of specific tasks under certain conditions and with some external fixed

parameters. For instance, consider node 7.3 (performance under load). The
dimension is expected to evaluate the performance of the prototype while
the quantity of data that have to be elaborated increases. The value v(x)
(x being an offer) combines an observed measure Wx (t) and an interpolated
one Tx (t) (t representing the data load; the interpolation is not necessarily
linear). The combination is obtained, in this case, through the following
v(x) = Wx (t)Tx (t)dt

In this case there are no external profiles with which to compare the perfor-
mances because the prototypes are created ad-hoc, the technology is quite
new and there are no standards of what a very good performance could be.
An ordinal scale was created considering the best performances as first,
all performances presenting a difference of more than 5% and less than 20%
second, all performances presenting a difference of more than 20% and less
than 25% third, all performances presenting a difference of more than 25%
and less than 50% fourth and all performances presenting a difference of
more than 50% fifth. The same model was applied to all subnodes of
node 7. A sorting procedure could then be established to obtain the final

This process was repeated for all the intermediate nodes up to the seven root
nodes representing the seven basic evaluation dimensions. It took four to five
months for all the nodes to be equipped with their evaluation model and the
process generated several discussions inside the team of analysts, mainly of a
technical nature (concerning the specific contents of the values for each node).
The most discussed concept of the model was the concordance threshold and the
veto condition since part of the team considered that the required levels were
extremely severe. However, since such an approach corresponded to a cautious
attitude, it prevailed in the team and finally was accepted. The length of the
process is justified, not only by the quantity of nodes to define, but also because
the team of analysts was obliged to define a new measurement scale and a precise
measurement aggregation procedure for each node. Although this process can
be often qualified as subjective measurement, it was the only way to obtain
meaningful values for the offers. The set of criteria to be used, if a preference
aggregation comparing the alternatives amongst themselves was requested, was
defined as the seven root nodes equipped with a simple preference model: the
weak order induced by the ordinal scale associated to each of these nodes.
No exogenous uncertainty was considered in the evaluation model. The in-
formation provided by the tenderers concerning their offers was considered to be
reliable and the use of ordinal scales made it possible to avoid the problems of im-
precision or of measurement errors. This reasoning however, is less true for node 7
and its subnodes, but the team of analysts felt sufficiently confident with the tests
and did not analyse the problem further. Some endogenous uncertainty appeared
as soon as the model was put into practice (the offers being available). We shall

discuss this problem in more detail in the next section (concerning the elaboration
of the final recommendation), but we can anticipate that the problem was created
by the double evaluation provided by the chosen ELECTRE-TRI type aggrega-
tion consisting in an optimistic and a pessimistic evaluation which may not
necessarily coincide.
The evaluation model was coded in a formal document that was submitted
(and explained) to the final client receiving his consensus. It is worthwhile to
note that the final client was not able to participate in the elaboration of the
model (technical details, establishment of the parameters etc.). Part of the team
of analysts (some of the external consultants) were acting as his delegates. The
establishment of the evaluation model and its acceptance by the client opened the
way for its application on the set of offers received and for the elaboration of the
final recommendation.
The client greatly appreciated his involvement in the establishment of the eval-
uation model that turned out to be a product considered to be their own (from
their ex-post remarks: ....this (the involvement) turned out to be important....for
the acceptability of the evaluation results). The fact that each node of the hier-
archy was discussed, analysed and finally defined by the team of analysts allowed
them to understand the consequences for the global level, to be able to explain the
contents of the model to their client and justify the final result on the grounds of
their own knowledge and experience, not of the procedure adopted.
In other words we can claim that the model was validated during its construc-
tion. Such an approach helped both the acceptability of the model and the final
result, eased the discussion when the question of the final aggregation was settled
and definitely legitimated the model in the eyes of the client.

9.3.3 The final recommendation

The evaluation of the six offers, which effectively had been submitted after the call
for tenders was elaborated, was carried out in two main steps. The first consisting
in evaluating the six quality attributes and the second consisting in testing the
prototypes provided by the tenderers.
The method adopted to aggregate the information and construct the final eval-
uations was a variant of the ELECTRE TRI procedure (see Yu 1992). The reader
can also see Appendix A and refer to Chapter 6 for more details. We have the
following remarks on the use of such a method.

1. The key parameters used in the method are the profiles (to which the al-
ternatives are compared in order to be classified in a specific class), the
importance of each criterion for each parent criterion classification and the
concepts of concordance thresholds and veto conditions.
For each intermediate node such parameters were extensively discussed be-
fore reaching a precise numerical representation. As already mentioned in
section 3.2 the relative importance of each criterion and the concordance
threshold were established using a reasoning based on the identification of
the winning coalitions enabling the outranking relation to hold. The veto

condition was initially perceived as a theoretical possibility of no practical

use, then, as an eliminatory threshold, but the client soon realised its impor-
tance mainly when it was necessary to have an incomparability instead of an
indifference that was a counterintuitive situation when very different objects
were compared. Further on and as soon as the veto conditions were under-
stood by the client, they decided to introduce a similar concept each times
they wanted to distinguish between positive reasons (for the establishment
of the outranking relation) and negative reasons (against the establishment
of the outranking relation), since they are not necessarily complementary
and must be evaluated in a separate and independent way.
The profiles were established using the knowledge of the team of analysts
(experts in their domain) that were able to identify the minimal requirements
to qualify an object in a certain class. It is interesting to notice that for the
client, the intuitive idea of a profile was that of a typical object of a class
and not of the lower bound of the class. The shift from the intuitive idea to
the one used in the case study was immediate and presented no problems.
The fact remains, that the distinction between the two concepts of profile
is crucial, while the lower bound approach appears to be less intuitive than
the typical element one.

2. The whole method (and the model) was implemented on a spreadsheet. This
was of great importance because spreadsheets are a basic tool for communi-
cation and work in all companies and enable an immediate understanding of
the results. Moreover, they enable on-line what-if operations when specific
problems, concerning precise information and/or evaluation, appeared dur-
ing the discussions inside the team of analysts. The experimental validation
of the model was greatly eased by the use of the spreadsheet.
Further on it helped the acceptability and legitimation of the model through
the idea that if it can be implemented on a spreadsheet it is sufficiently
simple and easy to be used by our company. In fact some of the critiques by
the client about the approach adopted in this case were that ....MCDA is
not yet a universally known method...., ....seems less intuitive than other
well known techniques such as the weighted sum..., is time consuming
to apply a new methodology...., all these problems limiting the acceptability
of the methodology towards the clients client (the IS manager) and the
company more generally. Being able to implement the method and the model
on a spreadsheet was, for them, a proof that, although new, complex and
apparently less intuitive, the method was simple and easy and therefore
legitimately used in the decision process.

A specific problem which was raised in the first step was the generation of un-
certainty due to the aggregation procedure. The ELECTRE-TRI type procedure
adopted produces an interval evaluation consisting in a lower value (the pessimistic
evaluation) and an upper value (the optimistic evaluation). When an alternative
has a profile on the subnodes that is very different from the profiles of the classes
on the parent node then, due to the incomparabilities that occur when comparing

O1 O2 O3 O4 O5 O6

Table 9.1: the values of the alternatives on the six quality criteria (U: unacceptable,
A: acceptable, G: good, VG: very good, E: excellent)

the alternative to the profiles, it may happen that the two values do not coin-
cide (see more details in Appendix A). When the user of the model is not able to
choose one of the two evaluations in an hierarchical aggregation can be a problem
since at the next aggregation the subnodes may have evaluations expressed on
an interval. This is a typical case of endogenous uncertainty created by a method
itself and not by the available information. The client was keen to consider the
pessimistic and optimistic evaluation as bounds of the real value, but there was
no uncertainty distribution on the interval. For this purpose, the following pro-
cedure was adopted. Two distinct aggregations were made, one where the lower
values were used and the other where the upper values were used. Each of these,
in turn, may produce a lower value and an upper value. At the next aggregation
step, the lowest of the two lower values and the highest of the two upper values is
used. This is a cautious attitude and has the drawback of widening the intervals
as the aggregation goes up the hierarchy. However, this effect did not occur here
and the final result for the six dimensions is represented in table 9.1 (from here
on we will represent the criteria by Ci and the alternatives by Oi).

We consider that the problem of interval evaluation on ordinal scales is an open

theoretical problem that deserves future consideration (very little literature on the
subject is available to our knowledge: (see Roubens and Vincke 1985, Vincke 1988,
Pirlot and Vincke 1997, Tsoukias and Vincke 1999).
Another modification introduced in the aggregation procedure concerned the
use of the veto concept. As already mentioned, a strong veto concept was used
in the evaluation model such that the presence of an unacceptable value on any
node (among the ones endowed with such veto power) could result in a global
unacceptable value. However, during the evaluation of the offers, weaker con-
cepts of veto appeared necessary. The idea was that certain values could have a
limitation effect of the type: if an offer has the value x on a subnode then it
cannot be more than y on the parent node.

The results on node 7 concerning the performances of the prototypes are pre-

O1 O2 O3 O4 O5 O6
C7 A-A G-G G-G A-A E-E A-A

Table 9.2: the values of the alternatives on the performance criterion (U: unac-
ceptable, A: acceptable, G: good, VG: very good, E: excellent)

sented in table 9.2. Remember that such a result is an ordinal scale obtained by
aggregating the four scales defined as explained in the previous section. Therefore,
it could be considered more as a ranking than as an absolute evaluation. For this
reason the team of analysts decided to use such an attribute only to rank the
different offers after their sorting obtained by using the six quality attributes. For
this purpose the team of analysts tested three different aggregation scenarios cor-
responding to three different hypotheses about the importance of the performance

1. The performance attribute is considered to have the same importance as the

set of six quality attributes. This scenario represents the idea that the tests
on the software performances correspond to the only real or objective
measurement of the offers and it should therefore be viewed as a validation of
the result obtained through the subjective measurement carried out on the
six quality attributes. The aggregation procedure consisted in using the six
quality attributes as criteria equipped with a weak order from which to obtain
a final ranking. Since the evaluations for some of the six attributes were in
the form of an interval, an extended ordinal scale was defined in order to in-
duce the weak order: E  V G  G V G  G  A V G  A G  A  U .
The importance parameters are w(1.) = 2, w(2.) = 2, w(3.) = 4, w(4.) =
1, w(5.) = 4, w(6.) = 2 and the concordance threshold 12/15 (0.8). The six
orders are the following (x,y standing for indifference between x and y):
- O5  O2  O3  O4  O1, O6;
- O2  O5  O3  O4  O6  O1;
- O2  O4  O3  O5, O1, O6;
- O2, O4  O3, O5  O1, O6;
- O2, O5  O3, O4  O1, O6;
- O3  O2  O6, O4  O5  O1.
The final result is presented in table 9.3. In order to rank the alternatives
a score is computed for each of them. It is the difference of the number
of alternatives preferred to this specific alternative and the number of alter-
natives to which this specific alternative is preferred. Then, the alternatives
are ranked by decreasing magnitude of this score. The final ranking thus
obtained is given in figure 9.2 2a (it is worthwhile noting that the indiffer-
ence obtained in the final ranking corresponds to incomparabilities obtained
in the aggregation step). An intersection was therefore operated with the

O2 O5
O2 O3

O3,O4,O5 O4

? ?
O6 O6

? ?
O1 O1

2a 2b

Figure 9.2: 2a: the final ranking using the six quality criteria. 2b: the final ranking
as intersection of the six quality criteria and the performance criterion

ranking obtained on node 7. resulting in a final ranking reported in figure

9.2 2b.
2. The performance attribute is considered to be of secondary importance, to
be used in order to distinguish among the alternatives assigned in the same
class using the six quality attributes. In other words, the principal evalua-
tion was to be considered as the one using the six quality attributes and the
performance evaluation was only a supplement enabling an eventual further
distinction. Such an approach resulted in a low confidence evaluation being
awarded to the performance and the undesirability of assigning it high im-
portance. A lexicographic aggregation has been therefore applied using the
six quality criteria as in the previous scenario and applying the performance
criterion to the equivalence classes of the global ranking. The final ranking
is O2  O5  O3  O4  O6  O1.
3. A third approach consisted in considering the seven attributes as seven cri-
teria to be aggregated to obtain a final ranking assigning them a reasoned
importance parameter. The idea was that while the client could be inter-
ested in having the absolute evaluation of the offers (result obtainable only
using the six quality attributes) he could also be interested in a ranking of
the alternatives that could help him in the final choice. From this point of

O1 O2 O3 O4 O5 O6
O1 1 0 0 0 0 0
O2 1 1 1 1 1 1
O3 1 0 1 0 0 1
O4 1 0 0 1 0 1
O5 1 0 0 0 1 1
O6 1 0 0 0 0 1

Table 9.3: the outranking relation aggregating the six quality criteria

O1 O2 O3 O4 O5 O6
O1 1 0 0 0 0 0
O2 1 1 1 1 0 1
O3 1 0 1 0 0 1
O4 1 0 0 1 0 1
O5 1 0 0 0 1 1
O6 1 0 0 0 0 1

Table 9.4: the outranking relation aggregating the seven criteria

view the absolute evaluations on of the six quality attributes were trans-
formed into rankings as in the first scenario adding the seventh attribute as
a seventh criterion. The seven weak orders are the following:
- O5  O2  O3  O4  O1, O6;
- O2  O5  O3  O4  O6  O1;
- O2  O4  O3  O5, O1, O6;
- O2, O4  O3, O5  O1, O6;
- O2, O5  O3, O4  O1, O6;
- O3  O2  O6, O4  O5  O1.
- O5  O2, O3  O4, O6, O1.
The importance parameters are w(1.) = 2, w(2.) = 2, w(3.) = 4, w(4.) =
1, w(5.) = 4, w(6.) = 2, w(7.) = 4 and the concordance threshold 16/19
(more than 0.8). The final result is reported in table 9.4.

Using the same ranking procedure the final ranking is now: O2  O5 

O3, O4  O6  O1.

Finally and after some discussions with the client, the third scenario was
adopted and used as the final result. The two basic reasons were:
- while it was meaningful to interpret the ordinal measures for the six quality at-
tributes as weak orders representing the clients preferences, it was not meaningful
to translate the weak order obtained for the performance attribute as an ordinal
measurement of the offers;

- the first and second scenarios implicitly adopted two extreme positions concern-
ing the importance of the performance attribute that correspond to two different
philosophies present in the team of analysts, but not to the clients perception of
the problem. The importance parameters and the concordance threshold adopted
in the final version made it possible to define a compromise of these two extreme
positions expressed during the decision aiding process.
In fact the performance criterion is associated with an importance parameter
of 4 which combined with the concordance threshold of 16/19 implies that it is
impossible for an alternative to outrank another if its value on the performance
criterion is worse (and this satisfied the part of the team of analysts that considered
the performance criterion as a critical evaluation of the offers). Giving a regular
importance parameter to the performance criterion avoided the extreme situation
in which all other evaluations could become irrelevant. The final ranking obtained
respects this idea and the outranking table could be understood by all the members
of the team of analysts. As already reported, the client considered the approach
to be useful because every activity was justified. A major concern for people
involved in complex decision processes is to be able to justify their behaviour,
recommendations and decisions towards a director, a superior in the hierarchy of
the company, an inspector, a committee etc.. Such a justification applies both to
how a specific result was obtained and to how the whole evaluation was conducted.
In this case, for instance, the choice of the final aggregation was justified by
a specific attitude towards the two basic evaluation points of view: the quality
information and the performance of the prototypes. It was extremely important
for the client to be able to summarise the correspondence between an aggregation
procedure and an operational attitude because it enabled them to better argue
against the possible objections of their client.

A final question that arose during the elaboration of the final recommendation
was elaborated was whether it would be possible to provide a numerical represen-
tation of the values obtained by the offers and of the final ranking. It was soon
clear that the question originated from the will of the final client to be able to
negotiate with the AQ manager on a monetary basis since it was expected that he
would introduce the cost dimension into the final decision.
For this purpose an appendix was included in the final recommendation where
the following was emphasised:
- it is possible to give a numerical representation to both the ordinal measurement
obtained using the six quality attributes and to the final ranking obtained using
the seven criteria, but is was meaningless to use such a numerical representation
in order to establish implicit or explicit trade-offs with a cost criterion;
- it is possible to compare the result with a cost criterion following two possible
1.) either induce an ordinal scale from the cost criterion and then, using an
ordinal aggregation procedure construct a final choice (then the negotiation should
concentrate on defining the importance parameters, the thresholds etc.);
2.) or establish a value function of the client using one of the usual protocols
available in literature (see also in Chapter 6) to obtain the trade-offs between the

quality evaluations, the performance evaluations and the cost criterion (then the
negotiations should concentrate on a value function);
- the team of analysts was also available to conduct this part of the decision aiding
process if the client desired it.
The final client was very satisfied with the final recommendation and was also
able to understand the reply about the numerical representation. He nevertheless
decided to conduct the negotiations with the AQ manager personally and so the
team of analysts terminated its task with the delivery of the final recommendation.
A final consideration can be the fact that it is sure that there was space (but
no time) to experiment with more variants and methods for the aggregation pro-
cedure and the construction of the final recommendation. Valued relations, valued
similarity relations, interval comparisons using extended preference structures, dy-
namic assignment of alternatives to classes and other innovative techniques were
considered too new by the client who already considered the use of an approach
different from the usual grid and weighted sum a revolution (compared with the
companys standards). In their view, the fact of being able to aggregate the ordinal
information available in a correct and meaningful way was more than satisfactory
as they report in their ex-post remarks: ....pointed out that it was not necessary
to always use ratio scales and weighted sums, as we thought before, but that it was
possible to use judgements and aggregate them.....

9.4 Conclusions
Concluding this chapter we may try to summarise the lessons learned in this real
experience of decision support.
The most important lesson perhaps concerns the process dimension of decision
support. What the client needed was continuous assistance and support during
the decision process (the management of the call for tenders) enabling them to
understand their role, the expected results, and the way to provide a useful con-
tribution. If the support was limited to answering the client demand on how to
define a global evaluation (based on the weighted sum of their notes on the prod-
ucts) we may have provided them with an excellent multi-attribute value model
that would have been of no interest for their problem. This is not against multi-
attribute value based methods, which in other decision aiding processes can be
extremely useful, but an emphasis on a process based decision aiding activity.
A careful analysis of the problem situation, a consensual problem formulation, a
correct definition of the evaluation model and an understandable and legitimated
final recommendation are the products that we have to provide in a decision aiding
A second lesson learned concerns the ownership of the final recommendation.
By this we want to indicate the fact that the client will be much more confident in
the result and much more ready to apply it if he feels that he owns the result in the
sense that it is a product of his own convictions, values, computations, experience,
simulations and whatever else. Such ownership can be achieved if the client not
only participates in elaborating the parameters of the evaluation model, but actu-

ally build the model with the help of the analyst (which has been the case in our
experience). Although the specific case may be considered exceptional (due to the
specific dimension of the evaluation model and the double role of the client being
analyst for another client at the same time) we claim that is always possible to
include the client in the construction of the evaluation model in a way that allows
him to feel responsible and to own the final recommendation. Such ownership
greatly eases the legitimisation of the recommendation since it is not just the ad-
vice recommended by the experts who do not understand anything. It might be
interesting to notice that a customised implementation of the model on the tools
on which the client is accustomed (as in our case the company spreadsheet) greatly
improves the acceptance and legitimisation of the evaluation model.
A third lesson concerns the key issue of meaningfulness. The construction of
the evaluation model must obey two dimensions of meaningfulness. The first is
a theoretical and conceptual one and refers to the necessity to manipulate the
information in a sound and correct way. The second is a practical one and refers
to the necessity to manipulate the information in a way understandable by the
client and corresponding to his intuitions and concerns. It is possible that such
two dimensions may conflict. However, the evaluation model has to satisfy both
requirements, thus implying a process of adaptation guided by reciprocal learning
for the client and the analyst. The existence of clear and sound theoretical re-
sults for the use of specific preference modelling tools, preference and/or measure
aggregation procedures and other modelling tools definitely helps such a process.
A fourth lesson concerns the importance of the distinction between measures
and preferences. The first refer to observations made on the set of alternatives
either through objective or through subjective measures. The seconds refer
to the clients values, is always subjective and depends on the problem situation.
Moving from one to the other might be possible, but not obvious and has to be
carefully studied. Knowing that a software has n function points, while another
has m function points does not imply any particular preference between them. We
hope that the case study offered an introduction to this problem.
A fifth lesson concerns the definition of the aggregation procedure in the evalu-
ation model. The previous chapters of this book provide enough evidence that uni-
versal methods for aggregating preferences and/or measures do not exist. There-
fore, the aggregation procedures included in an evaluation model are choices that
have to be carefully studied and justified.
A sixth lesson is about uncertainty. Even when the available information is
considered reliable, uncertainty may appear (as in our case). Moreover, uncer-
tainty can appear in a very qualitative way and not necessarily in the form of an
uncertainty distribution. It is necessary to have a large variety of uncertainty rep-
resentation tools in order to include the relevant one in the evaluation model. Last,
but not least, we emphasise the significant number of open theoretical problems
the case study highlights (interval evaluation, ordinal measurement, hesitation
modelling, hierarchical measurement, ordinal value theory etc.).

Appendix A
The basic concepts adopted in the procedure used (based on ELECTRE TRI) are
the following.

A set A of alternatives ai , i = 1 m.

A set G of criteria gj , j = 1 n. A relative importance wj (usually nor-

malised in the interval [0, 1]) is attributed to each criterion gj .

Each criterion gj is equipped with an ordinal scale Ej with degrees elj , l =

1 k.

A set P of profiles ph , h = 1 t, ph being a collection of degrees, ph =

heh1 ehn i, such that if ehj belongs to profile ph , eh+1
j cannot belong to profile
ph1 .

A set C of categories c , = 1 t + 1, such that the profile ph is the upper

bound of category ch and the lower bound of category ch+1 .

An outranking relation S (A P) (P A), where s(x, y) should be read

as x is at least as good as y.

A set of preference relations hPj , Ij i for each criterion gj such that:

- x A Pj (x, ehj ) gj (x)  ehj
- x A Pj (ehj , x) gj (x) ehj
- x A Ij (x, ehj ) gj (x) ehj
, induced by the ordinal scale associated to criterion gj .

The procedure works in two basic steps.

1. Establish the outranking relation on the basis of the following rule:

s(x, y) C(x, y) and not D(x, y)

x A, y P : C(x, y) wj c and ( wj wj )
jG jG+ jG

y A, x P : C(x, y)
( wj c and wj wj ) or ( wj > wj )
jG jG+ jG jG+ jG

(x, y) (A P) (P A) : not D(x, y)

wj d and gj not vj (x, y)

- G+ = {gj G : Pj (x, y)}
- G = {gj G : Pj (y, x)}
- G= = {gj G : Ij (x, y)}
- G = G+ G=
- c: the concordance threshold c [0.5, 1]
- d: the discordance threshold d [0, 1]
- vj (x, y): veto, expressed on criterion gj , of y on x

2. When the relation S is established, assign any element ai on the basis of the
following rules.

2.1 pessimistic assignment

- ai is iteratively compared with pt p1 ,
- as soon as s(ai , ph ) is established, assign ai to category ch .
2.2 optimistic assignment
- ai is iteratively compared with p1 pt ,
- as soon as is established s(ph , ai )s(ai , ph ) then assign ai to category
ch1 .

The pessimistic procedure finds the profile for which the element is not the
worst. The optimistic procedure finds the profile against which the element
is surely the worse. If the optimistic and pessimistic assignments coincide,
then no uncertainty exists for the assignment. Otherwise, an uncertainty
exists and should be considered by the user.

In order to better understand how the procedure works consider the following

Four criteria g1 g4 , of equal importance (j wj = 1/4), each of them

equipped with an ordinal scale A  B  C  D.

Two profiles p1 = hC, C, C, Ci and p2 = hA, B, B, Bi defining three cate-

gories: unacceptable (U), acceptable (A) and good (G) (p2 being the mini-
mum profile for category G, p1 being the minimum profile for category A).

Three alternatives:
a1 = hD, B, B, Bi, a2 = hB, C, A, Ai, a3 = hA, B, B, Ci.

Further on, fix c = 0.75, d = 0.40 and j vj (x, y) x = D


With such information it is possible to establish the outranking relation that is

S = {(p2 , a1 ), (p2 , a2 ), (p2 , a3 ), (a2 , p1 ), (a3 , p1 )}. The reader can easily check that
the pessimistic assignment puts alternative a1 in category U and alternatives a2
and a3 in category A, while the optimistic assignment puts all three alternatives
in category A.

Appendix B
The complete list of the attributes used in the evaluation model


1.1 User interface

1.1.1 Graphics type
1.1.2 Graphics engine adequacy
1.1.3 Interface personalisation
1.2 Functionality
1.2.1 Availability
1.2.2 Adequacy Planes analysis functions Topological connectivity functions Graphical rendering functions
1.3 Development environment
1.3.1 Libraries personalisation
1.3.2 Development support tools
1.3.3 Debugging support tools
1.3.4 Code documentation Documentation support tools Code browsing
1.3.5 Documentation Quality Completeness Documentation support type Information retrieval ease Contextual help
1.4 Administration tools
1.4.1 User administration functions
1.4.2 Software configuration management
1.4.3 Performance data collection
1.5 Work flow connection
1.6 Interoperability
1.7 Integration between Land-base products and the Spatial Data Manager
1.7.1 Vectorial data products integration
1.7.2 Descriptive data products integration
1.7.3 Raster data products integration
1.7.4 Digital Terrain Model products integration
1.8 Integration among Land-base products
1.8.1 Interfaces integration
1.8.2 Data sharing


2.1 User interface

2.1.1 Graphics type
2.1.2 Graphics engine adequacy
2.1.3 Interface personalisation
2.2 Functionality
2.2.1 Availability
2.2.2 Adequacy Planes analysis functions Graphical rendering functions
2.3 Development environment
2.3.1 Libraries personalisation
2.3.2 Development support tools
2.3.3 Debugging support tools
2.3.4 Code documentation Documentation support tools Code browsing
2.3.5 Documentation Quality Completeness Documentation support type Information retrieval ease Contextual help
2.4 Administration tools
2.4.1 Software configuration management
2.5 Interoperability
2.6 Integration between Geomarketing products and the Spatial Data Manager
2.6.1 Vectorial data products integration
2.6.2 Descriptive data products integration
2.6.3 Raster data products integration
2.7 Integration among Geomarketing products
2.7.1 Interfaces integration
2.7.2 Data sharing


3.1 User interface

3.1.1 Graphics type
3.1.2 Graphics engine adequacy
3.1.3 Interface personalisation
3.2 Functionality

3.2.1 Availability
3.2.2 Adequacy Planes analysis functions Topological connectivity functions Graphical rendering functions Network schema creation
3.3 Development environment
3.3.1 Libraries personalisation
3.3.2 Development support tools
3.3.3 Debugging support
3.3.4 Code documentation Documentation support tools Code browsing
3.3.5 Documentation Quality Completeness Documentation support type Information retrieval ease Contextual help
3.4 Administration tools
3.4.1 User administration functions
3.4.2 Software configuration management
3.4.3 Performance data collection
3.5 Work flow connection
3.6 Interoperability
3.7 Integration between this process products and the Spatial Data Manager
3.7.1 Vectorial data products integration
3.7.2 Descriptive data products integration
3.7.3 Raster data products integration
3.7.4 Digital Terrain Model products integration
3.8 Integration among this process products
3.8.1 Interfaces integration
3.8.2 Data sharing


4.1 User interface

4.1.1 Graphics type
4.1.2 Graphics engine adequacy
4.1.3 Interface personalisation
4.2 Functionality
4.2.1 Availability

4.2.2 Adequacy Planes analysis functions Topological connectivity functions Graphical rendering functions Network schema creation
4.3 Development environment
4.3.1 Libraries personalisation
4.3.2 Development support tools
4.3.3 Debugging support
4.3.4 Code documentation Documentation support tools Code browsing
4.3.5 Documentation Quality Completeness Documentation support type Information retrieval ease Contextual help
4.4 Administration tools
4.4.1 Software configuration management
4.4.2 Performance data collection
4.5 Interoperability
4.6 Integration between this process products and the Spatial Data Manager
4.6.1 Vectorial data products integration
4.6.2 Descriptive data products integration
4.6.3 Raster data products integration
4.7 Integration among this process products
4.7.1 Interfaces integration
4.7.2 Data sharing


5.1 Data base properties

5.1.1 Fundamental properties
5.1.2 Transaction typology support
5.1.3 Data / Function association
5.1.4 Client data access libraries
5.2 Basic properties of the Spatial Data Manager
5.2.1 Data model
5.2.2 Data management
5.2.3 Data integration
5.2.4 Spatial operators

5.2.5 Coordinate systems

5.2.6 Vectorial data continuous management
5.3 Special properties of the Spatial Data Manager
5.3.1 Data sharing constraints
5.3.2 Feature versioning
5.3.3 Feature life-cycle management
5.3.4 Data distribution
5.4 Integration between the Spatial Data Manager and the Data Layer
5.4.1 Server data access libraries Public libraries for feature manipulation Structured Query Language to access descriptive data
5.4.2 Independence from features structure
5.4.3 Integration with Oracle
5.4.4 Integration with Unix and MVS relational databases
5.4.5 Integration with Oracle Designer 2000
5.4.6 Logical scheme import capability
5.4.7 Spatial Data Manager platform
5.5 Data administration tools
5.5.1 Database distribution
5.5.2 Database access control
5.5.3 Backup


6.1 Robustness
6.2 Maturity
6.3 Easiness of installation and maintenance


7.1 Single transaction under different data volume

7.2 Data Manager under different operation typology
7.3 Data Manager under different concurrent transactions
7.4 Graphical interfaces performances

10.1 Formal methods are all around us

The aim of this book was to provide a critical introduction to a number of for-
mal decision and evaluation methods. By this, we mean a set of explicit and
well-defined rules to collect, assess and process information in order to make rec-
ommendations in decision and/or evaluation processes. Although these methods
may not be entirely formalised, their underlying logic should be explicit contrary
to, say, astrology or graphology. Such methods emanate from many different dis-
ciplines (Political Science, Education Science, Statistics, Economics Operational
Research, Computer Science, Decision Theory, Engineering, etc.) and are used to
support numerous kinds of decision or evaluation processes. It is not an overstate-
ment to say that nowadays nearly everyone is, implicitly or explicitly, confronted
with such methods.
We briefly summarise below the main methods presented in this book and the
difficulties that have been encountered.
Following a democratic election Mr. X has been elected
As citizens, we hopefully have to cast several kinds of votes. As mentioned in
chapter 2, elections are governed by rules that are very far from being innocuous.
Similar votes may well lead to very different results depending on the rules used
to process them. Such electoral rules contribute towards shaping the entire
political debate in a country and, thus, influence the type of democracy we live in.
Therefore, under a slightly different electoral system, Mr. X might not have been
Your child has a GPA of 9.54. Therefore we cannot allow him to continue
with this programme
Our early life at school was governed to a large extent by the grades we ob-
tained, the exams we passed or not. It is likely that the present professional life
of many readers is still governed by some type of formal evaluation method that
somehow uses grades (this is clearly the case for most academics). In chapter 3
we saw, that a grade, although being a very familiar concept, is in fact a com-
plex evaluation model. Not surprisingly, the aggregation of such evaluations is not
an obvious task. Therefore, the decision made concerning your child might well


have been significantly different depending on the grading policy and/or correction
habits of some teachers, the fact that his exams were corrected late at night or on
the way his various grades were aggregated.
Things are going well since the well-being index in our country raised by
more than 10% over the last three years
Statisticians have elaborated an incredible number of indicators or indices aim-
ing at capturing many aspects of reality (including the quality of the air we breeze,
the richness of a country, its state of development, etc.) by using numbers. Not
only are our newspapers full of these kinds of figures but they are also routinely
used to make important political or economic decisions. In chapter 4, we saw that
such measures should not be confounded with the familiar measurement oper-
ations in Physics. The resulting numbers do not appear to be measured on some
well-defined type of scale. Their properties are sometimes intriguing and they
surely should be manipulated with care. Therefore, claiming that the well-being
index has increased by 10% gives, at best, a very crude indication.
Calculations show that it is not profitable to equip this hospital with a mater-
nity department
The quality of the roads on which we drive, the tariffing of public transporta-
tion, the way our electricity is produced, the safety regulations applied to factories
near our homes, the quality of our social security system, etc., depend on partic-
ular ways of assessing and summarising the costs and the benefits of alternative
projects. Cost-benefit analysis evaluates such projects using money as a yardstick.
This raises many difficulties outside simple cases: how to convert the various
consequences of a complex project into monetary units, how to cope with equity
considerations in the distribution of costs and benefits, how to take the distribution
in time of these consequences into account? In chapter 5 we saw that cost-benefit
analysis can hardly claim to always solve all these difficulties in a satisfactory
manner. Therefore, the apparently objective calculations invoked to refuse the
creation of a maternity department in our hospital, are highly dependent on nu-
merous debatable hypotheses (e.g. the pricing of a number of statistical delivery
incidents due to a longer transportation time for some mothers). It is not unlikely
that other reasonable hypotheses may have led to an opposite decision.
Based on numerous tests it appears that the best buy is car Z
How to take several, generally conflicting, criteria into account when making
a decision ? This area, known as Multiple Criteria Decision Making (MCDM)
is the subject of chapter 6. We showed that, in most cases, the analyst has the
choice between several aggregation strategies that could lead to different results.
Furthermore, apparently familiar concepts, like the importance of criteria, are
shown to have little (if any) clear meaning outside a well-defined aggregation
strategy. Each of these strategies requires the assessment of more or less rich
and precise inter-criteria information. Since such assessments shape preference
information as much as they collect it, the comparison of these strategies raises
many problems. Therefore, because each potential buyer has his own preferences
and interests and there are many different and yet reasonable ways to aggregate
them, the very notion of a best buy is highly debatable.

Relax, our new camera will choose the optimal focus for you
Our washing machines, our cameras, our TV sets often take decisions on their
own, e.g. concerning the amount of water or energy to use, the right focus, the,
supposedly optimal tuning of channels, the clarity of an image. The decision
modules underlying such automatic decisions were studied in chapter 7. We saw
that they are based on concepts and techniques that are very similar to the ones
examined in chapter 6 and, thus, raise similar problems and questions. Contrary
to the situation in chapter 6 however, they are used in real time without human
intervention after the implementation stage. This raises new difficulties and issues.
Therefore, relying on the automatic decisions taken by the new camera might not
always be your best option.
Given what you told me about your preferences and beliefs, you should not
invest in this project in view of its expected utility
Standard decision analysis techniques (see e.g. Raiffa 1970) are often seen as
synonymous with decision support methods in risky and/or uncertain situations.
Using a real example in electricity production planning, in chapter 8, we showed
why the implementation of these standard techniques may not be as straightfor-
ward as is often believed. Besides possible computational problems, the assessment
and revision of (subjective) probability distributions in highly ambiguous environ-
ments and in situations involving a long period of time, is an enormous task.
Alternative tools, such as possibilities, belief functions, fuzzy sets and other
kinds of non-additive uncertainty measures may appear as good contenders al-
though their theoretical basis may be seen as less firm than the one underlying
standard Bayesian analysis. Furthermore, important considerations, like the dy-
namic consistency of choices and the aggregation of consequences over time were
shown to be largely open questions. Therefore, there might be more than one
way to assess preferences and beliefs and to combine them in order to make a
Whether we like it or not, it seems difficult nowadays to escape from formal
decision and evaluation methods. We may ignore them. The authors of this book
believe that it may be interesting and profitable to give them a closer look. The
real case-study presented in chapter 9 has shown that their proper use can have a
significant impact on real complex decision or evaluation processes.

10.2 What have we learned?

Although the methods examined in this book are apparently very different and
emanate from various disciplines, they appear to have a lot in common. This
should not be much of a surprise since these methods have the common objective of
providing recommendations in complex decision and evaluation processes. What
might be slightly more surprising, is that most of these methods and tools are
plagued with many difficulties. Let us try to summarise the main findings and
problems encountered in the preceding chapters here.

Objective and scope of formal decision/evaluation models


Formal decision and evaluation models are implemented in complex

decision/evaluation processes. Using them rarely amounts to solving a
well-defined formal problem. Their usefulness not only depends on their
intrinsic formal qualities but also on the quality of their implementation
(structuration of the problem, communication with actors involved in
the process, transparency of the model, etc.). Having a sound theo-
retical basis is therefore a necessary but insufficient condition to their
usefulness (see chapter 9).
The objective of these models may be different from recommending the
choice of a best course of action. More complex recommendations,
e.g. ranking the possible courses of action or comparing them to stan-
dards, are also frequently needed (see chapters 3, 4, 6 and 7). Moreover,
the usefulness of such models is not limited to the elaboration of sev-
eral types of recommendations. When properly used, they may provide
support at all steps of a decision process (see chapter 9)

Collecting data

All models imply collecting and assessing data of various types and
qualities and manipulating these data in order to derive conclusions that
will hopefully be useful in a decision or evaluation process. This more or
less inevitably implies building evaluation models trying to capture
aspects of reality that are difficult to define with great precision (see
chapters 3, 4, 6 and 9).
The numbers resulting from such evaluation models often appear as
constructs that are the result of multiple options. The choice between
these various possible options is only partly guided by scientific con-
siderations. These numbers should not be confounded with numbers
resulting from classical measurement operations in Physics. They are
measured on scales that are difficult to characterise properly. Further-
more, they are often plagued with imprecision, ambiguity and/or un-
certainty. Therefore, more often than not, these numbers seem, at best,
to give an order of magnitude of what is intended to be captured (see
chapters 3, 4, 6, 8).
The properties of the numbers manipulated in such models should be
examined with care; using numbers may only be a matter of con-
venience and does not imply that any operation can be meaningfully
performed on them (see chapters 3, 4, 6 and 7).
The use of evaluation models greatly contributes to shaping and trans-
forming the reality that we would like to measure. Implementing a
decision/evaluation model only rarely implies capturing aspects of re-
ality that can be considered as independent of the model (see chapters
6 and 9).

Aggregating evaluations

Aggregating the results of complex evaluation models is far from be-

ing an easy task. Although many aggregation models amount to sum-
marising these numbers into a single one, this is not the only possible
aggregation strategy (see chapters 3, 4, 5 and 6).
The pervasive use of simple tools such as weighted averages may lead to
disappointing and/or unwanted results. The use of weighted averages
should in fact be restricted to rather specific situations that are seldom
met in practice.
Devising an aggregation technique is not an easy task. Apparently
reasonable principles can lead to a model with poor properties. A formal
analysis of such models may therefore prove of utmost importance (see
chapters 2, 4 and 6).
Aggregation techniques often call for the introduction of preference
information. The type of aggregation model that is used greatly con-
tributes to shaping this information. Assessment techniques, therefore,
not only collect but shape and/or create preference information (see
chapter 6).
Many different tools can be envisaged to model the preferences of an
actor in a decision/evaluation process (see chapters 2 and 6).
Intuitive preference information, e.g. concerning the relative importance
of several points of view, may be difficult to interpret within a well-
defined aggregation model (see chapter 6).

Dealing with imprecision, ambiguity and uncertainty

In order to allow the analyst to derive convincing recommendations,

the model should explicitly deal with imprecision, uncertainty and in-
accurate determination. Modelling all these elements into the classical
framework of Decision Theory using probabilities may not always lead
to an adequate model. It is not easy to create an alternative framework
in which problems such as dynamic consistency or respect of (first or-
der) stochastic dominance are dealt with in a satisfactory manner (see
chapters 6 and 8).
Deriving robust conclusions on the basis of such aggregation models
requires a lot of work and care. The search for robust conclusions may
imply analyses much more complex than simple sensitivity analyses
varying one parameter at a time in order to test the stability of a
solution (see chapters 6 and 8).

We saw that the methods reviewed in chapters 2 to 8 are far from being without
problems. Indeed these chapters can be seen as a collection of the defects of these
methods. Some readers may think that, faced with such evidence, this type of
method should be abandoned and that intuition or expertise are not likely to
do much worse, at lower cost and with less effort. In our opinion, this would be a
totally unwarranted conclusion. It is the firm belief and conviction of the authors

that the use of formal decision and evaluation tools is both inevitable and useful.
Three main arguments can be proposed to support this claim.
First, it should not be forgotten that formal tools lend themselves more easily
to criticism and close examination than other kinds of tools. However, whenever
intuition or expertise has been subjected to close scrutiny, it has been more or
less always shown that such types of judgements are based on heuristics that are
likely to neglect important aspects of the situation and/or are affected by many
biases (see the syntheses of Kahneman, Slovic and Tversky 1981, Bazerman 1990,
Russo and Schoemaker 1989, Hogarth 1987, Poulton 1994, Thaler 1991)
Second, formal methods have a number of advantages that often prove crucial
in complex organisational and/or social processes:

they promote communication between the actors of a decision or evaluation

process by offering them a common language;

they require building models of certain aspects of reality; this implies

concentrating efforts on crucial matters. Thus, formal methods are often
indispensable structuration instruments.

they lend themselves easily to what-if types of questions. These explo-

ration capabilities are crucial in order to devise robust recommendations.

Although these advantages may have little weight compared to the obvious draw-
backs of formal methods in terms of effort involved, money and time consumed
in some situations (e.g. a very simple decision/evaluation process involving a sin-
gle actor) they appear to us fundamental to us in most social or organisational
processes (see chapter 9).
Third, casual observation suggests that there is an increasing demand for such
tools in various domains (going from executive information systems, decision sup-
port systems and expert systems to standardised evaluation tests and impact stud-
ies). It is our belief that the introduction of such tools may have quite a beneficial
impact in many areas in which they are not commonly used. Although many com-
panies use tools such as graphology and/or astrology in order to select between
applicants for a given position, we are more than inclined to say that the use of
more formal methods could improve such selection processes (let alone on issues
such as fairness and equity) in a significant way. Similarly, the introduction of
more formal evaluation tools in the evaluation of public policies, laws and regu-
lations (e.g. policy against crime and drugs, policy towards the carrying of guns,
fiscal policy, the establishment of environmental standards, etc.), an area in which
they are strikingly absent in many countries, would surely contribute to a more
transparent and effective government.
We would thus answer a clear and definite yes to the question of whether
formal decision and evaluation tools are useful.

10.3 What can be expected?

Our plea for the introduction of more formal decision and evaluation tools may
appear paradoxical in view of the content of this book. Have we been overly
critical then? Certainly not. Our willingness to keep mathematics and formalism
to the lowest possible level has not allowed us to explore many technical details
and difficulties. Indeed, a thorough critical examination of each of the methods
covered in chapters 2 to 8 could be the subject of an entire book.
The paradox between our conviction in the usefulness of formal methods and
the content of this book is only apparent and results from a misunderstanding. The
fact that many decision and evaluation tools are plagued with serious difficulties is
troublesome. It should not be unexpected however, unless one believes that there
is a single best way to provide support in each type of decision or evaluation
process. We doubt that this is a reasonable belief. Indeed, the very way in which
a good formal decision/evaluation method is defined, is nothing but clear. Two
main, non-exclusive, paths have often been suggested for this purpose. None of
them appear totally convincing to us.

the engineering route that amounts to saying that a method is good because
it works, i.e. has been applied several times in real-world problems and
has been well accepted by the actors in the process. Although we would
definitely not favour a method that would be unable to pass such a test, we
doubt that the engineering argument is sufficient to define what would dis-
tinguish good formal decision or evaluation methods. First, it is important
to remember that the quality of the support provided by a formal tool is
very difficult to separate from considerations linked to the implementation of
the method. As should be apparent from of chapter 9, the formal tools used
by an analyst are implemented in decision or evaluation processes that may
be highly complex (involving many different actors, lasting a long time and
being governed by complex rules and/or regulations). The resulting deci-
sion/evaluation aid process is therefore conditioned by many factors outside
the realm of a formal method: the quality of the structuration of the prob-
lem, of communication with stakeholders, the availability of user-friendly
softwares, the timing and costs of the study, etc. are elements of utmost
importance in the quality of a decision/evaluation aid process. Supporting
a decision or an evaluation process should not be confounded with solving
a well-defined formal problem. Although it may make sense to associate
a good method for solving it to such a problem, supporting real decision
and evaluation processes should not be confounded with this formal exer-
cise. Second, in practice, it is often difficult to know whether the proposed
model worked or not. Even though the final decision is at variance with
the recommendations derived from the model, the very presence of analysts,
the questions they raised, the type of reasoning they have promoted could
have had a significant impact on the decision process. Should we say then
that the method has worked or not?
A close variant of the engineering route could be called the naive route.

It amounts to saying that a formal tool is adequate if it consistently leads

to good decisions. The literature on decision (see Raiffa 1970, Russo
and Schoemaker 1989, Keeney, Hammond and Raiffa 1999), however, has
always insisted on the fact that good decisions do not necessarily lead to
good outcomes. This literature shows that it is very difficult to define what
would constitute a good decision a priori (good in which state of nature ?
good for whom ? good according to what criteria ? at what moment in time
?, etc.) and that the essential idea is to promote a good decision process.

the rational route which amounts to saying that a method is adequate if it

is backed by a sound theory of rational choice. Although we find theories
most useful, the criteria for separating sound from unsound theories of ratio-
nal choice do not appear obvious to us. A striking example of this difficulty
can be found in the area of decision under risk and uncertainty. While, until
the beginning of the eighties, expected utility theory was considered almost
unanimously as the rational theory of choice under risk, the proliferation
of alternative theories since then (see e.g. Dubois, Fargier and Prade 1997,
Fishburn 1988, Gilboa and Schmeidler 1989, Jaffray 1988, Jaffray 1989, Kah-
neman and Tversky 1979, Loomes and Sugden 1982, Machina 1982, Quiggin
1982, Schmeidler 1989, Wakker 1989, Yaari 1987) fostered by the result
of numerous empirical experiments (see e.g. Allais 1953, Hershey, Kun-
reuther and Schoemaker 1982, Johnson and Schkade 1989, Kahneman and
Tversky 1979, McCord and de Neufville 1982, McCrimmon and Larsson
1979) presently results in a very complex situation in which it is not easy
to discriminate between theories both from an empirical (see e.g. Abdellaoui
and Munier 1994, Carbone and Hey 1995, Harless and Camerer 1994, Hey
and Orme 1994, Sopher and Gigliotti 1993) or a normative point of view
(see e.g. Hammond 1988, Machina 1989, McClennen 1990, Nau 1995, Nau
and McCardle 1991). This is true even though most, if not all, of these
theories have been axiomatically characterised (i.e. a set of conditions is
known that completely characterises the proposed choice or evaluation mod-
els). Having axioms is certainly useful in order to compare theories but the
rational content of the axioms and their interpretation remain much de-
bated. Furthermore, the relation between the formal axiomatic theory and
the assessment technologies derived from it are far from being obvious (see
e.g. Bouyssou 1984).

Analysts implementing formal decision and evaluation tools are in a position sim-
ilar to that of an engineer. Contrary to most engineers, however, these decision
engineers often lack clear criteria for appreciating the success or failure of
their models.
At this point it should be apparent that research on formal decision and evalu-
ation methods should not be guided by the hope of discovering models that would
be ideal under certain types of circumstances. Can something be done then? In
view of the many difficulties encountered with the models envisaged in this book
and the many fields in which no formal decision and evaluation tools are used, we
do think that this area will be rich and fertile for future research.

Freed from the idea that we will discover the method, we can, more modestly
and more realistically, expect to move towards:

structuring tools that will facilitate the implementation of formal decision

and evaluation models in complex and conflictual decision processes;

flexible preference models able to cope with data of poor or unknown quality,
conflicting or lacking information;

assessment protocols and technologies able to cope with complex and unsta-
ble preferences, uncertain trade-offs, hesitation and learning;

tools for comparing aggregation models in order to know what they have
in common and whether one is likely to be more appropriate in view of the
quality of the data?

tools for defining and deriving robust conclusions.

To summarise, the future as we see it: structuration methodologies allowing for an

explicit involvement and participation of all stakeholders, flexible preference mod-
els tolerating hesitations and contradictions, flexible tools for modelling impreci-
sion and uncertainty, evaluation models fully taking incommensurable dimensions
into account in a meaningful way, assessments technologies incorporating fram-
ing effects and learning processes, exploration techniques allowing to build robust
recommendations (see Bouyssou et al. 1993). Thus, thanks to rigourous con-
cepts, well-formulated models, precise calculations and axiomatic considerations,
we should be able to clarify decisions by separating what is objective from what
is less objective, by separating strong conclusions from weaker ones, by dissipat-
ing certain forms of misunderstanding in communication, by avoiding the trap
of illusory reasoning, by bringing out certain counter-intuitive results (Roy and
Bouyssou 1991).
This utopia calls for a vast research programme requiring many different
types of research (axiomatic analyses of models, experimental studies of models,
clinical analyses of decision/evaluation processes, conceptual reflections on the
notions of rationality and performance, production of new pieces of software,
The authors are preparing another book that will hopefully contribute to this
research programme. It will cover the main topics that we believe to be useful in
order to successfully implement formal decision/evaluation models in real-world
processes :

structuration methods and concepts,

preference modelling tools,

uncertainty and imprecision modelling tools,

aggregation models,

tools for deriving robust recommendations.


If we managed to convince you that formal decision and evaluation models are an
important topic and that the hope of discovering ideal methods is somewhat
chimerical, it is not unlikely that you will find the next book valuable.

[1] Abbas, M., Pirlot, M. and Vincke, Ph. (1996). Preference structures and co-
comparability graphs, Journal of Multicriteria Decision Analysis 5: 8198.
[2] Abdellaoui, M. and Munier, B. (1994). The closing in method: An ex-
perimental tool to investigate individual choice patterns under risk, in
B. Munier and M.J. Machina (eds), Models and experiments in risk and
rationality, Kluwer, Dordrecht, pp. 141155.
[3] Adler, H.A. (1987). Economic appraisal of transport projects: A manual with
case studies, Johns Hopkins University Press for the World Bank, Balti-
[4] Airaisian, P.W. (1991). Classroom assessment, McGraw-Hill, New York.
[5] Allais, M. and Hagen, O. (eds) (1979). Expected utility hypotheses and the
Allais paradox, D. Reidel, Dordrecht.
[6] Allais, M. (1953). Le comportement de lhomme rationnel devant le risque :
Critique des postulats et axiomes de lecole americaine, Econometrica
21: 50346.
[7] Armstrong, W.E. (1939). The determinateness of the utility function, The
Economic Journal 49: 453467.
[8] Arrow, K.J. and Raynaud, H. (1986). Social choice and multicriterion
decision-making, MIT Press, Cambridge.
[9] Arrow, K.J. (1963). Social choice and individual values, 2nd edn, Wiley, New
[10] Atkinson, A.B. (1970). On the measurement of inequality, Journal of Eco-
nomic Theory 2: 244263.
[11] Baldwin, J.F. (1979). A new approach to approximate reasoning using a fuzzy
logic, Fuzzy Sets and Systems 2: 309325.
[12] Balinski, M.L. and Young, H.P. (1982). Fair representation, Yale University
Press, New Haven.
[13] Bana e Costa, C.A., Ensslin, L., Correa, E.C. and Vansnick, J.-C. (1999).
Decision support systems in action: Integrated application in a multi-
criteria decision aid process, European Journal of Operational Research
113: 315335.
[14] Barbera, S., Hammond, P. and Seidl, C. (eds) (1998). Handbook of utility
theory, Vol. 1: Principles, Kluwer, Dordrecht.


[15] Bartels, R. H.., Beatty, J. C.. and Barsky, B.H.. (1987). An introduction
to Spline for use in computer graphics and geometric Modeling, Morgan
Kaufmann, Los Altos.
[16] Barzilai, J., Cook, W.D. and Golany, B. (1987). Consistent weights for judg-
ments matrices of the relative importance of alternatives, Operations Re-
search Letters 6: 131134.
[17] Bazerman, M.H. (1990). Judgment in managerial decision making, Wiley,
New York.
[18] Bell, D., Raiffa, H. and Tversky, A. (eds) (1988). Decision making: Descrip-
tive, normative and prescriptive interactions, Cambridge University Press,
[19] Belton, V., Ackermann, F. and Shepherd, I. (1997). Integrated support
from problem structuring through alternative evaluation using COPE and
VISA, Journal of Multi-Criteria Decision Analysis 6: 115130.
[20] Belton, V. and Gear, A.E. (1983). On a shortcoming of Saatys analytic hi-
erarchies, Omega 11: 228230.
[21] Belton, V. (1986). A comparison of the analytic hierarchy process and a simple
multi-attribute value function, European Journal of Operational Research
26: 721.
[22] Bereau, M. and Dubuisson, B. (1991). A fuzzy extended k-nearest neighbor
rule, Fuzzy Sets and Systems 44: 1732.
[23] Bernoulli, D. (1954). Specimen theori nov de mensura sortis, Commen-
tarii Academi Scientiarum Imperialis Petropolitan (5, 175192, 1738),
Econometrica 22: 2336. Translated by L. Sommer.
[24] Bezdek, J., Chuah, S.K. and Leep, D. (1986). Generalised k-nearest neighbor
rules, Fuzzy Sets and Systems 18: 237256.
[25] Blin, M.-J. and Tsoukias, A. (1998). Multicriteria methodology contribution
to the software quality evaluation, Technical report, Cahier du LAMSADE
No 155, Universite Paris-Dauphine, Paris.
[26] Boardman, A. (1996). Cost benefit analysis: Concepts and practices, Prentice-
Hall, New-York.
[27] Boiteux, M. (1994). Transports : Pour un meilleur choix des investissements,
La Documentation Francaise, Paris.
[28] Bonboir, A. (1972). La docimologie, PUF, Paris.
[29] Borda, J.-Ch. (1781). Memoire sur les elections au scrutin, Comptes Rendus
de lAcademie des Sciences. Translated by Alfred de Grazia as Mathe-
matical derivation of an election system, Isis, Vol. 44, pp. 4251.
[30] Bouchon, B. (1995). La logique floue et ses applications, Addison Wesley, New

[31] Bouchon-Meunier, B. and Marsala, C. (1999). Learning fuzzy decision rules,

in D. D. J. Bezdek and H. Prade (eds), Fuzzy sets in approximate reason-
ing and information systems, Vol. 3 of Handbook of Fuzzy Sets, Kluwer,
Dordrecht, chapter 4, pp. 279304.
[32] Bouyssou, D., Perny, P., Pirlot, M., Tsoukias, A. and Vincke, Ph. (1993).
A manifesto for the new MCDM era, Journal of Multi-Criteria Decision
Analysis 2: 125127.
[33] Bouyssou, D. and Perny, P. (1992). Ranking methods for valued preference
relations: A characterization of a method based on entering and leaving
flows, European Journal of Operational Research 61: 186194.
[34] Bouyssou, D. and Pirlot, M. (1997). Choosing and ranking on the basis of
fuzzy preference relations with the Min in Favor, in G. Fandel and T. Gal
(eds), Multiple criteria decision making Proceedings of the twelfth inter-
national conference, Hagen, Germany, Springer Verlag, Berlin, pp. 115
[35] Bouyssou, D. and Vansnick, J.-C. (1986). Noncompensatory and generalized
noncompensatory preference structures, Theory and Decision 21: 251266.
[36] Bouyssou, D. (1984). Decision-aid and expected utility theory: A critical
survey, in O. Hagen and F. Wenstp (eds), Progress in utility and risk
theory, Kluwer, Dordrecht, pp. 181216.
[37] Bouyssou, D. (1986). Some remarks on the notion of compensation in MCDM,
European Journal of Operational Research 26: 150160.
[38] Bouyssou, D. (1990). Building criteria: A prerequisite for MCDA, in
C.A. Bana e Costa (ed.), Readings in multiple criteria decision aid,
Springer Verlag, Berlin, pp. 5880.
[39] Bouyssou, D. (1992). On some properties of outranking relations based on
a concordance-discordance principle, in A. Goicoechea, L. Duckstein and
S. Zionts (eds), Multiple criteria decision making, Springer-Verlag, Berlin,
pp. 93106.
[40] Bouyssou, D. (1996). Outranking relations: Do they have special properties?,
Journal of Multi-Criteria Decision Analysis 5: 99111.
[41] Brams, S.J. and Fishburn, P.C. (1982). Approval voting, Birkhauser, Basel.
[42] Brans, J.-P. and Vincke, Ph. (1985). A preference ranking organization
method, Management Science 31: 647656.
[43] Brekke, K.A. (1997). The numeraire matters in cost-benefit analysis, Journal
of Public Economics 64: 117123.
[44] Brent, R.J. (1984). Use of distributional weights in cost-benefit analysis: A
survey of schools, Public Finance Quarterly 12: 213230.
[45] Brent, R.J. (1996). Applied cost-benefit analysis, Elgar, Adelshot Hants.
[46] Broome, J. (1985). The economic value of life, Economica 52: 281294.

[47] Carbone, E. and Hey, J.D. (1995). A comparison of the estimates of expected
utility and non-expected utility preference functionals, Geneva Papers on
Risk and Insurance Theory 20: 111133.
[48] Cardinet, J. (1986). Evaluation scolaire et mesure, De Boeck, Brussels.
[49] Chatel, E. (1994). Quest-ce quune note : recherche sur la pluralite des
modes deducation et devaluation, Les Dossiers dEducation et Forma-
tions 47: 183203.
[50] Checkland, P. (1981). Systems thinking, systems practice, Wiley, New York.
[51] Condorcet, M.J.A.N.C., marquis de. (1785). Essai sur lapplication de
lanalyse a la probabilite des decisions rendues a la pluralite des voix, Im-
primerie Royale, Paris.
[52] Cover, T. M. and Hart, P. E. (1967). Nearest neighbor pattern classification,
IEEE, Transactions on Information Theory, IT-13 1: 2127.
[53] Cross, L.H. (1995). Grading students, Technical Report Series EDO-TM-95-5,
ERIC/AE Digest.
[54] Daellenbach, H.G. (1994). Systems and decision making. A management sci-
ence approach, Wiley, New York.
[55] Dasgupta, P.S., Marglin, S. and Sen, A.K. (1972). Guidelines for project eval-
uation, UNIDO, New York.
[56] Dasgupta, P.S. and Pearce, D.W. (1972). Cost-benefit analysis: Theory and
practice, Macmillan, Basingstoke.
[57] Davis, B.G. (1993). Tools for teaching, Jossey-Bass, San Francisco.
[58] de Jongh, A. (1992). Theorie du mesurage, agregation des criteres et appli-
cation au decathlon, Masters thesis, SMG, Universite Libre de Bruxelles,
[59] Dekel, E. (1986). An axiomatic characterization of preference under uncer-
tainty: Weakening the independence axiom, Journal of Economic Theory
40: 304318.
[60] Desrosieres, A. (1995). Refleter ou instituer : Linvention des indicateurs
statistiques, Technical Report 129/J310, INSEE, Paris.
[61] de Ketele, J.-M. (1982). La docimologie, Cabay, Louvain-La-Neuve.
[62] de Landsheere, G. (1980). Evaluation continue et examens. Precis de doci-
mologie, Labor-Nathan, Paris.
[63] Dinwiddy, C. and Teal, F. (1996). Principles of cost-benefit analysis for de-
veloping countries, Cambridge University Press, Cambridge.
[64] Dorfman, R. (1996). Why benefit-cost analysis is widely disregarded and what
to do about it?, Interfaces 26: 16.
[65] Dreze, J. and Stern, N. (1987). The theory of cost-benefit analysis, in
A.J. Auebach and M. Feldstein (eds), Handbook of public economics, El-
sevier, Amsterdam, pp. 909989.

[66] Dubois, D., Fargier, H. and Prade, H. (1997). Decision-making under ordinal
preferences and uncertainty, in D. Geiger and P.P. Shenoy (eds), Proceed-
ings of the 13th conference on uncertainty in artificial intelligence, Morgan
Kaufmann, Los Altos, pp. 157164.
[67] Dubois, D., Prade, H. and Sabbadin, R. (1998). Qualitative decision theory
with Sugeno integrals, Proceedings of the 14t h conference on uncertainty
in artificial intelligence, Morgan Kaufmann, Los Altos, pp. 121128.
[68] Dubois, D., Prade, H. and Ughetto, L. (1999). Fuzzy logic, control engi-
neering and artificial intelligence, in H.B. Verbruggen, H.J. Zimmermann
and R. Babuska (eds), Fuzzy algorithms for control, Kluwer, Dordrecht,
pp. 1758.
[69] Dubois, D. and Prade, H. (1987). The mean value of a fuzzy number, Fuzzy
Sets and Systems 24: 279300.
[70] Dubois, D. and Prade, H. (1988). Possibility theory, Plenum Press, New-York.
[71] Dupuit, J. (1844). De la mesure de lutilite des travaux publics, Annales des
Ponts et Chaussees (8).
[72] Dyer, J.S. (1990). Remarks on the analytic hierarchy process, Management
Science 36: 249258.
[73] Ebel, R.L. and Frisbie, D.A. (1991). Essentials of educational measurement,
Prentice-Hall, New-York.
[74] Ellsberg, D. (1961). Risk, ambiguity and the Savage axioms, Quarterly Jour-
nal of Economics 75: 643669.
[75] Fargier, H. and Perny, P. (1999). Qualitative decision models under uncer-
tainty without the commensurability hypothesis, in K.B.. Laskey and
H. Prade (eds), Proceedings of the 15t h conference on uncertainty in ar-
tificial intelligence, Morgan Kaufmann, Los Altos, pp. 188195.
[76] Farrell, D.M. (1997). Comparing electoral systems, Contemporary Political
Studies, Prentice-Hall, New-York.
[77] Fiammengo, A., Buosi, D., Iob, I., Maffioli, P., Panarotto, G. and Turino, M.
(1997). Bid management of software acquisition for cartography applica-
tions. Presented at AIRO 97 Conference, Aosta.
[78] Fishburn, P.C. and Sarin, R.K. (1991). Dispersive equity and social risk,
Management Science 37: 751769.
[79] Fishburn, P.C. and Sarin, R.K. (1994). Fairness and social risk I: Unaggre-
gated analyses, Management Science 40: 11741188.
[80] Fishburn, P.C. and Straffin, P.D. (1989). Equity considerations in public risks
evaluation, Operations Research 37: 229239.
[81] Fishburn, P.C. (1970). Utility theory for decision-making, Wiley, New York.
[82] Fishburn, P.C. (1976). Noncompensatory preferences, Synthese 33: 393403.
[83] Fishburn, P.C. (1977). Condorcet social choice functions, SIAM Journal on
Applied Mathematics 33: 469489.

[84] Fishburn, P.C. (1978). A survey of multiattribute/multicriteria evaluation

theories, in S. Zionts (ed.), Multicriteria problem solving, Springer Verlag,
Berlin, pp. 181224.
[85] Fishburn, P.C. (1982). The foundations of expected utility, D. Reidel, Dor-
[86] Fishburn, P.C. (1984). Equity axioms for public risks, Operations Research
32: 901908.
[87] Fishburn, P.C. (1988a). Nonlinear preference and utility theory, Johns Hop-
kins University Press, Baltimore.
[88] Fishburn, P.C. (1988b). Normative theories of decision making under risk
and under uncertainty, in M. Kacprzyk and M. Roubens (eds), Non-
conventional preference relations in decision making, Springer Verlag,
Berlin, pp. 469489.
[89] Fishburn, P.C. (1991). Nontransitive preferences in decision theory, Journal
of Risk and Uncertainty 4: 113134.
[90] Fix, E. and Hodges, J.L. (1951). Discriminatory analysis, non-parametric
discrimination: consistency properties, Technical report, USAF Scholl of
aviation and medicine, Randolph Field. 4.
[91] Fodor, J.C. and Roubens, M. (1994). Fuzzy preference modelling and multi-
criteria decision support, Kluwer, Dordrecht.
[92] Folland, S., Goodman, A.C. and Stano, M. (1997). The economics of health
and health care, Prentice-Hall, New-York.
[93] French, S. (1981). Measurement theory and examinations, British Journal of
Mathematical and Statistical Psychology 34: 3849.
[94] French, S. (1993). Decision theory An introduction to the mathematics of
rationality, Ellis Horwood, London.
[95] Gacogne, L. (1997). Elements de logique floue, Hermes, Paris.
[96] Gafni, A. and Birch, S. (1997). Equity considerations in utility-based mea-
sures of health outcomes in economic appraisals: An adjustment algo-
rithm, Journal of Health Economics 10: 329342.
[97] Gehrlein, W.V. (1983). Condorcets paradox, Theory and Decision 15: 161
[98] Gibbard, A. (1973). Manipulation of voting schemes: A general result, Econo-
metrica 41: 587601.
[99] Gilboa, I. and Schmeidler, D. (1989). Maxmin expected utility with a non-
unique prior, Journal of Mathematical Economics 18: 141153.
[100] Gilboa, I. and Schmeidler, D. (1993). Updating ambigous beliefs, Journal of
Economic Theory 59: 3349.
[101] Grabisch, M., Guely, F. and Perny, P. (1997). Evaluation subjective, Les
cahiers du Club CRIN - Association ECRIN, Paris.

[102] Grabisch, M. (1996). The application of fuzzy integrals to multicriteria de-

cision making, European Journal of Operational Research 89: 445456.
[103] Hammond, P.J. (1988). Consequentialist foundations for expected utility,
Theory and Decision 25: 2578.
[104] Hanley, N. and Spash, C.L. (1993). Cost-benefit analysis and the environ-
ment, Elgar, Adelshot Hants.
[105] Harker, P.T. and Vargas, L.G. (1987). The theory of ratio scale estimation:
Saatys analytic hierarchy process, Management Science 33: 13831403.
[106] Harless, D. and Camerer, C.F. (1994). The utility of generalized expected
utility theories, Econometrica 62: 12511289.
[107] Harvey, C.M. (1992). A slow-discounting model for energy conservation, In-
terfaces 22: 4760.
[108] Harvey, C.M. (1994). The reasonableness of non-constant discounting, Jour-
nal of Public Economics 53: 3151.
[109] Harvey, C.M. (1995). Proportional discounting of future costs and benefits,
Mathematics of Operations Research 20: 381399.
[110] Henriet, L. and Perny, P. (1996). Methodes multicrit res non-compensatoires
pour la classification floue dobjets, Proceedings of LFA96, pp. 915.
[111] Henriet, L. (1995). Probl mes daffectation et methodes de classification,
Memoire du dea 103, Universite Paris Dauphine.
[112] Hershey, J.C., Kunreuther, H.C. and Schoemaker, P.J.H. (1982). Sources of
bias in assessment procedures for utility functions, Management Science
28: 936953.
[113] Heurgon, E. (1982). Relationships between decision making process and
study process in OR interventions, European Journal of Operational Re-
search 10: 230236.
[114] Hey, J.D. and Orme, C. (1994). Investigating generalizations of expected
utility theory using experimental data, Econometrica 62: 12511289.
[115] Hogarth, R. (1987). Judgement and choice: The psychology of decision, Wi-
ley, New York.
[116] Holland, A. (1995). The assumptions of cost-benefit analysis: A philoso-
phers view, in K.G. Willis and J.T. Corkindale (eds), Environmental
valuation: New perspectives, CAB International, Oxford, pp. 2138.
[117] Horn, R.V. (1993). Statistical indicators, Cambridge University Press, Cam-
[118] Humphreys, P.C., Svenson, O. and Vari, A. (1993). Analysis and aiding
decision processes, North-Holland, Amsterdam.
[119] IEEE 92 (1992). Standard for a software quality metrics methodology, Tech-
nical report, The Institute of Electrical and Electronics Engineers.
[120] International Atomic Energy Agency (1993). Cost-benefit aspects of food ir-
radiation processing, Bernan Associates, Washington D.C.

[121] ISO/IEC 9126 (1991). Information technology Software product evalua-

tion, quality characteristics and guidelines for their use, Technical report,
ISO, Geneve.
[122] Jacquet-Lagreze, E., Moscarola, J., Roy, B. and Hirsch, G. (1978). Descrip-
tion dun processus de decision, Technical report, Cahier du LAMSADE
No 13, Universite Paris-Dauphine, Paris.
[123] Jacquet-Lagreze, E. and Siskos, J. (1982). Assessing a set of additive utility
functions for multicriteria decision making: The UTA method, European
Journal of Operational Research 10: 151164.
[124] Jacquet-Lagreze, E. (1990). Interactive assessment of preferences using holis-
tic judgments. The PREFCALC system, in C.A. Bana e Costa (ed.), Read-
ings in multiple criteria decision aid, Springer Verlag, Berlin, pp. 335350.
[125] Jaffray, J.-Y. (1988). Choice under risk and the security factor: An axiomatic
model, Theory and Decision 24: 169200.
[126] Jaffray, J.-Y. (1989a). Some experimental findings on decision making un-
der risk and their implications, European Journal of Operational Research
38: 301306.
[127] Jaffray, J.-Y. (1989b). Utility theory for belief functions, Operations Research
Letters 8: 107112.
[128] Johannesson, M. (1995a). A note on the depreciation of the societal perspec-
tive in economic evaluation in health care, Health Policy 33: 5966.
[129] Johannesson, M. (1995b). The relationship between cost-effectiveness anal-
ysis and cost-benefit analysis, Social Science and Medicine 41: 483489.
[130] Johannesson, M. (1996). Theory and methods of economic evaluation of
health care, Kluwer, Dordrecht.
[131] Johansson, P.O. (1993). Cost-benefit analysis of environmental change, Cam-
bridge University Press, Cambridge.
[132] Johnson, E.J. and Schkade, D.A. (1989). Bias in utility assesments: Further
evidence and explanations, Management Science 35: 406424.
[133] Kahneman, D., Slovic, P. and Tversky, A. (1981). Judgement under uncer-
tainty Heuristics and biases, Cambridge University Press, Cambridge.
[134] Kahneman, D. and Tversky, A. (1979). Prospect theory: An analysis of
decision under risk, Econometrica 47: 263291.
[135] Keeler, E.B. and Cretin, S. (1983). Discounting of life-saving and other non-
monetary effects, Management Science 29: 300306.
[136] Keeney, R.L., Hammond, J.S. and Raiffa, H. (1999). Smart choices: A guide
to making better decisions, Harvard University Press, Boston.
[137] Keeney, R.L. and Raiffa, H. (1976). Decisions with multiple objectives: Pref-
erences and value tradeoffs, Wiley, New York.

[138] Keller, J., Gray, M. and Givens, J. (1985). A fuzzy knearest neighbor
algorithm, IEEE Transactions on Systems Man and Cybernetics. 15: 580
[139] Kelly, J.S. (1991). Social choice bibliography, Social Choice and Welfare
8: 97169.
[140] Kerlinger, F.N. (1986). Foundations of behavioral research, 3rd edn, Holt,
Rinehart and Winston, New York.
[141] Kirkpatrick, C. and Weiss, J. (1996). Cost-benefit analysis and project ap-
praisal in developing countries, Elgar, Adelshot Hants.
[142] Kohli, K.N. (1993). Economic analysis of investment projects: A practical
approach, Oxford University Press for the Asian Development Bank, Ox-
[143] Krantz, D.H., Luce, R.D., Suppes, P. and Tversky, A. (1971). Foundations of
measurement, Vol. 1: Additive and polynomial representations, Academic
Press, New York.
[144] Krutilla, J.V. and Eckstein, O. (1958). Multiple purpose river development,
Johns Hopkins University Press, Baltimore.
[145] Laska, J.A. and Juarez, T. (1992). Grading and marking in American
schools: Two centuries of debate, Charles C. Thomas, Springfield.
[146] Laslett, R. (1995). The assumptions of cost-benefit analysis, in K.G. Willis
and J.T. Corkindale (eds), Environmental valuation: New perspectives,
CAB International, Oxford, pp. 520.
[147] Lesourne, J. (1975). Cost-benefit analysis and economic theory, North-
Holland, Amsterdam.
[148] Lindheim, E., Morris, L.L. and Fitz-Gibbon, C.T. (1987). How to measure
performance and use tests, Sage Publications, Thousand Oaks.
[149] Little, I.M.D. and Mirlees, J.A. (1968). Manual of industrial project analysis
in developing countries, O.E.C.D, Paris.
[150] Little, I.M.D. and Mirlees, J.A. (1974). Project appraisal and planning for
developing countries, Basic books, New York.
[151] Loomes, G. and Sugden, R. (1982). Regret theory: An alternative theory of
rational choice under uncertainty, Economic Journal 92: 805824.
[152] Loomes, G. (1988). Different experimental procedures for obtaining valua-
tions of risky actions: Implications for utility theory, Theory and Decision
25: 123.
[153] Loomis, J., Peterson, G., Champ, P., Brown, T. and Lucero, B. (1998).
Paired comparisons estimates of willingness to accept and contingent val-
uation estimates of willingness to pay, Journal of Economic Behavior and
Organisation 35: 501515.
[154] Luce, R.D., Krantz, D.H., Suppes, P. and Tversky, A. (1990). Foundations
of measurement, Vol. 3: Representation, axiomatisation and invariance,
Academic Press, New York.

[155] Luce, R.D. and Raiffa, H. (1957). Games and Decisions, Wiley, New York.
[156] Luce, R.D. (1956). Semiorders and a theory of utility discrimination, Econo-
metrica 24: 178191.
[157] Lysne, A. (1984). Grading of students attainement: Purposes and functions,
Scandinavian Journal of Educational Research 28: 149165.
[158] Machina, M.J. (1982). Expected utility without the independence axiom,
Econometrica 50: 277323.
[159] Machina, M.J. (1989). Dynamic consistency and non-expected utility models
of choice under uncertainty, Journal of Economic Literature 27: 1622
[160] Mamdani, E. H.. (1981). Gaines fuzzy reasonning and its applications, Aca-
demic Press, New York.
[161] Marchant, Th. (1996). Valued relations aggregation with the Borda method,
Journal of Multi-Criteria Decision Analysis 5: 127132.
[162] Masser, I. (1983). The representation of urban planning-processes: An ex-
ploratory review, Environment and Planning B 10: 4762.
[163] May, K.O. (1952). A set of independent necessary and sufficient conditions
for simple majority decisions, Econometrica 20: 680684.
[164] McClennen, E.F. (1990). Rationality and dynamic choice: Foundational ex-
plorations, Cambridge University Press, Cambridge.
[165] McCord, M. and de Neufville, R. (1983). Fundamental deficiency of expected
utility analysis, in S. French, R. Hartley, L.C. Thomas and D.J. White
(eds), Multiobjective decision making, Academic Press, London, pp. 279
[166] McCord, M. and de Neufville, R. (1982). Empirical demonstration that
expected utility decision analysis is not operational, in B. Stigum and
F. Wenstp (eds), Foundations of utility and risk theory, D. Reidel, Dor-
drecht, pp. 181199.
[167] McCrimmon, K.R. and Larsson, S. (1979). Utility theory: Axioms versus
paradoxes, in M. Allais and O. Hagen (eds), Expected utility hypotheses
and the Allais paradox, D. Reidel, pp. 27145.
[168] McLean, J.E. and Lockwood, R.E. (1996). Why and how should we assess
students? The competing measures of student performance, Sage Publica-
tions, Thousand Oaks.
[169] Merle, P. (1996). Levaluation des eleves. Enquete sur le jugement professo-
ral, PUF, Paris.
[170] Mintzberg, H., Raisinghani, D. and Theoret, A. (1976). The structure of un-
structured decision processes, Administrative Science Quarterly 21: 246
[171] Mishan, E. (1982). Cost-benefit analysis, Allen and Unwin, London.

[172] Moom, T.M. (1997). How do you know they know what they know? A hand-
book of helps for grading and evaluating student progress, Grove Publish-
ing, Westminster.
[173] Morisio, M. and Tsoukias, A. (1997). IUSWARE: A formal methodology for
software evaluation and selection, IEE Proceedings on Software Engineer-
ing 144: 162174.
[174] Moscarola, J. (1984). Organizational decision processes and ORASA inter-
vention, in R. Tomlinson and I. Kiss (eds), Rethinking the process of oper-
ational research and systems analysis, Pergamon Press, Oxford, pp. 169
[175] Mousseau, V. (1993). Problemes lies a levaluation de limportance en aide
multicritere a la decision : Reflexions theoriques et experimentations,
PhD thesis, LAMSADE, Universite Paris-Dauphine, Paris.
[176] Munier, B. (1989). New models of decisions under uncertainty, European
Journal of Operational Research 38: 307317.
[177] Nas, T.F. (1996). Cost-benefit analysis: Theory and application, Sage Pub-
lications, Thousand Oaks.
[178] Nauck, D. and Kruse, R. (1999). Neuro-fuzzy methods in fuzzy rule gener-
ation, in D. D. J. Bezdek and H. Prade (eds), Fuzzy sets in approximate
reasoning and information systems, Vol. 3 of Handbook of Fuzzy Sets,
Kluwer, Dordrecht, chapter 5, pp. 305333.
[179] Nau, R.F. and McCardle, K.F. (1991). Arbitrage, rationality and equilib-
rium, Theory and Decision 31: 199240.
[180] Nau, R.F. (1995). Coherent decision analysis with inseparable probabilities
and utilities, Journal of Risk and Uncertainty 10: 7191.
[181] Nguyen, H.T. and Sugeno, M. (1998). Modelling and control, Kluwer, Dor-
[182] Nims, J.F. (1990). Poems in translation: Sappho to Valery, The University
of Arkansas Press, Arkansas.
[183] Noizet, G. and Caverini, J.-P. (1978). La psychologie de levaluation scolaire,
PUF, Paris.
[184] Nurmi, H. (1987). Comparing voting systems, D. Reidel, Dordrecht.
[185] Nutt, P.C. (1984). Types of organizational decision processes, Administrative
Science Quarterly 19: 414450.
[186] Nyborg, K. (1998). Some Norwegian politicians use of cost-benefit analysis,
Public Choice 95: 381401.
[187] Ostanello, A. and Tsoukias, A. (1993). An explicative model of public in-
terorganizational interactions, European Journal of Operational Research
70: 6782.
[188] Ostanello, A. (1990). Action evaluation and action structuring Different
decision aid situations reviewed through two actual cases, in C.A. Bana

e Costa (ed.), Readings in multiple criteria decision aid, Springer Verlag,

Berlin, pp. 3657.
[189] Ostanello, A. (1997). Validation aspects of a prototype solution implemen-
tation to solve a complex MC problem, in J. Clmaco (ed.), Multi-criteria
analysis, Springer Verlag, Berlin, pp. 6174.
[190] Ott, W.R. (1978). Environmental indices: Theory and practice, Ann Arbor
Science, Ann Arbor.
[191] Paschetta, E. and Tsoukias, A. (1999). A real world MCDA application:
Evaluating software, Technical report, Document du LAMSADE No 113,
Universite Paris-Dauphine, Paris.
[192] Perny, P. and Pomerol, J.-Ch.. (1999). Use of artificial intelligence multi-
criteria decision making, in T. Gal, Th.J. Stewart and Th. Hanne (eds),
Advances in MCDM models, algorithms, theory, and applications, Kluwer,
Dordrecht, pp. 15.115.43.
[193] Perny, P. and Roubens, M. (1998). Fuzzy preference modelling, in
R. Slowinski (ed.), Fuzzy sets in decision analysis, operations research
and statistics, Kluwer, Dordrecht, pp. 330.
[194] Perny, P. and Zucker, J.D. (1999). Collaborative filtering methods based on
fuzzy preference relations, Proceedings of EUROFUSE-SIC99, pp. 279
[195] Perny, P. (1992). Sur le non-respect de laxiome dindependance dans les
methodes de type ELECTRE, Cahiers du CERO 34: 211232.
[196] Perrot, N., Trystram, G., Le Guennec, D. and Guely, F. (1996). Sensor
fusion for real time quality evaluation of biscuit during baking. compari-
son between bayesian and fuzzy approaches, Journal of Food Engineering
29: 301315.
[197] Perrot, N. (1997). Matrise des procedes alimentaires et theorie des ensem-
bles flous, PhD thesis, Ecole Nationale Superieure des Industries Agricoles
[198] Pieron, H. (1963). Examens et docimologie, PUF, Paris.
[199] Pirlot, M. and Vincke, Ph. (1997). Semiorders. Properties, representations,
applications, Kluwer, Dordrecht.
[200] Pirlot, M. (1997). A common framework for describing some outranking
procedures, Journal of Multi-Criteria Decision Analysis 6: 8693.
[201] Popham, W.J. (1981). Modern educational measurement, Prentice-Hall,
[202] Poulton, E.C. (1994). Behavioral decision theory: A new approach, Cam-
bridge University Press, Cambridge.
[203] Quiggin, J. (1982). A theory of anticipated utility, Journal of Economic
Behaviour and Organization 3: 323343.

[204] Quiggin, J. (1993). Generalized expected utility theory The rank-dependent

model, Kluwer, Dordrecht.
[205] Raiffa, H. (1970). Decision analysis Introductory lectures on choices under
uncertainty, Addison-Wesley, New York.
[206] Riley, H.J., Checca, R.C., Singer, T.S. and Worthington, D.F.. (1994).
Grades and grading practices: The results of the 1992 AACRAO survey,
American Association of Collegiate Registrars and Admissions Officers,
Washington D.C.
[207] Rosenhead, J. (1989). Rational analysis of a problematic world, Wiley, New
[208] Roubens, M. and Vincke, Ph. (1985). Preference modelling, Springer Verlag,
[209] Roy, B. and Bouyssou, D. (1991). Decision-aid: an elementary introduction
with emphasis on multiple criteria, Investigacion Operativa 2: 95110.
[210] Roy, B. and Bouyssou, D. (1993). Aide multicritere a la decision : Methodes
et cas, Economica, Paris.
[211] Roy, B. and Skalka, J.-M. (1984). ELECTRE IS : Aspects methodologiques
et guide dutilisation, Technical report, Document du LAMSADE No 30,
Universite Paris-Dauphine, Paris.
[212] Roy, B. (1974). Criteres multiples et modelisation des preferences : lapport
des relations de surclassement, Revue dEconomie Politique 1: 144.
[213] Roy, B. (1990). Science de la decision ou science de laide a la decision ?,
Technical report, Cahier du LAMSADE No 97, Universite Paris-Dauphine,
[214] Roy, B. (1993). Decision science or decision-aid science?, European Journal
of Operational Research 66: 184204.
[215] Roy, B. (1996). Multicriteria methodology for decision aiding, Kluwer, Dor-
drecht. Original version in French Methodologie multicritere daide a la
decision, Economica, Paris, 1985.
[216] Russo, J.E. and Schoemaker, P.J.H. (1989). Confident decision making, Pi-
atkus, London.
[217] Saaty, T.L. (1980). The analytic hierarchy process, McGraw-Hill, New York.
[218] Sabot, R. and Wakeman, L.J. (1991). Grade inflation and course choice,
Journal of Economic Perspectives 5: 159170.
[219] Sager, C. (1994). Eliminating grades in schools: An allegory for change, A
S Q Quality Press, Milwaukee.
[220] Salles, M., Barrett, C.R. and Pattanaik, P.K. (1992). Rationality and ag-
gregation of preferences in an ordinally fuzzy framework, Fuzzy Sets and
Systems 49: 913.

[221] Satterthwaite, M.A. (1975). Strategy proofness and Arrows conditions: Ex-
istence and correspondence theorems for voting procedures and social wel-
fare functions, Journal of Economic Theory 10: 187217.
[222] Savage, L. (1954). The foundations of statistics, 1972, 2nd revised edn, Wiley,
New York.
[223] Schmeidler, D. (1989). Subjective probability and expected utility without
additivity, Econometrica 57: 571587.
[224] Schneider, Th., Schieber, C., Eeckoudt, L. and Gollier, C. (1997). Eco-
nomics of radiation protection: Equity considerations, Theory and De-
cision 43: 24151.
[225] Schofield, J. (1989). Cost-benefit analysis in urban and regional planning,
Unwin and Hyman, London.
[226] Scotchmer, S. (1985). Hedonic prices and cost-benefit analysis, Journal of
Economic Theory 37: 5575.
[227] Sen, A.K. (1986). Social choice theory, in K.J. Arrow and M.D. Intriliga-
tor (eds), Handbook of mathematical economics, Vol. 3, North-Holland,
Amsterdam, pp. 10731181.
[228] Sen, A.K. (1997). Maximization and the act of choice, Econometrica 65: 745
[229] Simon, H.A. (1957). A behavioural model of rational choice in Models of
man, Wiley, New York, pp. 241260.
[230] Sinn, H.W. (1983). Economic decisions under uncertainty, North-Holland,
[231] Slowinski, R. (ed.) (1998). Fuzzy sets in decision analysis, operations research
and statistics, Kluwer, Dordrecht.
[232] Sopher, B. and Gigliotti, G. (1993). A test of generalized expected utility
theory, Theory and Decision 35: 75106.
[233] Speck, B.W. (1998). Grading student writing: An annotated bibliography,
Greenwood Publishing Group, Westport.
[234] Stamelos, I. and Tsoukias, A. (1998). Software evaluation problem situa-
tions, Technical report, Cahier du LAMSADE No 156, Universite Paris-
Dauphine, Paris.
[235] Steuer, R.E. (1986). Multiple criteria optimisation: Theory, computation,
and application, Wiley, New York.
[236] Stratton, R.W., Myers, S.C. and King, R.H. (1994). Faculty behavior, grades
and student evaluations, Journal of Economic Education 25: 515.
[237] Sugden, R. and Wiliams, A. (1983). The principles of practical cost-benefit
analysis, Oxford University Press, Oxford.
[238] Sugeno, M. (1977). Fuzzy measures and fuzzy integrals: a survey, in
M.M. Gupta, G.N. Saridis and B.R. Gains (eds), Fuzzy automata and
decision processes, North Holland, Amsterdam, pp. 89102.

[239] Sugeno, M. (1985). An introductory survey on fuzzy control, Information

Sciences 36: 5983.
[240] Suzumura, K. (1999). Consequences, opportunities and procedures, Social
Choice and Welfare 16: 1740.
[241] Syndicat des Transports Parisiens (1998). Methodes devaluation des projets
dinfrastructures de transports collectifs en region Ile-de-France, Technical
report, Syndicat des Transports Parisiens, Paris.
[242] Tchudi, S. (1997). Alternatives to grading student writing, National Council
of Teachers of English, Urbana.
[243] Teghem, J. (1996). Programmation lineaire, Editions de lUniversite de
Bruxelles-Editions Ellipses, Brussels.
[244] Thaler, R.H. (1991). Quasi rational economics, Russell Sage Foundation,
New York.
[245] Toth, F.L. (1997). Cost-benefit analysis of climate change: The broader per-
spectives, Birkhauser, Basel.
[246] Trystram, G., Perrot, N. and Guely, F. (1995). Application of fuzzy logic for
the control of food processes, Processing Automation 4: 504512.
[247] Tsoukias, A. and Vincke, Ph. (1995). A new axiomatic foundation of partial
comparability, Theory and Decision 39: 79114.
[248] Tsoukias, A. and Vincke, Ph. (1999). A characterization of PQI interval
orders, Proceedings OSDA 98, Electronic Notes on Discrete Mathematics,
pp. (, to appear also in Discrete
Applied Mathematics.
[249] Tversky, A. (1969). Intransitivity of preferences, Psychological Review
76: 3148.
[250] United Nations Development Programme (1997). Human Development Re-
port 1997, Oxford University Press, Oxford.
[251] van Doren, M. (1928). An anthology of world poetry, Albert and Charles
Boni, New York.
[252] Vansnick, J.-C. (1986). De Borda et Condorcet a lagregation multicritere,
Ricerca Operativa (40): 744.
[253] Vassiloglou, M. and French, S. (1982). Arrows theorem and examination
assessment, British Journal of Mathematical and Statistical Psychology
35: 183192.
[254] Vassiloglou, M. (1984). Some multi-attribute models in examination assess-
ment, British Journal of Mathematical and Statistical Psychology 37: 216
[255] Vincke, Ph. (1988). P, Q, I preference structures, in J. Kacprzyk and
M. Roubens (eds), Non conventional preference relations in decision mak-
ing, Springer Verlag, Berlin, pp. 7281.

[256] Vincke, Ph. (1992a). Exploitation of a crisp binary relation in a ranking

problem, Theory and Decision 32: 221241.
[257] Vincke, Ph. (1992b). Multi-criteria decision aid, Wiley, New York. Origi-
nal version in French LAide Multicritere a la Decision, Editions de
lUniversite de Bruxelles-Editions Ellipses, Brussels, 1989.
[258] Viscusi, W.K. (1992). Fatal tradeoffs: Public and private responsibilities for
risk, Oxford University Press, Oxford.
[259] von Neumann, J. and Morgenstern, O. (1944). Theory of games and eco-
nomic behavior, Princeton University Press, Princeton.
[260] von Winterfeldt, D. and Edwards, W. (1986). Decision analysis and behav-
ioral research, Cambridge University Press, Cambridge.
[261] Wakker, P.P. (1989). Additive representations of preferences A new foun-
dation of decision analysis, Kluwer, Dordrecht.
[262] Warusfel, A. (1961). Les nombres et leurs mysteres, Points Sciences, Seuil,
[263] Watson, S.R. (1981). Decision analysis as a replacement for cost-benefit anal-
ysis, European Journal of Operational Research 7: 242248.
[264] Weinstein, M.C. and Stason, W.B. (1977). Foundations of cost-effective-
ness analysis for health and medical practices, New England Journal of
Medicine 296: 716721.
[265] Weitzman, M.L. (1994). On the environmental discount rate, Journal of
Environmental Economics and Management 26: 200209.
[266] Weymark, J.A. (1981). Generalized Gini inequality indices, Mathematical
Social Sciences 1: 409430.
[267] Willis, K.G., Garrod, G.D. and Harvey, D.R. (1998). A review of cost-benefit
analysis as applied to the evaluation of new road proposals in the U.K.,
Transportation Research D 3: 141156.
[268] Yaari, M.E. (1987). The dual theory of choice under risk, Econometrica
55: 95115.
[269] Yu, W. (1992). Aide multicritere a la decision dans le cadre de la
problematique du tri : Methodes et applications, PhD thesis, LAMSADE,
Universite Paris-Dauphine, Paris.
[270] Zadeh, L.A. (1979). A theory of approximate reasoning, in J.E. Hayes,
D. Michie and L.I. Mikulich (eds), Machine intelligence, Elsevier, Am-
sterdam, pp. 149194.
[271] Zadeh, L.A. (1999). From computing with numbers to computing with words.
from manipulation of measurement to manipulation of perceptions, Pro-
ceedings of EUROFUSE-SIC99, pp. 12.
[272] Zarnowsky, F. (1989). The decathlon A colorful history of track and fields
most challenging event, Leisure Press, Champaign.

[273] Zerbe, R.O. and Dively, D.D. (1994). Benefit-cost analysis in theory and
practice, Harper Collins, New York.

absolute scale, 115 astrology, 237

action, 212 attributes
actor, 206 hierarchy, 214
acyclic, 126 attributes hierarchy, 213
aggregation, 30, 41, 51, 148, 245 automatic decision, 239
weighted sum, 172 automatic decision systems, 148
additive, 44 axiomatic analysis, 244
compensation, 46, 57, 212
conjunctive rule, 41, 96 bayesian decision theory, 239, 244
constructive approach, 130 binary relation
disaggregation, 117 acyclic, 126
dominance, 93 fuzzy, 21
linearity, 46, 80 incomparability, 19, 130, 220
monotonicity, 10, 61 outranking, 105
multi-attribute value function, 105 semiorder, 20
transitivity, 1820
non-compensation, 141
Borda, 14
paired comparison, 193
Bordas method, 124
procedure, 215
rank reversal, 117 call for tenders, 208
screening process, 96 cardinal, 47
single-attribute value function, 106 client, 206
tournament, 125 coalition, 219
utility, 105 coherence test, 214
utility function, 106 communication, 242
value function, 106 compensation, 46, 57, 212
weight, 35 computer science, 30, 237
weighted average, 42, 59, 85, 241 concordance, 216, 219
weighted sum, 155, 159, 166 concordance threshold, 134
AHP, 111 Condorcet, 13
rank reversal, 117 paradox, 51
air quality, 61, 63 Condorcets method, 125
Allais paradox, 191 conjunctive rule, 41, 96
ambiguity, 212 consistency, 84
analyst, 206 constructive approach, 130
anchoring effect, 34 corporate finance, 73
Arrow, 16 correlation, 102
aspiration level, 96 cost-benefit analysis, 71, 238


externalities, 78 expected value, 187

markets, 76 St. Petersburg game, 187
net present social value, 76 disaggregation, 117
price, 75, 85 discordance, 138
price of human life, 81 discounting, 74, 86, 239
price of time, 80 net present value, 74
public goods, 78 social rate, 76, 82
social benefits, 75 dominance, 51, 93, 96
social costs, 75 dynamic consistency, 202, 239
social welfare, 77, 86
credibility index, 139 economics, 71, 237
criteria education science, 237
coalition, 219 elections, 237
coherence test, 214 ELECTRE-TRI, 215, 228
coherent family, 214 Ellsbergs paradox, 192
hierarchy, 213, 214 engineering, 30, 237
interaction, 103 environment, 71, 82
point of view, 212 equity, 79, 86
relative importance, 216, 219 evaluation
cycle reduction, 135 absolute, 212
model, 1, 40, 51, 206, 213
decathlon, 63, 66 problem statement, 215
decision software, 207, 213
dynamic consistency, 202, 239 evaluation model
legitimation, 220 problem statement, 212
model, 1 expected value, 187
decision aiding process, 206 externalities, 78
decision model, formal, 84, 212, 237
decision process, 85, 206 final recommendation, 219
actor, 206 forecasting, 80
analyst, 206 fuzzy, 21
client, 206 control, 149, 165
decision rule, 149, 151 implication, 166
decision support, 210 interval, 161
evaluation model, 206, 213 labels, 161
final recommendation, 219 rule, 169
learning process, 119 set, 161, 169
problem formulation, 206, 211,
212 GPA, 48
problem situation, 206 grade, 29, 237
problem statement, 212, 215 anchoring effect, 34
decision table, 152 GPA, 48
decision theory, 237 marking scale, 33
Allais paradox, 191 minimal passing, 36, 42
Ellsbergs paradox, 192 standardised score, 34
expected utility, 189, 201 graphology, 237

health, 71 nominal scale, 214

heuristics, 242 ordinal, 39, 47, 214
hierarchy, 213, 214 ratio scale, 98
human development, 54, 61 reliability, 33
scale, 32, 38, 79
ideal point, 117 standard sequences, 107
implication, 166 subjective, 214
imprecision, 51, 103 validity, 33
incomparability, 19, 130, 220 model, 40, 245
independence, 58, 102 structuration, 84, 242, 245
of irrelevant alternatives, 15 mono-criterion analysis, 76, 85
separability, 214 monotonicity, 10, 61
indicator, 238
indices, 173 nearest neighbours, 172
indifference threshold, 201 net present social value, 76
interaction, 103 net present value, 74
interactive methods, 105 nominal scale, 214
interpolation, 155 non-compensation, 141
interval scale, 99
intuition, 242 operational research, 30, 237
ordinal, 39, 47, 214
kernel, 131 outranking, 105
outranking methods, 124, 129
learning process, 119 discordance, 138
legitimation, 220 concordance, 216, 219
linear scale, 104 concordance threshold, 134
credibility index, 139
majority rule, 125 cycle reduction, 135
manipulability, 17 ELECTRE-TRI, 215, 228
markets, 76 incomparability, 130
marking scale, 33 indifference threshold, 201
mathematics, 30 majority rule, 125
ideal point, 117 veto, 216, 219
interactive methods, 105
sorting, 212 paired comparison, 193
profiles, 220 point of view, 212
substitution rate, 57 political science, 237
swing-weight, 110 preference
trade-off, 101 model, 214, 245
meaningfulness, 62, 227 nontransitive, 130
measurement, 38, 51, 67, 212 relation, 125
absolute scale, 115 threshold, 50
cardinal, 47 price, 75, 85
interval scale, 99 price of human life, 81
linear scale, 104 price of time, 80
meaningfulness, 62, 227 priority, 111

probability, 239 uncertainty, 51, 79, 103, 179, 182,

problem formulation, 206, 211, 212 201, 239
problem situation, 206 endogenous, 218, 221
problem statement, 212, 215 exogenous, 218
PROMETHEE, 193 utility, 105, 106
public goods, 78 expected, 189, 201

rank reversal, 117 value function, 106

ranking, 212 multi-attribute, 105
ratio scale, 98 single-attribute, 106
relative importance, 216, 219 veto, 216, 219
risk, 239 voting procedure
robustness, 86, 242, 245 Bordas method, 124
rule Concordet paradox, 51
aggregation, 148 Condorcets method, 125
manipulability, 17
scale, 32, 38, 79 unanimity, 13
screening process, 96
security, 81 weight, 35
semiorder, 20 weighted average, 42, 59, 85, 241
sensitivity analysis, 83 weighted sum, 155, 159, 166, 172
stability, 99, 101
separability, 214
indices, 173
relation, 173
social benefits, 75
social costs, 75
social rate, 76, 82
social welfare, 77, 86
software, 207, 213
sorting, 212
St. Petersburg game, 187
stability, 99, 101
statistics, 237
structuration, 84, 242, 245
subjective, 214
substitution rate, 57

t-norm, 164
threshold, 50, 173
tournament, 125
trade-off, 101
transitivity, 1820
transportation, 71, 79

unanimity, 13