22 views

Uploaded by tiago_maia

- Critical Reasoning
- The Timing of Elections
- Factsheet on the 2010 Parliamentary Election in Afghanistan
- Politics
- Bali
- Submission to Individual Electoral Registration (IER) Consultation From
- Elections in Malaysia
- Miranda v. Abaya
- Seventh Circuit Decision in Common Cause Indiana v. Individual Members of the Indiana Election Commission
- US Supreme Court: 06-713
- Elections
- English
- Thayer Vietnam's National Assembly Elections May 2011
- Marston v. Lewis, 410 U.S. 679 (1973)
- Sri Lanka Needs Tightening, Not Relaxation, Of the Electoral Scheme
- 6. Aquino vs. Comelec
- CA-GOP Primary Options
- October 7 2010
- On Elections in India
- Voting Reforms Statement Simon

You are on page 1of 277

MODELS:

a critical perspective

EVALUATION AND

DECISION MODELS:

a critical perspective

Denis Bouyssou

ESSEC

Thierry Marchant

Ghent University

Marc Pirlot

SMRO, Faculte Polytechnique de Mons

Patrice Perny

LIP6, Universite Paris VI

Alexis Tsoukias

LAMSADE - CNRS, Universite Paris Dauphine

Philippe Vincke

SMG - ISRO, Universite Libre de Bruxelles

Boston/London/Dordrecht

Contents

1 Introduction 1

1.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.5 Who are the authors ? . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.6 Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.7 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1 Analysis of some voting systems . . . . . . . . . . . . . . . . . . . 8

2.1.1 Uninominal election . . . . . . . . . . . . . . . . . . . . . . 9

2.1.2 Election by rankings . . . . . . . . . . . . . . . . . . . . . . 13

2.1.3 Some theoretical results . . . . . . . . . . . . . . . . . . . . 16

2.2 Modelling the preferences of a voter . . . . . . . . . . . . . . . . . 18

2.2.1 Rankings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2.2 Fuzzy relations . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2.3 Other models . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.3 The voting process . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.3.1 Definition of the set of candidates . . . . . . . . . . . . . . 23

2.3.2 Definition of the set of the voters . . . . . . . . . . . . . . . 24

2.3.3 Choice of the aggregation method . . . . . . . . . . . . . . 24

2.4 Social choice and multiple criteria decision

support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.4.1 Analogies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.1.2 Evaluating students in Universities . . . . . . . . . . . . . . 30

3.2 Grading students in a given course . . . . . . . . . . . . . . . . . . 31

3.2.1 What is a grade? . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2.2 The grading process . . . . . . . . . . . . . . . . . . . . . . 31

3.2.3 Interpreting grades . . . . . . . . . . . . . . . . . . . . . . . 37

3.2.4 Why use grades? . . . . . . . . . . . . . . . . . . . . . . . . 40

3.3 Aggregating grades . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.3.1 Rules for aggregating grades . . . . . . . . . . . . . . . . . 41

v

vi

3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4 Constructing measures 53

4.1 The human development index . . . . . . . . . . . . . . . . . . . . 54

4.1.1 Scale Normalisation . . . . . . . . . . . . . . . . . . . . . . 56

4.1.2 Compensation . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.1.3 Dimension independence . . . . . . . . . . . . . . . . . . . . 58

4.1.4 Scale construction . . . . . . . . . . . . . . . . . . . . . . . 59

4.1.5 Statistical aspects . . . . . . . . . . . . . . . . . . . . . . . 59

4.2 Air quality index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.2.1 Monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.2.2 Non compensation . . . . . . . . . . . . . . . . . . . . . . . 62

4.2.3 Meaningfulness . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.3 The decathlon score . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.3.1 Role of the decathlon score . . . . . . . . . . . . . . . . . . 65

4.4 Indicators and multiple criteria decision

support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.2 The principles of CBA . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.2.1 Choosing between investment projects in private firms . . . 73

5.2.2 From Corporate Finance to CBA . . . . . . . . . . . . . . . 75

5.2.3 Theoretical foundations . . . . . . . . . . . . . . . . . . . . 76

5.3 Some examples in transportation studies . . . . . . . . . . . . . . . 79

5.3.1 Prevision of traffic . . . . . . . . . . . . . . . . . . . . . . . 80

5.3.2 Time gains . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5.3.3 Security gains . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.3.4 Other effects and remarks . . . . . . . . . . . . . . . . . . . 82

5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.1 Thierrys choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

6.1.1 Description of the case . . . . . . . . . . . . . . . . . . . . 88

6.1.2 Reasoning with preferences . . . . . . . . . . . . . . . . . . 91

6.2 The weighted sum . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

6.2.1 Transforming the evaluations . . . . . . . . . . . . . . . . . 98

6.2.2 Using the weighted sum on the case . . . . . . . . . . . . . 99

6.2.3 Is the resulting ranking reliable? . . . . . . . . . . . . . . . 99

6.2.4 The difficulties of a proper usage of the weighted sum . . . 101

6.2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

6.3 The additive value model . . . . . . . . . . . . . . . . . . . . . . . 106

6.3.1 Direct methods for determining single-attribute

value functions . . . . . . . . . . . . . . . . . . . . . . . . . 107

vii

6.3.3 An indirect method for assessing single-attribute value func-

tions and trade-offs . . . . . . . . . . . . . . . . . . . . . . 117

6.3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

6.4 Outranking methods . . . . . . . . . . . . . . . . . . . . . . . . . . 124

6.4.1 Condorcet-like procedures in decision analysis . . . . . . . . 124

6.4.2 A simple outranking method . . . . . . . . . . . . . . . . . 129

6.4.3 Using ELECTRE I on the case . . . . . . . . . . . . . . . . 131

6.4.4 Main features and problems of elementary outranking ap-

proaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

6.4.5 Advanced outranking methods: from thresholding towards

valued relations . . . . . . . . . . . . . . . . . . . . . . . . 141

6.5 General conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

7.2 A System with Explicit Decision Rules . . . . . . . . . . . . . . . . 149

7.2.1 Designing a decision system for automatic watering . . . . . 150

7.2.2 Linking symbolic and numerical representations . . . . . . . 150

7.2.3 Interpreting input labels as scalars . . . . . . . . . . . . . . 153

7.2.4 Interpreting input labels as intervals . . . . . . . . . . . . . 156

7.2.5 Interpreting input labels as fuzzy intervals . . . . . . . . . . 161

7.2.6 Interpreting output labels as (fuzzy) intervals . . . . . . . . 164

7.3 A System with Implicit Decision Rules . . . . . . . . . . . . . . . . 170

7.3.1 Controlling the quality of biscuits during baking . . . . . . 170

7.3.2 Automatising human decisions by learning from examples . 171

7.4 An hybrid approach for automatic decision-making . . . . . . . . . 174

7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

8.2 The context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

8.3 The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

8.3.1 The set of actions . . . . . . . . . . . . . . . . . . . . . . . 180

8.3.2 The set of criteria . . . . . . . . . . . . . . . . . . . . . . . 181

8.3.3 Uncertainties and scenarios . . . . . . . . . . . . . . . . . . 182

8.3.4 The temporal dimension . . . . . . . . . . . . . . . . . . . . 184

8.3.5 Summary of the model . . . . . . . . . . . . . . . . . . . . . 186

8.4 A didactic example . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

8.4.1 The expected value approach . . . . . . . . . . . . . . . . . 187

8.4.2 Some comments on the previous approach . . . . . . . . . . 187

8.4.3 The expected utility approach . . . . . . . . . . . . . . . . . 189

8.4.4 Some comments on the expected utility approach . . . . . . 191

8.4.5 The approach applied in this case: first step . . . . . . . . . 193

8.4.6 Comment on the first step . . . . . . . . . . . . . . . . . . . 196

8.4.7 The approach applied in this case: second step . . . . . . . 198

viii

9.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

9.2 The Decision Process . . . . . . . . . . . . . . . . . . . . . . . . . . 207

9.3 Decision Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

9.3.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . 211

9.3.2 The Evaluation Model . . . . . . . . . . . . . . . . . . . . . 213

9.3.3 The final recommendation . . . . . . . . . . . . . . . . . . . 219

9.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

10 Conclusion 237

10.1 Formal methods are all around us . . . . . . . . . . . . . . . . . . . 237

10.2 What have we learned? . . . . . . . . . . . . . . . . . . . . . . . . 239

10.3 What can be expected? . . . . . . . . . . . . . . . . . . . . . . . . 243

Bibliography 247

Index 262

1

INTRODUCTION

1.1 Motivations

Deciding is a very complex and difficult task. Some people even argue that our abil-

ity to make decisions in complex situations is the main feature that distinguishes

us from animals (it is also common to say that laughing is the main difference).

Nevertheless, when the task is too complex or the interests at stake are too impor-

tant, it quite often happens that we do not know or we are not sure what to decide

and, in many instances, we resort to a decision support technique: an informal

onewe toss a coin, we ask an oracle, we visit an astrologer, we consult an expert,

we thinkor a formal one. Although informal decision support techniques can be

of interest, in this book, we will focus on formal ones. Among the latter, we find

some well-known decision support techniques: cost-benefit analysis, multiple crite-

ria decision analysis, decision trees, . . . But there are many other ones, sometimes

not presented as decision support techniques, that help making decisions. Let us

cite but a few examples.

When the director of a school must decide whether a given student will pass

or fail, he usually asks each teacher to assess the merits of the student by

means of a grade. The director then sums the grades and compares the result

to a threshold.

When a bank must decide whether a given client will obtain a credit or not,

a technique, called credit scoring, is often used.

When the mayor of a city decides to temporarily forbid car traffic in a city

because of air pollution, he probably takes the value of some indicators, e.g.

the air quality index, into account.

Groups or committees must also make decisions. In order to do so, they

often use voting procedures.

All these formal techniques are what we call (formal) decision and evaluation

models, i.e. a set of explicit and well-defined rules to collect, assess and process

information in order to be able to make recommendations in decision and/or eval-

uation processes. They are so widespread that almost no one can pretend he is

1

2 CHAPTER 1. INTRODUCTION

because of their formal characterinspire respect and trust: they look scientific.

But are they really well founded ? Do they perform as well as we want ? Can we

safely rely on them when we have to make important decisions ?

That is why we try to look at formal decision and evaluation models with a

critical eye in this book. You guessed it: this book is more than 200 pages long.

So, there is probably a lot of criticism. You are right.

None of the evaluation and decision models that we examined are perfect or

the best. They all suffer limitations. For each one, we can find situations in which

it will perform very poorly. This is not really new: most decision models have

had contenders for a long time. Do we want to contend all models at the same

time ? Definitely not ! Our conviction is that there cannot be a best decision or

evaluation modelthis has been proved in some contexts (e.g. in voting) and seems

empirically correct in other contextsbut we are convinced as well that formal

evaluation and decision models are useful in many circumstances and here is why:

tations of a given problem; they offer a common language for communicating

about the problem. They are therefore particularly well suited for facilitating

communication among the actors of a decision or evaluation process.

Formal models require that the decision maker makes a substantial effort to

structure his perception or representation of the problem. This effort can

only be beneficial as it forces the decision maker to think harder and deeper

about his problem.

(often implemented on a computer) become available for drawing any kind

of conclusion that can be drawn from the model. For example, hundreds of

what-if questions can be answered in a flash. This can be of great help if we

want to devise robust recommendations.

stake, popularity) plus the fact that formal models lend themselves easily to criti-

cism, we think that it is important to deepen our understanding of evaluation and

decision models and encourage their users to think more thoroughly about them.

Our aim with this book is to foster reflection and critical thinking among all

individuals utilising decision and evaluation models, whether it be for research or

applications.

1.2 Audience

Most of us are confronted with formal evaluation and decision models. Very often,

we use them without even thinking about it. This book is intended for the aware

or enlightened practitioner, for anyone who uses decision or evaluation modelsfor

research or for applicationsand is willing to question his practice, to have a deeper

understanding of what he does. We have tried to keep mathematics and formalism

1.3. STRUCTURE 3

at a very low level so that, hopefully, most of the material will be accessible to the

not mathematically-inclined readers. A rich bibliography will allow the interested

reader to locate the more technical literature easily.

1.3 Structure

There are so many decision and evaluation models that it would be impossible to

deal with all of them within a single book. As will become apparent later, most of

them rely on similar kinds of principles. We decided to present seven examples of

such models. These examples, chosen in a wide variety of domains, will hopefully

allow the reader to grasp these principles. Each example is presented in a chapter

(Chapters 2 to 8), almost independent of the other chapters. Each of these seven

chapters ends with a conclusion, placing what has been discussed in a broader

context and indicating links with other chapters. Chapter 9 is somewhat different

from the seven previous ones: it does not focus on a decision model but presents a

real world application. The aim of this chapter is to emphasise the importance of

the decision aiding process (the context of the problem, the position of the actors

and their interactions, the role of the analyst, . . . ), to show that many difficulties

arise there as well and that a coherence between the decision aiding process and

the formal model is necessary.

Some examples have been chosen because they correspond to decision models

that everyone has experienced and can understand easily (student grades and

voting). We chose some models because they are not often perceived as decision

or evaluation models (student grades, indicators and rule based control). The other

examples (cost-benefit analysis, multiple criteria decision support and choice under

uncertainty) correspond to well identified and popular evaluation and decision

models.

1.4 Outline

Chapter 2 is devoted to the problem of voting. After showing the analogy between

voting and multiple criteria decision support, we present a sequence of twelve

short examples, each one illustrating a problem that arises with a particular voting

method. We begin with simple methods based on pairwise comparisons and we

end up with the Borda method. Although the goal of this book is not to overwhelm

the reader with theory, we informally present two theorems (Arrow and Gibbard-

Satterthwaite) that in one way or another explain why we encountered so many

difficulties in our twelve examples.

Then we turn to the way voters preferences are modelled. We present many

different models, each one trying to outdo the previous one but suffering its own

weaknesses. Finally, we explore some issues that are often neglected: who is going

to vote? Who are the candidates? These questions are difficult and we show that

they are important. The construction of the set of voters and the set of candidates,

as well as the choice of a voting method must be considered as part of the voting

process.

4 CHAPTER 1. INTRODUCTION

After examining voting, we turn in Chapter 3 to another very familiar topic for

the reader: students marks or grades. Marks are used for different purposes (e.g.

ranking the students, deciding whether a student is allowed to begin the next level

of study, deciding whether a student gets a degree, . . . ). Students are assessed in

a huge variety of ways in different countries and schools. This seems to indicate

that assessing students might not be trivial. We use this familiar topic to discuss

operations such as evaluating a performance and aggregating evaluations.

In Chapter 4, three particular indicators are considered: the Human Devel-

opment Index (used by the United Nations), the ATMO index (an air pollution

indicator used by the French government) and the decathlon score. We present

a few examples illustrating some problems occurring with indicators. We assert

that some difficulties are the consequences of the fact that the role of an indicator

is often manifold and not well defined. An indicator is a measure but, often, it is

also a tool for controlling or managing (in a broad sense).

Cost-benefit analysis (CBA) is a decision aiding method that is extremely

popular among economists. Following the CBA approach, a project should only

be undertaken when its benefits outweigh its costs. First we present the principles

of CBA and its theoretical foundations. Then, using an example in transportation

studies, we illustrate some difficulties encountered with CBA. Finally, we clarify

some of the hypotheses at the heart of CBA and criticise the relevance of these

hypotheses in some decision aiding processes.

In Chapter 6, using a well documented example, we present some difficulties

that arise when one wants to choose from or rank a set of alternatives considered

from different viewpoints. We examine several aggregation methods that lead to

a value function on the set of alternatives, namely the weighted sum, the sum of

utilities (direct and indirect assessment) and AHP (the Analytic Hierarchy Pro-

cess). Then we turn to the so called outranking methods. Some of these methods

can be used even when the data are not very rich or precise. The price we pay

for this is that results provided by these methods are not rich either, in the sense

that conclusions that can be drawn regarding a decision are not clear-cut.

Chapter 7 is dedicated to the study of automatic decision systems. These

systems concern the execution of repetitive decision tasks and the great majority

of them are based on more or less explicit decision rules aimed towards reflecting

the usual decision policy of humans. The goal of this section is to show the interest

of some formal tools (e.g. fuzzy sets) to model decision rules but also to clarify

some problems arising when simulating the rules. Three examples are presented:

the first one concerns the control of an automatic watering system while the others

are about the control of a food process. The first two examples describe decision

systems based on explicit decision rules; the third one addresses the case of implicit

decision rules.

The goal of Chapter 8 is to raise some questions about the modelling of un-

certainty. We present a real-life problem concerning the planning of electricity

production. This problem is characterised by many different uncertainties: for

example, the price of oil or the electricity demand in 20 years time. This prob-

lem is classically described by using a decision tree and solved with an expected

utility approach. After recalling some well known criticisms directed against this

1.5. WHO ARE THE AUTHORS ? 5

approach, we present the approach that has been used by the team that solved

this problem. Some of the drawbacks of this approach are discussed as well. The

relevance of probabilities is criticised and other modelling tools, such as belief

functions, fuzzy set theory and possibility theory, are briefly mentioned.

Convinced that there is more to decision aiding than just number crunching,

we devote the last chapter to the description of a real world decision aiding process

that took place in a large Italian company a few years ago. It concerns the eval-

uation of offers following a call for tenders for a GIS (Geographical Information

System) acquisition. Some important elements such as the participating actors,

the problem formulation, the construction of the criteria, etc. deserve greater con-

sideration. One should ideally never consider these elements separately from the

aggregation process because they can impact the whole decision process and even

the way the aggregation procedure behaves.

The authors of this book are European academics working in six different universi-

ties, in France and in Belgium. They teach in engineering, business, mathematics,

computer science and psychology schools. Their background is quite varied as

well: mathematics, economics, engineering, law and geology but they are all ac-

tive in decision support and more particularly in multiple criteria decision support.

Among their special interests are preference modelling, fuzzy logic, aggregation

techniques, social choice theory, artificial intelligence, problem structuring, mea-

surement theory, operations research, . . . Besides their interest in multiple criteria

decision support, they share a common view on this field. Five of the six authors

of the present volume presented their thoughts on the past and the objectives of

future research in multiple criteria decision support in the Manifesto of the new

MCDA era (Bouyssou, Perny, Pirlot, Tsoukias and Vincke 1993).

The authors are very active in theoretical research on the foundations of de-

cision aiding, mainly from an axiomatic point of view, but have been involved

in a variety of applications ranging from software evaluation to location of a nu-

clear repository, through the rehabilitation of a sewer network or the location of

high-voltage lines.

In spite of the large number of co-authors, this book is not a collection of

papers. It is a joint work.

1.6 Conventions

To refer to a decision maker, a voter or an individual whose sex is not determined,

we decided not to use the politically correct he/she but just he in order to

make the text easy to read. The fact that all of the authors are male has nothing

to do with this choice. The same applies for his/her.

None of the authors is a native English speaker. Therefore, even if we did

our best to write in correct English, the reader should not be surprised to find

6 CHAPTER 1. INTRODUCTION

some mistakes or inelegant expressions. We beg the readers leniency for any

incorrectness that might remain.

The adopted spelling is the British and not the American one.

1.7 Acknowledgements

We are ggreatly indebted to our collEague

///////// friend Philippe Fortemps \cite{Fortemps99}

.

Without him and his knowledge of Late-

x, this book would look like this paragraph.%\newline

The authors also wish to thank J.-L. Ottinger, who contributed to Chapter

8, H. Melot, who laid out the complex diagrams of that chapter, and Stefano

Abruzzini, who gave us a number of references concerning indicators. Chapter 6

is based on a report by Sebastien Clement written to fulfil the requirements of a

course on multiple criteria decision support. Large part of chapter 9 uses material

already published in (Paschetta and Tsoukias 1999).

A special thank goes to Marjorie and Diane Gassner who had the patience to

read and correct our continental approximation of the English language and to

Francois Glineur who helped in solving a great number of latex problems.

We thank Gary Folven from Kluwer Academic Publisher for his constant sup-

port during the preparation of this manuscript.

2

CHOOSING ON THE BASIS OF

SEVERAL OPINIONS: THE

EXAMPLE OF VOTING

elections, for the senate, . . . Is there much to say about voting ? Well, just think

about the way heads of state or members of parliament are elected in Australia,

France, the UK, . . .

vided into about 650 constituencies. One representative is elected in each

constituency. Each voter chooses one of the candidates in his constituency.

The winner is the candidate that is chosen by more voters than any other

one. Note that the winner does not have to win an overall majority of votes.

into single-seat constituencies. In a constituency, each voter chooses one of

the candidates. If one candidate receives more than 50 % of the votes, he

is elected. Otherwise a second stage is organised. During the second stage,

all candidates that were chosen by more than 12.5 % of the registered voters

may compete. Once more, each voter chooses one of the candidates. The

winner is the candidate that received the most votes.

Frances president Each voter chooses one of the candidates. If one candidate

has been chosen by more than 50 % of the voters, he is elected. Otherwise

a second stage is organised. During the second stage, only two candidates

remain: those with the highest scores. Once again, each voter chooses one of

the candidates. The winner is the candidate that has been chosen by more

voters than the other one.

constituencies called divisions. In a division, each voter is asked to rank all

candidates: he puts a 1 next to his preferred candidate, a 2 next to his second

preferred candidate, then a 3, and so on until his least preferred candidate.

Then the ballot papers are sorted according to the first preference votes. If a

candidate has more than 50 % of the ballot papers, he is elected. Otherwise,

the candidate that received fewer papers than any other is eliminated and

the corresponding ballot papers are transferred to the candidates that got

7

8 CHAPTER 2. CHOOSING ON THE BASIS OF SEVERAL OPINIONS

ballot papers, he is elected. Otherwise, the candidate that received fewer

papers than any other is eliminated and the corresponding ballot papers are

transferred to the candidates that got a 3 on these papers, etc. In the worst

case, this process ends when all but two candidates are eliminated, because,

unless they are tied, one of the candidates necessarily has more than 50 %

of the papers. Note that, as far as we know, it seems that the case of a tie

is seldom considered in electoral laws.

Canadas members of parliament and prime minister Every five years, the

Canadian parliament is elected as follows. The territory is divided into about

270 constituencies called counties. In each county, each party can present

one candidate. Each voter chooses one candidate. The winner in a county is

the candidate that is chosen by more voters than any other one. He is thus

the countys representative in the parliament. The leader of the party that

has the most representatives becomes prime minister.

Those interested in voting methods and the way they are applied in various

countries will find valuable information in Farrell (1997) and Nurmi (1987). The

diversity of the methods applied in practice probably reflects some underlying

complexity and, in fact, if you take a closer look at voting, you will be amazed

by the incredible complexity of the subject. In spite of its apparent simplicity,

thousands of papers have been devoted to the problem of voting (Kelly 1991) and

our guess is that many more are to come.

Our aim in this chapter is, on the one hand, to show that many difficult and

interesting problems arise in voting and, on the other hand, to convince the reader

that a formal study of voting might be enlightening. This chapter is organised

as follows. In Section 1, we make the following basic assumption: each voters

preferences can accurately be represented by a ranking of all candidates from best

to worse, without ties. Then we show some problems occurring when aggregating

the rankings, using classical voting systems such as those applied in France or the

United Kingdom. We do this through the use of small and classical examples. In

Section 2, we consider other preference models than the linear ranking of Section

1. Some models are poorer in information but more realistic. Some are richer and

less realistic. In most cases, the aggregation remains a difficult task. In Section

3, we change the focus and try to examine voting in a much broader context.

Voting is not instantaneous. It is not just counting the votes and performing

some mathematical operation to find the winner. It is a process that begins when

somebody decides that a vote should occur (or even earlier) and ends when the

winner begins his mandate (or even later). In Section 4, we discuss the analogy

with multiple criteria decision support. The chapter ends with a conclusion.

From now on, we will distinguish between the electionthe process by which the

voters express their preferences about a set of candidatesand the aggregation

2.1. ANALYSIS OF SOME VOTING SYSTEMS 9

methodthe process used to extract the best candidate or a ranking of the can-

didates from the result of the election. In many cases, the election is uninominal,

i.e. each voter votes for one candidate only

Let us recall the assumption that we mentioned earlier and that will hold through-

out Section 1. Each voter, consciously or not, ranks all candidates from best to

worse, without ties and, when voting, each voter sincerely (or naively) reports his

preferences. Thus, in a uninominal election, we shall assume that each voter votes

for the candidate that he ranks in first position. For example, suppose that a voter

prefers candidate a to b and b to c (in short aP bP c). He votes for a. We are now

ready to present a first example that illustrates a difficulty in voting.

Let {a, b, c, . . . , y, z} be a set of 26 candidates for a 100 voters election. Suppose

that

and 49 voters have preferences zP bP cP . . . P yP a.

It is clear that 51 voters will vote for a while 49 vote for z. Thus a has an

absolute majority and, in all uninominal systems we are aware of, a wins. But

is a really a good candidate ? Almost half of the voters perceive a as the worst

one. And candidate b seems to be a good candidate for everyone. Candidate b

could be a good compromise. As shown by this example, a uninominal election

combined with the majority rule allows a dictatorship of majority and doesnt

favour a compromise. A possible way to avoid this problem might be to ask the

voters to provide their whole ranking instead of their preferred candidate. This

will be discussed later. Let us continue with some strange problems arising when

using a uninominal election.

The voting system in the United Kingdom is plurality voting, i.e. the election is

uninominal and the aggregation method is simple majority. Let {a, b, c} be the

set of candidates for a 21 voters election. Suppose that

6 voters have preferences bP cP a

and 5 voters have preferences cP bP a.

Then a (resp. b and c) obtains 10 votes (resp. 6 and 5). Thus a is chosen.

Nevertheless, this might be different from what a majority of voters wanted. In-

deed, an absolute majority of voters prefers any other candidate to a (11 out of

21 voters prefer b and c to a).

10 CHAPTER 2. CHOOSING ON THE BASIS OF SEVERAL OPINIONS

Let us see, using the same example, if such a problem would be avoided by the

two-stage French system. After the first stage, as no candidate has an absolute

majority, a second stage is run between candidates a and b. We suppose that the

voters keep the same preferences on {a, b, c}. Thus a obtains 10 votes and b, 11

votes so that candidate b is elected. This time, none of the beaten candidates (a

and c) are preferred to b by a majority of voters. Nonetheless we cannot conclude

that the two-stage French system is superior to the British system from this point

of view, as shown by the following example.

Let {a, b, c, d} be the set of candidates for a 21 voters election. Suppose that

6 voters have preferences cP aP dP b

and 5 voters have preferences aP dP bP c.

After the first stage, as no candidate has absolute majority, a second stage is

run between candidates b and c. Candidate b easily wins with 15 out of 21 votes

though an absolute majority (11/21) of voters prefer a and d to b. Because it

is not necessary to be a mathematician to figure out such problems, some voters

might be tempted not to sincerely report their preferences as shown in the next

example.

Let us continue with the example used above. Suppose that the six voters having

preferences cP aP dP b decide not to be sincere and vote for a instead of c. Then

candidate a wins after the first stage because there is an absolute majority for

him (11/21). If they had been sincere (as in the previous example), b would have

been elected. Thus, casting a non sincere vote is useful for those 6 voters as they

prefer a to b. Such a system, that may encourage voters to falsely report their

preferences, is called manipulable. This is not the only weakness of the French

system as attested by the three following examples.

Let {a, b, c} be the set of candidates for a 17 voters election. A few days before

the election, the results of a survey are as follows:

5 voters have preferences cP aP b,

4 voters have preferences bP cP a

and 2 voters have preferences bP aP c.

With the French system, a second stage would be run, between a and b and

a would be chosen obtaining 11 out of 17 votes. Suppose that candidate a, in

order to increase his lead over b and to lessen the likelihood of a defeat, decides to

strengthen his electoral campaign against b. Suppose that the survey did exactly

2.1. ANALYSIS OF SOME VOTING SYSTEMS 11

reveal the preferences of the voters and that the campaign has the right effect on

the last two voters. Hence we observe the following preferences.

5 voters have preferences cP aP b

and 4 voters have preferences bP cP a.

After the first stage, b is eliminated, due to the campaign of a. The second

stage opposes a to c and c wins, obtaining 9 votes. Candidate a thought that

his campaign would be beneficial. He was wrong. Such a method is called non

monotonic because an improvement of a candidates position in some of the voters

preferences can lead to a deterioration of his position after the aggregation. It is

clear with such a system that it is not always interesting or efficient to sincerely re-

port ones preferences. You will note in the next example that some manipulations

can be very simple.

Let {a, b, c} be the set of candidates for a 11 voters election. Suppose that

4 voters have preferences cP bP a

and 3 voters have preferences bP cP a.

Using the French system, a second stage should oppose a to c and c should win

the election obtaining 7 out of 11 votes. Suppose that 2 of the 4 first voters (with

preferences aP bP c) decide not to vote because c, the worst candidate according

to them, is going to win anyway. What will happen ? There will be only 9 voters.

4 voters have preferences cP bP a

and 3 voters have preferences bP cP a.

Contrary to all expectations, candidate c will loose while b will win, obtaining

5 out of 9 votes. Our two lazy voters can be proud of their abstention since they

prefer b to c. Clearly such a method does not encourage participation.

Let {a, b, c} be the set of candidates for a 26 voters election. The voters are located

in two different areas: countryside and town. Suppose that the 13 voters located

in the town have the following preferences.

3 voters have preferences bP aP c,

3 voters have preferences cP aP b

and 3 voters have preferences cP bP a.

Suppose that the 13 voters located in the countryside have the following pref-

erences.

12 CHAPTER 2. CHOOSING ON THE BASIS OF SEVERAL OPINIONS

3 voters have preferences cP aP b,

3 voters have preferences bP cP a

and 3 voters have preferences bP aP c.

Suppose now that an election is organised in the town, with 13 voters. Candi-

dates a and c will go to the second stage and a will be chosen, obtaining 7 votes.

If an election is organised in the countryside, a will defeat b in the second stage,

obtaining 7 votes. Thus a is the winner in both areas. Naturally we expect a to

be the winner in a global election. But it is easy to observe that in the global

election (26 voters) a is defeated during the first stage. Such a method is called

non separable.

The previous examples showed that, when there are more than 2 candidates, it

is not an easy task to imagine a system that would behave as expected. Note that,

in the presence of 2 candidates, the British system (uninominal and one-stage) is

equivalent to all other systems and it suffers none of the above mentioned problems

(May 1952). Thus we might be tempted by a generalisation of the British system

(restricted to 2 candidates). If there are two candidates, we use the British system;

if there are more than two candidates, we arbitrarily choose two of them and we use

the British system to select one. The winner is opposed (using the British system)

to a new arbitrarily chosen candidate. And so on until no more candidates remain.

This would require n 1 votes between 2 candidates. Unfortunately, this method

suffers severe drawbacks.

Let {a, b, c} be the set of candidates for a 3 voters election. Suppose that

1 voter has preferences bP cP a

and 1 voter has preferences cP aP b.

The 3 candidates will be considered two by two in the following order or agenda:

a and b first, then c. During the first vote, a is opposed to b and a wins with

absolute majority (2 votes against 1). Then a is opposed to c and c defeats a with

absolute majority. Thus c is elected.

If the agenda is a and c first, it is easy to see that c defeats a and is then

opposed to b. Hence, b wins against c and is elected.

If the agenda is b and c first, it is easy to see that, finally, a is elected. Conse-

quently, in this example, any candidate can be elected and the outcome depends

completely on the agenda, i.e. on an arbitrary decision. Let us note that sequential

voting is very common in different parliaments. The different amendments to a

bill are considered one by one in a predefined sequence. The first one is opposed to

the status quo, using the British system; the second one is opposed to the winner ,

and so on. Clearly, such a method lacks neutrality. It doesnt treat all candidates

in a symmetric way. Candidates (or amendments) appearing at the end of the

agenda are more likely to be elected than those at the beginning.

2.1. ANALYSIS OF SOME VOTING SYSTEMS 13

Let {a, b, c, d} be the set of candidates for a 3 voters election. Suppose that

1 voter has preferences cP bP aP d

and 1 voter has preferences aP dP cP b.

Consider the following agenda: a and b first, then c and finally d. Candidate a

is defeated by b during the first vote. Candidate c wins the second vote and d is

finally elected though all voters unanimously prefer a to d. Let us remark that

this cannot happen with the French and British systems.

Up to now, we have assumed that the voters are able to rank all candidates

from best to worse without ties but the only information that we collected was the

best candidate. Why not try to palliate the many encountered problems by asking

voters to explicitly rank the candidates ? This idea, though interesting, will lead

us to many other pitfalls that we discuss just below.

In this kind of election, each voter provides a ranking without ties of the candidates.

Hence the task of the aggregation method is to extract from all these rankings the

best candidate or a ranking of the candidates reflecting the preferences of the

voters as much as possible.

At the end of the 18th century, two aggregation methods for election by rank-

ings appeared in France. One was proposed by Borda, the other by Condorcet.

Although other methods have been proposed, their methods are still at the heart

of many scientists concerns. In fact, many methods are variants of the Borda and

Condorcet methods.

Condorcet (1785) suggests to compare all candidates pairwise in the following way.

A candidate a is preferred to b if and only if the number of voters ranking a before

b is larger than the number of voters ranking b before a. In case of tie, candidates

a and b are indifferent. A candidate that is preferred to all other candidates is

called a (Condorcet) winner. In other words, a winner is a candidate that, opposed

to each of the n 1 other candidates, wins by a majority. It can be shown that

there is never more than one Condorcet winner.

Note that both the British as well as the two-stage French methods are different

from the Condorcet method. In example 2, candidate a is elected by the British

method but b is the Condorcet winner. In example 3, a is the Condorcet winner

although b is chosen by the French method.

Although the principle underlying the Condorcet methodthe candidate that

beats all other candidates in a pairwise contest is the winnerseems very natural,

close to the concept of democracy and hence very appealing, it is worth noting

that, in some instances, this principle might be questioned: in example 1, a is the

14 CHAPTER 2. CHOOSING ON THE BASIS OF SEVERAL OPINIONS

Condorcet winner, although almost half of the voters consider him to be the worse

candidate. Consider also example 10 taken from Fishburn (1977).

Let {a, b, c, d, e, f, g, x, y} be a set of 9 candidates for a 101 voters election. Suppose

that

21 voters have preferences eP f P gP xP yP aP bP cP d,

10 voters have preferences eP xP yP aP bP cP dP f P g,

10 voters have preferences f P xP yP aP bP cP dP eP g,

10 voters have preferences gP xP yP aP bP cP dP eP f

and 31 voters have preferences yP aP bP cP dP xP eP f P g.

Candidate x wins against every other candidate with a majority of 51 votes.

Thus x is the Condorcet winner. But let us focus on the candidates x and y.

Let us summarise their results in Table 2.1. In view of Table 2.1, it seems that y

should be elected.

k

1 2 3 4 5 6 7 8 9

x 0 30 0 21 0 31 0 0 19

y 50 0 30 0 21 0 0 0 0

Table 2.1: Number of voters who rank the candidate in k-th place in their prefer-

ences

Condorcet winner. Consider example 8: a is preferred to b, b is preferred to c

and c is preferred to a. No candidate is preferred to all others. In such a case,

the Condorcet method fails to elect a candidate. One might think that example

8 is very bizarre and very unlikely to happen. Unfortunately it isnt. If you

consider an election with 25 voters and 11 candidates, the probability of such a

paradox is significantly high as it is approximately 1/2 (Gehrlein 1983) and the

more candidates or voters, the higher the probability of such a paradox. Note

that, in order to obtain this result, all rankings are supposed to have the same

probability. Such an hypothesis is clearly questionable (Gehrlein 1983).

Many methods have been designed that elect the Condorcet winner, if he exists,

and choose a candidate in any case (Fishburn 1977, Nurmi 1987).

Borda (1781) proposed to use the following aggregation method. In each voters

preference, each candidate has a rank: 1 for the first candidate in the ranking, 2

for the second, . . . and n for the last. Compute the Borda score of each candidate,

i.e. the sum for all voters of that candidates rank. Then choose the candidate

with lowest Borda score.

2.1. ANALYSIS OF SOME VOTING SYSTEMS 15

Note that there can be several such candidates. In these cases, the Borda

method does not tell us which one to choose. They are considered as equivalent.

But the likelihood of indifference is rather small and decreases as the number of

candidates or voters increases. For example, for 3 candidates and 2 voters, the

probability of all candidates being tied is 1/3; for 3 candidates and 50 voters, it is

less than 1 %. Note that once again, we supposed that all rankings have the same

probability.

Note that the Borda method not only allows to choose one candidate but to

rank them (by increasing Borda scores). If two candidates have the same Borda

score, then they are indifferent.

Let {a, b, c, d} be the set of candidates for a 3 voters election. Suppose that

and 1 voter has preferences aP cP dP b.

The Borda score of a is 5 = 22+11. For b, it is 6 = 21+14. Candidates

c and d receive 8 and 11. Thus a is the winner. Using the Condorcet method, the

conclusion is different: b is the Condorcet winner. Thus, when a Condorcet winner

exists, it is not always chosen by the Borda method. Nevertheless, it can be shown

that the Borda method never chooses a Condorcet looser, i.e. a candidate that is

beaten by all other candidates by an absolute majority (contrary to the British

system, see Example 2).

Suppose now that candidates c and d decide not to compete because they

are almost sure to lose. With the Borda method, the new winner is b. Thus b

now defeats a just because c and d dropped out. Thus the fact that a defeats

or is defeated by b depends upon the presence of other candidates. This can be

a problem as the set of the candidates is not always fixed. It can vary because

candidates withdraw, because feasible solutions become infeasible or the converse,

because new solutions emerge during discussions, . . .

With the Condorcet method, b remains the winner and it can be shown that

this is always the case: if a candidate is a Condorcet winner, then he is still a

Condorcet winner after the elimination of some candidates.

Let {a, b, c} be the set of candidates for a 2 voters election. Suppose that

and 1 voter has preferences bP aP c.

The alternative with the lowest Borda score is a. Now consider a new election

where the alternatives and voters are identical but they changed their preferences

about c. Suppose that

and 1 voter has preferences bP cP a.

16 CHAPTER 2. CHOOSING ON THE BASIS OF SEVERAL OPINIONS

It turns out that b has the lowest Borda score. However, none of the two

voters changed their opinion about the pair {a, b}. The first (resp. second) voter

prefers a (resp. b) in both cases. Only the relative position of c changed and

this was enough to turn b into a winner and a into a looser. This can be seen

as a shortcoming of the Borda method. One says that the Borda method does

not satisfy the independence of irrelevant alternatives. It can be shown that the

Condorcet method satisfies this property.

We could go on and on with examples showing, that any method you can think of

suffers severe problems. But we think it is time to stop for at least two reasons.

First, it is not very constructive and, second, each example is related to a particular

method; hence this approach lacks generality. A more general (and thus theoretic)

approach is needed. We should find a way to answer questions like

...

the present volume, we try to present various problems arising in evaluation and

decision models in an informal way and to show the need for formal methods.

Nevertheless, we cannot resist to the desire to present now, in an informal way,

some of the most famous results of social choice theory.

Arrows theorem

Arrow (1963) was interested by the aggregation of rankings with ties into a ranking,

possibly with ties. We will call this ranking the overall ranking. He examined the

methods verifying the following properties.

Universal domain. This property implies that the aggregation method must be

applicable to all cases. Whatever the rankings provided by the voters, the

method must yield an overall ranking of the candidates. This property rules

out methods that would impose some restrictions on the preferences of the

voters.

with ties. This implies that, if aP b and bP c in the overall ranking, then

aP c in the overall ranking. Example 8 showed that the Condorcet method

doesnt verify transitivity: a is preferred to b, b is preferred to c and c is

preferred to a.

Unanimity. If all voters are unanimous about a pair of candidates, e.g. if all voters

rank a before b, then a must be ranked before b in the overall preference.

This seems quite reasonable but example 9 showed that some commonly used

2.1. ANALYSIS OF SOME VOTING SYSTEMS 17

Pareto condition.

Independence. The relative position of two candidates in the overall ranking de-

pends only on their relative positions in the individuals preferences. There-

fore other alternatives are considered as irrelevant with respect to that pair.

Note that we observed in example 12 that the Borda method violates the

independence property. This property is often called Independence of irrel-

evant alternatives.

on the other ones. This rules out aggregation methods such that the overall

ranking is always identical to the preference ranking of a given voter. This

may be seen as a minimal requirement for a democratic method.

Theorem 2.1 (Arrow) When the number of candidates is at least 3, there ex-

ists no aggregation method satisfying simultaneously the properties of universal

domain, transitivity, unanimity, independence and non-dictatorship.

culties when trying to find a satisfying aggregation method. For example, let us

observe that the Borda method satisfies the universal domain, transitivity, una-

nimity and non-dictatorship properties. Therefore, as a consequence of theorem

2.1, we can deduce that it cannot satisfy the independence condition. What about

the Condorcet method ? It satisfies the universal domain, unanimity, independence

and non-dictatorship properties. Hence it cannot verify transitivity (see example

8). Note that Arrows theorem uses only five conditions that, in addition, are

quite weak (at least at first glance). Yet, the result is powerful. If, in addition to

these five conditions, we wish to find a method satisfying neutrality, separability,

monotonicity, non-manipulability, . . . we face an even more puzzling problem.

Gibbard-Satterthwaites theorem

Gibbard (Gibbard 1973) and Satterthwaite (Satterthwaite 1975) were very inter-

ested by the (non-)manipulability of aggregation methods, especially those leading

to the election of a unique candidate. Informally, a method is non-manipulable if,

in no case, a voter can improve the result of the election by not reporting his true

preferences. They proved the following result.

than two, there exists no aggregation method satisfying simultaneously the proper-

ties of universal domain, non-manipulability and non-dictatorship.

in mind theorem 2.2. The French system satisfies universal domain and non-

dictatorship. Therefore, it is not surprising that it is manipulable.

18 CHAPTER 2. CHOOSING ON THE BASIS OF SEVERAL OPINIONS

Many other impossibility results can be found in the literature. But this is not

the place to review them. Besides impossibility results, many characterisations are

available. A characterisation of a given aggregation method is a set of properties

simultaneously satisfied by only that method. These results help to understand

the fundamental principles of a method and to compare different methods.

At the beginning of this chapter, we decided to focus on elections of a unique

candidate. Some voting systems lead to the election of several candidates and

aim towards achieving a kind of proportional representation. One might think

that those systems are the solution to our problems. In fact, they are not. Those

systems raise as many questions (perhaps more) as the ones we considered (Balinski

and Young 1982). Furthermore, suppose that a parliament has been elected, using

proportional representation. This parliament will have to vote on many different

issues and, very often, only one candidate or law or project will have to be chosen.

Let us consider the assumption that we made in Section 1: the preferences of each

voter can accurately be represented by a ranking of all candidates from best to

worse, without ties. We all know that this is not always realistic. For example, in

some instances, there are several candidates that a voter cannot rank, just because

he considers them as equivalent. Those candidates are tied. There are many other

reasons to question our assumption. In some cases, a voter is not able to rank

the candidates; in others, he is able to rank them but another kind of modeling of

his preferences would be more accurate. In this section, we list different cases in

which our initial assumption is not valid.

2.2.1 Rankings

To model the preferences of a voter, we can use a ranking without ties. This model

corresponds to the assumption of Section 1. This implies that when you present a

pair of candidates (a, b) to a voter, he is always able to tell if he prefers a to b or

the converse. Furthermore, if he prefers a to b and b to c, he necessarily prefers a

to c (transitivity of preference).

In some cases, a voter is unable to state if he prefers a to b or the converse because

he thinks that both candidates are of equal value. He is indifferent between a and

b. Thus, we need to model his preferences by a ranking with ties. For each pair

of candidates (a, b), we have a is preferred to b, the converse or a is indifferent

to b (which is equivalent to b is indifferent to a). Preference still is transitive.

Suppose that a voter prefers a to b, c and d, he is indifferent between b and c and,

finally, he prefers a, b and c to d. We can model his preferences by a ranking with

ties. A graphic representation of this model is given in Fig. 2.1 where an arrow

between two candidates (e.g. a and b) means that a is preferred to b and a line

between them means that a is indifferent to b. Note that, in a ranking with ties,

2.2. MODELLING THE PREFERENCES OF A VOTER 19

c

a d

b

Figure 2.1: A complete pre-order. Arrows implied by transitivity are not repre-

sented

b and c, he is also indifferent between a and c.

It can also occur that a voter is unable to rank the candidates, not because he

thinks that some of them are equivalent but because he cannot compare some of

them. There can be several reasons for this.

Poor information Suppose that a voter must compare two candidates a and b

about which he knows almost nothing, except that their names are a and b

and that they are candidates. Such a voter cannot declare that he prefers a

to b nor the converse. If he is forced to express his preferences by means of a

ranking with ties, he will probably rank a and b tied rather than ranking one

above the other. But this would not really reflect his preferences because he

has no reasons to consider that they are equivalent. It is very likely that one

is better than the other but, as he doesnt know which one, he is better off

not stating any preferences about them.

Conflicting information Suppose that a voter has to compare two candidates a

and b about which he knows a lot. He might be embarrassed when asked to

tell which candidate he prefers because, in some respects, a is far better than

b but, in other respects, b is far better than a. And he does not know how

to balance the pros and cons or he does not want to do so for the moment.

Confidential information Suppose that your mother invited you and your wife

for dinner. At the end of the meal, your mother says I have never eaten

such a good pie! Does NameOfYourWife prepare it as well as I do ? No

matter what your preference is, you would probably be very embarrassed to

answer. And your answer is very likely to be Well, it is difficult to say.

In fact they are different. I like both but I cannot compare them. Such

situations are very common in real life where people do not tell the truth,

all the truth and nothing but the truth about their preferences.

Of course, this list is not exhaustive. We therefore need to introduce a new model

in which voters are allowed to express incomparabilities. Hence, when comparing

two candidates a and b, four situations can arise:

1. a is preferred to b,

20 CHAPTER 2. CHOOSING ON THE BASIS OF SEVERAL OPINIONS

2. b is preferred to a,

3. a is indifferent to b or

obtain is called a partial ranking.

If we use a ranking with ties to model his preferences, he is necessarily indifferent

between a and c, because of the transitivity of indifference. Is this what we want ?

We are going to borrow a small example from Luce (1956) to show that transitivity

of indifference should be dropped, at least in some cases. Let us suppose that I

present two cups of coffee to a voter: one cup without sugar, the other one with

one grain of sugar. Let us also suppose that he likes his coffee with sugar. If I ask

him which cup he prefers, he will tell me that he is indifferent (because he is not

able to detect one grain of sugar). He equally dislikes both. I will then present him

a cup with one grain and another with two. He will still be indifferent. Next, two

grains and three grains, and so on until nine hundred ninety nine and one thousand

grains. The voter will always be indifferent between the two cups that I present

to him because they differ by just one grain of sugar. Because of the transitivity

of indifference, he must also be indifferent between a cup without sugar and a cup

with one thousand grains (2 full spoons). But of course, if I ask him which one

he prefers, he will choose the cup with one thousand grains. Thus transitivity of

indifference is violated. A possible objection to this is that the voter will be tired

before he reaches the cup with one thousand grains. Furthermorethis is more

seriousthe coffee will be cold and he hates that.

There is a structure that keeps transitivity of preference and drops it for in-

difference. Consequently, it can model the preferences of our coffee drinker. It is

called semiorder. For details about semiorders, see Pirlot and Vincke (1997).

Do we need semiorders only when a voter cannot distinguish between two very

similar objects ? The following example, adapted from (Armstrong 1939) will give

the answer. Suppose that you ask your child to choose between two presents for

his birthday: a poney and a blue bicycle. As he likes both of them equally, he will

say he is indifferent. Suppose now that you present him a third candidate: a red

bicycle with a small bell. He will probably tell you that he prefers the red one to

the blue one. So, you prefer the red bicycle to the poney, is that right ? you

would say if you consider a transitive indifference. However, it is obvious that the

child can still be indifferent between the poney and the red bicycle.

2.2. MODELLING THE PREFERENCES OF A VOTER 21

poney

red bike

blue bike

Rankings with or without ties, partial rankings and semiorders are all binary

relations. Many other families of binary relations have been considered in the

literature in order to formally model the preferences of individuals as faithfully as

possible (e.g. Roubens and Vincke 1985, Abbas, Pirlot and Vincke 1996). Note

that even the transitivity of strict preference can be questioned due to empirical

observations (e.g. Fishburn 1988, Fishburn 1991, Tversky 1969, Sen 1997). Let us

now focus on another kind of mathematical structure used to model the preferences

of a voter.

Fuzzy relations can be used to model preferences in at least two very different

situations.

When a voter is asked to express his preferences by means of a binary relation, he

has to examine each pair and choose a is preferred to b, b is preferred to a,

a is indifferent to b or a and b are incomparable (if indifference and incom-

parability are allowed). In fact, reality is more subtle. When facing a question

like do you prefer a to b, a voter might hesitate. It is easy to imagine situations

where a voter would like to say perhaps. And it is just a step further to imag-

ine different situations where a voter would hesitate but with various degrees of

confidence: almost yes but not completely sure, perhaps but more on the side of

yes, perhaps, perhaps but more on the side of no, . . . There can be many reasons

for his hesitations.

He does not have full knowledge about the candidates. For example, in a

legislative election, a voter does not necessarily know what the position of

all candidates is regarding a particular issue.

He does have full knowledge about the candidates but not about some events

that might occur in the future and affect the way he compares the candi-

dates. For example, again in a legislative election, a voter might ideally know

everything about all candidates. But he does not know if, during the forth-

coming mandate, the representatives will have to vote on a particular issue.

If such a vote is to occur, a voter might prefer candidate a to candidate b.

22 CHAPTER 2. CHOOSING ON THE BASIS OF SEVERAL OPINIONS

In the other case, he might prefer b to a because there is just one thing that

he disapproves of the policy of b: his position about that particular issue.

He does not fully know his preferences. Suppose that the community in

which you live has decided to build a new recreational facility. There are

two options: a tennis court or a playground. You have to vote. You perfectly

know the two options (budget, time to completion, plan, . . . ). You like tennis

and your children would love that playground. You will have access to both

facilities under the same conditions. Can you tell which one you will choose ?

What will you enjoy more ? To play tennis or to let your children play in

the playground ?

These three cases can be seen as three facets of a single problem. The voter is

uncertain about the final consequences of his choice.

Fuzzy relations can be used to model such preferences. The voter must still

answer the above mentioned question (do you prefer a to b ?), but by numbers,

no longer by yes or no. If he feels that a is preferred to b is definitely true, he

answers 1. If he feels that a is preferred to b is definitely false, he answers 0. For

intermediate situations, he chooses intermediate numbers. For example, perhaps

could be 0.5 and almost yes, 0.9. A typical fuzzy relation on three candidates is

illustrated by Fig. 2.3 where a number on the arrow between two candidates (e.g.

a and b) is the answer of the voter to the question is a preferred to b.

0.6 b

0.4

0.3

a 0.8

0.0

1.0 c

quences is assumed to exist. In such cases, the problem faced by the voter is no

longer uncertainty but risk. In these cases, probabilities of preference might be

assigned to each pair.

In some cases, when a voter is asked to tell if he prefers a to b, he will tend to

express faint differences in his judgement, not because he is uncertain about his

judgement, but because the concept of preference is vague and not well defined.

For example, a voter might say I definitely prefer a to b but not as much as I

prefer c to d. This is due to the fact that preference is not a clear-cut concept.

We might then model his preferences by a fuzzy relation and choose 0.5 for (a, b)

and 0.8 for (c, d). A value of 0 would correspond to no preference.

2.3. THE VOTING PROCESS 23

Note that in many cases, uncertainty and vagueness are probably simultane-

ously present. For a thorough review of fuzzy preference modelling, see (Perny

and Roubens 1998).

Many other models can be conceived or have been described in the literature. An

important one is the utilitarian one: a voter assigns to each candidate a number

(the utility of the candidate). The position of a candidate with respect to any

other candidate is a function only of the utilities of the two candidates. If the

utilities of a and b are respectively are 50 and 40, the implication is that a is

preferred to b. In addition, if the utilities of c and d are respectively 30 and 10,

it implies that the preference between c and d is twice as large as the preference

between a and b.

Another important model is used in approval voting (Brams and Fishburn

1982). In this voting system, every voter votes for as many candidates as he wants

or approves. Consequently, the preferences of a voter are modelled by a partition

of the set of candidates into two subsets: a subset of approved candidates and

a subset of disapproved candidates. Approval voting received a lot of attention

during the last twenty years and has been adopted by a number of committees.

We will not continue our list of preference models any further. Our aim was

just to give a small overview of the many problems that can arise when trying

to model the preferences of a voter. But there is an important issue that we still

must address. We encountered many problems in Section 2.1. In this section, we

were using complete orders to model voters the preferences. We then examined

alternative models. Is it easier to aggregate individual preferences modelled by

means of complete pre-orders, semiorders, fuzzy relations, . . . ? Unfortunately,

the answer is no. Many examples, similar to those in Section 1, can be built to

demonstrate this (Sen 1986, Salles, Barrett and Pattanaik 1992).

Until now, we considered only modelling the preferences of a voter and aggregating

the preferences of several voters. But voting is much more than that. Here are a

few points that are included in the voting process, even if they are often left aside

in the literature.

Who is going to define the candidates or alternatives that will be submitted to a

vote ? All the voters, some of them or one of them ? In some cases, e.g. presidential

elections, the candidates are voters that become candidates on a voluntary basis.

Nevertheless, there are often some rules: not everyone can be a candidate. Who

should fix these rules and how ? There is an even more fundamental question:

who should decide that voting should occur, on what issue, according to which

24 CHAPTER 2. CHOOSING ON THE BASIS OF SEVERAL OPINIONS

rules ? All these questions received different answers in different countries and

committees. This may indicate that they are far from trivial.

Let us now be more pragmatic. The board of directors of a company asks the

executive committee to prepare a report on the future investment strategies. A

vote on the proposed strategies will be held during the next board of directors

meeting. How should the executive committee prepare its report ? Should they

include all strategies, even infeasible ones ? If infeasible ones are to be avoided,

who should decide that they are infeasible. To find all feasible strategies might

be prohibitively resource and time consuming. And one can never be sure that

all feasible strategies have been explored. There is no systematic way, no formal

method to do that. Creativity and imagination are needed during this process.

Finally, suppose that the executive committee decides to explore only some

strategies. A more or less arbitrary selection needs to be made. Even if they do

make this selection in a perfectly honest way, it can have far reaching consequences

on the outcome of the process. Remember example 11 in which we showed that,

for some aggregation methods, the relative ranking of two candidates depends on

the presence (or absence) of some other candidates. Furthermore, some studies

show that an individual can prefer a to b or b to a depending on the presence or

absence of some other candidate (Sen 1997).

democracies, past or present. Citizens, rich people, noble people, men, men and

women, everyone, white men, experts who have some knowledge about the dis-

cussed problem, one representative for each faction, a number of representatives

proportional to the size of that faction, . . . There is no universal answer.

Even the choice of the aggregation method can be considered as part of the voting

process for, in some cases, the aggregation method is at least as important as the

result of the vote. Consider two countries, A and B: A is ruled by a dictator,

B is a democracy. Suppose that each time a policy is chosen by voting in B,

the dictator of A applies the same policy in his country, without voting. Hence,

all governmental decisions are the same in A and B. The only difference is that

the people in A do not vote; their benevolent dictator decides alone. In what

country would you prefer to live ? I guess you would choose B, unless you are

the dictator. And you would probably choose B even if the decisions taken in

B were a little bit worse than the decisions taken in A. What we value in B is

freedom of choice. Some references or more details on this topic can be found in

(Sen 1997, Suzumura 1999).

2.4. SOCIAL CHOICE AND MULTIPLE CRITERIA DECISION SUPPORT 25

support

2.4.1 Analogies

There is an interesting analogy between voting and multiple criteria decision sup-

port. Replace criteria by voters, alternatives by candidates and you get it. Let

us be more explicit. In multiple criteria decision support, most papers consider

an entity, called decision-maker, that wants to choose an alternative from a set of

available alternatives. The decision-maker is often assumed to be an individual,

a person. To make his choice, the decision maker takes several viewpoints called

criteria into account. These criteria are often conflicting, i.e. according to a cri-

terion, a given alternative is the best one while, according to another criterion,

other alternatives are better.

In a large part of the literature on voting, there is an entity called group

or society that has to choose a candidate from a set of candidates. This entity

consists of individuals and, for some reasons, that can vary largely in different

groups, the choice made by this entity must reflect in some way the opinion of

the individuals. And, of course, the individuals often have conflicting views about

the candidates. In other words, the preferences of an individual play the same

role, in social choice, as the preferences along a single viewpoint or criterion in

multiple criteria decision support. The collective or social preferences, in social

choice theory, and the global or multiple criteria preferences, in multiple criteria

decision support, can be compared in the same way.

The main interest of this analogy lies in the fact that voting has been studied

for a long time. The seminal works by Borda (1781), Condorcet (1785), and Arrow

(1963) have led to an important stream of research in the 20th century. Hence we

have a huge amount of results on voting at our disposal for use in multiple criteria

decision support. Besides, this similarity has widely been used (see e.g. Arrow and

Raynaud 1986, Vansnick 1986).

In this chapter, we only discussed elections in which only one candidate must

be chosen (single-seat constituencies, prime ministers or presidents). However, it

is often the case that several candidates must be chosen. For example, in Belgium

and Germany, in each constituency, several representatives are elected so as to

achieve a proportional representation. A committee that must select projects from

a list often selects several ones, according to the available resources. In multiple

criteria decision support, such cases are common. An investor usually invests in a

portfolio of stocks. A human resources manager chooses amongst the candidates

those that will form an efficient team, etc.

In fact, the comparison can be extended to the processes of voting and decision-

making. In multiple criteria decision support, the decision process is much broader

than just the extraction, by some aggregation method, of the best alternative from

a performance tableau.

The very beginning of the process, the problem definition, is a crucial step.

When a decision maker enters a decision process, he has no clearly defined problem.

He just feels unsatisfied with his current situation. He then tries to structure his

26 CHAPTER 2. CHOOSING ON THE BASIS OF SEVERAL OPINIONS

view of the situation, to put labels on different entities, to look for relationships

between entities, etc. Finally he obtains a problem , as one can find in books.

It is a description, in formal language or not, of the current situation. It usually

contains a description of the reasons for which that situation is not satisfying and

it contains an implicit description of the potential solutions to the problem. That

is, the problem statement contains information that allows to recognise if a given

action or course of actions is a potential solution or not. The problem statement

must not be too broad, otherwise anything can be a solution and the decision-

maker is not helped. On the contrary, if the statement is too narrow, some actions

are not recognised as potential solutions even if they would be good ones.

Some authors, mainly in the United Kingdom, have developed methods to help

decision-makers to better structure their problem (Rosenhead 1989, Daellenbach

1994).

When the problem has been stated, the decision-maker has a problem, but no

solution. He must construct the set of alternatives, like the candidates set in social

choice. Brainstorming and other techniques promoting and stimulating creativity

have been developed to support this step.

The criteria, like the voters, are not given in a decision process. The decision-

maker needs to identify all the viewpoints that are relevant with respect to his

problem. He then must define a set of criteria that reflect all relevant viewpoints

and that fulfills some conditions. There must not be several criteria reflecting

the same viewpoint. All criteria should be independent except if the aggregation

method to be used thereafter allows dependence between criteria. Depending on

the aggregation method, the scales corresponding to the criteria must have some

properties. And so on. See e.g. Roy (1996) and Keeney and Raiffa (1976).

Last but not least, the aggregation method itself must be chosen by the analyst

and/or the decision-maker. It is hard to imagine how an aggregation procedure

could be scientifically proven to be the best one. The decision-maker must thus

make a choice. He should choose the one that satisfies some properties he judges

important, the one he can understand, the one he trusts.

2.5 Conclusions

In this chapter, we have shown that the operation of voting is far from simple. In

the first section, using small examples, describing very simple situations, we found

that intuition and common sense are not sufficient to avoid the many traps that

await us when using aggregation procedures. In fact, in this domain, common

sense is of very little help. We also presented two theoretical results indicating

that there is no hope of finding a perfect voting procedure. Therefore, if we still

want to use a voting procedurethis seems hardly avoidablewe must accept to use

an imperfect one. But this does not mean that we can use any procedure in any

circumstance and any way. The flaws of a particular procedure are probably less

damageable in some instances than in others. Some features of a voting procedure

may be highly desirable in a given context while not so important in another one.

So, for each voting context, we have to choose the procedure that best matches our

2.5. CONCLUSIONS 27

needs. And, when we have made this choice, we must be aware that this match is

not perfect, that we must use the procedure in such a way that the risk of facing

a problematic situation is kept as low as possible.

In Section 2, we found that even the input of voting proceduresthe preferences

of the votersare not simple things. Many different models for preferences exist

and can be used in aggregation procedures. This shows that what is usually

considered as data is not really data. When we feed our aggregation procedures

with preferences, these are not given. They are constructed in some more or less

arbitrary way. The choice of a particular model (ranking with ties, fuzzy relations,

. . . ) is itself arbitrary. Nothing in the problem tells us what model to use.

Finally, in Section 3, we showed that the voting process itself is highly complex.

Voting procedures are decision models, just like student grades, indicators,

cost-benefit analysis, multiple criteria decision support (this has already been dis-

cussed in Section 4), . . . They are decision models devoted to the special case where

a decision must be taken by a group of voters and are mainly concerned with the

case of a finite and small set of alternatives. This peculiarity doesnt make voting

procedures very different from other decision and evaluation models. As you will

see in the following chapters, most decision models suffer the same kind of problems

that we have met in this chapter: there is no perfect aggregation procedure; the

data are not data, they are imperfect and arbitrary models; the decision models

are too narrow, they do not take into account the fact that decision support occurs

in a human process (the decision making process) and in a complex environment.

3

BUILDING AND AGGREGATING

EVALUATIONS: THE EXAMPLE OF

GRADING STUDENTS

3.1 Introduction

3.1.1 Motivation

In chapter 2, we tried to show that voting, although being a familiar activity

to almost everyone, raises many important and difficult questions that are closely

connected to the subject of this book. Our main objective in this chapter is

similar. We all share the more or less pleasant experience of having received

grades in order to evaluate our academic performances. The authors of this

book spend part of their time evaluating the performance of students through

grading several kinds of work, an activity that you may also be familiar with. The

purpose of this chapter is to build upon this shared experience. This will allow us

to discuss, based on simple and familiar situations, what is meant by evaluating

a performance and aggregating evaluations, both activities being central to

most evaluation and decision models. Although the entire chapter is based on the

example of grading students, it should be stressed that grades are often used

in contexts unrelated to the evaluation of the performance of students: employees

are often graded by their employers, products are routinely tested and graded by

consumer organisations, experts are used to rate the feasibility or the riskiness of

projects, etc. The findings of this chapter are therefore not limited to the realm

of a classroom.

As with voting systems, there is much variance across countries in the way

education is organised. Curricula, grading scales, rules for aggregating grades

and granting degrees, are seldom similar from place to place (for information on

the systems used in the European Union see www.eurydice.org).

This diversity is even increased by the fact that each instructor (a word that

we shall use to mean the person in charge of evaluating students) has generally

developed his own policy and habits. The authors of this book have studied in four

different European countries (Belgium, France, Greece and Italy) and obtained

degrees in different disciplines (Maths, Operational Research, Computer Science,

Geology, Management, Physics) and in different Universities. We were not overly

astonished to discover that the rules that governed the way our performances were

assessed were quite different. We were perhaps more surprised to realise that

29

30 CHAPTER 3. BUILDING AND AGGREGATING EVALUATIONS

policies were quite different even after having accounted for the fact that these

policies are partly contingent upon the rules governing our respective institutions.

Such diversity might indicate that evaluating students is an activity that is perhaps

more complex than it appears at first sight.

We shall restrict our attention in this chapter to education programmes with which

we are familiar. Our general framework will be that of a programme at University

level in which students have to take a number of courses or credits. In each

course the performance of students is graded. These grades are then collected and

form the basis of a decision to be taken about each student. Depending on the

programme, this decision may take various forms, e.g. success or failure, success or

failure with possible additional information such as distinctions, ranks or average

grades, success or failure with the possibility of a differed decision (e.g. the degree is

not granted immediately but there is still a possibility of obtaining it). Quite often

the various grades are summarised, amalgamated, we shall say aggregated,

in some way before a decision is taken.

In what follows, we shall implicitly have in mind the type of programmes in

which we teach (Mathematics, Computer Science, Operational Research, Engi-

neering) that are centred around disciplines which, at least at first sight, seem to

raise less evaluation problems than if we were concerned with, say, Philosophy,

Music or Sports.

Dealing only with technically-oriented programmes at University level will

clearly not allow us to cover the immense literature that has been developed in

Education Science on the evaluation of the performance of students. For good

accounts in English, we refer to Airaisian (1991), Davis (1993), Lindheim, Morris

and Fitz-Gibbon (1987), McLean and Lockwood (1996), Moom (1997) and Speck

(1998). Note that in Continental Europe, the Piagetian influence, different institu-

tional constraints and the popularity of the classic book by Pieron (1963) have led

to a somewhat different school of thought, see Bonboir (1972), Cardinet (1986),

de Ketele (1982), de Landsheere (1980), Merle (1996) and Noizet and Caverini

(1978). As we shall see, this will however allow us to raise several important is-

sues concerning the evaluation and the aggregation of performances. Two types

of questions prove to be central for our purposes:

meaning of the resulting grades and how to interpret them?

at an overall evaluation of his academic performance?

3.2. GRADING STUDENTS IN A GIVEN COURSE 31

Most of you have probably been in the situation of an instructor having to

attribute grades to students. Although this is clearly a very important task, many

instructors share the view that this is far from being the easiest and most pleasant

part of their jobs. We shall try here to give some hints on the process that leads

to the attribution of a grade as well as on some of its pitfalls and difficulties.

We shall understand a grade as an evaluation of the performance of a student in

a given course, i.e. an indication of the level to which a student has fulfilled the

objectives of the course.

This very general definition calls for some remarks.

course. Although it may appear obvious, this implies a precise statement of

the objectives of the course in the syllabus, a condition that is unfortunately

not always perfectly met.

2. All grades do not have a similar function. Whereas usually the final grade

of a course in Universities mainly has a certification role, intermediate

grades, on which the final grade may be partly based, have a more complex

role that is often both certificative and formative, e.g. the result of a

mid-term exam is included in the final grade but is also meant to be a signal

to a student indicating his strengths and weaknesses.

should be noticed that grades are not only a signal sent by the instructor

to each of his students. They have many other potential important users:

other students using them to evaluate their position in the class, other in-

structors judging your severity and/or performance, parents watching over

their child, administrations evaluating the performance of programmes, em-

ployers looking for all possible information on an applicant for a job.

ple functions (see Chatel 1994, Laska and Juarez 1992, Lysne 1984, McLean and

Lockwood 1996). Interpreting it necessarily calls for a study of the process that

leads to its attribution.

What is graded and how?

The types of work that are graded, the scale used for grading and the way of amal-

gamating these grades may vary in significant ways for similar types or courses.

32 CHAPTER 3. BUILDING AND AGGREGATING EVALUATIONS

1. The scale that is used for grading students is usually imposed by the pro-

gramme. Numerical scales are often used in Continental Europe with varying

bounds and orientations: 0-20 (in France or Belgium), 0-30 (in Italy), 6-1 (in

Germany and parts of Switzerland), 0-100 (in some Universities). American

and Asian institutions often use a letter scale, e.g. E to A or F to A. Obvi-

ously we would not want to conclude from this that Italian instructors have

come to develop much more sensitive instruments for evaluating performance

than German ones or that the evaluation process is in general more precise

in Europe than it is in the USA. Most of us would agree that the choice of

a particular scale is mainly conventional. It should however be noted that

since grades are often aggregated at some point, such choices might not be

totally without consequences. We shall come back to that point in section

3.3.

2. Some courses are evaluated on the basis of a single exam. But there are

many possible types of exams. They may be written or oral; they may be

open-book or closed-book. Their duration may vary (45 minute exams are

not uncommon in some countries whereas they may last up to 8 hours in

some French programmes). Their content for similar courses may vary from

multiple choice questions to exercises, case-studies or essays.

tests. The number and type of work may vary a lot: final exam, mid-term

exam, exercises, case-studies or even class participation. Furthermore the

way these various grades are aggregated is diverse: simple weighted average,

grade only based on exams with group work (e.g. case-studies or exercises)

counting as a bonus, imposition of a minimal grade at the final exam, etc.

(an overview of grading policies and practices in the USA can be found in

Riley, Checca, Singer and Worthington 1994).

4. Some instructors use raw grades. For reasons to be explained later, others

modify the raw grades in some way before aggregating and/or releasing

them, e.g. standardising them.

Within a given institution suppose that you have to prepare and grade a written,

closed-book, exam. We shall take the example of an exam for an Introduction to

Operational Research (OR) course, including Linear Programming (LP), Integer

Programming and Network models, with the aim of giving students a basic un-

derstanding of the modelling process in OR and an elementary mastering of some

basic techniques (Simplex Algorithm, Branch and Bound, elementary Network

Algorithms). Many different choices interfere with such a task.

exam is a difficult and time consuming task. Is the subject of adequate diffi-

culty? Does it contain enough questions to cover all parts of the programme?

3.2. GRADING STUDENTS IN A GIVEN COURSE 33

Do all the questions clearly relate to one or several of the announced objec-

tives of the course? Will it allow to discriminate between students? Is there

a good balance between modelling and computational skills? What should

the respective parts of closed vs. open questions be?

2. Preparing a marking scale. The preparation of the marking scale for a given

subject is also of utmost importance. A nice-looking subject might be

impractical in view of the associated marking scale. Will the marking scale

include a bonus for work showing good communication skills and/or will

misspellings be penalised? How to deal with computational errors? How

to deal with computational errors that lead to inconsistent results? How to

deal with computational errors influencing the answers to several questions?

How to judge an LP model in which the decision variables are incompletely

defined? How to judge a model that is only partially correct? How to judge a

model which is inconsistent from the point of view of units? Although much

expertise and/or rules of thumb are involved in the preparation of a good

subject and its associated marking scale, we are aware of no instructor not

having had to revise his judgement after correcting some work and realising

his severity and/or to correct work again after discovering some frequently

given half-correct answers that were unanticipated in the marking scale.

the tasks implied by the subject of the exam and, hopefully, will give an

indication of the extent to which a student has met the various objectives

of the course (in general an exam is far from dealing with all the aspects

that have been dealt with during the course). Although this is debatable,

such an evaluation is often thought of as a measure of performance. For

this kind of measure the psychometric literature (see Ebel and Frisbie

1991, Kerlinger 1986, Popham 1981), has traditionally developed at least

two desirable criteria. A measure should be:

reliable, i.e. give similar results when applied several times in similar

conditions,

valid, i.e. should measure what was intended to be measured and only

that.

Extensive research in Education Science has found that the process of giving

grades to students is seldom perfect in these respects (a basic reference re-

mains the classic book of Pieron (1963). Airaisian (1991) and Merle (1996)

are good surveys of recent findings). We briefly recall here some of the

difficulties that were uncovered.

The crudest reliability test that can be envisaged is to give similar works to

correct to several instructors and to record whether or not these works are

graded similarly. Such experiments were conducted extensively in various

disciplines and at various levels. Not overly surprisingly, most experiments

have shown that even in the more technical disciplines (Maths, Physics,

Grammar) in which it is possible to devise rather detailed marking scales

34 CHAPTER 3. BUILDING AND AGGREGATING EVALUATIONS

tween the more generous and the more severe correctors on Maths work

can be as high as 2 points on a 0-20 scale. Even more strikingly on some

work in Maths the difference can be as high as 9 points on a 0-20 scale (see

Pieron 1963).

In other experiments the same correctors are asked to correct a work that

they have already corrected earlier. These auto-reliability tests give similar

results since in more than 50% of the cases the second grade is significantly

different from the first one. Although few experiments have been conducted

with oral exams, it seems fair to suppose that they are no more reliable than

written ones.

Other experiments have shown that many extraneous factors may interfere in

the process of grading a paper and therefore question the validity of grades.

Instructors accustomed to grading papers will not be surprised to note that:

grades usually show much auto correlation: similar papers handed in

by a usually good student and by a usually uninterested student

are likely not to receive similar grades,

the order in which papers are corrected greatly influences the grades.

Near the end of a correction task, most correctors are less generous and

tend to give grades with a higher variance.

anchoring effects are pervasive: it is always better to be corrected

after a remarkably poor work than after a perfect one.

misspellings and poor hand-writing prove to have a non negligible influ-

ence on the grades even when the instructor declares not to take these

effects into account or is instructed not to.

4. The influence of correction habits. Experience shows that correction habits

tend to vary from one instructor to another. Some of them will tend to give

an equal percentage of all grades and will tend to use the whole range of the

scale. Some will systematically avoid the extremes of the range and the dis-

tribution of their marks will have little variability. Others will tend to give

only extreme marks e.g. arguing that either the basic concepts are under-

stood or they are not. Some are used to giving the lowest possible grade after

having spotted a mistake which, in their minds, implies that nothing has

been understood (e.g. proposing a non linear LP model). The distribu-

tion of grades for similar papers will tend to be highly different according to

the corrector. In order to cope with such effects, some instructors will tend

to standardise the grades before releasing them (the so-called z-scores),

others will tend to equalise average grades from term to term and/or use a

more or less ad hoc procedure.

A syllabus usually contains a section entitled grading policy. Although instruc-

tors do not generally consider it as the most important part of their syllabus, they

3.2. GRADING STUDENTS IN A GIVEN COURSE 35

are aware that it is probably the part that is read first and most attentively by all

students. Besides useful considerations on ethics, this section usually describes

the process that will lead to the attribution of the grades for the course in detail.

On top of describing the type of work that will be graded, the nature of exams

and the way the various grades will contribute to the determination of the final

grade, it usually also contains many details that may prove important in order

to understand and interpret grades. Among these details, let us mention:

the type of preparation and correction of the exams: who will prepare the

subject of the exam (the instructor or an outside evaluator)? Will the work

be corrected once or more than once (in some Universities all exams are

corrected twice)? Will the names of the students be kept secret?

the possibility of revising a grade: are there formal procedures allowing

the students to have their grades reconsidered? Do the students have the

possibility of asking for an additional correction? Do the students have

the possibility of taking the same course at several moments in the academic

year? What are the rules for students who cannot take the exam (e.g. because

they are sick)?

the policy towards cheating and other dishonest behaviour (exclusion from

the programme, attribution of the lowest possible grade for the course, at-

tribution of the lowest possible grade for the exam).

the policy towards late assignments (no late assignment will be graded, minus

x points per hour or day).

The process of the determination of the final grades for a given course can hardly

be understood without a clear knowledge of the requirements of the programme

in order to obtain the degree. In some programmes students are only required

to obtain a satisfactory grade (it may or not correspond to the middle of

the grading scale that is used) for all courses. In others, an average grade

is computed and this average grade must be over a given limit to obtain the

degree. Some programmes attribute different kinds of degrees through the use of

distinctions. Some courses (e.g. core courses) are sometimes treated apart; a

dissertation may have to be completed.

The freedom of an instructor in arranging his own grading policy is highly

conditioned by this environment. A grade can hardly be interpreted without a

clear knowledge of these rules (note that this sometimes creates serious problems

in institutions allowing students pertaining to different programmes with different

sets of rules to attend the same courses). Within a well defined set of rules,

however, many degrees of freedom remain. We examine some of them below.

Weights We mentioned that the final grade for a course was often the combina-

tion of several grades obtained throughout the course: mid-term exam, final exam,

case-studies, dissertation, etc. The usual way to proceed is to give a (numerical)

36 CHAPTER 3. BUILDING AND AGGREGATING EVALUATIONS

weight to each of the work entering into the final grade and to compute a weighted

average, more important works receiving higher weights. Although this process is

simple and almost universally used, it raises some difficulties that we shall examine

in section 3.3. Let us simply mention here that the interpretation of weights in

such a formula is not obvious. Most instructors would tend to compensate for a

very difficult mid term exam (weight 30%) preparing a comparatively easier final

exam (weight 70%). However, if the final exam is so easy that most students

obtain very good grades, the differences in the final grades will be attributable

almost exclusively to the mid term exam although it has a much lower weight

than the final exam. The same is true if the final grade combines an exam with

a dissertation. Since the variance of the grades is likely to be much lower for the

dissertation than for the exam, the former may only marginally contribute towards

explaining differences in final grades independently of the weighting scheme. In

order to avoid such difficulties, some instructors standardise grades before averag-

ing them. Although this might be desirable in some situations, it is clear that the

more or less arbitrary choice of a particular measure of dispersion (why use the

standard deviation and not the inter quartile range? should we exclude outliers?)

may have a crucial influence on the final grades. Furthermore, the manipulation

of such distorted grades seriously complicates the positioning of students with

respect to a minimal passing grade since their use amounts to abandoning any

idea of absolute evaluation in the grades.

Passing a course In some institutions, you may either pass or fail a course

and the grades obtained in several courses are not averaged. An essential problem

for the instructor is then to determine which students are above the minimal

passing grade. When the final grade is based on a single exam we have seen

that it is not easy to build a marking scale. It is even more difficult to conceive a

marking scale in connection to what is usually the minimal passing grade according

to the culture of the institution. The question boils down to deciding what amount

of the programme should a student master in order to obtain a passing grade, given

that an exam only gives partial information about the amount of knowledge of the

student.

The problem is clearly even more difficult when the final grade results from the

aggregation of several grades. The use of weighted averages may give undesirable

results since, for example, an excellent group case-study may compensate for a

very poor exam. Similarly weighted averages do not take the progression of the

student during the course into account.

It should be noted that the problem of positioning students with respect to a

minimal passing grade is more or less identical to positioning them with respect

to any other special grades, e.g. the minimal grade for being able to obtain a

distinction, to be cited on the Deans honour list or the Academic Honour

Roll.

3.2. GRADING STUDENTS IN A GIVEN COURSE 37

Grades from other institutions

In view of the complexity of the process that leads to the attribution of a grade,

it should not be a surprise that most instructors find it very difficult to interpret

grades obtained in another institution. Consider a student joining your programme

after having obtained a first degree at another University. Arguing that he has

already passed a course in OR with 14 on a 0-20 scale, he wants to have the

opportunity to be dispensed from your class. Not aware of the grading policy of

the instructor and of the culture and rules of the previous University this student

attended, knowing that he obtained 14 offers you little information. The knowledge

of his rank in the class may be more useful: if he obtained one of the highest grades

this may be a good indication that he has mastered the contents of the course

sufficiently. However, if you were to know that the lowest grade was 13 and that

14 is the highest, you would perhaps be tempted to conclude that the difference

between 13 and 14 may not be very significant and/or that you should not trust

grades that are so generous and exhibit so little variability.

Being able to interpret the grade that a student obtained in your own institution

is quite important at least as soon as some averaging of the grades is performed in

order to decide on the attribution of a degree. This task is clearly easier than the

preceding one: the grades that are to be interpreted here have been obtained in a

similar environment. However, we would like to argue that this task is not an easy

one either. First it should be observed that there is no clear implication in having

obtained a similar grade in two different courses. Is it possible or meaningful to

assert that a student is equally good in Maths and in Literature? Is it possible

to assert that, given the level of the programme, he has satisfied to a greater

extent the objectives of the Maths course than the objectives of the Literature

course? Our experience as instructors would lead us to answer negatively to such

questions even when talking of programmes in which all objectives are very clearly

stated. Secondly, in section 3.2.2 we mentioned that, even within fixed institutional

constraints, each instructor still had many degrees of freedom to choose his grading

policy. Unless there is a lot of co-ordination between colleagues they may apply

quite different rules e.g. in dealing with late assignments or in the nature and

number of exams. This seriously complicates the interpretation of the profile of

grades obtained by a student.

The numerical scales used for grades throughout Europe tend to give the impres-

sion that grades are real measures and that, consequently these numbers may

be manipulated as any other numbers. There are many possible kinds of mea-

sure and having a numerical scale is no guarantee that the numbers on that scale

may be manipulated in all possible ways. In fact, before manipulating numbers

supposedly resulting from measurements it is always important to try to figure

38 CHAPTER 3. BUILDING AND AGGREGATING EVALUATIONS

out on which type of scales they have been measured. Let us notice that this

is true even in Physics. Saying that Mr. X weighs twice as much as Mr. Y makes

sense because this assertion is true whether mass is measured in pounds or in

kilograms. Saying that the average temperature in city A is twice as high as the

average temperature in city B may be true but makes little sense since the truth

value of this assertion clearly depends on whether temperature is measured using

the Celsius or the Fahrenheit scale.

The highest point on the scale An important feature of all grading scales is

that they are bounded above. It should be clear that the numerical value attributed

to the highest point on the scale is somewhat arbitrary and conventional. No loss

of information would be incurred using a 0-100 or a 0-10 scale instead of a 0-20

one. At best it seems that grades should be considered as expressed on a ratio

scale, i.e. a scale in which the unit of measurement is arbitrary (such scales are

frequent in Physics, e.g. length can be measured in meters or inches without loss

of information).

If grades can be considered as measured on a ratio scale, it should be recognised

that this ratio scale is somewhat awkward because it is bounded above. Unless you

admit that knowledge is bounded or, more realistically, that perfectly fulfilling

the objectives of a course makes clear sense, problems might appear at the upper

bound of the scale. Consider two excellent, but not necessarily equally excellent,

students. They cannot obtain more than the perfect grade 20/20. Equality of

grades at the top of the scale (or near the top, depending on grading habits) does

not necessarily imply equality in performance (after a marking scale is devised it is

not exceptional that we would like to give some students more than the maximal

grade, i.e. because some bonus is added for particularly clever answers, whereas

the computer system of most Universities would definitely reject such grades !).

The lowest point on the scale It should be clear that the numerical value that

is attributed to the lowest point of the scale is no less arbitrary and conventional

than was the case for the highest point. There is nothing easier than to transform

grades expressed on a 0-20 scale to grades expressed on a 100-120 scale and this

involves no loss of information. Hence it would seem that a 0-20 scale might

be better viewed as an interval scale, i.e. a scale in which both the origin and

the unit of measurement are arbitrary (think of temperature scale in Celsius or

Fahrenheit). An interval scale allows comparisons of differences in performance;

it makes sense to assert that the difference between 0 and 10 is similar to the

difference between 10 and 20 or that the difference between 8 and 10 is twice as

large as the difference between 10 and 11, since changing the unit and origin of

measurement clearly preserves such comparisons.

Let us notice that using a scale that is bounded below is also problematic. In

some institutions the lowest grade is reserved for students who did not take the

exam. Clearly this does not imply that these students are equally ignorant.

Even when the lowest grade can be obtained by students having taken the exam,

some ambiguity remains. Knowing nothing, i.e. having completely failed to meet

any of the objectives of the course, is difficult to define and is certainly contingent

3.2. GRADING STUDENTS IN A GIVEN COURSE 39

upon the level of the course (this is all the more true that in many institutions

the lowest grade is also granted to students having cheated during the exam,

with obviously no guarantee that they are equally ignorant). To a large extent

knowing nothing in the context of a course is somewhat as arbitrary as is

knowing everything. Therefore, if grades are expressed on interval scales, care

should be taken when manipulating grades close to the bounds of the scale.

compare differences in grades. The authors of this book (even if their students

should know that they spend a lot of time and energy in grading them !) do

not consider that their own grades always allow for such comparisons. First we

already mentioned that a lot of care should be taken in manipulating grades that

are close to the bounds. Second, in between these bounds, some grades are very

particular in the sense that they play a particular role in the attribution of the

degree. Let us consider a programme in which all grades must be above a minimal

passing grade, say, 10 on a 0-20 scale, in order to obtain the degree. If it is clear

that an exam is well below the passing grade, few instructors will claim that there

is a highly significant difference between 4/20 and 5/20. Although the latter exam

seems slightly better than the former, the essential idea is that they are both

well below the minimal passing grade. On the contrary the gap between 9/20

and 10/20 may be much more important since before putting a grade just below

the passing grade most instructors usually make sure that they will have good

arguments in case of a dispute (some systematically avoid using grades just below

the minimal passing grade). In some programmes, not only the minimal passing

grade has a special role: some grades may correspond to different possible levels

of distinction, other may correspond to a minimal acceptable level below which

there is no possibility of compensation with grades obtained in other courses. In

between these special grades it seems that the reliable information conveyed by

grades is mainly ordinal. Some authors have been quite radical in emphasising this

point, e.g. Cross (1995) stating that: [...] we contend that the difficulty of nearly

all academic tests is arbitrary and regardless of the scoring method, they provide

nothing more than ranking information (but see French 1993, Vassiloglou and

French 1982). At first sight this would seem to be a strong argument in favour

of the letter system at use in most American Universities that only distinguishes

between a limited classes of grades (usually from F or E to A with, in some

institutions, the possibility of adding + or to the letters). However, since

these letter grades are usually obtained via the manipulation of a distribution of

numerical grades of some sort, the distinction between letter grades and numerical

grades is not as deep as it appears at first sight. Furthermore the aggregation of

letter grades is often done via a numerical transformation as we shall see in section

3.3.

Finally it should be observed that, in view of the lack of reliability and validity

of some aspects of the grading process, it might well be possible to assert that small

differences in grades that do not cross any special grades may not be significant at

all. A difference of 1 point on a 0-20 scale may well be due only to chance via the

position of the work, the quality of the preceding papers, the time of correction.

40 CHAPTER 3. BUILDING AND AGGREGATING EVALUATIONS

Once more grades appear as complex objects. While they seem to mainly

convey ordinal information (with the possibility of the existence of non significant

small differences) that is typical of a relative evaluation model, the existence of

special grades complicates the situation in introducing some absolute elements

of evaluation in the model (on the measurement-theoretic interpretation of grades

see French 1981, Vassiloglou 1984).

Some readers, and most notably instructors, may have the impression that we

have been overly pessimistic on the quality of the grading process. We would

like to mention that the literature in Education Science is even more pessimistic

leading some authors to question the very necessity of using grades (see Sager 1994,

Tchudi 1997). We suggest to sceptical instructors the following simple experiment.

Having prepared an exam, ask some of your colleagues to take it with the following

instructions: prepare what you would think to be an exam that would just be

acceptable for passing, prepare an exam that would clearly deserve distinction,

prepare an exam that is well below the passing grade. Then apply your marking

scale to these papers prepared by your colleagues. It would be extremely likely

that the resulting grades show some surprises!

However, none of us would be prepared to abandon grades, at least for the

type of programmes in which we teach. The difficulties that we mentioned would

be quite problematic if grades were considered as measures of performance that

we would tend to make more and more precise and objective. We tend to

consider grades as an evaluation model trying to capture aspects of something

that is subject to considerable indetermination, the performance of students.

As is the case with most evaluation models, their use greatly contributes to

transforming the reality that we would like to measure. Students cannot

be expected to react passively to a grading policy; they will undoubtedly adapt

their work and learning practice to what they perceive to be its severity and

consequences. Instructors are likely to use a grading policy that will depend

on their perception of the policy of the Faculty (on these points, see Sabot and

Wakeman 1991, Stratton, Myers and King 1994). The resulting scale of measure-

ment is unsurprisingly awkward. Furthermore, as with most evaluation models

of this type, aggregating these evaluations will raise even more problems.

This not to say that grades cannot be a useful evaluation model. If these lines

have lead some students to consider that grades are useless, we suggest they try

to build up an evaluation model that would not use grades without, of course,

relying too much on arbitrary judgements. This might not be an impossible task;

we, however, do not find it very easy.

3.3. AGGREGATING GRADES 41

3.3.1 Rules for aggregating grades

In the previous section, we hope to have convinced the reader that grading a

student in a given course is a difficult task and that the result of this process is a

complex object.

Unfortunately, this is only part of the evaluation process of students enrolled

in a given programme. Once they have received a grade in each course, a decision

still has to be made about each student. Depending on the programme, we already

mentioned that this decision may take different forms: success or failure, success

or failure with possible additional information e.g. distinctions, ranks or average

grades, success or failure with the additional possibility of partial success (the

degree is not granted immediately but there remains a possibility of obtaining it),

etc. Such decisions are usually based on the final grades that have been obtained

in each course but may well use some other information, e.g. verbal comments from

instructors or extra-academic information linked to the situation of each student.

What is required from the students to obtain a degree is generally described

in a lengthy and generally opaque set of rules that few instructorsbut generally

all studentsknow perfectly (as an interesting exercise we might suggest that

you investigate whether you are perfectly aware of the rules that are used in the

programmes in which you teach or, if you do not teach, whether you are aware of

such rules for the programmes in which your children are enrolled). These rules

exhibit such variety that it is obviously impossible to exhaustively examine them

here. However, it appears that they are often based on three kinds of principles

(see French 1981).

Conjunctive rules

In programmes of this type, students must pass all courses, i.e. obtain a grade

above a minimal passing grade in all courses in order to obtain the degree. If

they fail to do so after a given period of time, they do not obtain the degree.

This very simple rule has the immense advantage of avoiding any amalgamation

of grades. It is however seldom used as such because:

it does not allow to discriminate between grades just below the passing grade

and grades well below it,

it offers no incentive to obtain grades well above the minimal passing grade,

between students obtaining the degree.

Most instructors and students generally violently oppose such simple systems since

they generate high failure rates and do not promote academic excellence.

42 CHAPTER 3. BUILDING AND AGGREGATING EVALUATIONS

Weighted averages

In many programmes, the grades of students are aggregated using a simple weighted

average. This average grade (the so-called GPA in American Universities) is then

compared to some standards e.g. the minimal average grade for obtaining the de-

gree, the minimal average grade for obtaining the degree with a distinction, the

minimal average grade for being allowed to stay in the programme, etc. Whereas

conjunctive rules do not allow for any kind of compensation between the grades

obtained for several courses, all sorts of compensation effects are at work with a

weighted average.

In order to limit the scope of compensation effects allowed by the use of weighted

averages, some programmes include rules involving minimal acceptable grades

in each course. In such programmes, the final decision is taken on the basis of

an average grade provided that all grades entering this average are above some

minimal level.

The rules that are used in the programmes we are aware of often involve a

mixture of these three principles, e.g. an average grade is computed for each cat-

egory of courses provided that the grade of each course is above a minimal level

and such average grades per category of courses are then used in a conjunctive

fashion. Furthermore, it should be noticed that the final decision concerning a

student is very often taken by a committee that has some degree of freedom with

respect to the rules and may, for instance, grant the degree to someone who does

not meet all the requirements of the programme e.g. because of serious personal

problems.

All these rules are based on grades and we saw in section 3.2 that the very

nature of the grades was highly influenced by these rules. This amounts to aggre-

gating evaluations that are highly influenced by the aggregation rule. This makes

aggregation an uneasy task. We study some aspects of the most common aggre-

gation rule for grades below: the weighted average (more examples and comments

will be found in chapters 4 and 6).

The purpose of rules for aggregating grades is to know whether the overall per-

formance of a student is satisfactory taking his various final grades into account.

Using a weighted average system amounts to assessing the performance of a stu-

dent combining his grades using a simple weighting scheme. We shall suppose that

all final grades are expressed on similar scales and note gi (a) the final grade for

course i obtained by student

Pn a. The average grade obtained by student a is then

computed as g(a) = i=1 wi gi (a), the (positive) weights wi reflecting the im-

portance (in academic terms and/or in function of the length of the course)

of the course for the degree. The

Pnweights wi may, without loss of generality, be

normalised in such a way that i=1 wi = 1. Using such a convention the aver-

age grade g(a) will be expressed on a scale having the same bounds as the scale

3.3. AGGREGATING GRADES 43

used for the gi (a). The simplest decision rule consists in comparing g(a) with

some standards in order to decide on the attribution of the degree and on possible

distinctions. A number of examples will allow us to understand the meaning of

this rule better and to emphasise its strengths and weaknesses (we shall suppose

throughout this section that students have all been evaluated on the same courses;

for the problems that arise when this is not so, see Vassiloglou (1984)).

Example 1

Consider four students enrolled in a degree consisting of two courses. For each

course, a final grade between 0 and 20 is allocated. The results are as follows:

g1 g2

a 5 19

b 20 4

c 11 11

d 4 6

Student c has performed reasonably well in all courses whereas d has a consis-

tent very poor performance; both a and b are excellent in one course while having

a serious problem in the other. Casual introspection suggests that if the students

were to be ranked, c should certainly be ranked first and d should be ranked last.

Students a and b should be ranked in between, their relative position depending

on the relative importance of the two courses. Their very low performance in 50%

of the courses does not make them good candidates for the degree. The use of

simple weighted average of grades leads to very different results. Considering that

both courses are of equal importance gives the following average grades:

average grades

a 12

b 12

c 11

d 5

which leads to having both a and b ranked before c. As shown in figure 3.1, we can

say even more: there is no vector of weights (w, 1w) that would rank c before both

a and b. Ranking c before a implies that 11w + 11(1 w) > 5w + 19(1 w) which

8

leads to w > 15 . Ranking c before b implies 11w + 11(1 w) > 20w + 4(1 w), i.e.

7

w < 16 (figure 3.1 should make clear that there is no loss of generality in supposing

that weights sum to 1). The use of a simple weighted sum is therefore not in line

with the idea of promoting students performing reasonably well in all courses.

The exclusive reliance on a weighted average might therefore be an incentive for

students to concentrate their efforts on a limited number of courses and benefit

44 CHAPTER 3. BUILDING AND AGGREGATING EVALUATIONS

20

a l

18

16

14

12

c l

10

6 d l

4 l

b

2

0

0 2 4 6 8 10 12 14 16 18 20

from the compensation effects at work with such a rule. This is a consequence of

the additivity hypothesis embodied in the use of weighted averages.

It should finally be noticed that the addition of a minimal acceptable grade

for all courses can decrease but not suppress (unless the minimal acceptable grade

is so high that it turns the system in a nearly conjunctive one) the occurrence of

such effects.

A related consequence of the additivity hypothesis is that it forbids to account

for interaction between grades as shown in the following example.

Example 2

Consider four students enrolled in an undergraduate programme consisting in three

courses: Physics, Maths and Economics. For each course, a final grade between 0

and 20 is allocated. The results are as follows:

a 18 12 6

b 18 7 11

c 5 17 8

d 5 12 13

On the basis of these evaluations, it is felt that a should be ranked before b. Al-

though a has a low grade in Economics, he has reasonably good grades in both

3.3. AGGREGATING GRADES 45

Maths and Physics which makes him a good candidate for an Engineering pro-

gramme; b is weak in Maths and it seems difficult to recommend him for any

programme with a strong formal component (Engineering or Economics). Using

a similar type of reasoning, d appears to be a fair candidate for a programme in

Economics. Student c has two low grades and it seems difficult to recommend him

for a programme in Engineering or in Economics. Therefore d is ranked before c.

Although these preferences appear reasonable, they are not compatible with

the use of a weighted average in order to aggregate the three grades. It is easy to

observe that:

ranking a before b implies putting more weight on Maths than on Economics

(18w1 + 12w2 + 6w3 > 18w1 + 7w2 + 11w3 w2 > w3 ),

ranking d before c implies putting more weight on Economics than on Maths

(5w1 + 17w2 + 8w3 > 5w1 + 12w2 + 13w3 w3 > w2 ),

which is contradictory.

In this example it seems that criteria interact. Whereas Maths do not over-

weigh any other course (see the ranking of d vis-a-vis c), having good grades in

both Math and Physics or in both Maths and Economics is better than having

good grades in both Physics and Economics. Such interactions, although not

unfrequent, cannot be dealt with using weighted averages; this is another conse-

quence of the additivity hypothesis. Taking such interactions into account calls

for the use of more complex aggregation models (see Grabisch 1996).

Example 3

Consider two students enrolled in a degree consisting of two courses. For each

course a final grade between 0 and 20 is allocated; both courses have the same

weight and the required minimal average grade for the degree is 10. The results

are as follows:

g1 g2

a 11 10

b 12 9

It is clear that both students will receive an identical average grade of 10.5: the

difference between 11 and 12 on the first course exactly compensates for the oppo-

site difference on the second course. Both students will obtain the degree having

performed equally well.

It is not unreasonable to suppose that since the minimal required average for

the degree is 10, this grade will play the role of a special grade for the instructors,

a grade above 10 indicating that a student has satisfactorily met the objectives

of the course. If 10 is a special grade then, it might be reasonable to consider

that the difference between 10 and 9 which crosses a special grade is much more

significant than the difference between 12 and 11 (it might even be argued that the

small difference between 12 and 11 is not significant at all). If this is the case, we

46 CHAPTER 3. BUILDING AND AGGREGATING EVALUATIONS

would have good grounds to question the fact that a and b are equally good. The

linearity hypothesis embodied in the use of weighted averages has the inevitable

consequence that a difference of one point has a similar meaning wherever on the

scale and therefore does not allow for such considerations.

Example 4

Consider a programme similar to the one envisaged in the previous example. We

have the following results for three students:

g1 g2

a 14 16

b 15 15

c 16 14

All students have an average grade of 15 and they will all receive the degree.

Furthermore, if the degree comes with the indication of a rank or of an average

grade, these three students will not be distinguished: their equal average grade

makes them indifferent. This appears desirable since these three students have

very similar profiles of grades.

The use of linearity and additivity implies that if a difference of one point on

the first grade compensates for an opposite difference on the other grade, then a

difference of x points on the first grade will compensate for an opposite difference

of x points on the other grade, whatever the value of x. However, if x is chosen

to be large enough this may appear dubious since it could lead, for instance, to

view the following three students as perfectly equivalent with an average grade of

15:

g1 g2

a0 10 20

b 15 15

c0 20 10

whereas we already argued that, in such a case, b could well be judged preferable to

both a0 and c0 even though b is indifferent to a and c. This is another consequence

of the linearity hypothesis embodied in the use of weighted averages.

Example 5

Consider three students enrolled in a degree consisting of three courses. For each

course a final grade between 0 and 20 is allocated. All courses have identical

importance and the minimal passing grade is 10 on average. The results are as

follows:

3.3. AGGREGATING GRADES 47

g1 g2 g3

a 12 5 13

b 13 12 5

c 5 13 12

It is clear that all students have an average equal to the minimal passing grade

10. They all end up tied and should all be awarded the degree.

As argued in section 3.2 it might not be unreasonable to consider that final

grades are only recorded on an ordinal scale, i.e. only reflect the relative rank of

the students in the class, with the possible exception of a few special grades

such as the minimal passing grade. This means that the following table might as

well reflect the results of these three students:

g1 g2 g3

a 11 4 12

b 13 13 6

c 4 14 11

since the ranking of students within each course has remained unchanged as well

as the position of grades vis-a-vis the minimal passing grade. In this case, only

b (say the Deans nephew) gets an average above 10 and both a and c fail (with

respective averages of 9 and 9.6). Note that using different transformations, we

could have favoured any of the three students.

Not surprisingly, this example shows that a weighted average makes use of the

cardinal properties of the grades. This is hardly compatible with grades that

would only be indicators of ranks even with some added information (a view that

is very compatible with the discussion in section 3.2). As shown by the following

example, it does not seem that the use of letter grades, instead of numerical

ones, helps much in this respect.

Example 6

In many American Universities the Grade Point Average (GPA), which is nothing

more than a weighted average of grades, is crucial for the attribution of degrees and

the selection of students. Since courses are evaluated on letter scales, the GPA

is usually computed by associating a number to each letter grade. A common

conversion scheme is the following:

A 4 (outstanding or excellent)

B 3 (very good)

C 2 (good)

D 1 (satisfactory)

E 0 (failure)

48 CHAPTER 3. BUILDING AND AGGREGATING EVALUATIONS

Such a practice raises several difficulties. First, letter grades for a given course

are generally obtained on the basis of numerical grades of some sort. This implies

using a first conversion scheme of numbers into letters. The choice of such a

scheme is not obvious. Note that when there are no holes in the distribution

of numerical grades it is possible that a very small (and possibly non significant)

difference in numerical grades results in a significant difference in letter grades.

Secondly, the conversion scheme of letters into numbers used to compute the

GPA is somewhat arbitrary. Allowing for the possibility of adding + or to

the letter grades generally results in a conversion schemes maintaining an equal

difference between two consecutive letter grades. This can have a significant impact

on the ranking of students on the basis of the GPA.

To show how this might happen suppose that all courses are first evaluated

on a 0100 scale (e.g. indicating the percentage of correct answers to a multiple

choice questionnaire). These numbers are then converted into letter grades using

a first conversion scheme. These letter grades are further transformed, using a

second conversion scheme, into a numerical scale and the GPA is computed. Now

consider three students evaluated on three courses on a 0-100 scale in the following

way:

g1 g2 g3

a 90 69 70

b 79 79 89

c 100 70 69

Universities) is

A 90100%

B 8089%

C 7079%

D 6069%

E 059%

g1 g2 g3

a A D C

b C C B

c A C D

Supposing the three courses of equal importance and using the conversion scheme

of letter grades into numbers given above, the calculation of the GPA is as follows:

3.3. AGGREGATING GRADES 49

g1 g2 g3 GPA

a 4 1 2 2.33

b 2 2 3 2.33

c 4 2 1 2.33

Now another common (and actually used) scale for converting percentages into

letter grades is as follows:

A+ 98100%

A 9497%

A 9093%

B+ 8789%

B 8386%,

B 8082%

C+ 7779%,

C 7376%,

C 7072%,

D 6069%,

F 059%

g1 g2 g3

a A D C

b C+ C+ B+

c A+ C D

tive letter grades we obtain the following conversion scheme:

A+ 10

A 9

A 8

B+ 7

B 6

B 5

C+ 4

C 3

C 2

D 1

F 0

50 CHAPTER 3. BUILDING AND AGGREGATING EVALUATIONS

g1 g2 g3 GPA

a 8 1 2 3.66

b 4 4 7 5.00

c 10 2 1 4.33

In this case, b (again the Deans nephew) gets a clear advantage over a and c.

It should be clear that standardisation of the original numerical grades before

conversion offers no clear solution to the problem uncovered.

Example 7

We argued in section 3.2 that small differences in grades might not be significant

at all provided they do not involve crossing any special grade. The explicit

treatment of such imprecision is problematic using a weighted average; most often,

it is simply ignored. Consider the following example in which three students are

enrolled in a degree consisting of three courses. For each course a final grade

between 0 and 20 is allocated. All courses have the same weight and the minimal

passing grade is 10 on average. The results are as follows:

g1 g2 g3

a 13 12 11

b 11 13 12

c 14 10 12

All students will receive an average grade of 12 and will all be judged indifferent.

If all instructors agree that a difference of one point in their grades (away from 10)

should not be considered as significant, student a has good grounds to complain.

He can argue that he should be ranked before b: he has a significantly higher grade

than b on g1 while there is no significant difference between the other two grades.

The situation is the same vis-a-vis c: a has a significantly higher grade on g2 and

this is the only significant difference.

In a similar vein, using the same hypotheses, the following table appears even

more problematic:

g1 g2 g3

a 13 12 11

b 11 13 12

c 12 11 13

since, while all students clearly obtain a similar average grade, a is significantly

better than b (he has a significantly higher grade on g1 while there are no signifi-

cant differences on the other two grades), b is significantly better than c and c is

3.4. CONCLUSIONS 51

significantly better than a (the reader will have noticed that this is a variant of

the Condorcet paradox mentioned in chapter 2).

Aggregation rules using weighted sums will be dealt with again in chapters 4

and 6. In view of these few examples, we hope to have convinced the reader that

although the weighted sum is a very simple and almost universally accepted rule,

its use may be problematic for aggregating grades. Since grades are a complex

evaluation model, this is not overly surprising. If it is admitted that there is no

easy way to evaluate the performance of a student in a given course, there is no

reason why there should be an obvious one for an entire programme. In particular,

the necessity and feasibility of using rules that completely rank order all students

might well be questioned.

3.4 Conclusions

We all have been accustomed to seeing our academic performances in courses

evaluated through grades and to seeing these grades amalgamated in one way or

another in order to judge our overall performance. Most of us routinely grade

various kinds of work, prepare exams, write syllabi specifying a grading policy,

etc. Although they are very familiar, we have tried to show that these activities

may not be as simple and as unproblematic as they appear to be. In particular,

we discussed the many elements that may obscure the interpretation of grades

and argued that the common weighted sum rule to amalgamate them may not be

without difficulties. We expect such difficulties to be present in the other types of

evaluation models that will be studied in this book.

We would like to emphasise a few simple ideas to be drawn from this example

that we should keep in mind when working on different evaluation models:

Actors are most likely to modify their behaviour in response to the imple-

mentation of the model;

evaluation operations are complex and should not be confused with mea-

surement operations in Physics. When they result in numbers, the proper-

ties of these numbers should be examined with care; using numbers may

be only a matter of convenience and does not imply that any operation can

be meaningfully performed on these numbers.

the aggregation of the result of several evaluation models should take the

nature of these models into account. The information to be aggregated may

itself be the result of more or less complex aggregation operations (e.g. ag-

gregating the grades obtained at the mid-term and the final exams) and may

be affected by imprecision, uncertainty and/or inaccurate determination.

aggregation models should be analysed with care. Even the simplest and

most familiar ones may in some cases lead to surprising and undesirable

conclusions.

52 CHAPTER 3. BUILDING AND AGGREGATING EVALUATIONS

Finally we hope that this brief study of the evaluation procedures of students will

also be the occasion for instructors to reflect on their current grading practices.

This has surely been the case for the authors.

4

CONSTRUCTING MEASURES: THE

EXAMPLE OF INDICATORS

Our daily life is filled with indicators: I.Q., Dow Jones, GNP, air quality, physicians

per capita, poverty index, social position index, consumer price index, rate of

return, . . . If you read a newspaper, you could feel that these magic numbers rule

the world.

allowed to enter the EURO.

children should stay indoors.

succeed in bringing indicator y to level z.

Note that in many cases, the decisions of the World Bank to withdraw help

are not motivated by economic or financial reasons. Violations of human rights

are often presented as the main factor. But it is worth noting that indicators of

human rights also exist (see e.g. Horn (1993)).

Why are these indicators (often called indices) so powerful ? Probably because

it is commonly accepted that they faithfully reflect reality. This forces us to raise

several questions.

1. Is there one reality, several realities or no reality ? Many philosophers nowa-

days consider that reality is not unique. Each person has a particular per-

ception of the world and, hence, a particular reality. One could argue that

these particular realities are just particular views of the same reality but,

as it is impossible to consider reality independently of our perception of it,

it might be meaningless to consider that reality exists per se (Roy 1990).

As a consequence, an indicator might only be relevant for the person who

constructed it.

2. Whatever the answer to the previous question, can we hope that an indicator

faithfully reflects reality (the reality or a reality) ? Reality is so complex that

this is doubtful. Therefore, we must accept that an indicator accounts only

for some aspects of reality. Hence, an indicator must be designed so as

53

54 CHAPTER 4. CONSTRUCTING MEASURES

to reflect those aspects that are relevant with respect to our concerns. As

an illustration, the Human Development index (HDI) defined by the United

Nations Development Programme (UNDP) to measure development (United

Nations Development Programme 1997) is used by many different people in

different continents and in different areas of activity (politicians, economists,

businessmen, . . . ). Can we assume that their concerns are similar ?

In the Human development report 1997, UNDP proudly reports that

counties as a guide to identifying those most severely disadvan-

taged in terms of human development. Several countries, such as

the Philippines, have used such analysis as a planning tool. [. . . ]

The HDI has been used especially when a researcher wants a com-

posite measure of development. For such uses, other indicators

have sometimes been added to the HDI.

This clearly shows that many people used the HDI in completely different

ways.

Furthermore, are the concerns of UNDP itself with respect to the HDI clearly

defined ? Why do they need the human development index ? To cut subsidies

to nations evolving in the wrong direction ? To share subsidies among the

poorest countries (according to what key) ? To put some pressure on the

governments performing the worst ? To prove that Western democracies

have the best political systems ?

3. Suppose that the purpose of an indicator is clearly defined. Are we sure that

this indicator indicates what we want it to ? Do the arithmetic operations

performed during the computation of the indicator lead to something that

makes sense ?

Let us now discuss three well known indicators arising in completely different

areas of our lives in detail: the human development index, the air quality index

and the decathlon score.

As stated by the United Nations Development Programme (1997), page 14,

a country in three basic dimensions of human developmentlongevity,

knowledge and a decent standard of living. A composite index, the HDI

thus contains three variables: life expectancy, educational attainment

(adult literacy and combined primary, secondary and tertiary enroll-

ment) and real GDP (Gross Domestic Product) per capita expressed

in PPP$ (Purchasing Power Parity $).

4.1. THE HUMAN DEVELOPMENT INDEX 55

HDIs precise definition is presented on page 122 of the 1997 Human Develop-

ment Report. The HDI is a simple average of the life expectancy index, educational

attainment index and adjusted real GDP per capita (PPP$) index. Here is how

each index is computed.

Life Expectancy Index (LEI) This index measures life expectancy at birth. In

order to normalise the scale of this index, a minimum value (25 years) and

a maximum one (85 years) have been defined. The index is defined as

life expectancy at birth 25

.

85 25

Hence, it is a value between 0 and 1.

Educational Attainment Index (EAI) It is a combination of two other indi-

cators: the Adult Literacy Index (ALI) and the combined primary, secondary

and tertiary Enrollment Ratio Index (ERI). The first one is the proportion

of literate adults while the second one is the proportion of children in age of

primary, secondary or tertiary school that really go to school. The EAI is a

weighted average of ALI and ERI; it is equal to

2ALI + ERI

.

3

Adjusted real GDP per capita (PPP$) Index (GDPI) This index aims at

measuring the income per capita. As the value of one dollar for someone

earning $100 is much larger than the value of one dollar for someone earning

$100 000, the income is first transformed using Atkinsons formula (Atkinson

1970). The transformed value of y, i.e. W (y), is given by one of the following:

if 0 < y < y ,

y

1/2

if y y < 2y ,

y + 2[(y y ) ]

y + 2(y )1/2 + 3[(y 2y )1/3 ] if 2y y < 3y ,

..

.

y + 2(y )1/2 + 3(y )1/3 + . . . if (n 1)y y < ny

+n[(y (n 1)y )1/n ]

.

In this formula, y represents the income, W (y) the transformed income and

y is set at $5 835 (PPP$) which was the World average annual income per

capita in 1994.

Thereafter, the income scale is normalised, using the maximum value of

$40 000, the minimum value of $100 and the formula

transformed income W (100)

.

W (40 000) W (100)

Hence, it is a value between 0 and 1. Note that W (40 000) = 6 154 and

W (100) = 100.

56 CHAPTER 4. CONSTRUCTING MEASURES

Some words about the data and their collection time: the Human Development

Report is a yearly publication (since 1990). Obviously, the 1997 report does not

contain the 1997 data. Indeed, the HDI computed in the 97 report is considered

by the UNDP as the HDI of 1994. To make things more complicated, the 199i HDI

(in the 199j report) is an aggregate of data from 199i (for some dimensions) and

from earlier years (for other dimensions). In this volume, we use only data from

the 1997 Human Development Report. We refer to them as HDR97, irrespective

of the collection year.

To illustrate how the HDI works, lets compute the HDI for Greece (HDR97).

Life expectancy in Greece is 77.8 years. Hence, LEI = (77.825)/(8525) = 0.880.

The ALI is 0.967 and the ERI is 0.820. Hence, EAI = (2 0.967 + 0.820)/3 =

0.918. Greeces real GDP per capita at $11 265 is above y by less than twice

y . Thus the adjusted real GDP per capita for Greece is $5 982 (PPP$) because

5 982 = 5 835+2(11 2655 835)1/2 . Hence GDPI = (5 982W (100))/(W (40 000)

W (100)) = (5 982 100)/(6 154 100) = 0.972. Finally, Greeces HDI is (0.880 +

0.918 + 0.972)/3 = 0.923.

To obtain the LEI and the GDPI, maximum and minimum values have been de-

fined so that, after normalisation, the range of the index is [0,1]. The choice of

these bounds is quite arbitrary. Why 25 and 85 years ? Is 25 years the smallest

observed value ? No, the lowest observed value is 22.6 (Rwanda, HDR97). There-

fore the LEI is negative for Rwanda. The value of 25 was chosen for the first

report (1990), when the lowest observed value was above 35. At that time, no one

would ever have thought that life expectancy could be lower than 25. To avoid

this problem, they could have chosen a much lower value: 20 or 10. The likelihood

of observing a value smaller than the minimum would have been much smaller.

But the choice of the bounds is not without consequences. Consider the following

example.

Suppose that the EAI and GDPI have been computed for South Korea and

Costa Rica (HDR97). We also know the life expectancy at birth for South Korea

and Costa Rica (see Table. 4.1) If the maximum and minimum for life expectancy

South Korea 71.5 .93 .97

Costa Rica 76.6 .86 .95

Table 4.1: Bounds: life expectancy, EAI and GDPI for South Korea and Costa

Rica (HDR97)

are set to 85 and 25, then the HDI is 0.890 for South Korea and 0.889 for Costa

Rica. But if the maximum and minimum for life expectancy are set to 80 and 25,

then the HDI is 0.915 for South Korea and 0.916 for Costa Rica. In the first case,

Costa Rica is less developed than South Korea while in the second one, we obtain

the converse: Costa Rica is more developed than South Korea. Hence, the choice

of the bounds matters.

4.1. THE HUMAN DEVELOPMENT INDEX 57

In fact narrowing the range of life expectancy from [25,85] to [25,80] increases

the difference between any two values of LEI by a factor (8525)/(8025). Hence

it amounts to increasing the weight of LEI by the same factor. In our example,

Costa Rica performed better than South Korea on life expectancy. Therefore, it

is not surprising that its position is improved when life expectancy is given more

weight (by narrowing its range).

Note that, apparently, no bounds were fixed for the ALI and the ERI. In reality,

this is equivalent to choosing 1 for maximum and 0 for minimum. This is also an

arbitrary choice. It is obvious that values 0 and 1 have not been observed and are

not likely to be observed in a foreseeable future. Hence the range of these scales

is narrower than [0,1] and the scale could be normalised, using other values than

0 and 1.

4.1.2 Compensation

Consider Table 4.2 where the data for two countries (Gabon and the Solomon

Islands, HDR97) are presented. The Solomon Islands perform quite well on all

dimensions; Gabon is slightly better than the Solomon Islands on all dimensions

except life expectancy where it is very bad. For us, this very short life expectancy

is clearly a sign of severe underdevelopment, even if other dimensions are good.

Nevertheless, the HDI is equal to 0.56 for both Gabon and Solomon Islands. Hence,

Gabon 54.1 .63 .60 3 641

Solomon Islands 70.8 .62 .47 2 118

that Gabon and Solomon Islands are at the same development level. This problem

is due to the fact that we used the usual average to aggregate our data into one

number. Weaknesses on some dimensions are compensated by strengths on other

dimensions. This is probably desirable, to some extent. Yet, extreme weaknesses

should not be compensated, even by very good performances on other dimensions.

Let us go further with compensation. As any weakness can be compensated

by a strength, a decrease in life expectancy by one year can be compensated by

some increase in adjusted real GDP (income transformed by Atkinsons formula).

Let us compute this increase. A decrease by one year yields a decrease of LEI by

1/(85 25) = 0.016667. To compensate this, the GDPI must increase by the same

amount. Hence, the adjusted real GDP must be increased by 0.016667(6 154

100)= 100.9$ (recall that W (40 000) = 6 154). Accordingly, a decrease in life

expectancy by 2 years can be compensated by an increase in adjusted real GDP

by 2 times 100.9$; a decrease in life expectancy by n years can be compensated

by an increase in adjusted real GDP by n times 100.9$. The value of one year of

life is thus 100.9$ (adjusted by Atkinsons formula). The value 100.9 is called the

substitution rate between life expectancy and adjusted real GDP.

58 CHAPTER 4. CONSTRUCTING MEASURES

Other substitution rates are easy to compute: e.g. the substitution rate between

life expectancy and adult literacy is 0.016667(1 0)(3/2)=0.025. To compensate

a decrease of n years of life expectancy, you need an increase of the adult literacy

index of n times 0.025.

Let us now think in terms of real GDP (not adjusted). In a country where

real GDP is 13 071$ (Cyprus, HDR97), a decrease in life expectancy of one year

can be compensated by an increase in real GDP of 21 084$. In a country where

real GDP is 700$ (Chad, HDR97), a decrease of life expectancy by one year can

be compensated by an increase in real GDP by 100.9$. Hence, poor peoples life

expectancy has much less value than that of rich ones.

Consider the example of Table 4.3. Countries x and y perform equally badly on

x 30 .80 .65 500

y 30 .35 .40 3 500

life expectancy, y is much lower than x on adult literacy but much higher than

x on income. As life expectancy is very short, one might consider that adult

literacy is not very important (because there are almost no adults) but income is

more important because it improves quality of life in other respects. Furthermore,

health conditions and life expectancy can be expected to improve rapidly due to a

higher income. Hence, one could conclude that y is more developed than x. Our

conclusion is confirmed by the HDI: 0.30 for x and 0.34 for y.

Let us now compare two countries, w and z similar to x and y except that life

expectancy is equal to 70 for both w and z (see Table 4.4). In such conditions, the

performance of z on adult literacy is really bad compared to that of w. The adult

population is very important and its illiteracy is a severe problem. Even if the high

income of z is used to foster education, it will take decades before a significant

part of the population is literate. On the contrary, ws low income doesnt seem to

w 70 .80 .65 500

z 70 .35 .40 3 500

be a problem for the quality of life, as life expectancy is high as well as education.

Hence, it might not be unreasonable to conclude that w is more developed than

z. But if we compute the HDI, we obtain 0.52 for w and 0.56 for z! This should

not be a surprise; there is no difference between x and y on the one hand and w

and z on the other hand, except for life expectancy. But the differences in life

4.1. THE HUMAN DEVELOPMENT INDEX 59

expectancy between x and w and between y and z are equal. Hence, this results

in the same increase of the HDI (compared to x and y) for both w and z.

When a sum (or an average) is used to aggregate different dimensions, identical

performances of by two items (countries or whatever) on one or more dimensions

are not relevant for the comparison of these items. The identical performances can

be changed in any direction; as long as they remain identical, they do not affect

the way both items compare to each other. This is called dimension independence;

it is inherent to sums and averages. But we saw that this property is not always

desirable. When we compare countries on the basis of life expectancy, education

and income, dimension independence might not be desirable.

In a way, we already have discussed this topic in Section 4.1.1 (Scale Normali-

sation). But there is more to scale construction than scale normalisation. For

example, concerning real GDP, before normalising this scale, the real GDP is ad-

justed using Atkinsons formula. The goal of this adjustment is obvious: if you

earn 40 000 dollars, one more dollar is negligible. If you earn 100 dollars, one

more dollar is considerable. Atkinsons formula reflects this. But why choosing

y = $5 835 ? Why choose Atkinsons formula ? Other formulas and other values

for y would work just as well. Once more, an arbitrary choice has been made

and we could easily build a small example showing that another arbitrary (but

defendable) choice would yield a different ranking of the countries.

Note that the fact that life expectancy, adult literacy and enrollment have

not been adjusted is also an arbitrary choice. One could argue that improving

life expectancy by one year in a country where life expectancy is 30 is a huge

achievement while it is a moderate one in a country where life expectancy is 70.

Some could even argue that increasing life expectancy above a certain threshold is

no longer an improvement. It increases the health budget in such proportions that

no more resources are available for other important areas: education, employment

policy, . . .

Let us consider the four indices of the HDI from a statistical point of view. The

life expectancy index is the average over the population and for a determined time

period of the length of the lives of the individuals in the population. It is well

known that averages, even if they are useful, cannot reflect the variety present in

the population. A country where approximately everyone lives until 50 has a life

expectancy of 50 years. A country where a part of the population (rural or poor

or of some race) dies early and where another part of the population lives until 80

might also have a life expectancy of 50 years.

Note that this kind of average is quite particular. It is very different from

the average that we perform when, for example, we have several measures of the

weight of an object and we consider the average as a good estimate of its actual

weight. The weight of an object really exists (as far as reality exists). On the

60 CHAPTER 4. CONSTRUCTING MEASURES

contrary, even if reality exists, the average of the length of life doesnt correspond

to something real. It is the length of life of a kind of average or ideal human, as

if we (the real humans) were imperfect, irregular or noisy copies of that average

human. Until the 19th- century, both kinds of averages were called by differ-

ent names (moyenne proportionnelledifferent measures of one objectand valeur

communedifferent objects, each measured once) and considered as completely dif-

ferent. During the 19th-century the Belgian astronomer and statistician Quetelet

(1796-1894) invented the concept of the average human and unified both averages

(Desrosieres 1995).

To convince you that the concept of the average human is quite strange (though

possibly useful), consider a country where all inhabitants are right triangles of

different sizes and shapes (example borrowed from Warusfel (1961)). To make it

easy, let us suppose that there are just two kinds of right triangles (see Fig. 4.1),

in the same proportion. A statistician wants to measure the average right triangle.

In order to do so, he computes the average length of each edge. What he gets is

a triangle with edges of length 4, 8 and 9, i.e. a triangle which is not right-angled

for 42 + 82 6= 92 . The average right triangle is no longer a right triangle! What

looks like a right angle is in fact approximately a 91 degrees angle. In the same

spirit, Quetelet measured the average size of humans, in all dimensions, including

the liver, heart, spleen and other organs. What he got was an average human in

which it was impossible to fit all its average organs. They were too large!

13

5 9

5 4

3

4 12 8

The adult literacy index is quite different: it is just the number of literate

adults, divided by the total adult population to allow comparisons between coun-

tries. Hence one could think it is not an average. In fact it depends on how we

interpret it. If we consider that an ALI of 0.60 means that 60% of the population

is literate, then it is not an average. If we consider that an ALI of 0.60 means that

the average literacy level is 60%, then it is an average. And this last interpreta-

tion is not more silly than computing a life expectancy index. Consider a variable

whose value is 0 for an illiterate adult and 1 for a literate one. Compute the av-

erage of this variable over the population and over some a time period. What do

you get ? The adult literacy index!

We can analyse the enrolment ratio index and the adjusted real GDP index in

the same way as the ALI. They are quantities that are measured at country level.

The first one being a proportion and the second one being normalised, they can

also be interpreted at individual level, like averages.

What about the HDI itself. According to the United Nations Development

Programme (1997), it is designed to

4.2. AIR QUALITY INDEX 61

Furthermore, the HDI contains an index (LEI) which can only be interpreted bear-

ing in mind Quetelets average human. Therefore the ALI, GDPI and HDI should

be interpreted in this way as well. The HDI somehow describes how developed the

average human in a country is.

Due to the alarming increase in air pollution, mainly in urban areas, during the last

decades, several governments and international organisations edited some norms

concerning pollutants concentration in the air (e.g., the Clean Air Act in the US).

Usually these norms specify, for each pollutant, a concentration that should not be

exceeded. Naturally, these norms are just norms and they are often are exceeded.

Therefore, as a good quality air is not guaranteed by norms, different monitoring

systems have been developed in order to provide governments as well as citizens

with some information about air pollution. Two examples of such systems are

the Pollutant Standards Index (PSI), developed by the US Environmental Protec-

tion Agency ((Ott 1978) or http://www.epa.gov/oar/oaqps/psi.html), and the

ATMO Index, developed by the French Environment Ministry (http://www-sante.ujf-grenoble.fr/

SANTE/paracelse/envirtox/Pollatmo/Surveill/atmo.html). These two indi-

cators are very similar and we will discuss the French ATMO.

The ATMO index is based on the concentration of 4 major pollutants: sulfur

dioxide (SO2 ), nitrogen dioxide (NO2 ), ozone (O3 ) and particulate matter (soot,

dust, particles). For each pollutant, a sub-index is computed and the final ATMO

index is defined as being equal to the largest sub-index. Here is how each sub-

index is defined. For each pollutant, the concentration is converted into a number

on a scale from 1 to 10. Level 1 corresponds to an air of excellent quality; levels

5 and 6 are just around the EU long term norms, levels 8 corresponds to the EU

short term norms and 10 indicates hazardous conditions.

To illustrate, suppose that the sub-indices are as in Table 4.5. The resulting

sub-index 3 3 2 8

ATMO index is the largest value, that is 8. Hence the air quality is very bad. In

the following paragraphs, we discuss some problems arising with the ATMO index.

4.2.1 Monotonicity

Suppose that, due to heavy traffic, the absence of wind and a very sunny day, the

ozone sub-index increases from 3 to 8 for the air described in Table 4.5. Clearly,

this corresponds to a worse air: no pollutant did decreased, one of them increased.

In these conditions, we expect the ATMO index to worsen as well. In fact the

62 CHAPTER 4. CONSTRUCTING MEASURES

ATMO index does not change. The maximum is still 8. Thus some changes, even

significant ones, are not reflected by the index. In our example, the change is very

significant as the ozone sub-index was almost perfect and became very bad.

Note that if the ozone sub-index decreases from 8 to 3, the ATMO index does

not change either though the air quality improves. This shows that the ATMO

index is not monotonic. Some changes, in both directions, are not reflected by the

index.

Let us consider the ATMO index for two different airs (x and y), as described by

Table 4.6. Air x is perfect on for all measurements but one: it scores just above

x 1 1 6 1

y 5 4 5 5

the EU long term norm for ozone. Air y is not good for any dimensions. It is of

average quality on all dimensions and close to the EU long term norms for three

dimensions. The ATMO index is 6 for air x and 5 for air y. Hence, the quality of

air x is considered to be lower than that of air y. Contrary to what we observed

with the HDI, no compensation at all occurs between the different dimensions.

The small weakness of x (6 compared to 5, for ozone) is not compensated by its

large strengths (1 compared to 4 or 5, for carbon dioxide, nitrogen dioxide and

dust). In the case of human development, the compensation between dimensions

was too strong. Here, we face another extreme: no compensation at all, which is

probably not better.

4.2.3 Meaningfulness

Let us forget our criticism of the ATMO index and suppose that it works well.

Consider the statement Todays ATMO index (6) is twice as high as yesterdays

index (3). What does it mean ? We are going to show that it is meaningless, in

a certain sense. Let us come back to the definition of the sub-indices. For a given

pollutant, the concentration is measured in g/m3 . The concentration figures are

then transformed into numbers between 1 and 10. This is done in an arbitrary

way. For example, instead of choosing 5-6 for the EU long term norms and 8 for

the short term ones, 6-7 and 9 could have been chosen. The index would work

as well. The relevant information provided by the index is not the figure itself; it

is some information about the fact that we are above or below some norms that

are related to the effects of the pollutants on health (a somewhat similar situation

has been encountered in Chapter 3). But in such a case, the values of todays

and yesterdays index would be different, say 7 and 4, and 7 is not twice as large

as 4. To conclude, the statement Todays ATMO index (6) is twice as high as

4.3. THE DECATHLON SCORE 63

depending upon arbitrary choices. Such a statement is said to be meaningless.

On the contrary, the statement Todays ATMO sub-index for ozone (6) is

higher than yesterdays sub-index for ozone (3) is meaningful. Any reasonable

transformation of the concentration figures into numbers between 1 and 10 would

lead to the same conclusion: todays sub-index is higher than yesterdays one. By

reasonable transformation we mean a transformation that preserves the order:

a concentration cannot be transformed into an index value lower than the index

value corresponding to a lower concentration. Concentration of 110 and 180 g/m3

can be transformed in 3 and 6, or 4 and 6, or 2 and 4 but not 4 and 2.

More subtle: Todays ATMO index (6) is larger than yesterdays ATMO index

(3). Is this sentence meaningful ? In the previous paragraph, we saw that the

arbitrariness involved in the construction of the 1 to 10 scale of a sub-index is not

a problem when we want to compare two values of the same sub-index. But if we

want to compare two values of two different sub-indices, it is no longer true. A

value of 3 on a sub-index could be more dangerous for health than a 6 on another

sub-index. Of course, the scales have been constructed with care: 5 corresponds

to the EU long term norms on all sub-indices and 8 to the short term norms.

This is intended to make all sub-indices commensurable. Comparisons should

thus be meaningful. But can we really assume that a 5 (or the corresponding

concentration in g/m3 ) is equivalent on two different sub-indices ? Equivalent in

what terms ? Some pollutants might have short term effects and other pollutants,

long term effects. They can have effects on different parts of the organism. Should

we compare the effects in terms of discomfort, mortality after n years, health care

costs, . . . ?

The decathlon is a 10-event athletic contest. It consists of 100-meter, 400-meter,

and 1 500-meter runs, a 110-meter high hurdles race, the javelin and discus throws,

shot put, pole vault, high jump, and long jump. It is usually disputed over two

or three days. It was introduced as a three-day event at the Olympic Games

of Stockholm in 1912. To determine the winner of the competition, a score is

computed for each athlete and the athlete with the best score is the winner. This

score is the sum of the single-event scores. The single event scores are not just

times and distances. It doesnt make sense to add the time of a 100-meter run to

the time of a 1 500-meter run. It is even worse to add the time of a run to the

length of a jump. This should be obvious for everyone.

Until 1908, the single-event scores were just the rank of an athlete in that

event. For example, if an athlete performed the third best high jump, his single-

event score for the high jump was 3. The winner was thus the athlete with the

lowest overall score. Note that this amounts to using the Borda method (see p.14)

to elect the best athlete when there are ten voters and the preferences of each

voter are the rankings defined by each event.

The main problem with these single-event scores is that they very poorly reflect

64 CHAPTER 4. CONSTRUCTING MEASURES

points

points

distance distance

Figure 4.2: Decathlon tables for distances: general shape of a convex (left) and

concave (right) tables

the performances of the athletes. Suppose that an athlete arrived 0.1 second before

the next athlete in the 100-meter run. They have ranks i and i+1. So the difference

in the scores that they receive is 1. Suppose now that the delay between these

two athletes is 1 second. Their ranks are unchanged. Thus the difference of in

the scores that they receive is still 1 though a larger difference would be more

appropriate. That is why other tables of single-event scores have been used since

1908 (de Jongh 1992, Zarnowsky 1989). In the tables used after 1908, high scores

are associated to good performances (contrary to scores before 1908). Hence, the

winner is the athlete that has the highest overall score.

Some of these tables (different versions, in use between 1934 and 1962) are

based on the idea that improving a performance by some amount (e.g. 5 centime-

tres in a long jump) is more difficult if the performance is close to the world record.

Hence, it deserves more points. The general shape of these tables, for distances,

is given in Figure 4.2 (convex table). For times (in runs), the shape is different as

an improvement is a decrease in time.

A problem raised by convex tables is the following: if an athlete decides to

focus on some events (for example the four kinds of runs) and to do much more

training for them than for the other ones, he will have an advantage. He will come

closer to the world record for runs and earn many points. At the same time, he

will be further away from the world record for the other disciplines but that will

make him lose less points as the slope of the curve is more gentle in that direction.

The balance will be positive. Thus these tables encourage athletes to focus on

some disciplines, which is contrary to the spirit of the decathlon.

That is why, since 1962, different concave tables (see Figure 4.2) have been used.

These tables strongly encourage the athletes to be excellent in all disciplines. An

example of a real table, in use in 1998, is presented in Figure 4.3. Note that a new

change occurred: this table is no longer concave. It is almost linear but slightly

convex.

There are many interesting points to discuss about the decathlon score.

How are the minimum and maximum values set ? They can highly influ-

ence the score as it was shown with the HDI (in Section 4.1.1). Obviously,

the maximum value must somehow be related to the world record. But as

4.3. THE DECATHLON SCORE 65

1200

1100

1000

900

score

800

700

600

500

400

100 meters time

Figure 4.3: A plot for the 100 meters run score table in 1998

everyone knows, world records are objects that athletes like to break.

Why adding single-event scores ? Other operations might work as well. For

example, multiplication may favour the athletes that perform equally well in

all disciplines. To illustrate this point very simply, consider a 3-event contest

where single-event scores are between 0 and 10. An athlete, say x obtains 8

in all three events. Another one, y obtains 9, 8 and 7. If we add the scores,

x and y obtain the same score: 24. If we multiply the scores, x gets 512

while y looses with 504.

...

The point on which we will focus, in this decathlon example, is the role of the

indicator.

Although one might think that the role of the overall score is clearly to designate

the winner, we are going to show that it plays many roles (like student grades, see

Chapter 3) and that this is one of the reasons why it changes so often. Of course,

one of the roles is to designate the winner and it was probably the only purpose

that the first designers of the score had in mind. But we can be quite sure that

immediately after the first contest, another role arose. Many people probably used

the scores to assess the performance of the athletes. Such athlete has a score very

66 CHAPTER 4. CONSTRUCTING MEASURES

close to that of the winner and is thus a good athlete. Another one is far from the

winner and is consequently not a good one athlete.

Not much later (after the second competition), a third role appeared. How did

the athletes evolve ? This athlete has improved his score or x has a better score in

this contest than the score of y in the previous contest. This kind of comparison

is not meaningful: suppose that an athlete wins a contest with a score of 16. In

the next contest, he performs very poorly: short jumps, slow runs, short throws.

But his main opponents are absent or perform equally poorly. He might still win

the contest and even with a higher score although his performance is worse than

the previous time.

After some time, the organisers of decathlons became aware of the second and

third role. It was probably part of the motivations to abandon the sum of ranks

and to use convex tables. These tables, to some extent, made the comparisons of

scores across athletes and/or competitions meaningful. At the same time, the score

found a new role as a monitoring tool during the training. Before 1908, the scores

could be computed only during competitions as they were sums of ranks. And it

was not long before a wise coach used it as a strategic tool, advising his athlete to

focus on some events. For this reason, since 1962, the organisers conferred a new

role to the score: to foster excellence in all disciplines. This was achieved by the

introduction of concave tables. But it is most likely that the score is still used as

a strategic tool, hopefully in a less perverse way.

It is worth noting that this new role doesnt replace any of the previous ones.

The score aims at rewarding equal performances in all disciplines but it is also

used to assess the performance of an athlete. Even if we only consider only these

two roles (the other ones could be seen as side effects), it is amazing to see how

incompatible they are.

support

Classically, in a decision aiding process, a decision-maker wants to rank the ele-

ments of the set of alternatives (or to choose the best element). In order to rank,

he selects several dimensions (criteria) that seem relevant with respect to his prob-

lem. Each alternative is characterised by a performance on each criterion (this is

the evaluation matrix or performance tableau). A MCDA method is then used to

rank the alternatives, with respect to the preferences of the decision-maker.

When an indicator is built, several dimensions are also selected. Each item is

characterised by a performance on each dimension. An index that can be used to

rank the items is computed. The analogy between a decision support method and

an index is obvious: both aim at aggregating multi-dimensional information about

a set of objects. But there is a tremendous difference as well: when an indicator is

built, it is often the case that there is no clearly defined decision problem, decision-

maker and, a fortiori, preferences. To avoid the absence of preference, one could

consider that the preferences are those of the potential users of the indicator.

To some extent, this is possible because very often the preferences of the users

4.4. INDICATORS AND MULTIPLE CRITERIA DECISION SUPPORT 67

go in the same direction for each dimension taken separately. For example, for

each dimension of the ATMO index, everyone prefers a lower concentration. But

it is definitely not reasonable to assume that the global preferences are similar.

Furthermore, even if single-dimensional preferences go in the same direction, it

does not mean that single-dimensional preferences are identical. Those who are

not very sensitive to a pollutant will value a decrease in concentration much more

if it occurs at high concentration than at low concentration. On the contrary,

sensitive people might value concentration decreases at low and high levels equally.

The absence of preferences is crucial. In decision support, many studies and con-

cepts relate to measurement theory. Measurement theory is the theory that studies

how we can measure objects (assign a number to an object) so as to reflect a re-

lation on these objects. E.g., how can we assign numbers to physical objects so

as to reflect the relation heavier than ? That is, how to assign a number (called

weight) to each object so that xs weight > ys weight implies x is heavier than

y ? Additional properties may be required. For example, in the case of weight

measurement, one wishes that the number assigned to x and y taken together be

the sum of their individual weights.

Another example is that of distance. How to assign numbers to points in

the space so as to reflect the relation more distant than with respect to some

reference point ? Contrary to the previous example, this one has several dimensions

(usually two or three: : x, y or x, y, z or altitude, longitude, latitude, etc.). Each

object (point) is characterised by a performance (co-ordinate) in each dimension

and one tries to aggregate these performances into one indicator: the distance

to the reference point. This problem is at the core of geometry. Note that the

answer is not unique. Very often the Euclidean distance is chosen (assuming that

the shortest path between two points is the straight line). Sometimes, a Gaussian

distance is more relevant (when you consider points on the earths surface, unless

you are a mole, the shortest path is no longer a straight line but a curve). In other

circumstances, the Manhattan distance is more appropriate (between two points

in Manhattan, if you are not flying, the shortest path is not a straight line nor a

curve, it is a succession of perpendicular straight lines). And there are many other

distances.

As far as physical properties are concerned (larger than, warmer than, faster

than, . . . ), the problem is easy: good measurements were carried out in Antiquity

without any theory of measurement. But when we consider other kinds of relations,

things are more complex. How to assign numbers to people or alternatives so as to

reflect the relations more loveable than, preferable to or more risky than ?

In such cases, measurement theory can be of great assistance but is insufficient to

solve all problems.

In decision support, measuring objects with respect to the relation is preferred

to can be of some help because, once the objects have been measured, it is rather

easy to handle numbers. It is often assumed that a preference relation over the

alternatives exists but is not well known and one tries to measure the alternatives

68 CHAPTER 4. CONSTRUCTING MEASURES

assumed to completely exist a priori. Preferences can emerge and evolve during

the decision aid process, but some characteristics of the preference relation still

exist a priori. Measurement theory can therefore be used to build or to analyse a

decision support method.

Many indices are built without the assumption that a relation over the items a

priori exists or without trying to reflect a pre-existent relation. On the contrary,

it seems that, in many cases, the aim of an index is precisely to build or create

a relation over the items. Therefore, in such a case, measurement theory cannot

tell us much about the index. Measurement theory loses some of its power when

there is no a priori relation to be reflected.

The index does not help to uncover reality, that is a pre-existent relation. It insti-

tutes or settles reality (Desrosieres 1995). This is very obvious with the decathlon

score. Between 1908 and 1962, the scores were designed to assess the performances

and to compare them. As one of the most important things for a professional ath-

lete is to win (contrary to the opinion of de Coubertin), the score is considered as

the true measure of performance. Any athlete that was not convinced of this had

to change his mind and to behave accordingly if he wanted to compete. This is

not particular to the decathlon score. Many governments probably try to exhibit

good HDI for their country in order to keep international subsidies or to legitimise

their authority to the population of the country or foreign governments. Some city

councils, willing to attract high salaried residents, claim, among others, to have

high air quality. The most efficient way for them to make their claim credible is to

exhibit a good ATMO index (or any other index in countries other than France),

even if other policies might be more beneficial to the country.

One might be tempted to reject any indicator that does not reflect reality,

that, in some arbitrary way, institutes reality. Nevertheless, the indicators are

not useless. An indicator can be considered as a kind of language. It is based on

some (more or less necessarily arbitrary) conventions and helps us to efficiently

communicate about different topics or perform different tasks. By efficiently,

we mean more efficiently than without any language; not necessarily in the

most efficient way. As any language, it is not always precise and leaves room

for ambiguities and contradictions. If the people that created the decathlon had

decided to wait until a sound theory shows them how to designate the winner, it

is very likely that no decathlon contest would ever have taken place.

But this does not mean that all indicators are equally good. Ambiguities and

contradictions are certainly adequate for poetry otherwise we could never enjoy

things like this:

Resuenan

en otra calle

donde

4.5. CONCLUSIONS 69

pasar en esta calle

donde

Solo es real la niebla 1

or

Wenn ich mich lehn an deine Brust,

kommts uber mich wie Himmelslust;

doch wenn du sprichst: ich liebe dich!

so muss ich weinen bitterlich.2

generally be kept at a minimum. When possible, they should be avoided. When

certain elements of preferences are known for sure, all indicators should reflect

them.

In a decision aiding process, preferences are not perfectly known a priori. Other-

wise, it would be very unlikely that any aid would be required. Therefore, relying

solely on measurement theory is not possible. Most decision aiding processes, like

most indicators, probably cannot avoid some arbitrary elements. They can occur

at different steps of the process: the choice of an analyst, of the criteria, of the

aggregation scheme, to mention a few.

But unlike cases where indicators are built without any decision problem in

mind, most decision aiding processes relate to a more or less precisely defined de-

cision problem. Consequently, at least some elements of preferences are present.

Therefore, if some measurement (associating numbers to alternatives) is performed

during the aiding process, measurement theory can be used to ensure that the

model built during the aiding process does not contradict these elements of pref-

erences, that it reflects them and that all sound conclusions that can be drawn

from the conjunction of these elements are actually drawn.

4.5 Conclusions

Among evaluation and decision models, indicators are probably more widespread

than any other model (this is definitely true if you think of cost-benefit analysis or

1 Octavio Paz, Here, translated by Nims (1990)

passing in this street / where / Nothing is real but the fog

2 Heinrich Heine, Ich liebe dich, translated by Louis Untermeyer(van Doren 1928)

And when I lean upon your breast / My soul is soothed with godlike rest; / But when you

swear: I love but thee! / Then I must weepand bitterly.

70 CHAPTER 4. CONSTRUCTING MEASURES

multiple criteria decision support). Student grades are also very popular, as well

almost every one has faced them at some point of his lifebut, besides the fact that

most people use and/or encounter them, indicators are pervasive in many domains

of human activity, contrary to student grades that are confined to education (note

that student grades could be considered as special cases of indicators).

Indicators are not often thought of as decision support models but, actually,

in many circumstances, are. Indicators are usually presented as an efficient way

to synthesise information. But what do we need information for ? For making

decisions !

In this chapter, we analyzed three different indicators: the human development

index, the ATMO (an air quality index) and the decathlon score.

On the one hand, all three indicators have been shown to present flaws: they

do not always reflect reality or what we consider as reality. This is due to an excess

or a lack of compensation, to non monotonicity, to an incapability of dealing with

dimension dependence, . . . These problems are not specific to indicators. Some of

them have already been discussed in Chapter 3 and/or will be met in Chapter 6.

On the other hand, we saw that an indicator does not necessarily need to reflect

reality or, at least, it does not need to reflect only reality.

5

ASSESSING COMPETING

PROJECTS: THE EXAMPLE OF

COST-BENEFIT ANALYSIS

5.1 Introduction

Decision-making inevitably implies, at some stage, the allocation of rare resources

to some alternatives rather than to others (e.g. deciding how to use ones income).

It is therefore not at all surprising that the question of helping a decision-maker

to choose between competing alternatives, projects, courses of action and/or to

evaluate them, has attracted the attention of economists. Cost-Benefit Analysis

(CBA) is a set of techniques that economists have developed for this purpose. It

is based on the following simple and apparently inescapable idea: a project should

only be undertaken when its benefits outweigh its costs.

CBA is particularly oriented towards the evaluation of public sector projects.

Decisions made by governments, public agencies and firms or international organ-

isations are complex and have a huge variety of consequences. Some examples of

areas in which CBA has been applied will give a hint of the type of projects that

are evaluated:

locating budgets among agencies, developing an energy policy for a nation

(Dinwiddy and Teal 1996, Kirkpatrick and Weiss 1996, Little and Mirlees

1968, Little and Mirlees 1974),

Harvey 1998), building a high-speed train, reorganising the bus lines in a

city (Adler 1987, Schofield 1989),

diagnosis tools, choosing standard treatments for certain types of illnesses

(Folland, Goodman and Stano 1997, Johannesson 1996),

proving the human consumption of genetically-modified organisms, or irra-

diated food (Hanley and Spash 1993, International Atomic Energy Agency

1993, Johansson 1993, Toth 1997).

71

72 CHAPTER 5. ASSESSING COMPETING PROJECTS

These types of decision are immensely complex. They affect our everyday

life and are likely to affect that of our children. Most economists view CBA as

the standard way of evaluating such projects and of supporting public decision-

making (numerous examples of practical studies using CBA can easily be found in

applied economics journals, e.g. American Journal of Agricultural Economics, En-

ergy Economics, Environment and Planning, Journal of Environmental Economics

and Management, Journal of Health Economics, Journal of Policy Analysis and

Management, Journal of Public Finance and Public Choice, Journal of Transport

Economics and Policy, Land Economics, Pharmaco-Economics, Public Budget-

ing and Finance, Regional Science and Urban Economics, Water Resources Re-

search). Since fairly different approaches to these problems have been advocated,

it is important to have a clear idea of what CBA is; if the claim of economists

was to be perfectly well-founded there would be hardly any need for other deci-

sion/evaluation models.

Although it has distant origins (see Dupuit 1844), the development of CBA

has unsurprisingly coincided with the more active involvement of governments in

economic affairs that started after the great depression and climaxed after World

War II in the 50s and 60s. A good overview of the early history of CBA can

be found in Dasgupta and Pearce (1972). After having started in the USA in

the field of Water Resource Management (see Krutilla and Eckstein (1958) for

an overview of these pioneering developments), the principles of CBA were soon

adopted in other areas and countries, the UK being the first and more active one.

While research on (and applications of) CBA grew at a very fast rate during the

50s and 60s, the principles of CBA were entrenched in a series of very influential

manuals for project evaluation produced by several international organisations

(OECD: Little and Mirlees (1968), Little and Mirlees (1974), ONUDI: Dasgupta,

Marglin and Sen (1972) and, more recently, World Bank: Adler (1987), Asian De-

velopment Bank: Kohli (1993)). In many countries nowadays, the Law makes it

an obligation to evaluate projects using the principles of CBA. Research on CBA

is still active and economists have spent considerable time and energy in investi-

gating its foundations and refining the various tools that it requires in practical

applications (recent references include Boardman 1996, Brent 1996, Nas 1996).

It would be impossible to give a fair account of the immense literature on CBA

in a few pages. Although somewhat old, two excellent introductory references are

Dasgupta and Pearce (1972) and Lesourne (1975). Less ambitiously, we shall try

here to:

These three objectives structure the rest of this chapter into sections. Our

aim, while clearly not being to promote the use of CBA, is not to support the

nowadays-fashionable claim (especially among environmentalists) that CBA is an

outdated useless technique either. In pointing out what we believe to be some

5.2. THE PRINCIPLES OF CBA 73

limitations of CBA, we only want to give arguments refuting the claim of some

economists that, under all circumstances, it is the only consistent way to support

decision/evaluation processes (Boiteux 1994).

5.2.1 Choosing between investment projects in private firms

The idea that a project should only be undertaken if its benefits outweigh its

costs is at the heart of CBA. This claim may seem so obvious that it need not

be discussed any further. It is of little practical content however unless we define

more precisely what costs and benefits are and how to evaluate and compare

them. Some discussion will therefore prove useful.

A simple starting point is to be found in the literature on Corporate Finance on

the choice between investment projects in private firms. An investment project

may usefully be seen as an operation in which money is spent today (the costs),

with the hope that this money will produce even more money (the benefits)

tomorrow.

A useful way to evaluate such an investment project is the following. First a

time horizon for its evaluation must be chosen. If the very nature of the project

may command this choice (e.g. because after a certain date the Law will change,

equipment will have to be replaced) the general case is that the duration of the

project is more or less conventionally chosen as the period of time for which it

seems reasonable and useful to perform the evaluation.

Although a continuous evaluation is theoretically possible, real-world appli-

cations imply dividing the duration of the project into time periods of equal length.

This involves some arbitrariness (should we choose years or semesters?) as well as

trade-offs between the depth and the complexity of the evaluation model.

Suppose now that a project is to be evaluated on T time periods of equal length.

The next step is to try to evaluate the consequences of the project in each of these

time periods. Such a task may be more or less easy depending on the nature of

the project, the environment of the firm and the duration of the project. We seek

to obtain an evaluation of the amount of cash that is generated by the project

during each time period, this amount being the difference between the benefits

and the expenses generated by the project (including the residual value of the

project in the last period). Note that these evaluations are relative: they aim at

capturing the influence of the project on the firm and not its overall situation.

Let us denote b(i) (resp. c(i)) the benefits (resp. the expenses) generated by the

project during the ith period of time. The net effect of the project in period i is

therefore a(i) = b(i) c(i).

At this stage, the evaluation model of the project has the form of an evaluation

vector with T +1 components (a(0), a(1), . . . , a(T )) where 0 conventionally denotes

the starting time of the project. In general, some of the components of this vector

(most notably a(0)) will be negative (if not, you should enjoy the free lunch and

there is hardly any evaluation problem). Although all components of the evaluation

vector are expressed in identical monetary units (m.u.), the (algebraic) sum a(0)

74 CHAPTER 5. ASSESSING COMPETING PROJECTS

is to be received today while a(1) will only be received one time period ahead.

Therefore these two numbers, although expressed in the same unit, are not directly

comparable. There is a simple way however to summarise the components of the

evaluation vector using a single number.

Suppose that there is a capital market on which the firm is able to lend or

borrow money at a fixed interest rate of r per time period (this market is assumed

to be perfect: borrowing and lending will not affect r and are not restricted). If

you borrow 1 m.u. for one time period on this market today, you will have to spend

(1 + r) m.u. in period 1 in order to respect your contract. Similarly, if you know

1

that you will receive 1 m.u. in period 1, you can borrow an amount of 1+r m.u.:

your revenue of 1 m.u. in period 1 will allow you to reimburse exactly what you

1

have to i.e. 1+r (1 + r) = 1 m.u. Hence, being sure of receiving 1 m.u. in period

1

1 corresponds to receiving, here and now, an amount 1+r m.u. Using a similar

reasoning and taking into account compound interest, receiving 1 m.u. in period

1

i corresponds to an amount of (1+r) i m.u. now. This is what is called discounting

This suggests a simple way of summarising the components of the vector

(a(0), a(1), . . . , a(T )) as the sum to be received now that is equivalent to this

cash stream via borrowing and lending operations on the capital market. This

sum, called the Net Present Value (N P V ) of the project is given by:

T T

X b(i) c(i)

X a(i)

(5.1) NPV = i

=

i=0

(1 + r) i=0

(1 + r)i

now, i.e. taking into account the costs and the benefits of the project and their

dispersion in time, it appears that the project makes the firm richer and, thus,

should be undertaken. The reverse conclusion obviously holds if N P V < 0. When

N P V = 0, the firm is indifferent between undertaking the project or not.

This simple reasoning underlies the following well-known rule for choosing be-

tween investment projects in Finance: when projects are independent, choose all

projects that have a strictly positive N P V . In deriving this simple rule, we have

made various hypotheses. Most notably:

the duration was divided into conveniently chosen time periods of equal

length,

benefits b(i) and costs c(i) expressed in m.u. for each time period,

5.2. THE PRINCIPLES OF CBA 75

other possible constraints were ignored (e.g. projects may be exclusive, syn-

ergetic).

The literature in Finance is replete with extensions of this simple model that allow

to cope with less simplistic hypotheses.

Although the projects that are usually evaluated using CBA are considerably more

complex than the ones we implicitly envisaged in the previous paragraph, CBA

may usefully be seen as using a direct extension of the rule used in Finance. The

main extensions are the following:

in CBA costs and benefits are evaluated from the point of view of so-

ciety,

in CBA costs and benefits are not necessarily directly expressed in m.u.;

when this happens, conveniently chosen prices are used to convert them

into m.u.,

in CBA the discounting rate has to be chosen from the point of view of

society.

Retaining the spirit of the notations used above, the benefits b(i) and costs c(i)

of a project in period i are seen in CBA as vectors with respectively ` and `0

components:

c(i) = (c(1, i), c(2, i), . . . c(`0 , i))

where b(j, i) (resp. c(k, i)) denotes the social benefits (resp. the social costs)

on the jth dimension (resp. on the kth dimension), evaluated in units that are

specific to that dimension, generated by the project in period i.

In each period, costs and benefits are converted into m.u. using suitably

chosen prices. We denote by p(j) (resp p0 (k)) the price of one unit of social

benefit on the jth dimension (resp. one unit of the social cost on the kth dimension)

expressed in m.u. (for simplicity, and consistently with real-world applications,

prices are assumed to be independent from the time period). These prices are

used to summarise the vectors b(i) and c(i) into single numbers expressed in m.u.

letting:

P

b(i) = p(j)b(j, i)

j=1

and

0

`

p0 (k)c(k, i)

P

c(i) =

k=1

76 CHAPTER 5. ASSESSING COMPETING PROJECTS

where b(i) (resp. c(i)) denotes the social benefits (resp. costs) generated by the

project in period i converted into m.u.

After this conversion and having suitably chosen a social discounting rate r,

it is possible to apply the standard discounting formula for computing the Net

Present Social Value (N P SV ) of a project. We have:

0

`

p0 (k)c(k, i)

P P

T T p(j)b(j, i)

X b(i) c(i) X j=1 k=1

(5.2) N P SV = =

i=0

(1 + r)i i=0

(1 + r)i

of society and, thus, should be implemented (in the absence of other constraints).

It should be observed that the difficulties that we mentioned concerning the

computation of the NPV are still present here. Extra difficulties are easily seen to

emerge:

how can one evaluate benefits and costs from a social point of view?

etary units and how should the prices be chosen?

a yardstick. Clearly the foundations of such a method and the way of using it

in practice deserve to be clarified. Section 5.2.3 presents an elementary theoret-

ical model that helps understanding the foundations of CBA. It may be skipped

without loss of continuity.

It is obviously impossible to give a complete account of the vast literature on the

foundations of CBA which has deep roots in Welfare Economics here. We would

however like to give a hint of why CBA consistently insists on trying to price out

every effect of a project. The important point here is that CBA conducts project

evaluation within an environment in which markets are especially important

instruments of social co-ordination.

Consider a one-period economy in which m individuals consume n goods that are

exchanged on markets. Each individual j is supposed to have completely ordered

preferences for consumption bundles. These preferences can be conveniently repre-

sented using a utility function Uj (qj1 , qj1 , . . . , qjn ) where qji denotes the quantities

of good i consumed by individual j.

5.2. THE PRINCIPLES OF CBA 77

of the individuals through a social utility function (or social welfare function)

W (U1 , U2 , . . . , Un ). It is useful to interpret W as representing the preferences of a

planner regarding the various social states.

Starting from an initial situation in the economy, consider a project, inter-

preted as an external shock to the economy, consisting in a modification of the

quantities of goods consumed by each individual. These modifications are sup-

posed to be marginal; they will not affect the prices of the various goods. The

impact of such a shock on social welfare is given by (assuming differentiability):

m X

X n

(5.3) dW = Wj Uji dqji

j=1 i=1

where

W U

Wj = U j

and Uji = qjij

Social welfare will increase following the shock if dW > 0.

The existence of markets for the various goods and the hypothesis that indi-

viduals operate on these markets so as to maximise utility ensure that, before the

shock, we have, for all individuals j and for all goods i and k:

Uji pi

(5.4) =

Ujk pk

where pi denotes the price of the ith good. Having chosen a particular good

for numeraire (we shall call that good money), this implies that:

(5.5) Uji = j pi

where j can be interpreted as the marginal effect on the utility of individual

j of a marginal variation of the consumption of the numeraire good, i.e. as the

marginal utility of income for individual j.

Using 5.5, 5.3 can be rewritten as:

m

X n

X

(5.6) dW = i Wj pi dqji

j=1 i=1

the increase in social welfare following a marginal increase of the income of indi-

vidual j.

Under the hypothesis that, before the shock, the distribution of income is op-

timal in the society, the conclusion is that the coefficients i Wj are constant over

individuals (otherwise income would have been reallocated in favour of individuals

for which i Wj is the larger). Under this hypothesis, we may always normalise W

in such a way that i Wj = 1, for all j. We therefore rewrite equation 5.6 as:

78 CHAPTER 5. ASSESSING COMPETING PROJECTS

m X

X n

(5.7) dW = pi dqji

j=1 i=1

which amounts to saying that the social effects of the shock are measured as

the sum over individuals of the variation of their consumption evaluated at market

prices (i.e. the so-called consumer surplus). In this simple model, variations of

social welfare are therefore conveniently measured in money terms using market

prices.

Returning to CBA, the relation 5.7 coincides with the computation of the

N P SV when time is not an issue and the effects (costs or benefits) of a project

can be expressed in terms of consumption of goods exchanged on markets. The

general formula for computing the N P SV may be seen as an extension of 5.7

without these restrictions.

The limitations of the elementary model presented above are obvious. The most

important ones seem to be the following:

(and in particular no taxes),

In spite of all its limitations, our model allows us to understand, through the

simple derivation of equation 5.7, the rationale for trying to price out all effects of

a project in order to assess its contribution to social welfare.

A detailed treatment of the foundations of CBA without our simplifying hy-

potheses can be found in Dreze and Stern (1987). Although we shall not enter

into details, it should be emphasised that the theoretical foundations of CBA are

controversial on some important points. The appropriateness of equation 5.7 and

of related formulas is particularly clear in situations that are fairly different from

the ones in which CBA is currently used as an evaluation tool. These are often

characterised by:

in a city),

the presence of numerous public goods for which no market price is available

(think of health services or education),

a new motorway),

5.3. SOME EXAMPLES IN TRANSPORTATION STUDIES 79

regulations),

effects that are highly complex and may concern a very long period of time

(think of a policy for storing used nuclear fuel),

effects that are very unevenly distributed among individuals and raise im-

portant equity concerns (think of your reaction if a new airport were to be

built close to your second residence in the middle of the countryside),

long term effects of air pollution on health),

aesthetic value of the countryside) and, thus, to price them out

In spite of these difficulties, CBA still mainly rests on the use of the N P SV (or

some of its extensions) to evaluate projects. Economists have indeed developed an

incredible variety of tools in order to use the N P SV even in situations in which

it would a priori seem difficult to do so. It is impossible to review the immense

literature that these efforts have generated here. It includes: the determination of

prices for goods without markets, e.g. contingent valuation techniques or hedonic

prices (see Scotchmer 1985, Loomis, Peterson, Champ, Brown and Lucero 1998),

the determination of an appropriate social discounting rate (useful references on

this controversial topic include Harvey 1992, Harvey 1994, Harvey 1995, Keeler

and Cretin 1983, Weitzman 1994), the inclusion of equity considerations in the

calculation of the NPSV (Brent 1984), the treatment of uncertainty, the consid-

eration of irreversible effects (e.g. through the use of option values). An overview

of this literature may be found in Sugden and Wiliams (1983) and in Zerbe and

Dively (1994). We will simply illustrate some of these points in section 5.3.

Public investment in transportation facilities amounts to over 80 109 FRF annu-

ally in France (around 14 109 USD or 14 109 e). CBA is presently the standard

evaluation technique for such projects. It is impossible to give a detailed account

of how CBA is currently applied in France for the evaluation of transportation

investment projects; this would take an entire book even for a project of moderate

importance. In order to illustrate the type of work involved in such studies, we

shall only take a few examples (for more details, see Boiteux (1994) and Syndi-

cat des Transports Parisiens (1998); a useful reference in English is Adler (1987))

based on a number of real-world applications. For concreteness, we shall envisage a

project consisting in the extension of an underground line in the suburbs of Paris.

Effects of such a project are clearly very diverse. We will concentrate on some of

them here, leaving direct financial effects aside (construction costs, maintenance

costs, exploitation costs) although their evaluation may raise problems.

80 CHAPTER 5. ASSESSING COMPETING PROJECTS

An inevitable step in all studies of this type is to forecast the modification of the

volume and the structure of the traffic that would follow the implementation of the

project. Its main benefits consist in time gains, which are obviously directly

related to traffic forecasts (time gains converted into m.u. frequently account for

more than 50% of the benefits of these types of projects).

Implementing such forecasting models is obviously an enormous task. Local

modifications in the offer of public transportation may have consequences on the

traffic in the whole region. Furthermore, such forecasts are usually made at an

early stage of development of the project, a stage in which all details (concerning

e.g. the tariffing of the new infrastructure or the frequency of the trains) may not

be completely decided yet.

Traffic forecast models usually involve highly complex modal choice modules

coupled with forecasting and/or simulation techniques. Their outputs are clearly

crucial for the rest of the study. Nearly all public transportation firms and gov-

ernmental agencies in France have developed their own tools for generating traffic

forecasts. They differ on many points, e.g. the statistical tools used for modal

choice or the segmentation of the population that is used (Boiteux 1994). Unsur-

prisingly these models lead to very different results.

As far as we know, all these models forecast the traffic for a period of time that

is not too distant from the installation of the new infrastructure. These forecasts

are then more or less mechanically updated (e.g. increased following the observed

rate of growth of the traffic in the past few years) in order to obtain figures for all

the periods of study. None of them seem to integrate the potential modifications

of behaviour of a significant proportion of the population in reaction to the new

infrastructure (e.g. by moving away from the centre of the city) whereas such

effects are well-known and have proved to be overwhelming in the past.

These models are not part of CBA and indicating their limitations should

not be seen as a criticism of CBA. Their results, however, form the basis of the

evaluation model.

Traffic forecasts are used to evaluate the time that inhabitants of the Paris region

would gain with the extension of the metro line. Such evaluations, on top of being

technically rather involved, raise some basic difficulties:

is one minute equal to one minute? Such a question may not be as silly as

it seems. In most models time gains are evaluated on the basis of what is

called generalised time i.e. a measure of time that accounts for elements of

(dis)comfort of the journey (e.g. temperature, stairs to be climbed, a more

or less crowded environment). Although this seems reasonable, much less

efforts have been devoted to the study of models allowing to convert time

into generalised time than on the price of time that will be used afterwards,

5.3. SOME EXAMPLES IN TRANSPORTATION STUDIES 81

is one hour worth 60 times one minute? Most models evaluating and pricing

out time gains are strictly linear. This is dubious since some gains (e.g. 10

seconds per user-day) might well be considered insignificant. Furthermore,

the loss of one hour daily for some users may have a much greater impact

than 60 losses of 1 minute,

what is the value of time and how should time gains be converted into mon-

etary units? Should we take the fact that people have different salaries into

account? Should we rather use price based on stated preferences? Should

we take into account the fact that most surveys using stated preferences have

shown that the value of time highly depends on the motive of the journey

(being much lower for journeys not connected to work)?

The present practice in the Paris region is to linearly evaluate all (generalised)

time gains using the average hourly net salary in the Region (74 FRF/hour in 1994

or approximately 13 USD/hour or 13 e/hour). In view of the major uncertainties

surrounding traffic forecasts that are used to compute the time gains and the

arbitrariness of the price of time that is used, it does not seem unfair to consider

that such evaluations give, at best, interesting indications.

Important benefits of projects in public transportation are security gains (hope-

fully, using the metro is far less risky than driving a car). A first step consists in

evaluating, based on traffic forecasts, the gain of security in terms of the number

of (statistical) deaths and serious injuries that would be avoided annually by the

project. The following one consists in converting these figures into monetary units

through the use of a price for human life. The following figures are presently

used in France (in 1993 FRF; they should be divided by a little less than 6 in order

to obtain 1993 USD):

Serious injury 370 000 FRF

Other injury 79 000 FRF

these figures being based on several stated preference studies (it is not without

interest to note that these figures were quite different before 1993, human life

being, at that time, valued at 1 866 000 FRF). Using these figures and combining

them with statistical information concerning the occurrence of car accidents and

their severity, leads to benefits in terms of security which amount to 0.08 FRF per

vehicle-km avoided in the Paris region.

Although this might not appear as a very pleasant subject of study, econo-

mists have developed many different methods for evaluating the value of human

life, including methods based on human capital, the value of life insurance con-

tracts, sums granted by courts following accidents, stated preference approaches,

revealed preference approaches including smoking and driving behaviour, wages

82 CHAPTER 5. ASSESSING COMPETING PROJECTS

for activities involving risk (Viscusi 1992). Besides raising serious ethical dif-

ficulties (Broome 1985), these studies exhibit incredible variations across tech-

niques and, seemingly similar, countries (this explains why in many medical stud-

ies, in which benefits mainly include lives saved, cost-effectiveness analysis

is often preferred to CBA since it does not require to price out human life (see

Johannesson 1995, Weinstein and Stason 1977). We reproduce below some sig-

nificant figures for the value of life used in several European countries (this table

is adapted from Syndicat des Transports Parisiens 1998); all figures are in 1993

European Currency Unit (ECU), one 1993 ECU being approximately one 1993

USD):

Denmark 628 147 ECU

Finland 1 414 200 ECU

France 600 000 ECU

Germany 406 672 ECU

Portugal 78 230 ECU

Spain 100 529 ECU

Sweden 984 940 ECU

UK 935 149 ECU

The inclusion of other effects in the computation of the NPSV of a project in such

studies raises difficulties similar to the ones mentioned for time gains and security

gains. Their evaluation is subject to much uncertainty and inaccurate determina-

tion. Moreover the prices that are used to convert them into monetary units can

be obtained using many different methods leading to significantly different results.

As is apparent in Syndicat des Transports Parisiens (1998), prices used to

monetarise effects like:

noise,

local air pollution,

contribution to the greenhouse effect,

are mainly conventional.

The social discounting rate used for such projects is determined by the govern-

ment (the Commissariat General du Plan). Presently a rate of 8% is used (note

that this rate is about twice as high as the rate commonly used in Germany). A

period of evaluation of 30 years is recommended for this type of project.

The conclusions and recommendations of a recent official report (Boiteux 1994)

on the evaluation of public transportation projects stated that:

although CBA has limitations, it remains the best way to evaluate such

projects,

5.4. CONCLUSIONS 83

computation of the NPSV,

all other effects should be described verbally. Monetarised effects and non

monetarised ones should not be included in a common table that would

give the same statute and, implicitly, importance to all. A multiple criteria

presentation would furthermore attribute an unwarranted scientific value to

such tables,

order to allow meaningful comparisons,

In view of:

ing in the evaluation model,

the conclusion that CBA remains the best method seems unwarranted. CBA

has often been criticised on purely ideological grounds, which seems ridiculous.

However the insistence on seeing CBA as a scientific, rational and objective

evaluation model, all words that are frequently spotted in texts on CBA (Boiteux

1994), seems no more convincing.

5.4 Conclusions

CBA is an important decision/evaluation method. We would like to note in par-

ticular that:

basis. Contrary to many other decision/evaluation methods that are more or

less ad hoc, the users of CBA can rely on more than 50 years of theoretical

and practical investigations,

CBA emphasises the fact that decision and/or evaluation methods are not

context-free. Having emerged from economics, it is not surprising that mar-

kets and prices are viewed as the essential parts of the environment in CBA.

More generally, any decision/evaluation method that would claim to be

context-free would seem of limited interest to us,

84 CHAPTER 5. ASSESSING COMPETING PROJECTS

providing simple tools allowing, in a decentralised way, to ensure a minimal

consistency between decisions taken by various public bodies. Any deci-

sion/evaluation model should tackle this problem,

CBA explicitly acknowledges that the effects of a project may be diverse

and that all effects should be taken into account in the model. In view of

the popularity of purely financial analyses for public sector projects, this is

worth recalling (Johannesson 1995),

although the implementation of CBA may involve highly complex models

(e.g. traffic forecasts), the underlying logic of the method is simple and easily

understandable,

CBA is a formal method of decision/evaluation. It is the belief and expe-

rience of the authors of this book that such methods may have a highly

beneficial impact on the treatment of highly complex questions. Although

other means of evaluation and of social co-ordination (e.g. negotiation, elec-

tions, exercise of power) clearly exist, formal methods based on an explicit

logic can provide invaluable contributions allowing sensitivity analyses, pro-

moting constructive dialogue and pointing out crucial issues.

We already mentioned that we disagree with the view held by some economists

that CBA is the only rational scientific and objective method for helping

decision-makers (such views are explicitly or implicitly present in Boiteux (1994)

or Mishan (1982)). We strongly recommend Dorfman (1996) as an antidote to this

radical position.

We shall stress here why we think that decision/evaluation models should not

be confused with CBA:

supporting decision/evaluation processes involves many more activities than

just evaluation. As we shall see in chapter 9, formulation is a basic

activity of any analyst. The determination of the frontiers of the study and

of the various stakeholders, the modelling of their objectives, the invention

of alternatives, form an importantwe would tend to say a crucialpart of

any decision/evaluation support study. CBA offers little help at this stage.

Even worse, too radical an interpretation of CBA might lead (Dorfman 1996)

to an excessive attention given to monetarisation, which may be detrimental

to an adequate formulation,

having sound theoretical foundations, such as CBA, is probably a necessary

but insufficient condition to build useful decision/evaluation tools (let alone

the best ones). A recurrent theme in OR is that a successful implemen-

tation of a model is contingent on many other factors than just the quality

of the underlying method. Creativity, flexibility and reactivity are essen-

tial ingredients of the process. They do not seem always to be compatible

with a too rigid view on what a good decision/evaluation model should

be. Furthermore, the foundations of CBA are especially strong in situa-

tions that are at variance with the usual context of public sector projects:

5.4. CONCLUSIONS 85

Brekke 1997, Holland 1995, Laslett 1995),

a decision/evaluation tool will be all the more useful that it lends itself

easily to an insertion into a decision process. Decision processes involving

public sector projects are usually extremely complex. They last for years

and involve many stakeholders generally having conflicting objectives. CBA

tries to summarise the effects of complex projects into a single number. The

complex calculations leading to the NPSV use a huge amount of data with

varying levels of credibility. Merging rather uncontroversial information (e.g.

the number of deaths per vehicle-km in a given area) with much more sensible

and debatable information (e.g. the price of human life) from the start might

not give many opportunities to stakeholders for reaching partial agreements

and/or for starting negotiations. This might also result in a model that might

not appear transparent enough to be really convincing (Nyborg 1998),

in simple terms (the NPSV) it might be argued that the efforts that have to

be made in order to monetarise all effects may not always be needed. On the

basis of less ambitious methods, it is not unlikely that some projects may be

easily discarded and/or that some clearly superior project will emerge. Even

when monetarisation is reasonably possible, it may not always be necessary,

in market-like mechanisms) tend to obscure the, implicit, weighting of the

various effects of a project. This leaves little room for political debate, which

might be an incentive for some stakeholders to simply discard CBA,

the additive linear structure of the, implicit, aggregation rule used in CBA

can be subjected to the familiar criticisms already mentioned in chapters 3

and 4. Probably all users of CBA would agree that an accident killing 10 000

people might result in a dramatic situation in which the costs incurred

have little relation with the costs of 10 000 accidents each resulting in one

loss of life (think of a serious nuclear accident compared to ordinary car

accidents). Similarly, they might be prepared to accept that there may exist

air pollution levels above which all mammal life on earth could be endangered

and that although these levels are multiples of those currently manipulated

in the evaluation of transportation projects, they may have to be priced out

quite differently. If there are limits to linearity, CBA offers almost no clue

as to where to place these limits. It would seem to be a heroic hypothesis to

suppose that such limits are simply never reached in practice,

zling. Although the possibility of including in the computation of the NPSV

individual weights (capturing a different impact on social welfare of indi-

vidual variations of income) exists (Brent 1984), it is hardly ever used in prac-

tice. Furthermore, this possibility is at much variance with more subtle views

86 CHAPTER 5. ASSESSING COMPETING PROJECTS

Sarin 1991, Fishburn and Sarin 1994, Fishburn and Straffin 1989, Gafni and

Birch 1997, Schneider, Schieber, Eeckoudt and Gollier 1997, Weymark 1981),

the use of a simple social discounting rate as a surrogate for taking a clear

position on inter-generational equity issues is open to discussion. Even ac-

cepting the rather optimistic view of a continuous increase of welfare and of

technical innovation, taking decisions today that will have important conse-

quences in 1000 years (think of the storage of used nuclear fuel) while using a

method that gives almost no weight to what will happen 60 years from now

1

( 1.08 60 1%) seems debatable (see Harvey 1992, Harvey 1994, Weitzman

1994),

the very idea that social preferences exist is open to question. We showed

in chapter 2 that elections were not likely to give rise to such a concept.

It seems hard to think of other forms of social co-ordination that could do

much better. We doubt that markets are such particular institutions that

they always allow to solve or bypass the problem in an undebatable way. But

if social preferences are ill-defined, the meaning of the NPSV of a project

is far from being obvious. We would argue that it gives, at best, a partial

and highly conventional view of the desirability of the project,

decision/evaluation models can hardly lead to convincing conclusions if el-

ements of uncertainty and inaccurate determination entering the model are

not explicitly dealt with. This is especially true in the context of the eval-

uation of public sector projects. Practical texts on CBA always insist on

the need for sensitivity analysis before coming to conclusions and recom-

mendations. Due to the amount of data of varying quality included in the

computation of the NPSV, sensitivity analysis is often restricted to studying

the impact of the variation of a few parameters on the NPSV, one parameter

varying at a time. This is rather far from what we could expect in such situ-

ations; a true robustness analysis should combine simultaneous variations

of all parameters in a given domain,

These limitations should not be interpreted as implying a condemnation of

CBA. We consider them as arguments showing that, in spite of its many qual-

ities, CBA is far from exhausting the activity of supporting decision/evaluation

processes (Watson 1981). We are afraid to say that if you disagree on this point,

you might find the rest of this book of extremely limited interest. On the other

hand, if you expect to discover in the next chapters formal decision/evaluation

tools and methodologies that would solve all problems and avoid all difficulties

you should also realise that your chances of being disappointed are very high.

6

COMPARING ON THE BASIS OF

SEVERAL ATTRIBUTES: THE

EXAMPLE OF MULTIPLE CRITERIA

DECISION ANALYSIS

How to choose a car is probably the multiple criteria problem example that has

been most frequently used to illustrate the virtues and possible pitfalls of multiple

criteria decision aiding methods. The main advantage of this example is that the

problem is familiar to most of us (except for one of the authors of this book who is

definitely opposed to owning a car) and it is especially appealing for male decision-

makers and analysts for some psychological reason. However, one can object that

in many illustrations, the problem is too roughly stated to be meaningful; the

motivations, needs, desires and/or phantasms of the potential buyer of a new or

second-hand car can be so diversified that it will be very difficult to establish a list

of relevant points of view and build criteria on which everybody would agree; the

price for instance is a very delicate criterion since the amount of money the buyer

is ready to spend clearly depends on his social condition. The relative importance

of the criteria also very much depends on the personal characteristics of the buyer:

there are various ideal types of car buyers, for instance people who like sportive

car driving, or large comfortable cars or reliable cars or cars that are cheap to run.

One point should be made very clear: it is unlikely that a car could be universally

recognised as the best, even if one restricts oneself to a segment of the market;

this is a consequence of the existence of decision-makers with many different value

systems.

Despite these facts, we have chosen to use the Choosing a car example,

in a properly defined context, for illustrating the hypotheses underlying various

elementary methods for modelling and aggregating evaluations in a decision aiding

process. The case is simple enough to allow for a short but complete description;

it also offers sufficient potential for reasoning on quite general problems raised by

the treatment of multi-dimensional data in view of decision and evaluation. We

describe the context of the case below and will invoke it throughout this chapter

for illustrating a sample of decision aiding methods.

87

88 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

1 Fiat Tipo 20 ie 16V

2 Alfa 33 17 16V

3 Nissan Sunny 20 GTI 16

4 Mazda 323 GRSI

5 Mitsubishi Colt GTI

6 Toyota Corolla GTI 16

7 Honda Civic VTI 16

8 Opel Astra GSI 16

9 Ford Escort RS 2000

10 Renault 19 16S

11 Peugeot 309 GTI 16V

12 Peugeot 309 GTI

13 Mitsubishi Galant GTI 16

14 Renault 21 20 turbo

Our example is adapted from an unpublished report by a Belgian engineering

student who describes how he decided which car he would buy. The story dates

back to 1993; our studentcall him Thierryaged 21, is passionate about sportive

cars and driving (he has taken lessons in sports car driving and participates in car

races). Being a student, he cannot afford to buy either a new car nor a luxury

second hand sports car; so he decides to explore the middle range segment, 4 year

old cars with powerful engines. Thierry intends to use the car in everyday life and

occasionally in competitions. His strategy is first to select the make and type of

the car on the basis of its characteristics, estimated costs and performances, then

to look for such a car in second hand car sale advertisements. This is what he

actually did, finding the rare pearl about twelve months after he made up his

mind as to which car he wanted.

The initial list of alternatives was selected taking an additional feature into ac-

count. Thierry lives in town and does not have a garage to park the car in at

night. So he does not want a car that would be too attractive to thieves. This

explains why he discards cars like VW Golf GTI or Honda CRX. He thus limits

his selection of alternatives to the 14 cars listed in Table 6.1.

Selecting the relevant points of view and looking for or constructing indices

that reflect the performances of the alternatives for each of the viewpoints often

constitutes a long and delicate task; it is moreover a crucial one since the quality

of the modelling will determine the relevance of the model as a decision aiding

tool. Many authors have advocated a hierarchical approach to criteria building,

each viewpoint being decomposed into sub-points that can be further decomposed

6.1. THIERRYS CHOICE 89

(Keeney and Raiffa (1976), Saaty (1980)). A thorough analysis of the properties

required of the family of criteria selected in any particular context (consistent

family, i.e. exhaustive, non-redundant and monotonic) can be found in Roy and

Bouyssou (1993) (see also Bouyssou (1990), for a survey).

We shall not emphasise the process of selecting viewpoints in this chapter,

although it is a matter of importance. It is sufficient to say that Thierrys concerns

are very particular and that he accordingly selected five viewpoints related to cost

(criterion 1), performance of the engine (criteria 2 and 3) and safety (criteria 4

and 5).

Evaluations of the cars on these viewpoints have been obtained from monthly

journals specialised in the benchmarking of cars. The official quotation of second

hand vehicles of various ages is also published in such journals.

Evaluating the expenses incurred by buying and using a specific car is not as

straightforward as it may seem. Large variations from the estimation may oc-

cur due to several uncertainty and risk factors such as actual life-length of the

car, actual selling price (in contrast to the official quotation), actual mileage per

year, etc. Thierry evaluates the expenses as the sum of an initial fixed cost and

expenses resulting from using the car. The fixed costs are the amount paid for

buying the car, estimated by the official quotation of the 4-year old vehicle, plus

various taxes. The yearly costs involve another tax, insurance and petrol consump-

tion. Maintenance costs are considered roughly independent of the car and hence

neglected. Petrol consumption is estimated on the basis of three figures that are

highly conventional: the number of litres of petrol burned in 100 km is taken from

the magazine benchmarks; Thierry somehow estimates his mileage at 12 000 km

per year and the price of the petrol to .9 e per litre (1 e, the European currency

unit, is approximately equivalent to 1 USD). Finally he expects (hopes) to use the

car for 4 years. On the basis of these hypotheses he gets the estimations of his

expenses for using the car during 4 years that are reported in Table 1 (Criterion

1 = Cost). The resale value of the car after 8 years is not taken into account due

to the high risk of accidents resulting from Thierrys offensive driving style. Note

that the petrol consumption cost which is estimated with a rather high degree of

imprecision counts for about one third of the total cost. The purchase cost is also

highly uncertain.

For building the other criteria Thierry has a large number of performance in-

dices whose value is to be found in the magazine benchmarks at his disposal.

Thierrys particular interest in sporty cars is reflected in his definition of the other

criteria. Car performances are evaluated by their acceleration; criterion 2 (Accel

in Table 6.2) encodes the time (in seconds) needed to cover a distance of one kilo-

metre starting from rest. One could alternatively have taken other indicators such

as power of the engine, time needed to reach a speed of 100 km/h or to cover 400

meters that are also widely available. Some of these values may be imprecisely

determined: they may be biased when provided by the car manufacturer (the

procedures for evaluating petrol consumption are standardised but usually under-

90 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

Cost Accel Pick up Brakes Road-h

1 Fiat Tipo 18 342 30.7 37.2 2.33 3

2 Alfa 33 15 335 30.2 41.6 2 2.5

3 Nissan Sunny 16 973 29 34.9 2.66 2.5

4 Mazda 323 15 460 30.4 35.8 1.66 1.5

5 Mitsubishi Colt 15 131 29.7 35.6 1.66 1.75

6 Toyota Corolla 13 841 30.8 36.5 1.33 2

7 Honda Civic 18 971 28 35.6 2.33 2

8 Opel Astra 18 319 28.9 35.3 1.66 2

9 Ford Escort 19 800 29.4 34.7 2 1.75

10 Renault 19 16 966 30 37.7 2.33 3.25

11 Peugeot 309 16V 17 537 28.3 34.8 2.33 2.75

12 Peugeot 309 15 980 29.6 35.3 2.33 2.75

13 Mitsubishi Galant 17 219 30.2 36.9 1.66 1.25

14 Renault 21 21 334 28.9 36.7 2 2.25

estimate the actual consumption for everyday use); when provided by specialised

journalists in magazines, the procedures for measuring are generally unspecified

and might vary since the cars are not all evaluated by the same person.

The third criterion that Thierry took into consideration is linked with the

pick up or suppleness of the engine in urban traffic; this dimension is considered

important since Thierry also intends to use his car in normal traffic. The indicator

selected to measure this dimension (Pick up in Table 6.2) is the time (in seconds)

needed for covering one kilometre when starting in fifth gear at 40 km/h. Again

other indicators could have been chosen (e.g. the torque). This dimension is not

independent of the second criterion, since they are generally positively correlated

(powerful engines generally lead to quick response times on both criteria); cars

that are specially prepared for competition may however lack suppleness in low

operation conditions which is quite unpleasant in urban traffic. So, from the point

of view of the user, i.e. in terms of preferences, criteria 2 and 3 reflect different

requirements and are thus both necessary. For a short discussion about the notions

of independence and interaction, the reader is referred to Section 6.2.4.

In the magazines evaluation report, several other dimensions are investigated

such as comfort, brakes, road-holding behaviour, equipment, body, boot, finish,

maintenance, etc. For each of these, a number of aspects are considered: 10 for

comfort, 3 for brakes, 4 for road-holding, . . . . In view of Thierrys particular

motivations, only the qualities of braking and of road-holding are of concern to

him and lead to the building of criteria 4 and 5 (resp. Brakes and Road-h

in Table 6.2). The 3 or 4 partial aspects of each viewpoint are evaluated on an

ordinal scale the levels of which are labelled serious deficiency, below average,

average, above average, exceptional. To get an overall indicator of braking

quality (and also for road-holding), Thierry re-codes the ordinal levels with integers

6.1. THIERRYS CHOICE 91

from 0 to 4 and takes the arithmetic mean of the 3 or 4 numbers; this results in

the figures with 2 decimals provided in the last two columns of Table 1. Obviously

these numbers are also imprecise, not necessarily because of imprecision in the

evaluations but because of the arbitrary character of the cardinal re-coding of

the ordinal information and its aggregation via an arithmetic mean (postulating

implicitly that, in some sense, the 3 components of each viewpoint are equally

important and the levels of each of the three scales are equally spaced). We shall

however consider that these figures reflect, in some way, the behaviour of each car

from the corresponding viewpoint; it is clear however that not too much confidence

should be awarded to the precision of these evaluations.

Note that the first 3 criteria have to be minimised while the last 2 must be

maximised.

This completes the description of the data which, obviously, are not given

but selected and elaborated on the basis of the available information. Being in-

trinsically part of this data is an appreciation (more or less explicit) of their degree

of precision and their reliability.

In the second part of the presentation of this case, Thierry will provide information

about his preferences. In fact, in the relatively simple decision situation he was

facing (no wife, no boss, Thierry decides for himself and the consequences of his

decision should not affect him crucially), he was able to make up his mind without

using any formal aggregation method. Let us follow his reasoning.

First of all he built a graphic representation of the data. Many types of rep-

resentations can be thought of; popular spreadsheet software offer a large number

of graphical options for representing multi-dimensional data. Figure 6.1 shows

such a representation. Note that the evaluations for the various criteria have been

re-scaled in view of a better readability of the figure. The values for all criteria

have been mapped (linearly) onto intervals of length 2, the first criterion being

represented in the [0, 2] interval, the second criterion, in the [2, 4] interval and so

on. For each criterion, the lowest evaluation observed for the sample of cars is

mapped on the lower bound of the interval while the highest value is represented

on the upper bound of the interval. Such a transformation of the data is not

always innocent; we briefly discuss this point below.

In view of reaching a decision, Thierry first discards the cars whose braking

efficiency and road-holding behaviour is definitely unsatisfactory, i.e. car numbers

4, 5, 6, 8, 9, 13. The reason for such an elimination is that a powerful engine is

needless in competition if the chassis is not good enough and does not guarantee

good road-holding; efficient brakes are also needed to keep the risk inherent to

competition at a reasonable level. The rules for discarding the above mentioned

cars have not been made explicit by Thierry in terms of unattained levels on the

corresponding scales. Rules that would restate the set of remaining cars are for

instance:

criterion 4 2

92 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

Criteria to be minimised

6

Fiat

Alfa

Nissan

Mazda

MitsuColt

Toyota

Honda

Opel

Ford

R19

Peu16

Peu

MitsuGal Supple

R21 Accel

Cost

Criteria to be maximised

4

3.5

2.5

1.5

0.5

Fiat

Alfa

Nissan

Mazda

MitsuColt

Toyota

Honda

Opel

Ford

R19

Peu16

Peu

MitsuGal

R21 Roadh

Brakes

Figure 6.1: Performance diagram of all cars along the first three criteria (above;

to be minimised) and the last two (below; to be maximised)

6.1. THIERRYS CHOICE 93

and

criterion 5 2

with at least one strict inequality.

Looking at the performances of the remaining cars, those labelled 1, 2, 10 are

further discarded. The set of remaining cars is restated for instance by the rule:

criterion 2 < 30

Finally, the car labelled 14 is eliminated since it is dominated by car number 11.

Dominated by car 11 means that car 11 is at least as good on all criteria and

better on at least one criterion (here all of them!). Notice that car number 14

would not have been dominated if other criteria had been taken into consideration

such as comfort or size: this car is indeed bigger and more classy than the other

cars in the sample.

The cars left after the above elimination process are those labelled 3,7,11,12;

their performances are shown on Figure 6.2. In these star-diagrams each car is

represented by a pentagon; their values on each criterion have all been linearly

re-scaled, being mapped on the [1, 3] interval. The choice of interval [1, 3] instead

of interval [0, 2] is dictated by the mode of representation: the value 0 plays a

special role since it is common to all axes; if an alternative was to receive a 0 value

on several criteria, those evaluations would all be represented by the origin, which

makes the graph less readable. On each axis, the value 1 corresponds to the lowest

value for one of the cars in the initial set of 14 alternatives on each criterion; the

value 3 corresponds to the highest value for one of the 14 cars. In interpreting the

diagrams, remember that criteria 1, 2 and 3 are to be minimised while the others

have to be maximised.

Thierry did not use the latter diagram (Figure 6.2); he drew the same diagram

as in Figure 6.1 instead after reordering the cars; the 4 candidate cars were all

put on the right of the diagram as shown in Figure 6.3; in this way Thierry was

still able to compare the difference in the performances of two candidate cars for a

criterion to typical differences for that criterion in the initial sample. This suggests

that the evaluations of the selected cars should not be transformed independently

of the values of the cars in the initial set; these still constitute reference points in

relation to which the selected cars are evaluated. On Figure 6.4, for the readers

convenience, we show a close-up of Figure 6.3 that is focused on the 4 selected

cars only.

Thierry first eliminates car number 12 on the basis of its relative weakness

on the second criterion (acceleration). Among the 3 remaining cars the one he

chooses is number 11. Here are the reasons for this decision.

1. Comparing cars 3 and 11, Thierry considers that the price difference (about

500 e ) is worth the gain (.7 second) on the acceleration criterion.

2. Comparing cars 7 and 11, he considers that the cost difference (car 7 about

1 500 e more expensive) is not balanced by the small advantage on accelera-

tion (.3 second) coupled with a definite disadvantage (.8 second) on supple-

ness.

94 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

Nissan sunny 20 GTI 16V

crit 1 (cost)

crit 1 (cost) 3,00

3,00

2,00

2,00

1,00 crit 2 (accel)

crit 5 (road-h) 1,00 crit 5 (road-h)

crit 2 ( accel) 0,00

0,00

Peugeot 309 GTI

3,00 3,00

2,00 2,00

1,00 1,00 crit 2 (accel)

crit 5 (road-h) crit 2 (accel) crit 5 (road-h)

0,00 0,00

crit 3 (supple)

crit 4 (brakes) crit 3 (supple) crit 4 (brakes)

Figure 6.2: Star graph of the performances of the 4 cars left after the elimination

process

Cost Acc Pick Brakes Road

3 Nissan Sunny 16 973 29 34.9 2.66 2.5

7 Honda Civic 18 971 28 35.6 2.33 2

11 Peugeot 16V 17 537 28.3 34.8 2.33 2.75

12 Peugeot 15 980 29.6 35.3 2.33 2.75

6.1. THIERRYS CHOICE 95

10

Fiat (1)

Alfa (2)

Mazda (4)

Mitsu Colt (5)

Toyota (6)

Opel (8)

Ford (9)

R19 (10)

Mitsu Gal (13)

R21 (14)

Nissan (3)

Honda (7) Roadh (Max)

Brakes (Max)

Peu16 (11)

Pick up (min)

Peu (12) Accel (min)

Cost (min)

Figure 6.3: Performance diagram of all cars; the 4 candidate cars stand on the

right

96 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

10

Nissan (3)

Brakes (Max)

Peu16 (11) Pick up (min)

Accel (min)

Peu (12)

Cost (min)

Figure 6.4: Detail of Figure 6.3: the 4 cars remaining after initial screening

Comments

Thierrys reasoning process can be analysed as being composed of two steps. The

first one is a screening process in which a number of alternatives are discarded

on the basis of the fact that they do not reach aspiration levels on some criteria.

Notice that these levels have not been set a priori as minimal levels of satis-

faction; they have been set after having examined the whole set of alternatives, to

a value that could be described as both desirable and accessible. The rules that

have been used for eliminating certain alternatives have exclusively been combined

in conjunctive mode since an alternative is discarded as soon as it does not fulfil

any of the rules.

More sophisticated modes of combinations may be envisaged, for instance mix-

ing up conjunctive and disjunctive modes with aspiration levels defined for sub-

sets of criteria (see Fishburn (1978) and Roy and Bouyssou (1993), pp. 264-266).

Another elementary method that has been used is the elimination of dominated

alternatives (car 11 dominates car 14).

In the second step of Thierrys reasoning,

1. Criteria 4 and 5 were not invoked; there are several possible reasons for this:

criteria 4 and 5 might be of minor importance or considered satisfactory

once a certain level is reached; they could be insufficiently discriminating for

the considered subset of cars (this is certainly the case for criterion 4): the

values of the differences for the set of candidate cars could be such that they

are not large enough to balance the differences on other criteria.

between pairs of cars on 2 or 3 criteria results in an advantage to one of the

cars in the pair.

6.2. THE WEIGHTED SUM 97

3. The reasoning is not made on the basis of re-coded values like those used

in the graphics; more intuition is needed, which is better supported by the

original scales. Since criteria 4 and 5 are aggregates and, thus, are not

expressed in directly interpretable units, this might also have been a reason

for not exploiting them in the final selection.

is at the heart of the activity of modelling preferences and aggregating them in

order to have an informed decision process. In the simple case we are dealing with

here, the small number of alternatives and criteria has allowed Thierry to make up

his mind without having to build a formal model of his preferences. We have seen,

however, that after the first step consisting in the elimination of unsatisfactory

alternatives, the analysis of the remaining four cars has been much more delicate.

Note also that if Thierrys goal had been to rank order the cars in order of

decreasing preference, it is not sure that the kind of reasoning he used for just

choosing the best alternative for him would have fit the bill. In more complex

situations (when more alternatives remain after an initial elimination or more

criteria have to be considered or if a ranking of the alternatives is wanted), it may

appear necessary to use tools for modelling preferences.

There is another rather frequent circumstance in which more formal methods

are mandatory; if the decision-maker is bound to justify his decision to other per-

sons (shareholders, colleagues, . . . ), the evaluation system should be more system-

atic, for instance being able to cope with new alternatives that could be suggested

by the other people.

In the rest of this chapter, we discuss a few formal methods commonly used for

aggregating preferences. We report on how Thierry applied some of them to his

case and extrapolate on how he could have used the others. This can be viewed as

an ex post analysis of the problem, since the decision was actually made well before

Thierry became aware of multiple criteria methods. In his ex post justification

study, Thierry has in addition tried to derive a ranking of the alternatives that

would reflect his preferences.

When dealing with multi-dimensional evaluations of alternatives, the basic and

almost natural (or perhaps, cultural?) attitude consists in trying to build a one-

dimensional synthesis, which would reflect the value of the alternatives on a syn-

thetic super scale of evaluation. This attitude is perhaps inherited from school

practice where all other performance evaluations of the pupils have long been (and

often still are) summarised in a single figure, a weighted average of their grades in

the various subjects. The problems raised by such a practice have been discussed

in depth in Chapter 3. We discuss the application of the weighted sum to the car

example below, emphasising the very strong hypotheses underlying the use of this

type of approach.

Starting from the standard situation of a set of alternatives a A evaluated

on n points of view by a vector g(a) = (g1 (a), g2 (a), . . . , gn (a)), we consider the

98 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

Suppose, without loss of generality, that all criteria are to be maximised, i.e. the

larger the value gi (a), the better the alternative a on criterion i (if, on the contrary,

gi were to be minimised, substitute gi by gi or use a negative weight ki ). Once the

weights ki have been determined, choosing an alternative becomes straightforward:

the best alternative is the one associated with the largest values of f . Similarly,

a ranking of the alternatives is obtained by ordering them in decreasing order of

the value of f .

This simple and most commonly used procedure relies however on very strong

hypotheses that can seldom be considered plausibly satisfied. These problems

appear very clearly when trying to use the weighted sum approach on the car

example.

A look at the evaluations of the cars (see Table 6.2) prompts a remark that was

already made when we considered representing the data graphically. The ranges

of variation on the scales are very heterogeneous: from 13841 to 21334 on the cost

criterion; from 1.33 to 2.66 on criterion 4. Clearly, asking for values of the weights

ki in terms of the relative importance of the criteria without referring to the

scales would yield absurd results. The usual way out consists in normalising the

values on the scales but there are several manners of doing this. One consists in

dividing gi by the largest value on the ith scale, gi,max ; alternatively one might

subtract the minimal value gi,min and divide by the range gi,max gi,min . These

normalisations of the original gi functions are respectively denoted gi0 and gi00 in

the following formulae

gi (a)

(6.2) gi0 (a) =

gi,max

gi (a) gi,min

(6.3) gi00 (a) =

gi,max gi,min

For simplicity, we suppose here that gi are positive. In the former case the maximal

value of gi0 will be 1 while value 0 is kept fixed which means that the ratio of the

evaluations of any pair a, b of alternatives remains unaltered:

(6.4) 0 =

gi (b) gi (b)

This transformation can be advanced when using ratio scales, in which the value

0 plays a special role. Statements such as alternative a is twice as good as b on

criterion i remain valid after transformation.

In the case of gi , the top evaluation will be mapped onto 1 while the bottom

one goes onto 0; ratios are not preserved but ratios in differences of evaluations

6.2. THE WEIGHTED SUM 99

(6.5) 00 00 =

gi (c) gi (d) gi (c) gi (d)

Such a transformation is appropriate for interval scales; it does not alter the

validity of statements like the difference between a and b on criterion i is twice

the difference between c and d.

Note that the above are not the only possible options for transforming the data;

note also that these transformations depend on the set of alternatives: considering

the 14 cars of the initial sample or the 4 cars retained after the first elimination

would yield substantially different results since the values gi,min and gi,max depend

on the set of alternatives.

Suppose we consider that 0 plays a special role in all scales and we choose the first

transformation option. The values of the gi s that are obtained are shown in Table

6.4. A set of weights has been chosen which is, to some extent, arbitrary but seems

compatible with what is known about Thierrys preferences and priorities. The

first three criteria receive negative weights namely and respectively 1, 2, 1

(since they have to be minimised), while the last two are given the weight .5. The

alternatives are listed in Table 6.4 in decreasing order of the values of f . As can

be seen in the last column of Table 6.4, this rough assignment of weights yields

car number 3 as first choice followed immediately by car number 11 which was

actually Thierrys choice. Moreover, the difference in the values of f for those two

cars is tiny (less than .01) but we have no idea as to whether such a difference is

meaningful; all we can do is being very prudent in using such a ranking since the

weights were chosen in a rather arbitrary manner. It is likely that by varying the

weights slightly from their present value, one would readily get rank reversals i.e.

permutations of alternatives in the order of preference; in other words, the ranking

is not very stable. Varying the values that are considered imprecisely determined

is what is called sensitivity analysis; it helps to detect what the stable conclusions

in the output of a model are; this is certainly a crucial activity in a decision aiding

process.

Weights depend on scaling

To illustrate the lack of stability of the ranking obtained, let us consider Table

6.5 where the set of alternatives is reduced to the 4 cars remaining after the

elimination procedure; the re-scaling of the criteria yields values of gi that are

not the same as in Table 6.4 since gi,max depends on the set of alternatives. This

perturbation, without any change in the values of the weights, is sufficient to cause

a rank reversal between the leading two alternatives. Of course, one could prevent

such a drawback, by using a normalising constant that would not depend on the

100 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

Weights ki Value

1 2 1 0.5 0.5 f

Nr Name of cars Cost Accel Pick Brak Road

3 Nissan Sunny 0.80 0.94 0.84 1.00 0.77 -2.63

11 Peugeot 16V 0.82 0.92 0.84 0.88 0.85 -2.64

12 Peugeot 0.75 0.96 0.85 0.88 0.85 -2.66

10 Renault 19 0.80 0.97 0.91 0.88 1.00 -2.71

7 Honda Civic 0.89 0.91 0.86 0.88 0.62 -2.82

1 Fiat Tipo 0.86 1.00 0.89 0.88 0.92 -2.85

5 Mitsu Colt 0.71 0.96 0.86 0.62 0.54 -2.91

2 Alfa 33 0.72 0.98 1.00 0.75 0.77 -2.92

8 Opel Astra 0.86 0.94 0.85 0.62 0.62 -2.96

6 Toyota 0.65 1.00 0.88 0.50 0.62 -2.97

4 Mazda 323 0.72 0.99 0.86 0.62 0.46 -3.02

9 Ford Escort 0.93 0.95 0.83 0.75 0.54 -3.03

14 Renault 21 1.00 0.94 0.88 0.75 0.69 -3.04

13 Mitsu Galant 0.81 0.98 0.89 0.62 0.38 -3.15

Weights ki Value

-1 -2 -1 0.5 0.5 f

Nr Name of car Cost Accel Pick Brak Road

11 Peugeot 16V 0.92 0.96 0.98 0.88 1.00 -2.876

3 Nissan Sunny 0.89 0.98 0.98 1.00 0.91 -2.890

12 Peugeot 0.84 1.00 0.99 0.88 1.00 -2.896

7 Honda Civic 1.00 0.95 1.00 0.88 0.73 -3.090

6.2. THE WEIGHTED SUM 101

set of alternatives, for instance the worst acceptable value (minimal requirement

for a performance to be maximised; maximal level of a variable to be minimised,

a cost, for instance) on each criterion; with such an option, the source of the lack

of stability would be the imprecision in the determination of the worst acceptable

value. Notice that the above problem has already been discussed in Chapter 4,

Section 4.1.1.

Conventional codings

Another comment concerns the figures used for evaluating the performances of the

cars on criteria 4 and 5. Recall that those were obtained by averaging equally

spaced numerical codings of an ordinal scale of evaluation. The obtained figures

presumably convey a less quantitative and more conventional meaning than for

instance acceleration performances measured in seconds in standardisable (if not

standardised) trials. These figures however are treated in the weighted sum just

like the more quantitative ones associated with the first three criteria. In par-

ticular, other codings of the ordinal scale might have been envisaged, for instance

codings with unequal intervals separating the levels on the ordinal scale. Some of

these codings could obviously have changed the ranking.

What is the exact significance of the weights in the weighted sum model? The

weights have a very precise and quantitative meaning; they are trade-offs: to

compensate for a disadvantage of ki units for criterion j, you need an advantage

of kj units for criterion i. An important consequence is that the weights depend

on the determination of the unit on each scale. In a weighted sum model that

would directly use the evaluations of the alternatives given in Table 6.2, it is clear

that the weight of criterion 2 (acceleration time) has to be multiplied by 60 if

times are expressed in minutes instead of seconds. This was implicitly a reason

for normalising the evaluations as was done through formulae 6.2 and 6.3. After

transformation, both gi0 and gi00 are independent of the choice of a unit; yet they are

not identical and, in a consistent model, their weights should be different. Indeed,

we have

gi,max

(6.6) gi00 (a) = gi0 (a) + i = i gi0 (a) + i

gi,max gi,min

where i is a constant. Additive constants do not matter since they do not alter

the rating. So, unless gi,min = 0, gi00 is essentially related to gi0 by a multiplicative

factor i 6= 1; in order to model the same preferences through a weighted sum of

the gi00 and a weighted sum of the gi0 , the weight ki00 of gi00 should be obtained by

dividing the weight ki0 by i . Obviously, the weights have to be assessed in relation

to a particular determination of the evaluations on each scale and eliciting them

in practice is a complex task. In any case, they certainly cannot be evaluated in a

102 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

meaningful manner through naive questions about the relative importance of the

criteria; reference to the underlying scale is essential.

Up to this point we have considered the influence on the weights of multiplying

the evaluations by a positive constant. Note that translating the origin of a scale

has no influence on the ranking of the alternatives provided by the weighted sum

since it results in adding a (positive or negative) constant to f , the same for all

alternatives. There is still a very important observation that has to be made:

all scales used in the model are implicitly considered linear in the sense that

equal differences in values on a criterion result in equal differences in the overall

evaluation function f and this does not depend on the position of the interval

of values corresponding to that difference on the scale. For instance in the car

example, car number 12 is finally eliminated because it accelerates too slowly. The

difference between car 12 and car 3 with respect to acceleration is 0.6 between 29

seconds and 29.6 seconds. Does Thierry perceive this difference as almost equally

important as a difference of 0.7 between cars 11 and 3, the latter difference being

positioned between 28.3 seconds and 29 seconds on the acceleration scale? It seems

rather clear from Thierrys motivations, that coming close to a performance of 28

seconds is what matters to him while cars above 29 seconds are unworthy. This

means that the gain for passing from 29.6 seconds to 29 seconds has definitely less

value than a gain of similar amplitude, say from 29 to 28.3 seconds. As will be

confirmed in the sequel (see Section 6.3 below), it is very unlikely that Thierrys

preferences are correctly modelled by a linear function of the current scales of

performance.

Independence or interaction

The next issue is more subtle. Evaluations of the alternatives for the various points

of view taken into consideration by the decision-maker often show correlations; this

is because the attributes that are used to reflect these viewpoints are often linked

by logical or factual interdependencies. For instance, indicators of cost, comfort

and equipment, which may be used as attributes for assessing the alternatives for

those viewpoints, are likely to be positively correlated. This does not mean that

the corresponding points of view are redundant and that one should eliminate

some of them. One is perfectly entitled to work with attributes that are (even

strongly) correlated. That is the first point.

A second point is about independence. In order to use a weighted sum, the

viewpoints should be independent, but not in the statistical sense implying that

the evaluations of the alternatives should be uncorrelated! They should be in-

dependent with respect to preferences. In other words, if two alternatives that

share the same profile on a subset of criteria compare in a certain way in terms of

overall preferences, their relative position should not be altered when the profile

they share on a subset of criteria is substituted by any other common profile. On

the contrary, a famous example of dependence in the sense of preferences in a

gastronomic context is the following: the preference for white wine or red wine

usually depends on whether you are eating fish or meat. There are relatively sim-

ple tests for independence in the sense of preferences, which consist in asking the

6.2. THE WEIGHTED SUM 103

decision-maker about his preferences on pairs of alternatives that share the same

profile for a subset of attributes; varying the common profile should not reverse

the preferences when the points of view are independent. Independence is a nec-

essary condition for the representation of preferences by a weighted sum; it is not

a sufficient one of course.

There is a different concept that has been recently implemented for modelling

preferences. It is the concept of interacting criteria that was already discussed

in example 2 of Chapter 3. Suppose that in the process of modelling the prefer-

ences of the decision-maker, he declares that the influence of positively correlated

aspects should be dimmed and that conjoint good performances for negatively

correlated aspects should be emphasised. In our case for instance, criteria 2 and

3, respectively acceleration and suppleness, may be thought of as being positively

correlated. It may then prove impossible to model some preferences by means of a

weighted sum of the evaluations such as those in Table 6.2 (and even of transfor-

mations thereof such as obtained through formulae like 6.3). This does not mean

that no additive model would be suitable and it does not imply that the prefer-

ences are not independent (in the above-defined sense). In the next section we

shall study an additive model, more general than the weighted average, in which

the evaluations gi may be re-coded through using value functions ui . With

appropriate choices of u2 and u3 it may be possible to take the decision-makers

preferences about positively and negatively correlated aspects into account, pro-

vided they satisfy the independence property. If no re-coding is allowed (like in

the assessment of students, see Chapter 3) there is a non-additive variant of the

weighted average that could help modelling interactions among the criteria; in

such a model the weight of a coalition of criteria may be larger or smaller than

the sum of the weights of its components (see Grabisch (1996), for more detail on

non-additive averages).

In the above discussion as well as in the presentation of our example we have

emphasised the many sources of uncertainty (lack of knowledge) and of imprecision

that bear on the figures used as input in the weighted sum. Let us summarise some

of them:

1. Uncertainty in the evaluation of the cost: the buying price as well as the

life-length of a second hand car are not known. This uncertainty can be

considered of stochastic nature; statistical data could help to masterto

some extentsuch a source of uncertainty; in practice, it will generally be

very difficult to get sufficient relevant and reliable statistical information in

for this kind of problems.

is the measurement of the acceleration? Such an imprecision can be reduced

by making the conditions of the measurement as standard as possible and

can then be estimated on the basis of the precision of the measurement

apparatus.

104 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

appreciation of braking and road-holding behaviour. Any re-coding that

respects the order of the categories would in principle be acceptable. To

master such an imprecision one could try to build quantitative indicators

for the criteria or try to get additional information on the comparison be-

tween differences of levels on the ordinal scale: for instance, is the difference

between below average and average larger than the difference between

above average and exceptional?

weights kj /ki must be elicited as conversion rates: a unit for criterion j is

worth kj /ki units for criterion i; of course, the scales must first be re-coded in

order that one unit difference on a criterion has the same value everywhere

on the scale (linearisation); these operations are far from obvious and as a

consequence, the imprecision of the linearisation process combines with the

inaccuracy in the determination of weights.

Making a decision

All these sources of imprecision have an effect on the precision of the determination

of the value of f that is almost impossible to quantify; contrary to what can (often)

be done in physics, there is generally little information on the size of the impre-

cisions; quite often, there is not even probabilistic information on the accuracy

of the evaluations. As a consequence, the apparently straightforward decision

choosing the alternative with the highest value of f or ranking the alternatives in

decreasing order of the values of f might be unconsidered as illustrated above.

The usual way out is extensive sensitivity analysis, which could be described as

part of the validation of the model. This part of the job is seldom carried out

with the required exhaustivity because it is a delicate task at least in two respects.

On the one hand there are many possible strategies for varying the values of the

imprecisely determined parameters; usually parameters are varied one at a time

which is not sufficient but is possibly tractable; the range in which the parameters

must be varied is not even clear as suggested above. On the other hand, once

the sensitivity analysis has been performed, one is likely to be faced with several

almost equally valuable alternatives; in the car problem for instance, the simple

remarks made above strongly suggest that it will be very difficult to discriminate

between cars 3 and 11.

In view of the previous discussion, there are two main approaches to solve the

difficulties raised by the weighted sum:

1. Either one tries to prepare the inputs of the model (linearised evaluations and

trade-offs) as carefully as possible, paying permanent attention to reducing

imprecision and finishing with extensive sensitivity analysis;

2. Or one takes imprecision into account from the start, by avoiding to exploit

precise values when knowing that they are not reliable but rather working

with classes of values and ordered categories. Note that imprecision may well

6.2. THE WEIGHTED SUM 105

lie in the link between evaluations and preferences rather than in the eval-

uations themselves; detailed preferential information, even extracted from

perfectly precise evaluations, may prove rather difficult to elicit.

utility functions, while the latter leads to the outranking approach. These two

approaches will be developed in the sequel. There is however a whole family of

methods that we shall not consider here, the so-called interactive methods (Steuer

(1986), Vincke (1992), Teghem (1996)). These implement various strategies for

exploring the efficient boundary, i.e. the set of non-dominated solutions; the ex-

ploration jumps from one solution to another; it is guided by the decision-maker

who is asked to tell, for instance, which characteristics of the current solution he

would like to see improved. Such methods are mainly designed for dealing with

infinite and even continuous sets of alternatives; moreover, they do not lead to an

explicit model of the decision-makers preferences. On the contrary, we have set-

tled on problems with a (small) finite number of alternatives and we concentrate

on obtaining explicit representations of the decision-makers preferences.

6.2.5 Conclusion

The weighted sum is useful for obtaining a quick and rough draft of an overall

evaluation of the alternatives. One should however keep in mind that there are

rather restrictive assumptions underlying a proper use of the weighted sum. As a

conclusion to this section we summarise these conditions.

of the alternatives for all criteria are numbers and these values are used as

such even if they result from the re-coding of ordinal data.

whatever the location of the corresponding intervals on the scale (at the

bottom, in the middle or at the top of the scale), produce the same effect on

the overall evaluation f : if alternatives a, b, c, d are such that gi (a) gi (b) =

gi (c) gi (d) for all i, then f (a) f (b) = f (c) f (d).

3. The weights are trade-offs. Weights depend on the scaling of the cri-

teria; transforming the (linearised) scales results in a related transformation

of the weights. Weights tell how many units on the scale of criterion i are

needed to compensate one unit of criterion j.

called preference independence, can be formulated as follows. Consider two

alternatives that share the same evaluation on at least one criterion, say

criterion i. Varying the level of that common value on criterion i does not

alter the way the two alternatives compare in the overall ranking.

106 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

Our analysis of the weighted sum brought us very close to the requirements for

additive multi-attribute value functions. The most common model in multiple

criteria decision analysis is a formalisation of the idea that the decision-maker,

when making a decision, behaves as if he was trying to maximise a quantity called

utility or value (the term utility tends nowadays to be used preferably in the

context of decision under risk, but we shall use it sometimes for value).

This postulates that all alternatives may be evaluated on a single super-scale

reflecting the value system of the decision-maker and his preferences. In other

words, the alternatives can be measured, in terms of worth on a synthetic

dimension of value or utility. Accordingly, if we denote by % the overall preference

relation of the decision-maker on the set of alternatives, this relation relates to the

values u(a), u(b) of the alternatives in the following way:

(6.7) a % b iff u(a) u(b)

As a consequence, the preference relation % on the set of alternatives is a complete

preorder, i.e. a complete ranking possibly with ties. Of course, the value u(a)

usually is a function of the evaluations {gi (a), i = 1, . . . , n}. If this function is

a linear combination of gi (a), i = 1, . . . , n, we get back to the weighted sum. A

slightly more general case is the following additive model:

n

X

(6.8) u(a) = ui (gi (a))

i=1

original evaluation gi in order to linearise it in the sense described in the previous

section; the weights ki are incorporated in the ui functions. The additive value

function model can thus be viewed as a clever version of the weighted sum since it

allows us to take some of the objectionsmainly the second hypothesis in Section

6.2.5against a naive use of it into account. Note however that the imprecision

issue is not dealt with inside the model (sensitivity analysis has to be performed

in the validation phase, but is neither part of the model nor straightforward in

practice); the elicitation of the partial value functions ui may also be a difficult

task.

Much effort has been devoted to characterising various systems of conditions

under which the preferences of a decision-maker can be described by means of

an additive value function model. Depending on the context, some systems of

conditions may be interpretable and tested, at least partially, i.e. it may be possible

to ask the decision-maker questions that will determine whether an additive value

model is compatible with what can be perceived of his system of preferences. If the

preferences of the decision-maker are compatible with an additive value model, a

method of elicitation of the ui s may then be used; if not, another model should be

looked for: a multiplicative model or, more generally, a non-additive one, a non-

independent one, a model that takes imprecision more intrinsically into account,

etc. (see Krantz, Luce, Suppes and Tversky (1971), Chapter 7, Luce, Krantz,

Suppes and Tversky (1990), Vol. 3, Chapter 19).

6.3. THE ADDITIVE VALUE MODEL 107

value functions

A large number of methods have been proposed to determine the u0i s in an additive

value function model. For an accessible account of such methods, the reader is

referred to von Winterfeldt and Edwards (1986), Chapter 8.

There are essentially two families of methods, one based on direct numerical

estimations and the other on indifference judgements. We briefly describe the

application of a technique of the latter category relying on what is called dual

standard sequences, (Krantz et al. (1971), von Winterfeldt and Edwards (1986),

Wakker (1989)) that builds a series of equally spaced intervals on the scale of

values.

Suppose we want to assess the u0i s in an additive model for the Cars case. It is

assumed that the suitability of such a model for representing the decision-makers

preferences has been established. Consider a pair of criteria, say Cost and Ac-

celeration. We are going to outline a simulated dialog between an analyst and

a decision-maker that could yield an assessment of u1 and u2 , the corresponding

single-attribute value functions, for ranges of evaluations corresponding to accept-

able cars. Note that we start the construction of the sequence from a central

point instead of taking a worst point (see for instance von Winterfeldt and

Edwards (1986), pp. 267 sq for an example starting from a worst point)

The range for the cost will be the interval between 21 500 e to 13 500 e and

from 28 to 31 seconds for acceleration. First ask the decision-maker to select a

central point corresponding to medium range evaluations on both criteria. In

view of the set of alternatives selected by Thierry, let us start with (17 500, 29.5) as

average values for cost and acceleration. Also ask the decision-maker to define a

unit step on the cost criterion; this step will consist, say, of passing from a cost of

17 500 e to 16 500 e. Then the standard sequence is constructed by asking which

value x1 for the acceleration would make a car costing 16 500 e and accelerating

in 29.5 seconds indifferent to a car costing 17 500 e and accelerating in x1 seconds.

Suppose the answer is 29.2 meaning that from the chosen starting point, a gain

of 0.3 second on the acceleration time is worth an increase of 1 000 e in cost. The

answer could be explained by the fact that at the starting level of performance

for the acceleration criterion, the decision-maker is quite interested by a gain in

acceleration time. Relativising the gains as percentages of the half range from

the central to the best values on each scale, this means that the decision-maker

1000 .3

is ready to lose 4000 =25% of the potential reduction in cost for gaining 1.5 =20%

of acceleration time. We will say in the sequel that the parity is equal when the

decision-maker agrees to exchange a percentage of the half range on a criterion

against an equal percentage on another criterion.

The second step in the construction of the standard sequence is asking the

decision-maker which value to assign to x2 to have (16 500, 29.2) (17 500, x2 ),

where denotes indifferent to. The answer might be, for instance, 28.9. Con-

tinuing along the same line would for instance yield the following sequence of

108 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

3.5

2.5

value 2

1.5

0.5

0

28 28.5 29 29.5

acceleration (sec)

Figure 6.5: Single-attribute value function for acceleration criterion (half range)

indifferences:

(16 500, 29.5) (17 500, 29.2)

(16 500, 29.2) (17 500, 28.9)

(16 500, 28.9) (17 500, 28.7)

(16 500, 28.7) (17 500, 28.5)

(16 500, 28.5) (17 500, 28.3)

(16 500, 28.3) (17 500, 28.1)

Such a sequence gives the analyst an approximation of the single-attribute

value function u2 , on the half range from 28 to 29.5 seconds but it is easy to devise

a similar procedure for the other half range, from 29.5 to 31. Figure 6.5 shows the

re-coding u2 of the evaluations g2 on the interval [28, 29.5]; there are two linear

parts in the graph: one ranging from 28 to 28.9 where the slope is proportional to

1 1

.2 and the other valid between 28.9 and 29.5 with a slope proportional to .3 .

From there, using the same idea, one is able to re-code the scale of the cost cri-

terion into the single-attribute value function u1 . Then, considering (for instance)

the cost criterion with criteria 3, 4 and 5 in turn, one obtains a re-coding of each

gi into a single-attribute value function ui .

The trade-off between u1 and u2 is easily determined through solving the fol-

lowing equation that just expresses the initial indifference in the standard sequence

(16 500, 29.5) (17 500, 29.2)

k2 u1 (16 500) u1 (17 500)

= .

k1 u2 (29.2) u2 (29.5)

6.3. THE ADDITIVE VALUE MODEL 109

If we set k1 to 1, this formula yields k2 and the trade-offs k3 , k4 and k5 are obtained

similarly. Notice that the re-coding process of the original evaluations into value

functions results in a formulation in which all criteria have to be maximised (in

value).

The above procedure, although rather intuitive and systematic is also quite

complex; the questions are far from easy to answer; starting from one reference

point or another (worst point instead of central point) may result in variations in

the assessments. There are however many possibilities for checking for inconsisten-

cies. Assume for instance that a single-attribute value function has been assessed

by means of a standard sequence that links its scale to the cost criterion; one

may validate this assessment by building a standard sequence that links its scale

to another criterion and compare the two assessments of the same value function

obtained in this way; hopefully they will be consistent; otherwise some sort of

retroaction is required.

Note finally that such methods may not be used when the scale on which the

assessments are made only has a finite number of degrees instead of being the set

of real numbers; at least numerous and densely spaced degrees are needed.

In another line of methods, simplicity and direct intuition are more praised than

scrupulous satisfaction of theoretical requirements, although the theory is not

ignored. An example is SMART (Simple Multi-Attribute Rating Technique),

developed by W. Edwards, which is more a collection of methods than a single

one. We just outline here a variant referring to von Winterfeldt and Edwards

(1986), pp. 278 sq., for more details. In order to re-code, say, the evaluations for

the acceleration criterion, one initially fixes two anchor points that may be the

extreme values of the evaluations on the set of acceptable cars, here 28 and 31

seconds. On the value scale, the anchor points are associated to the endpoints of a

conventional interval of values, for instance 31 to 0 and 28 to 100. Since 29 seconds

seems to be the value under which Thierry considers that a car becomes definitely

attractive from the acceleration viewpoint, it should be assigned to the interval

[28, 29] a range of values larger than 13 , its size (in relative terms) in the original

scale. Thierry could for instance assign 29 seconds to 50 on the value scale. Then

28.5 and 30 could be located respectively in 70 and 10, yielding the initial sketch

of a value function shown on Figure 6.6(a), (with linear interpolation between the

specified values. This picture can be further improved by asking Thierry to see

whether the relative spacings of the locations correctly reflect the strength of his

preferences. Thierry might say that almost the same gain in value (40) from 30

seconds to 29 as from 29 to 28 (gain of 50) is unfair and he could consequently

propose to lower to 40 the value associated with 29 seconds; he also lowers to 65

the value of 28.5 seconds. Suppose he is then satisfied with all other differences

of values; the final version is drawn in Figure 6.6(b). A similar work has to be

carried over for all criteria and the weights must be assessed.

The weights are usually derived through direct numerical judgements of relative

attribute importance. Thierry would be asked to rank-order the attributes; an

110 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

(a) (b)

100 100

90 90

80 80

70 70

60 60

value

value

50 50

40 40

30 30

20 20

10 10

0 0

28 29 30 31 28 29 30 31

acceleration (sec) acceleration (sec)

Figure 6.6: Value function for acceleration criterion: (a) initial sketch; (b) final,

with initial sketch in dotted line

and the importance of each other criterion should be assessed in relation to the least

important one, directly as an estimation of the ratio of weights. This approach

in terms of importance can be and has been criticised. In assessing the relative

weights no reference is made to the underlying scales. This is not appropriate

since weights are trade-offs between units on the various value scales and must

vary with the scaling.

For instance, on the acceleration value scale that is normalised in the 0-100

range, the meaning of one unit varies depending on the range of original evalua-

tions (acceleration measured in seconds) that are represented between value 0 and

value 100 of the value scale. If we had considered that the acceleration evaluations

of admissible cars range from 27 to 32 seconds, instead of from 28 to 31, we would

have constructed a value function u02 with u02 (32) = 0 and u02 (27) = 100; a differ-

ence of one unit of value on the scale u2 illustrated in Figure 6.6 corresponds to a

u0 (28)u0 (31)

(less-than-unit) difference of 2 100 2 on the scale u02 . The weight attached to

that criterion must vary in inverse proportion to the previous factor when passing

from u2 to u02 . It is unlikely that a decision-maker would take the range of evalua-

tions into account when asked to assess weights in terms of relative importance

of criteria, a formulation that seems independent of the scalings of the criteria.

A way of avoiding these difficulties is to give up the notion of importance that

seems misleading in this context and to use a technique called swing-weighting;

the decision-maker is asked to compare alternatives that swing between the

worst and the best level for each attribute in terms of their contribution to the

overall value. The argument of simplicity in favour of SMART is then lost since

the questions to be answered are similar, both in difficulty and in spirit, to those

raised in the approach based on indifference judgements.

6.3. THE ADDITIVE VALUE MODEL 111

The eigenvalue method for assessing attribute weights and single-attribute value

functions is part of a general methodology called Analytic Hierarchy Process; it

consists in structuring the decision problem in a hierarchical manner (as it is also

advocated for building value functions, for instance in Keeney and Raiffa (1976)),

constructing numerical evaluations associated with all levels of the hierarchy and

aggregating them in a specific fashion, formally a weighted sum of single-attribute

value functions (see Saaty (1980), Harker and Vargas (1987)).

In our case, the top level of the hierarchy is Thierrys goal of finding the best

car according to his particular views. The second level consists in the 5 criteria

into which his global goal can be decomposed. The last level can be described as

the list of potential cars. Thus the hierarchical tree is composed of 1 first level

node, 5 second level nodes and 5 times 14 third level nodes also called leaves.

What we have to determine is the strength or priority of each element of a level

in relation to their importance for an element in the next level.

The assessment of the nodes may start (as is usually done) from the bottom

nodes; all nodes linked to the same parent node are compared pairwise; in our

case this amounts to comparing all cars from the point of view of a criterion and

repeating this for all criteria. The same is then done for all criteria in relation to the

top node; the influence of all criteria on the global goal are also compared pairwise.

At each level, the pairwise comparison of the nodes in relation to the parent node is

done by means of a particular method that allows, to some extent, to detect and

correct inconsistencies. For each pair of nodes a, b, the decision-maker is asked

to assess the priority of a as compared to the priority of b. The questions

are expressed in terms of importance or preference or likelihood according

to the context. It is asked for instance how much alternative a is preferred to

alternative b from a certain point of view. The answers may be formulated either

on a verbal or a numerical scale. The levels of the verbal scale correspond to

numbers and are dealt with as such in the computations. The conversion of verbal

levels into numerical levels is described in Table 6.6. There are five main levels on

the verbal scale, but 4 intermediary levels that correspond to numerical codings

2, 4, 6, 8 can also be used. For instance, the level Moderate corresponds to an

alternative that is preferred 3 times more than another or a criterion that is 3

times more important than another. Such an interpretation of the verbal levels

has very strong implications; it means that preference, importance and likelihood

are considered as perceived on a ratio scale (much like sound intensity). This is

indeed Saatys basic assumption; what the decision-maker expresses as a level on

the scale is postulated to be the ratio of values associated to the alternatives or

the criteria. In other words, a number f (a) is assumed to be attached to all a;

when comparing a to b, the decision-maker is assumed to give an approximation

of the ratio ff (a)

(b) . Since verbal levels are automatically translated into numbers in

Saatys method, we shall concentrate on assessing directly on the numerical scale.

Let (a, b) denote the level of preference (or of relative importance) of a over b

expressed by the decision-maker; the results of the pairwise comparisons may thus

be encoded in a square matrix . If Saatys hypotheses are correct, there should

112 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

Numeric 1 3 5 7 9

Table 6.6: Conversion of verbal levels into numbers in Saatys pairwise comparison

method; e.g. Moderate means 3 times more preferred

and in particular,

1

(6.10) (a, b)

(b, a)

In view of the latter relation, only one half (roughly) of the matrix has to be

elicited, which amounts to answering n(n1)

2 questions.

Relation (6.9) implies that all columns of matrix should be approximately

proportional to f . The pairwise comparisons enable to

1. detect departure from the basic hypothesis in case the columns of are too

far from proportional;

2. correct errors made in the estimation of the ratios; some sort of averaging of

the columns is performed yielding an estimation of f .

A test based on statistical considerations allows the user to determine whether the

assessments in the pairwise comparison matrix show sufficient agreement with the

hypothesis that they are approximations of ff (a) (b) , for an unknown f . If the test

conclusion is negative, it is recommended either to revise the assessments or to

choose another approach more suitable for the type of data.

If one wants to apply AHP in a multiple criteria decision problem, pairwise

comparisons of the alternatives must be performed for each criterion; criteria must

also be compared in a pairwise manner to model their importance. This process

results in functions ui that evaluate the alternatives on each criterion i and in

coefficients of importance ki . Each alternative a is then assigned an overall value

v(a) computed as

Xn

(6.11) v(a) = ki ui (a)

i=1

Since Thierry did not apply AHP to his analysis of the case, we have answered

the questions on pairwise comparisons on the basis of the information contained in

his report. For instance, when comparing cars on the cost criterion, more weight

will be put on a particular cost difference, say 1 000 e, when located in the range

6.3. THE ADDITIVE VALUE MODEL 113

from 17 500 e to 21 500 e than when lying between 13 500 e and 17 500 e. This

corresponds to the fact that Thierry said he is rather insensitive to cost differences

up to about 17 500 e, which is the amount of money he had budgeted for his car.

For the sake of concision, we have restricted our comparisons to a subset of cars,

namely the top four cars plus the Renault 19, Mazda 323 and Toyota Corolla.

A major issue in the assessment of pairwise comparisons, for instance of alter-

natives in relation to a criterion, is to determine how many times a is preferred to b

on criterion i from looking at the evaluations gi (a) and gi (b). Of course the (ratio)

scale of preference on i is not in general the scale of the evaluations gi . For ex-

ample, Car 11 costs approximately 17 500 e and Car 12 costs about 16 000 e. The

17 500

ratio of these costs, 16 000 , is equal to 1.09375 but this does not necessarily mean

that Car 12 is preferred 1.09375 times more than Car 11 on the cost criterion; this

is because the cost evaluation does not measure the preferences directly. Indeed, a

transformation (re-scaling) is usually needed to go from evaluations to preferences;

for the cost, according to Thierry himself, the transformation is not linear since

equal ratios corresponding to costs located either below or above 17 500 e do not

correspond to equal ratios of preference. But even in linear parts, the question

is not easily answered. A decision-maker might very well say that Car 12 is 1.5

times more preferred than Car 11 for the cost criterion; or he could say 2 times or

4 times. All depends on what the decision-maker would consider as the minimum

possible cost; for instance (supposing that the transformation of cost into prefer-

ence is linear), if Car 12 is declared to be 1.5 times more preferred to Car 11, the

zero of the cost scale x would be such that

17 500 x

= 1.5 ,

16 000 x

i.e. x = 14 500 e. The problem is even more crucial for transforming scales such

as those on which braking or road-holding are evaluated. For instance, how many

times is Car 3 preferred to Car 10 with respect to the braking criterion? In other

words, how many times is 2.66 better than (preferred to) 2.33?

Similar questions arise for the comparison of importance of criteria. We discuss

the determination of the weights ki of the criteria in formula 6.11 below. For

computing those weights, the relative importance of each criterion with respect

to all others must be assessed. Our assessments are shown in Table 6.7. We

made them directly in numerical terms taking into account a set of weights that

Thierry considered as reflecting his preferences; those weights have been obtained

using the Prefcalc software and a method that is discussed in the next session.

By default, the blanks on the diagonal should be interpreted as 1s; the blanks

below the diagonal are supposed to be 1 over the corresponding value above the

diagonal, according to equation 6.10.

Once the matrix in Table 6.7 has been filled, several algorithms can be proposed

to compute the priority of each criterion with respect to the goal symbolised by

the top node of the hierarchy (under the hypothesis that the elements of the

assessment matrix are approximations of the ratios of those priorities). The most

famous algorithm, which was initially proposed by Saaty, consists in computing

the eigenvector of the matrix corresponding to the largest eigenvalue (see Harker

114 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

Cost 1.5 2 3 3

Acceleration 1.5 2 2

Pick-up 1.5 1.5

Brakes 1

Road-holding

Table 6.7: Assessment of the comparison of importance for all pairs of criteria.

For instance, the number 2 at the intersection of 1st row and 3rd column means

that Cost is considered twice as important as Pick-up

of averaging ratios along paths). Since eigenvectors are determined up to a

multiplicative factor, the vector of priorities is the normalised eigenvector whose

components sum up to unity; the special structure of the matrix (reciprocal matrix)

guarantees that all priorities will be positive. Alternative methods for correcting

inconsistencies have been elaborated; most of them are based on some sort of

a least squares criterion or on computing averages (see e.g. Barzilai, Cook and

Golany (1987) who argue in favour of a geometric mean). Applying the eigenvector

method to the matrix in Table 6.7, one obtains the following values that reflect

the importance of the criteria:

.

Note that only the lowest degrees of the 1 to 9 scale have been used in Table 6.7.

This means that the weights are not perceived as very contrasted; in order to get

the sort of gradation of the weights as above (the ratio of the highest to the lowest

value is about 3), some comparisons have been assessed by non-integer degrees,

which normally are not available on the verbal counterpart of the 1 to 9 scale

described in Table 6.6. When the assessments are made through this verbal scale,

approximations should be made, for instance by saying that cost and acceleration

are equally important and substituting 1.5 by 1. Note that the labelling of the

degrees on the verbal scale may be misleading; one would quite naturally qualify

the degree to which Cost is more important than Acceleration as Moderate

until it is fully realised that Moderate means three times as important; using

the intermediary level between Equal and Moderate would still mean twice

as important.

It should be emphasised that the eigenvalue method is not linear. What

would have changed if we had scaled the importance differently, for instance as-

sessing the comparisons of importance by degrees twice as large as those in Table

6.7 (except for 1s that remain constant)? Would the coefficients of importance

have been twice as large? Not at all! The resulting weights would have been much

more contrasted, namely:

6.3. THE ADDITIVE VALUE MODEL 115

Name of car Nr 7 11 3 12 10 4 6

Honda Civic 7 1.0 1.0 2.0 4.0 4.0 5.0 5.0

Peugeot 309/16V 11 1.0 1.0 2.0 3.0 4.0 4.0 4.0

Nissan Sunny 3 0.50 0.50 1.0 1.50 2.0 3.0 3.0

Peugeot 309 12 0.25 0.33 0.67 1.0 1.0 2.0 2.0

Renault 19 10 0.25 0.25 0.5 1.0 1.0 1.0 1.5

Mazda 323 4 0.2 0.25 0.33 0.5 1.0 1.0 1.0

Toyota Corolla 6 0.2 0.25 0.33 0.5 0.67 1.0 1.0

rion

Using the latter set of weights instead of the former would substantially change the

values attached to the alternatives through formula 6.11 and might even alter their

ordering. So, contrary to the determination of the trade-offs in an additive value

model (which may be re-scaled through multiplying them by a positive number,

without altering the way in which alternatives are ordered by the multi-attribute

value function), there is no degree of freedom in the assessment of the ratios in

AHP; in other words, these assessments are made on an absolute scale.

As a further example, we now apply the method to determine the evaluation

of the alternatives in terms of preference on the Acceleration criterion. Suppose

the pairwise comparison matrix has been filled as shown in Table 6.8, in a way

that seems consistent with what we know of Thierrys preferences. Applying the

eigenvalue method yields the following priorities attached to each of the cars in

relation to acceleration:

6.7; the solid line is a linear interpolation of the priorities in the eigenvector. A

re-scaling of the same criterion had been obtained through the construction of a

standard sequence (see Figure 6.5). Comparing these scales is not straightforward.

Notice that the origin is arbitrary in the single-attribute value model; one may add

any constant number to the values without changing the ranking of the alternatives

(a term equal to the constant number times the trade-off associated to the attribute

would just be added to the multi-attribute value function); since trade-offs depend

on the scaling of their corresponding single-attribute value function, changing the

unit on the vertical axis amounts to multiplying ui by a positive number; the

corresponding trade-off must then be divided by the same number. In the multi-

attribute value model, the scaling of the single-attribute value function is related

to the value of the trade-off; transformation of the former must be compensated

for by transforming the latter. In AHP since the assessments of all nodes are

made independently, no transformation is allowed. In order to compare the two

figures, one may transform the value function of Figure 6.5 so it coincides with

AHP priority on the extreme values of the acceleration half range, i.e. 28 and 29.5.

Figure 6.7 shows the transformed single-attribute value function superimposed

116 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

0.3

0.25

0.2

0.15

0.1

0.05

0

28 28.5 29 29.5 30 30.5 31

acceleration (sec)

method are represented by the solid line; the linearly transformed single-attribute

values of Figure 6.5 are represented by the dotted line on the range from 28 to

29.5 seconds

(dotted line) on the graph of the priorities. There seems to be a good fit of the

two curves but this is only an example from which no general conclusion can be

drawn.

Comments on AHP

Although the models for describing the overall preferences of the decision-maker

are identical in multi-attribute value theory and in AHP, this does not mean that

applying the respective methodologies of these theories normally yields the same

overall evaluation of the alternatives. There are striking differences between the

two approaches from the methodological point of view. The ambition of AHP is

to help construct evaluations of the alternatives for each viewpoint (in terms of

preferences) and of the viewpoints with regard to the overall goal (in terms of

importance); these evaluations are claimed to belong to a ratio scale, i.e. to be

determined up to a positive multiplicative constant. Since the eigenvalue method

yields a particular determination of this constant and this determination is not

taken into account when assessing the relative importance of the various criteria,

the evaluations in terms of preference must be considered as if they were made on

an absolute scale, which has been repeatedly criticised in the literature (see for

instance Belton (1986) and Dyer (1990)). This weakness (that can also be blamed

on direct rating techniques, as mentioned above) could be corrected by asking the

decision-maker about the relative importance of the viewpoints in terms of passing

from the least preferred value to the most preferred value on criterion i compared

6.3. THE ADDITIVE VALUE MODEL 117

to a similar change on criterion j (Dyer 1990). Taking this suggestion into account

would however go against one of the basic principles of Saatys methodology, i.e.

the assumption that the assessments at all levels of the hierarchy can be made

along the same procedure and independently of the other levels. That is probably

why the original method, although seriously attacked, has remained unchanged.

AHP has been criticised in the literature in several other respects. Besides the

fact already mentioned that it may be difficult to reliably assess comparisons of

preferences or of importance on the standard scale described in Table 6.6, there

is an issue about AHP that has been discussed quite a lot, namely the possibility

of rank reversal. Suppose alternative x is removed from the current set and

nothing is changed to the pairwise assessments of the remaining alternatives; it

may happen that an alternative, say, a among the remaining ones could now be

ranked below an alternative b whilst it was ahead of b in the initial situation. This

phenomenon was discussed in Belton and Gear (1983) and Dyer (1990) (see also

Harker and Vargas (1987) for a defense of AHP).

functions and trade-offs

Various methods have been conceived in order to avoid direct elicitation of a

multi-attribute value function. A class of such methods consists in postulating

an additive value model (as described in formulae 6.7 and 6.8) and inferring all

together the shapes of all single-attribute value functions and the values of all

the trade-offs from declared global preferences on a subset of well-known alterna-

tives. The idea is thus to infer a general preference model from partial holistic

information about the decision-makers preferences.

Thierry used a method of disaggregation of preferences described in Jacquet-

Lagreze and Siskos (1982); it is implemented in a software called Prefcalc, which

computes piece-wise linear single-attribute value functions and is based on linear

programming (see also Jacquet-Lagreze (1990), Vincke (1992)). More precisely,

the software helps to build a function

n

X

u(a) = ui (gi (a))

i=1

such that a % b u(a) u(b). Without loss of generality, the lowest (resp.

highest) value of u is conventionally set to 0 (resp. 1); 0 (resp. 1) is the value of an

(fictitious) alternative whose assessment on each criterion would be to the worst

(resp. best) evaluation attained for the criterion on the current set of alternatives.

This fictitious alternative is sometimes called the anti-ideal (resp. ideal ) point.

In our example, the anti-ideal car, costs 21 334 e, needs 30.8 seconds to cover

1 km starting from rest and 41.6 seconds, starting in fifth gear at 40km/h; its

performance regarding brakes and road-holding are respectively 1.33 and 1.25.

The ideal car on the opposite side of the range, costs 13 841 e, needs 28 seconds

to cover 1km starting from rest and 34.7 seconds, starting in fifth gear at 40km/h;

its performance regarding brakes and road-holding are respectively 2.66 and 3.25.

118 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

Brake .1 Road .1

Choosing a car problem; the value of the trade-off is written in the right upper

corner of each box

The shape of the single-attribute value function for the cost criterion for in-

stance is modelled as follows. The user fixes the number of linear pieces; suppose

that you decide to set it to 2 (which is a parsimonious option and the default

value proposed in Prefcalc); the single-attribute value function of the cost could

for instance be represented as in Figure 6.8. Note that the maximal value of the

utility (reached for a cost of 13 841 e) is scaled in such a way that it corresponds

to the value of the trade-off associated with the cost criterion, i.e. .43 in the exam-

ple shown in Figure 6.8. Note also that with two linear pieces, one for each half

of the cost range, the single-attribute value function is completely determined by

two numbers, i.e. the utility value at mid-range and the maximal utility. Those

values, say u1,1 , u1,2 are variables of the linear program that Prefcalc writes and

solves. The pieces of information on which the formulation of the linear program

relies are obtained from the user. The user is asked to select a few alternatives

that he is familiar with and feels able to rank-order according with his overall

preferences. The ordering of these alternatives, which include the fictitious ideal

and anti-ideal ones, induces the corresponding order on their overall value and

hence, generates constraints of the linear program. Prefcalc then tries to find

levels ui,1 , ui,2 for each criterion i, which will make the additive value function

compatible with the declared information. If the program is not contradictory,

i.e. if an additive value function (with 2-piece piece-wise linear single-attribute

value functions) proves compatible with the preferences, the system tries to find a

solution among all feasible solutions, that maximises the discrimination between

6.3. THE ADDITIVE VALUE MODEL 119

the selected alternatives. If no feasible solution can be found, the system proposes

to increase the number of variables of the model, for instance by using a higher

number of linear pieces in the description of the single-attribute value functions.

This method could be described as a learning process; the system fits the

parameters of the model on the basis of partial information about the users pref-

erences; the set of alternatives on which the user declares his global preferences

may be viewed as a learning set. For more details on the method, the reader is

referred to Vincke (1992), Jacquet-Lagreze and Siskos (1982).

In his ex post study Thierry selects five cars, besides the ideal and anti-ideal

ones and ranks them in the following order:

value function is described in Figure 6.8.

Thierry examines this result and makes the following comments. He agrees

with many features of the fitted single-attribute value functions and in particular

with,

1. the lack of sensitivity in the price in the range from 13 841 e to 17 576 e (he

was a priori estimating his budget at about 17 500 e);

acceleration criterion (above 29 seconds, the car is useless since a difference

of 1 second in acceleration results in the faster car being two car lengths

ahead of the slower one at the end of the test; Thierry declares this criterion

to be the second most important after cost (weight = .43);

the acceleration test starting from 40 km/h (above 38 seconds he agrees that

the car loses all attractiveness; the car is not only used in competition; it

must be pleasant in everyday use and hence, the third criterion has a certain

importance although it is of less importance than the second one);

However, Thierry disagrees with the modelling of the braking criterion, which

he considers equally important as road-holding. He believes that the relative

importance of the fourth and fifth criteria should be revised. Thierry then looks

at the ranking of the cars according to the computed value function. The ranking

as well as the multi-attribute value assigned to each car are given in Table 6.9.

120 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

1 * Peugeot 309/16 (Car 11) 0.84

2 * Nissan Sunny (Car 3) 0.68

3 Renault 19 (Car 10) 0.66

4 Peugeot 309 (Car 12) 0.65

5 Honda Civic (Car 7) 0.61

6 Fiat Tipo (Car 1) 0.54

7 Opel Astra (Car 8) 0.54

8 Mitsubishi Colt (Car 5) 0.53

9 Mazda 323 (Car 4) 0.52

10 Toyota Corolla (Car 6) 0.50

11 Alfa 33 (Car 2) 0.49

12 * Mitsubishi Galant (Car 13) 0.48

13 * Ford Escort (Car 9) 0.32

14 * R 21 (Car 14) 0.16

Table 6.9: Ranking obtained using Prefcalc. The cars ranked by Thierry are those

marked with a *

Thierry feels that Car 10 (Renault 19) is ranked too high while Car 7 (Honda

Civic) should be in a better position.

In view of these observations, Thierry modifies the single-attribute value func-

tions for criteria 4 and 5. For the braking criterion, the utility (0.01) associated

with 2 remains unchanged while the utility of the level 2.7 is raised to 0.1 instead

of 0.01. The road-holding criterion is also modified; the value (0.2) associated with

the level 3.2 is lowered to 0.1 (see Figure 6.9). Note that Prefcalc normalises the

value function in order that the ideal alternative is always assigned the value 1;

of course due to the numbers display format with two decimal positions, the sum

of the maximal values of the single-attribute value functions may be only approx-

imately equal to 1. Running Prefcalc with the altered value functions returns the

ranking in table 6.10 and the revised multi-attribute value after each car name.

After he sees the modified ranking yielded by Prefcalc, Thierry feels that the

new ranking is fully satisfactory. He observes that if he had used Prefcalc a few

years earlier, he would have made the same choice as he actually did; he considers

this as a good point as far as Prefcalc is concerned. He finally makes the following

comments: Using Prefcalc has enhanced my understanding of both the data and

my own preferences; in particular I am more conscious of the relative importance

I give to the various criteria.

First let us emphasise an important psychological aspect of the empirical validation

of a method or a tool, which is common in human practice: the fact that previous

intuition or previous more informal analyses are confirmed by using a tool, here

Prefcalc, contributes to raising the level of confidence the user puts in the tool.

6.3. THE ADDITIVE VALUE MODEL 121

Brake .1 Road .1

Figure 6.9: Modified single-attribute value functions for the braking and road-

holding criteria

1 * Peugeot 309/16 (Car 11) 0.85

2 * Nissan Sunny (Car 3) 0.75

3 Honda Civic (Car 7) 0.66

4 Peugeot 309 (Car 12) 0.65

5 Renault 19 (Car 10) 0.61

6 Opel Astra (Car 8) 0.55

7 Mitsubishi Colt (Car 5) 0.54

8 Mazda 323 (Car 4) 0.53

9 Fiat Tipo (Car 1) 0.51

10 Toyota Corolla (Car 6) 0.50

11 * Mitsubishi Galant (Car 13) 0.48

12 Alfa 33 (Car 2) 0.47

13 * Ford Escort (Car 9) 0.32

14 * R 21 (Car 14) 0.16

Table 6.10: Modified ranking using Prefcalc. The cars ranked by Thierry are those

marked with *

122 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

Observe that the user may well have a very vague understanding of the method

itself; he simply validates the method by using it to reproduce results that he has

confidence in. After such a successful empirical validation step he will be more

prone to use the method in new situations that he does not master that well.

What are the drawbacks and traps of Prefcalc? Obviously Prefcalc can only be

used in cases where the overall preference of the decision-maker can be represented

by an additive multi-attribute value function (as described by Equation 6.8). In

particular, this is not the case when preferences are not transitive or not complete

(for arguments supporting the possible observation of non-transitive preferences,

see the survey by Fishburn (1991)). There are some additional restrictions due

to the fact that the shapes of the single-attribute value functions that can be

modelled by Prefcalc are limited to piece-wise linear functions. This is hardly a

restriction when dealing with a finite set of alternatives; by adapting the number

of linear pieces one can obtain approximations of any continuous curve that can

be as accurate as desired. When bounded to a small number of pieces, this may

however be a more serious restriction.

Stability of ranking

The main problem raised by the use of such a tool is the indetermination of the

estimated single-attribute value functions (including the estimation of the trade-

offs). Usually, if the preferences declared on the set of well-known alternatives are

compatible with an additive value model, there will be several value functions that

can represent these preferences. Prefcalc chooses one such representation according

to the principles outlined above, i.e. the most discriminating (in a sense). Other

choices of a model albeit compatible with the declared preferences on the learning

set, may lead to variations in the rankings of the remaining alternatives. Slight

variations in the trade-off values can yield rank reversals. For instance, with all

trade-offs within .02 of their value in Figure 6.9, changes already occur. Passing

from the set of trade-offs (.43, .23, .13, .10, .10) to (.45, .21, .11, .12, .10) results in

exchanging the positions of Honda Civic and Peugeot 309, which are ranked 3rd

and 4th respectively after the change. This rank reversal is obtained by putting

slightly more emphasis on cost and slightly less on performance. Note that such

a slight change in the trade-offs has an effect on the ranking of the top 4 cars,

those on which Thierry focused after his preliminary analysis (see Table 6.3). It

should thus be very clear that in practice, determining the trade-offs with sufficient

accuracy could be both crucial and challenging. It is therefore of prime importance

to carry out a lot of sensitivity analyses in order to identify which parts of the

result remain reasonably stable.

In view of the fact that small variations of the trade-offs may even result in changes

in the ranking of the top alternatives, one may question the influence of the se-

lection of a learning set. In the case under examination, the top two alternatives

were chosen to be in the learning set and hence, are constrained to appear in the

6.3. THE ADDITIVE VALUE MODEL 123

correct order in the output of Prefcalc. What would have happened if the learning

set had been different?

Let us take another subset of 5 cars and declare preferences that agree with

the ranking validated by Thierry (Table 6.10). When substituting the top 2 cars

(Peugeot 309/16V, Nissan Sunny) by Renault 19, Mitsubishi Colt, two cars in the

middle segment of the ranking, the vector of trade-offs is (.53, .06, .08, .08, .25)

and the top four in the new ranking are Renault 19 (1), Peugeot 309 (2), Peugeot

309/16V (3), and Nissan Sunny (4); Honda Civic is relegated to the 12th position.

In the choice of the present learning set, stronger emphasis has been put on cost

and safety (brakes and road-holding) and much less on performance (acceleration

and pick up); three of the former top cars remain in the top four; Honda recedes

due to its higher cost and its weakness on road-holding; Renault 19 is heading the

race mainly due to excellent road-holding.

top cars and removing Renault 19. Clearly, the value of the trade-offs may depend

drastically on the learning set. Some sort of preliminary analysis of the users

preferences can help to choose the learning set or understand the variations in the

ranking and the trade-offs a posteriori. In the present case, one can be relatively

satisfied with the results since the top 3 cars are usually well-ranked; the ranking

of the Honda Civic is much more unstable and it is not difficult to understand why

(weakness on road-holding and relatively high cost). The Renault 19 appears as

an outsider due to excellent road-holding. Of course for the rest of the cars huge

variations may appear in their ranking, but one is usually more interested in the

top ranked alternatives.

programming model to reduce the indeterminacies (essentially, by choosing to

maximise the contrast between the evaluations of the alternatives in the learning

set) is not aimed at being as insensitive as possible with regard to the selection

of a learning set. Other options could be experimentally investigated in order to

see whether some could consistently yield more stable evaluations. It should be

noted however that stability, which may be a desirable property in the perspective

of uncovering an objective model of preferences measurement, is not necessarily

a relevant requirement when the goal is to exploit partial available information.

One may expect that the decision-maker will naturally choose alternatives that

he considers as clearly distinct from one another as members of the learning set;

the analyst might alternatively instruct the decision-maker to do so. In a learning

process, where, typically, information is incomplete, it must be decided how to

complement the available facts by some arbitrary default assumptions. The infor-

mation should then be collected while taking the assumptions made into account;

one may consider that in the case of Prefcalc, the analysts instructions of select-

ing alternatives that are as contrasted as possible, is in good agreement with the

implementation options.

124 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

6.3.4 Conclusion

This section has been devoted to the construction of a formal model that represents

preferences on a numerical scale. Such a model can only be expected to exist when

preferences satisfy rather demanding hypotheses; it thus relies on firm theoretical

bases, which is undoubtedly part of the intellectual appeal of the method. There

is at least one additional advantage to theoretically well-founded decision models;

such models can be used to legitimate a decision to persons that have not been

involved in the decision making process. Once the hypotheses of the model have

been accepted or proved valid in a decision context and provided the process of

elicitation of the various parameters of the model has been conducted correctly,

the decision becomes transparent.

The additive multi-attribute value model is rewarding, when established and

accepted by the stake-holders, since it is directly interpretable in terms of decision;

the best decision is the one the model values most (provided the imprecisions in

the establishment of the model and the uncertainties in the evaluation information

allow to discriminate at least between the top alternatives). The counterpart of

the clear-cut character of the conclusions that can be drawn from the model is

that establishing the model requires a lot of information and of a very precise and

particular type. This means that the model may be inadequate not only because

the hypotheses could not be fulfilled but also because the respondents might feel

unable to answer the questions or because their answers might not be reliable.

Indirect methods based on exploiting partial information and extrapolating it (in

a recursive validation process) may help when the information is not available in

explicit form; it remains that the quality of the information is crucial and that a lot

of it is needed. In conclusion, direct assessment of multi-attribute value functions

is a narrow road between the practical problem of obtaining reliable answers to

difficult questions and the risks involved in building a model on answers to simpler

but ambiguous questions.

In the next section we shall explore a very different formal approach that may

be less demanding with regard to the precision of the information, but also provides

less conclusive outputs.

6.4.1 Condorcet-like procedures in decision analysis

Is there any alternative way of dealing with multiple criteria evaluation in view

of a decision to the one described above for building a one-dimensional synthetic

evaluation on some sort of super-scale? To answer this question (positively), in-

spiration can be gained from the voting procedures discussed in Chapter 2 (see

also Vansnick (1986)). Suppose that each voter expresses his preferences through

a complete ranking of the candidates. With Bordas method, each candidate is

assigned a rank for each of the voters (rank 1 if candidate is ranked first by a voter,

rank 2 if he is ranked second, and so on); the Borda score of a candidate is the

sum of the ranks assigned to him by the voters; the winner is the candidate with

6.4. OUTRANKING METHODS 125

Cars 1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 5 3 1 2 2 3 3 2 3 2 2 2 2 3

2 2 5 2 4 2 3 2 3 3 1 1 1 4 3

3 4 4 5 4 4 4 4 4 4 3 2 3 5 4

4 3 1 1 5 1 3 1 2 1 2 1 1 4 2

5 3 3 1 5 5 3 2 2 2 3 1 1 5 2

6 2 2 1 2 2 5 2 2 2 2 1 1 3 2

7 3 3 1 4 4 4 5 3 4 3 2 2 4 4

8 3 2 1 4 4 4 3 5 3 2 0 2 4 3

9 2 3 1 4 4 3 1 2 5 2 1 2 4 3

10 4 4 2 3 2 3 2 3 3 5 3 2 4 3

11 4 4 3 4 4 4 4 5 4 3 5 4 4 5

12 4 4 2 4 4 4 4 4 3 4 3 5 5 4

13 3 2 0 2 1 2 1 2 1 1 1 0 5 1

14 2 3 1 3 3 3 1 3 3 2 0 1 4 5

Table 6.11: Number of criteria in favour of a when compared to b for all pairs of

cars a, b in the Choosing a car problem

the smallest Borda score. This method can be seen as a method of construction of

a synthetic evaluation of the alternatives in multiple criteria decision analysis, the

points of view corresponding to the voters and the alternatives to the candidates;

all criteria-voters have equal weight and coding by the rank number of the position

of the candidate in a voters preference looks like a form of evaluation.

Condorcets method consists of a kind of tournament where all candidates

compare in pairwise contests. A candidate is declared to be preferred to another

according to a majority rule, i.e. if more voters rank him before the latter than

the converse. The result of such a procedure is a preference relation on the set

of candidates that in general is neither transitive nor acyclic. A further step is

thus needed in order to exploit this relation in view of the selection of one or

several candidates or in view of ranking all the candidates. This idea can of course

be transposed in the multiple criteria decision context. We do this below, using

Thierrys case again for illustrative purpose; we show how the problems raised by

a direct transposition rather naturally lead to elementary outranking methods.

For each pair of cars a and b, we count the number of criteria according to

which a is at least as good as b. This yields the matrix given in Table 6.11; the

elements of the matrix are integers ranging from 0 to 5. Note that we might have

alternatively decided to count the criteria for which a is better than b, not taking

into account criteria for which a and b are tied.

What we could call the Condorcet preference relation is obtained by deter-

mining for each pair of alternatives a, b whether or not there is a (simple) majority

of criteria for which a is at least as good as b. Since there are 5 criteria, the ma-

jority is reached as soon as at least 3 criteria favour alternative a when compared

to b. The preference matrix is thus obtained by substituting 1 to any number

larger or equal to 3 in Table 6.11 and 0 to any number smaller than 3 yielding the

126 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

Cars 1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 1 1 0 0 0 1 1 0 1 0 0 0 0 1

2 0 1 0 1 0 1 0 1 1 0 0 0 1 1

3 1 1 1 1 1 1 1 1 1 1 0 1 1 1

4 1 0 0 1 0 1 0 0 0 0 0 0 1 0

5 1 1 0 1 1 1 0 0 0 1 0 0 1 0

6 0 0 0 0 0 1 0 0 0 0 0 0 1 0

7 1 1 0 1 1 1 1 1 1 1 0 0 1 1

8 1 0 0 1 1 1 1 1 1 0 0 0 1 1

9 0 1 0 1 1 1 0 0 1 0 0 0 1 1

10 1 1 0 1 0 1 1 1 1 1 1 0 1 1

11 1 1 1 1 1 1 1 1 1 1 1 1 1 1

12 1 1 0 1 1 1 1 1 1 1 0 1 1 1

13 1 0 0 0 0 0 0 0 0 0 0 0 1 0

14 0 1 0 1 1 1 0 1 1 0 0 0 1 1

1 at the intersection of the a row and the b column means that a is rated not

lower than b on at least 3 criteria

relation described by the 0-1 matrix in Table 6.12. Note that a criterion counts

both in favour of a and in favour of b only if a and b are tied on that criterion;

the relation is reflexive since any alternative is at least as good as itself along all

criteria.

It is not immediately apparent that this relation has cycles and even cycles that

go through all alternatives; an instance of such a cycle is 1, 7, 10, 11, 3, 12, 5, 2,

14, 8, 9, 4, 6, 13, 1. Obviously it is not straightforward to suggest a good choice on

the basis of such a relation since one can find 3 criteria (out of 5) saying that 1 is

at least as good as 7, 3 (possibly different) criteria saying that 7 is at least as good

as 10, . . . , and finally 3 criteria saying that 13 is at least as good as 1. How can

we possibly obtain something from this matrix in view of our goal of selecting the

best car? A closer look at the preference relation reveals that some alternatives

are preferred to most others while some to only a few ones; among the former are

alternatives 11 (preferred to all), 3 (preferred to all but one), 12 (preferred to all

but 2), 7 and 10 (preferred to all but 3). The same alternatives appear as seldom

beaten: 3 and 11 (only once, excluding by themselves), 12 (twice), then come 10

(5 times) and 7 (6 times).

To make things appear more clearly, by avoiding cycles as much as possible,

one might decide to impose more demanding levels of majority in the definition of

a preference relation. We might require that an alternative be at least better than

another on 4 criteria. The new preference relation is shown in Table 6.13.

All cycles in the previous relation disappeared. When ranking the alternatives

6.4. OUTRANKING METHODS 127

Cars 1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 1 0 0 0 0 0 0 0 0 0 0 0 0 0

2 0 1 0 1 0 0 0 0 0 0 0 0 1 0

3 1 1 1 1 1 1 1 1 1 0 0 0 1 1

4 0 0 0 1 0 0 0 0 0 0 0 0 1 0

5 0 0 0 1 1 0 0 0 0 0 0 0 1 0

6 0 0 0 0 0 1 0 0 0 0 0 0 0 0

7 0 0 0 0 0 0 1 0 1 0 0 0 1 1

8 0 0 0 1 1 0 0 1 0 0 0 0 1 0

9 0 0 0 0 0 0 0 0 1 0 0 0 1 0

10 1 1 0 0 0 0 0 0 0 1 0 0 1 0

11 1 1 0 1 1 0 1 1 1 0 1 1 1 1

12 1 1 0 1 1 1 1 1 0 1 0 1 1 1

13 0 0 0 0 0 0 0 0 0 0 0 0 1 0

14 0 0 0 0 0 0 0 0 0 0 0 0 0 1

Table 6.13: Condorcet preference relation for the Choosing a car problem. A

1 at the intersection of the a row and the b column means that a is rated not

lower than b on at least 4 criteria

by the number of those they beat (i.e. are at least as good on 4 criteria or more)

one sees that 3, 11 and 12 come in the first position (they are preferred to 10 other

cars), then there is a big gap after which come 7, 8 and 10 that beat only 3 other

cars. Conversely, there are two non-beaten cars, 3 and 11, then come 10 and 12

(beaten by one car); 7 is beaten by 3 cars.

In the present case, we see that the simple approach that was used essentially

makes the same cars emerge as the methods used so far. There are at least two

radical differences between approaches based on the weighted sum and some more

sophisticated way of assessing each alternative by a single number that synthesises

all the criteria values. One is that all criteria have been considered equally impor-

tant; it is possible however to take information on the relative importance of the

criteria into account as will be seen in section 6.4.3.

The second difference is more in the nature of the type of approach; the most

striking point is that the size of the differences in the evaluations of a and b for all

criteria does not matter; only the signs of those differences do. In other words, had

the available information been rankings of the cars with respect to each criterion

(instead of numeric evaluations), the result of the Condorcet procedure would

have been exactly the same. More precisely, suppose that all that we know (or

that Thierry considers relevant in terms of preferences) about the cost criterion is

the ordering of the cars according to the estimated cost, i.e.

Car 10 1 Car 3 1 Car 13 1 Car 11 1 Car 8 1

Car 1 1 Car 7 1 Car 9 1 Car 14

128 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

. . . . Suppose that similar hypotheses are made for the other 4 criteria; if this were

the case we would have obtained the same matrices as in Tables 6.12 and 6.13.

Of course, neglecting the size of the differences for a criterion such as cost may

appear as misusing the available information; there are at least two considerations

that could mitigate this commonsense reaction:

the assessments for the cars on the cost criterion are rather rough estimations

of an expected cost (see section 6.1.1); in particular it is presumed that on

average the lifetimes of all alternatives are equal; is it reasonable in those

circumstances to rely on precise values of differences of these estimations to

select the best alternative?

estimations of cost, even reliable ones, are not necessarily related with pref-

erences on the cost criterion in a simple way.

Such issues were discussed extensively in section 6.2.4. The whole analysis carried

out there was aimed towards the construction of a multiple criteria value function,

which implies making any difference in evaluations on a criterion equivalent to

some uniquely defined difference for any other criterion. The many methods that

can be used to build a value function by questioning a decision-maker about his

preferences may well fail however; let us list a few reasons for the possible failure

of these methods:

time pressure may be so intense that there is not enough time available to

engage in the lengthy elicitation process of a multiple criteria value function;

it may be that the importance of the decision to be made does not justify

such an effort;

the decision-maker might not know how to answer the questions or might

try to answer but prove inconsistent or might feel discomfort in being forced

to give precise answers where things are vague to him;

in case of group decision, the analyst may be unable to make the various

decision-makers agree on the answers to be given to some of the questions

raised in the elicitation process.

and other approaches may be preferred. This appears perhaps better if we consider

the more artificial scales associated with criteria 4 and 5 (see section 6.1.1 concern-

ing the construction of these scales). Take, for instance, criterion 4 (Brakes). Does

the difference between the levels 2.33 and 2.66 have a quantitative meaning? If it

does, is this difference, in terms of preferences, more than, less than or equal to

the difference between the levels 1.66 and 2? How much would you accept to pay

(in terms of criterion 1) to raise the value for criterion 4 from 2.33 to 2.66 or from

1.33 to 2.33? Of course questions raised for eliciting value functions are more indi-

rect but they still require a precise perception of the meaning of the levels on the

scale of criterion 4 by the decision-maker. Such a perception can only be obtained

6.4. OUTRANKING METHODS 129

by having experienced the braking behaviour of specific cars rated at the various

levels of the scale, but such knowledge cannot be expected from a decision-maker

(otherwise there would be no room on the marketplace for all the magazines that

evaluate goods in order to help consumers spend their money while making the

best choice). Also remember that braking performance has been described by the

average of 3 indices evaluating aspects of the cars braking behaviour; this does

not favour a deep intuitive perception of what the levels on that scale may really

mean. So, one has to admit that in many cases the definition of the levels on scales

is quite far from precise in quantitative terms and it may be hygienic not to use

the fallacious power of numbers. This is definitely the option chosen in the meth-

ods discussed in the present section. Not that these methods are purely ordinal;

but differences between levels on a scale are carefully categorised, yet usually in a

coarse-grained fashion, in order not to take into account differences that are only

due to the irrelevant precision of numbers.

The Condorcet idea for a voting procedure has been transposed in decision analysis

under the name of outranking methods. Such a transposition takes the peculiari-

ties of the decision analysis context into account, in particular the fact that criteria

may be perceived as unequally important; additional elements such as the notion

of discordance have also been added. The principle of these methods is as fol-

lows. Each pair of alternatives is considered in turn independently of third part

alternatives; when looking at alternatives a and b, it is claimed that a outranks

b if there are enough arguments to decide that a is at least as good as b, while

there is no essential reason to refute that statement (Roy (1974), cited by Vincke

(1992), p. 58). Note that taking strong arguments against declaring a preference

into account is typically what is called discordance and is original with respect

to the simple Condorcet rule. Such an approach has been operationalised through

various procedures and particularly the family of ELECTRE methods associated

with the name of B. Roy. (For an overview of outranking methods, the reader is

referred to the books by Vincke (1992) and Roy and Bouyssou (1993)). Below, we

discuss an application of the simplest of these methods, ELECTRE I, to Thierrys

case; ELECTRE I is a tool designed to be used in the context of a choice deci-

sion problem; it builds up a set of which the best alternativeaccording to the

decision-makers preferencesshould be a member. Let us emphasise that this set

cannot be described as the set of best alternatives, not even a set of good alter-

natives, but just a set that contains the best alternatives. We shall then show

how the fundamental ideas of ELECTRE I can be sophisticated, in particular in

view of helping to rank the alternatives. Our goal is not to make a survey of all

outranking methods; we just want to present the basic ideas of such methods and

illustrate some problems they may raise.

130 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

(and more generally methods based on pairwise comparisons) do not generally yield

preferences that are transitive (not even acyclic). This point was already made in

Chapter 2 about Condorcets method. Since the hypotheses of Arrows theorem

can be re-formulated to be relevant in the framework of multiple criteria decision

analysis (through the correspondence candidate-alternative, voter-criterion; see

also Bouyssou (1992) and Perny (1992)), it is no wonder that methods based on

comparisons of alternatives by pairs, independently of the other alternatives, will

seldom directly yield a ranking of the alternatives. The pairs of alternatives that

belong to the outranking relation are normally those between which the preference

is established with a high degree of confidence; contradictions are reflected either in

cycles (a outranks b that outranks c that . . . that outranks a) or incomparabilities

(neither a outranks b nor the opposite).

Let us emphasise that the lack of transitivity or of completeness, although rais-

ing operational problems, may be viewed not as a weakness but rather as faithfully

reflecting preferences as they can be perceived at the end of the study. Defenders

of the approach support the idea that forcing preferences to be expressed in the

format of a complete ranking is in general too restrictive; there is experimental

evidence that backs their viewpoint (Tversky (1969), Fishburn (1991)). Explicit

recognition that some alternatives are incomparable may be an important piece of

information for the decision-maker.

In addition, as repeatedly stressed in the writings of B. Roy, the outranking

relation should be interpreted as what is clear-cut in the preferences of the decision-

maker, something like the surest and most stable expression of a complex, vague

and evolving object that is named, for simplicity, the preferences of the decision-

maker. In this approach very little hypotheses are made on preferences (like

rationality hypotheses); one may even doubt that preferences pre-exist the process

from which they emerge.

The analysis of a decision problem is conceived as an informational process,

in which, carefully, prudently and interactively, models are built that reflect, to

some extent, the way of thinking, the feelings and the values of a decision-maker;

in this concept, the concern is not making a decision but helping a decision-maker

to make up his mind, helping him to understand a decision problem while taking

his own values into account in the modelling of the decision situation.

The approach could be called constructive; it has many features in common

with a learning process; however, in contrast with most artificial intelligence prac-

tice, the model of preferences is built explicitly and formally; preferences are not

simply described through rules extracted from partial information obtained on a

learning set. For more about the constructive approach including comparisons with

the classical normative and descriptive approaches (see Bell, Raiffa and Tversky

(1988)), the reader is referred to Roy (1993).

Once the outranking relation has been constructed, the job of suggesting a

decision is thus not straightforward. A phase of exploitation of the outranking

relation is needed in order to provide the decision-maker with information more

6.4. OUTRANKING METHODS 131

advantage of good control on the transformation of the multi-dimensional infor-

mation into a model of the decision-makers preferences including a certain degree

of inconsistency and incompleteness.

We briefly review the principles of the ELECTRE I method. For each pair of

alternatives a and b, the so-called concordance index is computed; it measures

the strength of the coalition of criteria that support the idea that a is at least as

good as b. The strength of a coalition is just the sum of the weights associated to

the criteria that constitute the coalition. The notion of weights will be discussed

below. If all criteria are equally important, the concordance index is proportional

to the number of criteria in favour of a as compared to b as in the Condorcet-like

method discussed above. The level from which a coalition is judged strong enough

is determined by the so-called concordance threshold; in the Condorcet voting

method, with the simple majority rule, this threshold is just half the number of

criteria and in general one will choose a number above half the sum of the weights

of all criteria. Another feature that contrasts ELECTRE with pure Condorcet but

also with purely ordinal methods, is that some large differences in evaluation, when

in disfavour of a, might be pinpointed as preventing a from outranking b. One

therefore checks whether there is any criterion for which b is so much better than a

that it would make it meaningless for a to be declared preferred overall to b; if this

happens for at least one criterion one says that there is a veto to the preference of a

over b. If the concordance index passes some threshold (concordance threshold)

and there is no veto of b against a, then a outranks b. Note that the outranking

relation is not asymmetric in general; it may happen that a outranks b and that b

outranks a.

This process yields a binary relation on the set of alternatives, which may

have cycles and be incomplete (neither a outranks b nor the opposite). In order

to propose a set of alternatives of particular interest to the decision-maker from

which the best compromise alternative should emerge, one extracts the kernel of

the graph of the outranking relation after having the cycles reduced; in other words,

all alternatives in a cycle are considered to be equivalent; they are substituted by

a unique representative node; in the resulting relation without cycles, the kernel

is defined as a subset of alternatives that do not outrank one another and such

that each alternative not in the kernel is outranked by at least one alternative in

the kernel; in particular all non-outranked alternatives belong to the kernel. In a

graph without cycles, a unique kernel always exists. It should be emphasised that

all alternatives in the kernel are not necessarily good candidates for selection; an

alternative incomparable to all others is always in the kernel; alternatives in the

kernel may be beaten by alternatives not in the kernel. So, the kernel may be

viewed as a set of alternatives on which the decision-makers attention should be

focused.

In order to apply the method to Thierrys case, we successively have to deter-

mine

132 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

a concordance threshold

ordered pairs of evaluations that lead to a veto (and this for every criterion)

The concordance index c(a, b), that measures the coalition of criteria along which

a is at least as good as b may be computed by the formula

X

(6.12) c(a, b) = pi

i:gi (a)gi (b)

where the pi s are normalised weights that reflect the relative importance of the

criteria; gi (a) denotes, as usual, the evaluation of alternative a for criterion i (which

is assumed to be maximised; if it were to be minimised, the weight pi would be

added when the converse inequality holds, i.e. gi (a) gi (b)). So, as often as the

evaluation of a passes or equals that of b on a criterion, its weight now enters into

the weight of the coalition (additively) in favour of a. A criterion can count both

for a against b and the opposite if and only if gi (a) = gi (b).

In the context of outranking, the weights are not trade-offs; they are completely

independent of the scales for the criteria. A practical consequence is that one

may question the decision-maker in terms of relative importance of the criteria

without reference to the scales on which the evaluations for the various viewpoints

are expressed. This does not mean however that they are independent of the

method and that one could use values given spontaneously by the decision-maker

or through questioning in terms of importance without care, without reference

to the evaluations as is done in Saatys procedure. It is important to bear in mind

how the weights will be used, in this case to measure the strength of coalitions

in pairwise comparisons and decide on the preference only on the basis of the

coalitions.

To be more specific and contrast the meaning of the weights from those used

in weighted sums, let us first consider those suggested by Thierry in section 6.2.2,

i.e. (1, 2, 1, 0.5, 0.5). Note that these were not obtained through questioning on the

relative importance of criteria but in the context of the weighted sum with Thierry

bearing re-scaled evaluations in mind: the evaluations on each criterion had been

divided by the maximal value gi,max attained for that criterion. Dividing the

weights by their sum (= 5), yields the normalised weights (.2, .4, .2, .1, .1). Using

these weights in outranking methods would lead to an overwhelming predominance

of criteria 2 (Acceleration) and 3 (Pick-up), which are also linked since they are

facets of the cars performance. With such weights and a concordance threshold of

at least .5 , it is impossible for a car to be outranked when it is better on criteria 2

and 3 even if all other criteria are in favour of an opponent. It was never Thierrys

intention that once a car is better on criteria 2 and 3, there is no need for looking

at the other criteria; the whole initial analysis shows on the contrary, that a fast

and powerful car is useless, for instance, if it is bad on the braking or road-holding

criterion. Such a feature of the preference structure could indeed be reflected

6.4. OUTRANKING METHODS 133

through the use of vetoes, but only in a negative manner, i.e. by removing the

outranking of a safe car by a powerful one, not by allowing a safe car to outrank

a powerful one. Note that the above weights may nevertheless be appropriate

for a weighted sum because in such a method, the weights are multiplied by the

evaluations (or re-coded evaluations). To make it clearer, consider the following

reformulation of the condition under which a is preferred to b in the weighted sum

model (a similar formulation is straightforward in the additive value model)

n

X

(6.13) a % b iff ki (gi (a) gi (b)) 0.

i=1

If a is slightly better than b on a point of view i, the influence of this fact in the

comparison between a and b is reflected by the term ki (gi (a) gi (b)) which

is presumably small. Hence, important criteria count for little in pairwise com-

parisons when the difference between the evaluations of the alternatives are small

enough. On the contrary, in outranking methods, weights are not divided; when a

is better than b on some criterion, the full weight of the criterion counts in favour

of a, whether a is either slightly or by far better than b.

Since the weights in a weighted sum depend on the scaling of each criterion and

there is no acknowledged standard scaling, it makes no sense in principle to use

the weights initially provided by Thierry as coefficients measuring the importance

of the criteria in an outranking method. If we nevertheless try to use them, we

might consider the weights used with the normalised criteria of Table 6.4. We see

that the importance of the safety coalition (Criteria 4 and 5) would be negligible

(weight = .20), while the importance of the performance coalition (Criteria 2

and 3) would be overwhelming (weight = .60). There is another reasonable nor-

malisation of the criteria that does not fix the zero of the scale but rather maps

the smallest attained value gi,min onto 0 and the largest gi,max onto 1. Transform-

ing the weights accordingly (i.e. multiplying them by the inverse of the range of

the values for the corresponding criterion prior to the transformation) one would

obtain (.28, .14, .13, .20, .25) as a weight vector. With these values as coefficients of

importance, the safety coalition (Criteria 4 and 5; weight = .45) becomes more

important than the performance coalition (Criteria 2 and 3; weight = .27) that

Thierry may consider unfair. As an additional conclusion, one may note that the

values of the weights vary tremendously depending on the type of normalisation

applied.

Now look at the weights (.35, .24, .17, .12, .12 ) obtained through Saatys ques-

tioning procedure in terms of importance (see section 6.3.2). Using these weights

for measuring strength of coalitions does not seem appropriate, since criteria 1 and

2s predominance is too strong (joint weight = .35 + .24 = .59).

Due to the all or nothing character of the weights in ELECTRE I, one is

inclined to choose less contrasted weights than those examined above. Although

there are procedures that have been proposed to elicit such weights (see Mousseau

(1993), Roy and Bouyssou (1993)), we will just choose a set of weights in an

intuitive manner; let us take weights proportional to (10, 8, 6, 6, 6) as reflecting the

relative importance of the criteria. At least the ordering of the values seems to be

134 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

Cars 1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 1 .5 .17 .33 .33 .56 .61 .33 .61 .33 .33 .33 .33 .61

2 .49 1 .44 .83 .33 .56 .44 .61 .61 .28 .28 .28 .83 .61

3 .83 .73 1 .73 .73 .73 .78 .78 .83 .56 .44 .56 1 .78

4 .66 .17 .28 1 .17 .56 .28 .44 .28 .44 .28 .28 .78 .44

5 .66 .66 .28 1 1 .56 .44 .44 .44 .66 .28 .28 1 .44

6 .44 .44 .28 .44 .44 1 .44 .44 .44 .44 .28 .28 .61 .44

7 .56 .56 .22 .73 .73 .73 1 .56 .83 .56 .39 .39 .73 .83

8 .66 .39 .22 .73 .73 .73 .61 1 .66 .39 0 .39 .73 .66

9 .39 .56 .17 .73 .73 .56 .17 .33 1 .39 .17 .39 .73 .61

10 .83 .73 .44 .56 .33 .56 .61 .61 .61 1 .61 .33 .83 .61

11 .83 .73 .56 .73 .73 .73 .78 1 .83 .56 1 .73 .73 1

12 .83 .73 .44 .73 .73 .73 .78 .78 .61 .83 .61 1 1 .78

13 .66 .39 0 .39 .17 .39 .28 .44 .28 .17 .28 0 1 .28

14 .39 .56 .22 .56 .56 .56 .17 .56 .56 .39 0 .22 .73 1

Table 6.14: Concordance index (rounded to two decimals) for the Choosing a

car problem

weight vector yields (.27, .22, .17, .17, .17) after rounding in such a way that the

normalised weights sum up to 1.00. The weights of the three groups of criteria

are rather balanced; .27 for cost, .39 for performance and .34 for safety. The

concordance matrix c(a, b) computed with these weights is shown in Table 6.14.

At this stage we have to build the concordance relation, a binary relation obtained

through deciding which coalitions in Table 6.14 are strong enough; this is done by

selecting a concordance threshold above which we consider that they are. If we set

the concordance threshold at .60, we obtain a concordance relation with a cycle

passing through all alternatives but one, which is Car 3. This tells us something

about coalitions that we did not know. Previous analysis with equal weights (see

Section 6.4.1) showed that the relation in Table 6.12, obtained through looking at

concordant coalitions involving at least three criteria, had a cycle passing through

all alternatives; with the weights we have now chosen, the lightest coalition

of three criteria involves criteria 3, 4 and 5 and weighs .51; then, in increasing

order, we have three different coalitions weighing .56 (two of the criteria 3, 4, 5

with criterion 2), and three coalitions weighing .61 (two of the criteria 3, 4, 5 with

criterion 1); finally there are three coalitions weighing .66 (one of the three criteria

3, 4, 5 together with criteria 1 and 2). Cutting the concordance index at .60 thus

only keeps the 3-coalitions that contain criterion 1 with the coalitions involving at

least 4 criteria.

The new thing that we can learn is the following: the relation obtained by

looking at coalitions of at least 4 criteria plus coalitions of three that involve

criterion 1 has a big cycle. When we cut above .62 there is no longer a cycle. The

lightest 4-coalition weighs .73 and there is only one value of the concordance

index between .61 and .73, namely .66. So cutting between .66 and .72 will yield

the relation in Table 6.13, which we have already looked at; a poorer relation

(i.e. with fewer arcs) is obtained when cutting above .73. In the sequel we will

6.4. OUTRANKING METHODS 135

concentrate on two values of the concordance threshold, .60 and .65, that are

on both sides of the borderline separating concordance relations with and without

cycles; above these values, concordance relations tend to become increasingly poor;

below, they are less and less discriminating.

In the above presentation the weights sum up to 1. Note that multiplying

all the weights by a positive number would yield the same concordance relations

provided the concordance threshold is multiplied by the same factor; the weights

in ELECTRE I may be considered as being assessed on a ratio scale, i.e. up to a

positive scaling factor.

Before studying discordance and veto we show how a concordance relation, which

is just an outranking relation without veto, can be used for supporting a choice

or a ranking in a decision process. Introducing vetoes will just remove arcs from

the concordance relation but the operations performed on the outranking relation

during the exploitation phase are exactly those that are applied below to the

concordance relation.

In view of supporting a choice process, the exploitation procedure of ELECTRE

I firstly consists in reducing the cycles, which amounts to consider all alternatives

in a cycle as equivalent. The kernel of the resulting acyclic relation is then searched

for and it is suggested that the kernel contains all alternatives on which the at-

tention of the decision-maker should be focused. Obviously, reducing the cycles

involves some drawbacks. For example, cutting the concordance relation of Table

6.14 at .60 yields a concordance relation with cycles involving all alternatives but

Car 3; there is no simple cycle passing once through all alternatives except Car 3;

an example of (non-simple) cycle is 1, 7, 9, 5, 10, 11, 12, 2, 14, 13, 1 plus, starting

from 12, 12, 8, 4, 1 and again, 12, 6, 1. Reducing the cycles of this concordance

relation results in considering two classes of equivalent alternatives; one class is

composed of the single Car 3 while the other class comprises all other alternatives.

Beside the fact that this partition is not very discriminating it also considers as

equivalent alternatives that are not in the same simple cycle. Moreover, the infor-

mation on how the alternatives compare with respect to all others is completely

lost; for instance Car 12, which beats almost all other alternatives in the cut at

.60 of the concordance relation, would be considered as equivalent to Car 6 which

beats almost no other car.

For illustrative purposes, we consider the cut at level .65 of the concordance

index, which is the largest acyclic concordance relation that can be obtained; this

relation is shown in Table 6.15. Its kernel is composed of cars 3, 10 and 11. Cars 3

and 11 are not outranked and car 10 is the only alternative that is not outranked

either by car 3 or by car 11. This seems to be an interesting set in a choice process,

in view of the analysis of the problem carried out so far.

Rankings of the alternatives may also be obtained from Table 6.15 in a rather

simple manner. For instance, consider the alternatives either in decreasing order

of the number of alternatives they beat in the concordance relation or in increasing

order of the number of alternatives by which they are beaten in the concordance

136 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

Cars 1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 1 0 0 0 0 0 0 0 0 0 0 0 0 0

2 0 1 0 1 0 0 0 0 0 0 0 0 1 0

3 1 1 1 1 1 1 1 1 1 0 0 0 1 1

4 1 0 0 1 0 0 0 0 0 0 0 0 1 0

5 1 1 0 1 1 0 0 0 0 1 0 0 1 0

6 0 0 0 0 0 1 0 0 0 0 0 0 0 0

7 0 0 0 1 1 1 1 0 1 0 0 0 1 1

8 1 0 0 1 1 1 0 1 1 0 0 0 1 1

9 0 0 0 1 1 0 0 0 1 0 0 0 1 0

10 1 1 0 0 0 0 0 0 0 1 0 0 1 0

11 1 1 0 1 1 1 1 1 1 0 1 1 1 1

12 1 1 0 1 1 1 1 1 0 1 0 1 1 1

13 1 0 0 0 0 0 0 0 0 0 0 0 1 0

14 0 0 0 0 0 0 0 0 0 0 0 0 1 1

Table 6.15: Concordance relation for the Choosing a car problem with weights

.28, .22, .17, .17, .17 and concordance threshold .65

Class 1 2 3 4 5 6 7 8 9

A 11 3, 12 8 7 5 9, 10 2, 4 13, 14 1, 6

(11) (10) (7) (6) (5) (3) (2) (1) (0)

B 3, 11 12 10 7, 8 9 2, 6, 14 5 1, 4 13

(0) (1) (2) (3) (4) (5) (6) (8) 11

Table 6.16: Rankings obtained from counting how many alternatives are beaten

(ranking A) or beat (ranking B) each alternative in the concordance relation

(threshold .65); the numbers between parentheses in the second row of ranking A

(resp. ranking B) are the numbers of beaten (resp. beating) alternatives for each

alternative of the same column in the first row

Table 6.15 and ranking the alternatives accordingly (we do not count the 1s on

the diagonal since the coalition of criteria saying that an alternative is at least

as good as itself always encompasses all criteria); the corresponding rankings are

respectively labelled A and B in Table 6.16. We observe that the usual group

of good alternatives form the top two classes of these rankings.

There are more sophisticated ways of obtaining rankings from outranking re-

lations. ELECTRE II, which we do not describe here, was designed for fulfilling

this goal. To some extent, it makes better use of the information contained in the

concordance index, since the ranking is based on two cuts, one linked with a weak

preference threshold, the other, with a strong preference threshold; for instance

in our case, one could consider that the .60 cut corresponds to weak preference

(or weak outranking) while the .65 cut corresponds to strong preference. In the

6.4. OUTRANKING METHODS 137

above method, the information contained in other cutting levels has been totally

ignored although the rankings obtained from them may not be identical. They

may even differ significantly as can be seen when deriving a ranking from the .60

cut by using the method we applied to the .65 cut.

Thresholding

To this point, both in the Condorcet-like method and the basic ELECTRE I

method (without veto), we treated the assessments of the alternatives as if they

were ordinal data, i.e. we could have obtained exactly the same results (kernel

or ranking) by working with the orders induced from the set of alternatives by

their evaluations on the various criteria. Does this mean that outranking methods

are purely ordinal? Not exactly! More sophisticated outranking methods exploit

information that is richer than purely ordinal but not as demanding as cardinal.

This is done through what we shall call thresholding. Thresholding amounts to

identifying intervals on the criteria scales, which represent the minimal difference

evaluation above which a particular property holds. For instance, consider that

the assessment of b on criterion i, gi (b), is given and criterion i is to be maximised;

from which value gi (b) + ti (gi (b)) onwards, will an alternative a be said to be

preferred to b? Implicitly, we have considered previously that b was preferred to a

on criterion i as soon as gi (b) gi (a), i.e. we have considered that ti (gi (b)) = 0. In

view of imprecision in the assessments and since it is not clear for all criteria that

there is a marked preference when the difference |gi (a)gi (b)| is small, one may be

led to consider a non-null threshold to model preference. In our case, for instance,

it is not likely that Thierry would really mark a preference between cars 3 and 10

on the Cost criterion since their estimated costs are within 10 e (see Table 6.2).

Thresholding is all the more important that, as mentioned at the end of section

6.4.1, the size of the interval between the evaluations is not taken into account

when deciding that a is overall preferred to b. Hence one should be prudent when

deciding that a criterion is or is not an argument for saying that a is at least as

good as b; therefore, it is reasonable to determine a threshold function ti and say

that criterion i is such an argument as soon as gi (a) gi (b) + ti (gi (b)); since we

examine reasons for saying that a is at least as good as b, not for saying that a is

(strictly) better than b, the function ti should be negatively valued.

Determining such a threshold function is not necessarily an easy task. One

could ask the decision-maker to tell, ideally for each evaluation gi (a) of each al-

ternative on each criterion, from which value onwards an evaluation should be

considered at least as good as gi (a). Things may become simpler if the threshold

may be considered constant or proportional to gi (a) (e.g. ti (gi (a)) = .05 gi (a)).

Note that constant thresholds could be used when a scale is linear in the sense

that equal differences throughout a scale have the same meaning and consequences

(see end of section 6.2.3); however this is not a necessary condition since some dif-

ferences, but not all, need to be equivalent throughout the scale. In any case,

Definition 6.12 of the concordance index is adapted in a straightforward manner

138 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

as follows and the method for building an outranking relation remains unchanged:

X

(6.14) c(a, b) = pi .

i:gi (a)gi (b)+ti (gi (b))

Note that preference thresholds, that lead to indifference zones, are used in

a variant of the ELECTRE I method called ELECTRE IS (see Roy and Skalka

(1984) or Roy and Bouyssou (1993)).

Thresholding is a key tool in the original outranking methods; it allows one

to bypass the necessity of transforming the original evaluations to obtain linear

scales. There is another occasion for invoking thresholds, which is in the analysis

of discordance.

Remember that the principle of the outranking methods consists in examining the

validity of the proposition a outranks b; the concordance index measures the

arguments in favour of saying so, but there may be arguments strongly against that

assertion (discordant criteria). These discordant voices can be viewed as vetoes;

there is a veto against declaring that a outranks b if b is so much better than

a on some criterion that it becomes disputable or even meaningless to pretend

that a might be better overall than b. Let us emphasise that the effect of a veto

is quite radical, just like in the voting context. If a veto threshold is passed on

a criterion when comparing two alternatives, then the alternative against which

there is a veto, say a, may not outrank the other one, say b; this may result in

incomparabilities in the outranking relation if in addition b does not outrank a,

either because the coalition of criteria stating that b is at least as good as a is not

strong enough or because there is also a veto of a against b on another criterion.

To be more precise, a veto threshold on criterion i is in general a function vi

encoding a difference in evaluations so big that it would be out of the question to

say that a outranks b if

when criterion i is to be minimised, or

when criterion i is to be maximised. Of course it may be the case that the function

vi be a constant.

In our case, in view of Thierrys particular interest in sporty cars, the criterion

most likely to yield a veto is acceleration. Although there was no precise indica-

tion on setting vetoes in Thierrys preliminary analysis (section 6.1.2), one might

speculate that on the acceleration criterion, pairs such as (28, 29.6), (28.3, 30),

(29, 30.4), (29, 30.7) (all evaluations expressed in seconds) and all intervals wider

than those listed, lead to a veto (against claiming that the alternative with the

higher evaluation could be preferred to the other one, since here, the criterion is

to be minimised). If this would seem reasonable then we would not be far from

6.4. OUTRANKING METHODS 139

accepting a constant veto threshold of about 1.5 or 1.6 second. If we decide that

there is a veto with a constant threshold on the acceleration criterion for differ-

ences exceeding 1.5 second, it means that a car that accelerates from 0 to 100

km/h in 29.6 seconds (as is the case of Peugeot 309 GTI) could not conceivably

outrank a car which does it in 28 (as Honda Civic does) whatever the evaluations

on the other criteria might be. Of course, setting the veto threshold to 1.5 implies

that a car needing 30.4 seconds (like Mazda 323) may not outrank a car that

accelerates in 28.9 (like Opel Astra or Renault 21) but might very well outrank

a car that accelerates in 29 (like Nissan Sunny) if the performances on the other

criteria are superior. Using 1.5 as a veto threshold thus implies that differences

of at least 1.5 from 28 to 29.6 or from 28.9 to 30.4 have the same consequences

in terms of preference. Setting the value of the veto threshold obviously involves

some degree of arbitrariness; why not set the threshold at 1.4 second, which would

imply that Mazda 323 may not outrank Nissan Sunny? In such cases, it must be

verified whether small variations around the chosen value of a parameter (such as

a veto threshold) do not influence the conclusions in a dramatic manner; if small

variations do have a strong influence, detailed investigation is needed in order to

decide which setting of the parameters value is most appropriate. A related facet

of using thresholds is that growing differences that are initially not significant,

brutally crystallise into significant ones as soon as a crisp threshold is passed; ob-

viously methods using thresholds may show discontinuities in their consequences

and that is why sensitivity analysis is even more crucial here than with more clas-

sical methods. However, the underlying logic is quite similar to that on which

statistical tests are based; here as well, conventional levels of significance (like the

famous 5% rejection intervals) are widely used to decide whether a hypothesis

must be rejected or not. We will allude in the next section to more gradual

methods that can be designed on the basis of concordance-discordance principles

similar to those outlined above.

In order not to be too long we do not develop the consequences of introducing

veto thresholds in our example. It suffices to say that the outranking relation, its

kernel and the derived rankings are not dramatically modified in the present case.

approaches

The ideas behind the methods analysed above may be summarised as follows. For

each pair of alternatives (a, b) it is determined whether a outranks b by comparing

their evaluations gi (a) and gi (b) on each point of view i. The pairs of evaluations

are compared to intervals that can be viewed as typical of classes of ordered pairs of

evaluations on each criterion (for instance the classes indifference, preference

and veto). On the basis of the list of classes to which it belongs for each criterion

(its profile), the pair (a, b) is declared to be or not to be in the outranking

relation.

Note that

a credibility index of outranking (for instance weak and strong outrank-

ing) may be defined; to each value of the index corresponds a set of profiles;

140 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

if the profile of the pair (a, b) is one of those associated with a particular

value of credibility of outranking, then the outranking of b by a is assigned

this value of credibility index; there are of course rationality requirements

for the sets of profiles associated with the various values of the credibility

index; this credibility index is to be interpreted in logical terms; it models

the degree to which it is true that there are enough arguments in favour of

saying that a is better than b while there is no strong reason of refuting this

statement (see the definition of outranking in Section 6.4.2);

thresholds may be used to determine the classes in differences for preference

on each criterion, provided differences gi (a) gi (b) equal to such thresh-

olds have the same meaning independently of their location on the scale of

criterion i (linearity property);

the rules for determining whether a outranks b (eventually to some degree of

a credibility index) generally involve weights that describe the relative impor-

tance of the criteria; these weights are typically used additively to measure

the importance of coalitions of criteria independently of the evaluations of

the alternatives.

The result of the construction, i.e. the outranking relation (possibly qualified

with a degree of a credibility index), is then exploited in view of a specific type of

decision problems (choice, ranking, . . . ). It is supposed to include all the relevant

and sure information about preference that could be extracted from the data and

the questions answered by the decision-maker.

Due to their lack of transitivity and acyclicity, procedures are needed to derive

a ranking or a choice set from the outranking relation. In the process of deriving

a complete ranking from the outranking relation, the property of independence

of irrelevant alternatives (see Chapter 2 where this property is evoked) is lost;

this property was satisfied in the construction of the outranking relation since

outranking is decided by looking in turn at the profiles of each pair of alternatives,

independently of the rest. Since this is an hypothesis of Arrows theorem and it is

violated, the conclusion of the theorem is not necessarily valid and one may hope

that there is no criterion playing the role of dictator.

The various procedures that have been proposed for exploiting the outrank-

ing relation (for instance transforming it into a complete ranking) are not above

criticism; it is especially difficult to justify them rigorously since they operate on

an object that has been constructed, the outranking relation. Since the decision-

maker has no direct intuition of this object, one can hardly expect to get reliable

answers when questioning him about the properties of this relation. On the other

hand, a direct characterisation of the ranking produced by the exploitation of an

outranking relation seems out of reach.

Non-compensation

The weights count entirely or not at all in the comparison of two alternatives; the

smaller or larger difference in evaluations between alternatives does not matter

once a certain threshold is passed. This fact, which was discussed in the second

6.4. OUTRANKING METHODS 141

of outranking methods. A large difference in favour, say, of a over b on some

criterion is of no use to compensate for small differences in favour of b on many

criteria since all that counts for deciding that a outranks b is the list of criteria in

favour of a. Vetoes only have a negative action, impeding that outranking be

declared. The reader interested in the non-compensation property is referred to

Fishburn (1976), Bouyssou and Vansnick (1986), Bouyssou (1986).

For some pairs (a, b) it may be the case that neither a outranks b nor the opposite;

this can occur not only because of the activation of a veto but alternatively because

the credibility of both the outranking of a by b and of b by a are not sufficiently

high. In such a case a and b are said to be incomparable. This may be interpreted in

two different ways. One may advance that some alternatives are too contrasted to

be compared. It has been argued, for instance, that comparing a Rolls Royce with

a small and cheap car proves impossible because the Rolls Royce is incomparably

better on many criteria but is also incomparably more expensive. Another example

concerns the comparison of projects that involve the risk of loss of human life;

should one prefer a more expensive project with a lower risk or a less expensive one

with higher risk (see Chapter 5, Section 5.3.3, for evaluations of the cost of human

losses in various countries)? Other people support the idea that incomparability

results from insufficient information; the available information sometimes does not

allow to make up ones mind on whether a is preferred to b or the converse.

In any case, incomparability should not be assimilated to indifference. Indiffer-

ence occurs when alternatives are considered as almost equivalent; incomparability

is more concerned with very contrasted alternatives. The treatment of the two cat-

egories is quite different in the exploitation phase; indifferent alternatives should

appear in the same class of a ranking or in neighbouring one, while incomparable

alternatives may be ranked in classes quite far apart.

wards valued relations

Looking at the variants of the ELECTRE method suggests that there is a general

pattern on which they are all built:

alternatives are considered in pairs and eventually, outranking is determined

on the basis of the profiles of performance of the pair only;

the differences between the evaluations of a pair of alternatives for each cri-

terion are categorised in discrete classes delimited by thresholds (preference,

veto, . . . );

rules are invoked to decide which combinations of these classes lead to out-

ranking; more generally, there are several grades of outranking (weak, strong

in ELECTRE II, . . . ) and rules associate specific combinations of classes to

each grade;

142 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

in view of supporting the decision process.

Defining the classes through thresholding raises the problem of discontinuity

alluded to in the previous section. It is thus appealing to work with continuous

classes of differences of preference for each criterion, i.e. directly with valued re-

lations. A value cj (a, b) on arc (a, b) models the degree to which alternative a is

preferred to alternative b on criterion j. These degrees are often interpreted in

logical fashion as a degree of credibility of the preference. Then each combination

of values of the credibility index on the various criteria may be assigned an overall

value of the credibility index for outranking; the outranking relation is also valued

in such a context.

Dealing with valued relations and especially combining values raises a ques-

tion: which operations may be meaningfully (or just reasonably) performed on

them. Our analysis of the weighted sum in section 6.2 has taught us that opera-

tions that may appear as natural, rely on strong assumptions that suppose very

detailed information on the preferences.

Consider the following formula which is used in ELECTRE III, a method

leading to a valued outranking relation (see Roy and Bouyssou (1993) or Vincke

(1992)), to compute the overall degree of credibility S(a, b) of the outranking of b

by a.

if Dj (a, b) c(a, b) j

c(a, b)

S(a, b) = Q 1Dj (a,b)

c(a, b) j:Dj (a,b)>c(a,b) 1c(a,b) otherwise

enter into the detail of how c(a, b) or Dj (a, b) can be computed; just remember

that they are valued between 0 and 1.

The justification of such a formula is mainly heuristic in the sense that the re-

sponse of the formula to the variation of some inputs is not counter-intuitive: when

discordance raises outranking decreases; the converse with concordance; when dis-

cordance is maximal there may not be any degree of outranking at all. This does

not mean that the formula is fully justified. Other formulae might have been

chosen with similarly good heuristic behaviour. The weighted sum also has good

heuristic properties at first glance, but deeper investigation shows that the val-

ues it yields cannot be trusted as a valid representation of the preferences unless

additional information is requested from the decision-maker and used to re-code

the original evaluations gj . The formula above involves operations such as mul-

tiplication and division that suppose that concordance and discordance indices

are plainly cardinal numbers and not simply labels of ordered categories. This is

indeed a strong assumption that does not seem to us to be supported by the rest

of the approach, in particular by the manner in which the indices are elaborated;

in the elementary outranking methods (ELECTRE I and II) much care was taken,

for instance, to avoid performing arithmetical operations on the evaluations gi (a);

only cuts of the concordance index were considered (which is typically an opera-

tion valid for ordinal data); vetoes were used in a very radical fashion. No special

6.4. OUTRANKING METHODS 143

attention, comparable to what was needed to build value functions from the eval-

uations, was paid to building concordance and discordance indices; in particular,

nothing guarantees that these indices can be combined by means of arithmetic

operations and produce an overall index S representative of a degree of credibility

of an outranking. For instance, consider the following two cases which lead to an

outranking degree of .4:

the concordance index c(a, b) is equal to .40 and there is no discordance (i.e.

Dj (a, b) = 0 for all j);

the concordant coalition weighs .80 but there is a strong discordance on

criterion 1; D1 (a, b) = .90 while Dj (a, b) = 0 for all j 6= 1.

For both, the formula yields a degree of outranking of .40. Obviously another

formula with similar heuristic behaviour might have resulted in quite different

outputs. Consider for instance the following:

On the first case, it yields an outranking degree of .40 as well but on the second

case, the degree falls to .10. It is likely that in some circumstances a decision-maker

might find the latter model more appropriate. Note also that the latter formula

does not involve arithmetic operations on c(a, b) and the 1 Dj (a, b)s but only

ordinal operations, namely taking the minimum. This means that transforming

c(a, b) and the 1 Dj (a, b)s by an increasing transformation of the [0, 1] interval

would just amount to transforming the original value of S(a, b) by the same trans-

formation. This is not the case with the former formula. Hence, if the information

content of the c(a, b) and the 1 Dj (a, b)s just consists in the ordering of their

values in the [0, 1] interval, then the former formula is not suitable. For a survey

of possible ways of aggregating preferences into a valued relation, the reader is

referred to chapters 2 and 3 of the book edited by Slowinski (1998).

The fact that the value obtained for the outranking degree may involve some de-

gree of arbitrariness did not escape Roy and Bouyssou (1993) who explain (p.417)

that the value of the degree of outranking obtained by a formula like the above

should be handled with care; they advocate that thresholds be used when com-

paring two such values: the outranking of b by a can be considered to be more

credible than the outranking of d by c only if S(a, b) is significantly larger than

S(c, d). We agree with this statement but unfortunately it seems quite difficult to

assign a value to a threshold above which the difference S(a, b) S(c, d) could be

claimed as significant.

There are thus two directions that can be followed for taking the objections to

the formula of ELECTRE III into account. In the first option, one considers that

the meaning of the concordance and discordance degrees is ordinal and one tries to

determine a family of aggregation formulae that fulfil basic requirements including

compatibility with the ordinal character of concordance and discordance. The

other option consists in revising the way concordance and discordance indices are

constructed in order to have a quantitative meaning that allows to use arithmetic

operations for aggregating them. That is, at least tentatively, the option followed

144 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

in the PROMETHEE methods (see Brans and Vincke (1985) or Vincke (1992);

these methods may be interpreted as aiming towards building a value function

on the pairs of alternatives; this function would represent the overall difference in

preference between any two alternatives. The way that this function is constructed

in practice however, leaves the door open to remarks analogous to those addressed

to the weighted sum in Section 6.2.

This long chapter has enabled us to travel through the continent of formal methods

of decision analysis; by formal we mean those methods relying on an explicit

mathematical model of the decision-makers preferences. We neither looked into

all methods nor did we explore those we looked into completely. There are other

continents that have been almost completely ignored, in particular all the methods

that do not rely on a formal modelling of the preferences (see for instance the

book edited by Rosenhead (1989) in which various approaches are presented for

structuring problems in view of facilitating decision making).

On the particular topic of multi-attribute decision analysis, we may summarise

our main conclusions as follows:

Numbers do not always mean what they seem to. It makes no sense to ma-

nipulate raw evaluations without taking the context into account. Numbers

may have an ordinal meaning, in which case it cannot be recommended to

perform arithmetic operations on them; they may be evaluations on an inter-

val scale or a ratio scale and there are appropriate transformations that are

allowed for each type of scale. We have also suggested that the significance

of a number may be intermediate between ordinal and cardinal; in that case,

the interval separating two evaluations might be given an interpretation: one

might take into consideration the fact that intervals are e.g. large, medium

or small. Evaluations may also be imprecise and knowing that should influ-

ence the way they will be handled. Preference modelling is specifically the

activity that deals with the meaning of the data in a decision context.

Preference modelling does not only take objective information linked with

the evaluations or with the data, such as the type of scale or the degree

of precision or the degree of certainty into account. It also incorporates

subjective information in relation to the preferences of the decision maker.

Even if numeric evaluations actually mean what they seem to, their signifi-

cance is not immediately in terms of preferences: the interval separating two

evaluations must be reinterpreted in terms of difference in preferences.

The (vague) notion of importance of the criteria and its implementation are

strongly model-dependent. Weights and trade-offs should not be elicited in

the same manner depending on the type of model since e.g. they may or may

not depend on the scaling of the criteria.

There are various types of models that can be used in a decision process.

There is no best model; all have their strong points and their weak points.

6.5. GENERAL CONCLUSION 145

result of an evaluation, in a given decision situation, of the chances of being

able to elicit the parameters of the corresponding model in a reliable manner;

these chances obviously depend on several factors including the type and

precision of the available data, the way of thinking of the decision-maker,

his knowledge of the problem. Another factor that should be considered for

choosing a model, is the type of information that is wanted as output: the

decision maker needs different information when he has to rank alternatives

to when he has to choose among alternatives or when he has to assign them

to predefined (ordered) categories (we put the latter problem aside in our

discussion of the car choosing case). So, in our view, the ideal decision

analyst, should master several methodologies for building a model. Notice

that additional dimensions make the choice and the construction of a model

in group decision making even more difficult; the dynamics of such decision

processes is by far more complex, involving conflicts and negotiation aspects;

constructing complete formal models in such contexts is not always possible,

but it remains that using problem structuring tools (such as cognitive maps)

may prove profitable.

output may be discordant or even contradictory. We have encountered such

a situation several times in the above study; cars may be ranked in different

positions according to the method that is used. This does not puzzle us too

much. First of all, because the observed differences appear more as vari-

ants than as contradictions; the various outputs are remarkably consistent

and the variants can be explained to some extent. Second, the approaches

use different concepts and the questions the decision maker has to answer

are accordingly expressed in different languages; this of course induces vari-

ability. This is no wonder since the information that decision analysis aims

at capturing cannot usually be precisely measured. It is sufficient to recall

that experiments have shown that there is much variability in the answers

of subjects submitted to the same questions at time intervals. Does this

mean that all methods are acceptable? Not at all. There are several criteria

of validity. One is that the method has to be accepted in a particular de-

cision situation; this means that the questions asked to the decision-maker

must make sense to him and he should not be asked for information he is

unable to provide in a reliable manner. There are also internal and external

consistency criteria that a method should fulfil. Internal consistency implies

making explicit the hypotheses under which data form an acceptable input

for a method; then the method should perform operations on the input that

are compatible with the supposed properties of the input; this in turn induces

an output which enjoys particular properties. External consistency consists

in checking whether the available information matches the requirements of

acceptable inputs and whether the output may help in the decision process.

The main goal of the above study was to illustrate the issue of internal and

external validity on a few methods in a specific simple problem.

146 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

Besides the above points that are specific to multiple criteria preference models,

more general lessons can also be drawn.

If we consider our trip from the weighted sum to the additive multi-attribute

value model in retrospect, we see that much self-confidence and therefrom

much convincing power can be gained by eliciting conditions under which

an approach such as the weighted sum would be legitimate. The analysis

is worth the effort because precise concepts (like trade-offs and values) are

sculptured through analysis that also results in methods for eliciting the

parameters of the model. Another advantage of theory is to provide us

with limits, i.e. conditions under which a model is valid and a method is

applicable. From this viewpoint and although the outranking methods have

not been fully characterised, it is worth noticing that their study has recently

made theoretical progress (see e.g. Arrow and Raynaud (1986), Bouyssou

and Perny (1992), Vincke (1992), Fodor and Roubens (1994), Tsoukias and

Vincke (1995) , Bouyssou (1996), Marchant (1996), Bouyssou and Pirlot

(1997)), Pirlot (1997)) .

An advantage of formal models that could not be overemphasised is that

they favour communication. In the course of the decision process, the con-

struction of the model requires that pieces of information, knowledge and

priorities that are usually implicit or hidden, be brought into light and taken

into account; also, the choice of the model reflects the type of available in-

formation (more or less certain, precise, quantitative). The result is often

a synthesis of what is known and what has been learnt about the decision

problem in the process of elaborating the model. The fact that a model is

formal also allows for some sort of calculations; in particular, testing to what

extent the conclusions are stable when the evaluation of imprecise data are

varied is possible within formal models. Once a decision has been made, the

model does not lose its utility. It can provide grounds for arguing in favour

or against a decision. It can be adapted to make ulterior decisions in similar

contexts.

The decisiveness of the output depends on the richness of the infor-

mation available. If the knowledge is uncertain, imprecise or simply non-

quantitative in nature, it may be difficult to build a very strong model;

by strong, we mean a model that clearly suggests a decision as, for in-

stance, those that produce a ranking of the alternatives. Other models

(and especially those based on pairwise comparisons of alternatives and ver-

ifying the independence of irrelevant alternatives property) are not able

structurallyto produce a ranking; they may nevertheless be the best possi-

ble synthesis of the relevant information in particular decision situations. In

any case, even if the model leads to a ranking, the decision is to be taken by

the decision-maker and it is not in general an automatic consequence of the

model (due for instance to imprecisions in the data that calls for a relativi-

sation of the models prescription). As will be illustrated in greater detail in

Chapter 9, the construction of a model is not all of the decision process.

7

DECIDING AUTOMATICALLY:

THE EXAMPLE OF RULE BASED

CONTROL

7.1 Introduction

The increasing development of automatic systems in most sectors of human ac-

tivities (e.g. manufacturing, management, medicine, etc.) has progressively led

to involving computers in many tasks traditionally reserved to humans, even the

more strategic ones such as control, evaluation and decision-making. The main

function of automatic decision systems is to act as a substitute for humans (deci-

sion makers, experts) in the execution of repetitive decision tasks. Such systems

can be in charge of all or part of the decision process. The main tasks to be per-

formed by automatic decision systems are collecting information (e.g. by sensors),

making a diagnosis of the current situation, selecting relevant actions, executing

and controlling these actions. Automatisation of these tasks requires the elabo-

ration of computational models able to simulate human reasoning. Such models

are, in many respects, comparable to those involved in the scientific preparation

of human decisions. Indeed, deciding automatically is also a matter of representa-

tion, evaluation and comparison. For this reason, we introduce and discuss some

very simple techniques used to design rule-based decision/control systems. This is

one more opportunity for us to address some important issues linked to descrip-

tive, normative and constructive aspects of mathematical modelling for decision

support:

extent, to be able to predict, simulate and extrapolate human reasoning and

decision-making in an autonomous way. This requires different tasks such

as the collection of human expertise, the representation of knowledge, the

extraction of rules and the modelling of preferences. For all these activities,

the choice of appropriate formal models, symbolical as well as numerical, is

crucial in order to describe situations and process information.

fixed and well formalised body of knowledge that could be exploited by the

analyst responsible for the implementation of a decision system. Valuable

information can be obtained from human experts, but this expertise is often

147

148 CHAPTER 7. DECIDING AUTOMATICALLY

formal model handling the core of human skill in decision-making must be

constructed by the analyst, in close cooperation with experts. They must

decide together what type of input should be used, what type of output is

needed, and what type of consideration should play a role in linking output

to input. One must also decide how to link subjective symbolic information

(close to the language of the expert) and objective numeric data that can be

accessible to the system.

an exhaustive list of situations with their adequate solution. Usually, this

type of information is given only for a sample of typical situations, which

implies that only a partial model can be constructed. To be fully efficient,

this model must be completed with some general principles and rules used

by the expert. In order to extrapolate examples as well as expert decision

rules in a reasonable way, there is a need for normative principles putting

constraints on inference so as to decide what can seriously be inferred by the

system from any new input. Hence, the analysis of the formal properties of

our model is crucial for the validation of the system.

These three points show how the use of formal models and the analysis of the

mathematical properties of the models are crucial in automatic decision-making.

In this respect, the modelling exercise discussed here is comparable to those treated

in the previous chapters, concerning human decision-making, but includes spe-

cial features due to the automatisation (stable pre-existing knowledge and pref-

erences, real-time decision-making, closed system completely autonomous, etc.).

We present a critical introduction to the use of simple formal tools such as fuzzy

sets and rule-based system to model human knowledge and decision rules. We also

make explicit multiple criteria aggregation problems arising in the implementation

of these rules and discuss some important issues linked to rule aggregation.

For the sake of illustration, we consider two types of automatic decision Systems

in this chapter:

decision systems based on explicit decision rules: such systems are used in

practical situations where the decision-maker or the expert is able to make

explicit the principles and rules he uses to make a decision. It is also assumed

that these rules constitute a consistent body of knowledge, sufficiently ex-

haustive to reproduce, predict and explain human decisions. Such systems

are illustrated in section 7.2 where the control of an automatic watering sys-

tem is discussed, and in section 7.4 where a decision problem in the context

of the automatic control of a food process is briefly presented. In the first

case, the decision problem concerns the choice of an appropriate duration for

watering, whereas in the second case, it concerns the determination of oven

settings aimed at preserving the quality of biscuits.

decision systems based on implicit decision rules: such systems are used in

practical applications for which it is not possible to obtain explicit decision

7.2. A SYSTEM WITH EXPLICIT DECISION RULES 149

rules. This is very frequent in practice. The main possible reasons for it are

the following:

information to construct decision rules, or his expertise is too complex

to be simply representable by a consistent set of decision rules,

the decision-maker or the expert is able to provide a set of decision rules,

but these decision rules are not easily expressible using variables that

can be observed by the system. A typical example of such a situation

occurs in the domain of subjective evaluation (see Grabisch, Guely and

Perny 1997) where the quality of a product is defined on the basis of

human perception.

the decision-maker or the expert does not want to reveal his own strat-

egy for making decisions. This can be due to the existence of strategic

or confidential information that cannot be revealed or alternatively be-

cause this expertise represents his only competence making him indis-

pensable to his organisation.

Such systems are illustrated in section 7.3, also in the context of the auto-

matic control of food processes. We will use the problem of controlling the

biscuit quality during baking as an illustrative case where numerical deci-

sion models based on pattern matching procedures can be used to perform

a diagnosis of disfunction and a regulation of the oven, without any explicit

rule.

Automatising human decision-making is often a difficult task because of the com-

plexity of the information involved in human reasoning. In some cases, however,

the decision making process is repetitive and well-known so that automatisation

becomes feasible. In this section, we would like to consider an interesting sub-

class of easy problems where human decisions can be explained by a small set

of decision rules of type:

if X is A and Y is B then Z is C

where the X and Y variables are used to describe the current decision context

(input variables) and Z is a variable representing the decision (output variable).

Whenever X and Y can be automatically observed by the decision system (e.g.

using sensors), human skill and experience in problem solving can be approximated

and simulated using the fuzzy control approach (see e.g. Nguyen and Sugeno

1998). Such an approach is based on the use of fuzzy sets and multiple criteria

aggregation functions. Our purpose is to emphasise the interest as well as the

difficulty of resorting to such formal notions on real practical examples.

150 CHAPTER 7. DECIDING AUTOMATICALLY

Let us consider the following case: the owner of a nice estate has the responsibility

of watering the family garden, and this task must be performed several times

per week. Every evening, the man usually estimates the air temperature and the

ground moisture so as to decide the appropriate time required for watering his

garden. This amount of time is determined so as to satisfy a twofold objective:

on the one hand he wants to preserve the nice aspect of his garden (especially the

dahlias put in by his wife at the beginning of the summer) but on the other hand,

he does not want to use too much water for this, preferring to allocate his financial

resources to more essential activities. Because this small decision problem is very

repetitive and also because the occasional gardener does not want to delegate the

responsibility of the garden to somebody else, he decided to purchase an automatic

watering system. The function of this system is first to check every evening,

whether watering is necessary or not, and second to determine automatically the

watering time required. The implicit aim of the occasional gardener is to obtain

a system that implement the same rules as he does; in his mind, this is the best

way to really preserve the current beautiful aspect of the garden.

In this case, we need a system able to periodically measure the air temperature

and the soil moisture and a decision module able to determine the appropriate

duration of watering, as shown in Figure 7.1.

Let t denote the current temperature of the air (in degrees Celsius), and m the

moisture of the ground defined as the water content of the soil. This second

quantity, expressed in centigrams per gram (cg/g), corresponds to the ratio:

x1 x2

m = 100

x2

where x1 is the weight of a soil sample and x2 the weight of the same sample

o

after drying in a low-temperature oven (75105 C). Assuming the quantities t

and m can be observed automatically, they will constitute the input data of the

decision module in charge of the computation of the watering time w (expressed

in minutes), which is the sole output of the module.

Clearly, w must be defined as a function of the input parameters. Thus, we are

looking for a function f such that w = f (t, m) that can simulate the usual decisions

of the gardener. Function f must be defined so as to include the subjectivity of

7.2. A SYSTEM WITH EXPLICIT DECISION RULES 151

the gardener both in diagnosis steps (evaluation of the current situation) and in

decision-making steps (choice of an appropriate action). A common way to achieve

this task is to elicit decision rules from the gardener using a very simple language,

as close as possible to the natural language used by the gardener to explain his

decision. For instance, we can use propositional logic and define rules of the

following form:

If T is A and M is B then W is C

where T and M are descriptive variables used for temperature and soil moisture, W

is an output variable used to represent the decision and A, B, C are linguistic values

(labels) used to describe temperature, moisture and watering time respectively.

For example, suppose the gardener is able to formulate the following empirical

decision rules:

then watering time is VeryLong;

R2 if air temperature is Warm and soil moisture is Low

then watering time is Long;

R3 if air temperature is Cool and soil moisture is Low

then watering time is Long;

R4 if air temperature is Hot and soil moisture is Medium

then watering time is Long;

R5 if air temperature is Warm and soil moisture is Medium

then watering time is Medium;

R6 if air temperature is Cool and soil moisture is Medium

then watering time is Medium;

R7 if air temperature is Hot and soil moisture is High

then watering time is Medium;

R8 if air temperature is Warm and soil moisture is High

then watering time is Short;

R9 if air temperature is Cool and soil moisture is High

then watering time is VeryShort

R10 if air temperature is Cold then watering time is Zero

Notice that the elicitation of such rules is usually not straightforward, even if it

is the result of a close collaboration with experts in that domain. Indeed, general

rules used by experts may appear to be partially inconsistent and must often

include explicit exceptions to be fully operational. Even without any inconsistency,

the individual acceptance of each rule is not sufficient to validate the whole set

of rules. In some situations, unsuitable conclusions may appear, resulting from

several inferences due to the coexistence of apparently reasonable rules. This

makes the validation of a set of rules particularly difficult. Even in the case of

control rules where there is no need for chaining inferences (we assume here that

the rules directly link inputs (observations) to outputs (decisions)), structuring

152 CHAPTER 7. DECIDING AUTOMATICALLY

the expert knowledge so as to obtain a synthesis of the expert rules in the form

of a decision table (table linking outputs to inputs) requires a significant effort.

We will show alternative approaches that do not require the explicit formulation

of decision rules in Section 7.3.

Now, assuming that the above set of decision rules has been obtained, the

problem is the following: suppose the current air temperature and soil moisture

are known, how can a watering time be computed from these sentences, in other

words how can f be defined so as to properly reflect the strategy underlying these

rules? Some partial answers could be obtained if we could define a formal relation

linking the various labels occurring in the decision rules and the physical quantities

observable by the system. We can observe that the decision rules are expressed

using only three variables, i.e. the air temperature T , the soil moisture M , and

the watering time W . Moreover, they all take the following form:

either if T is Ti then W is Wk

or if T is Ti and M is Mj then W is Wk

The possible labels Ti , Mj and Wk for temperature, moisture and watering

time are given by the sets Tlabels, Mlabels and Wlabels respectively:

Tlabels = {Cold, Cool, Warm, Hot}. These labels can be seen as different

words used to specify different areas on the temperature scale.

Mlabels = {Low, Medium, High}. These labels can be seen as words used to

specify different areas on the moisture scale

Wlabels = {Zero, VeryShort, Short, Medium, Long,VeryLong}. These labels

can be seen as different words used to specify different areas on the time

scale

Using these labels, the rules can be synthesised by the following decision table

(see Table 7.1):

Low Zero (R10 ) Long (R3 ) Long (R2 ) VeryLong (R1 )

Medium Zero (R10 ) Medium (R6 ) Medium (R5 ) Long (R4 )

High Zero (R10 ) VeryShort (R9 ) Short (R8 ) Medium (R7 )

This decision table represents a symbolic function F linking Tlabels and Mla-

bels to Wlabels (Wk = F (Ti , Mj )). Now, we need to produce a numerical trans-

lation of function F in order to construct a numerical function f called transfer

function, whose role is to compute a watering time w from any input (t, m). To

build such a function, the standard process consists in the following stages:

this state,

7.2. A SYSTEM WITH EXPLICIT DECISION RULES 153

2. activate the relevant decision rules for the current state (inference),

3. synthesise the recommendations induced from the rules and derive a numer-

ical output (decision)

The diagnosis stage consists in identifying the current state of the system using

numerical measures and describing this state in the language used by the expert

to express his decision rules. The inference stage consists of an activation of the

rules whose premises match the description of the current state. The decision

stage consists of a synthesis of the various conclusions derived from the rules and

the selection of the most appropriate action (at this stage, the selected action is

precisely defined by numerical output values). Thus, the definition of the decision

function f relies on a symbolic translation of the initial numerical information in

the diagnosis stage, a purely symbolic inference implementing the usual decision-

making reasoning and then a numerical translation of the conclusions derived

from the rules. The symbolic/numerical translation possibly includes the subjec-

tivity of the decision-maker (perceptions, beliefs, etc), both in the diagnosis and

decision stages. For example, in the gardener example, the subjectivity of the

decision maker is not only expressed in choosing particular decision rules, but also

in linking input labels (T labels and M labels) to observable values chosen on the

basis of the temperature and moisture scales. In the decision step, the expert or

decision-makers subjectivity can also be expressed by linking output labels (Wla-

bels) with elements of the time scale. There are several ways of establishing the

symbolic/numeric translation first in the diagnosis stage and then in the decision

stage. In both stages, symbols can be linked to scalars, intervals or fuzzy sets,

depending of the level of sophistication of the model. In the following subsections,

we present the main basic possibilities and discuss the associated representation

and aggregation problems.

A first and simple way of building the symbolic/numerical correspondence is by

asking the decision-maker to associate a typical scalar value to each input label

used in the rules. Note that the simplicity of the task is only apparent. An

individual, expert or not, may feel uncomfortable in specifying the scalar transla-

tion precisely. This is particularly true concerning parameters like soil moisture

which are not easily perceived by humans and whose qualification requires an im-

portant cognitive effort. Even for apparently simpler notions such as temperature

and duration, the expert may be reluctant to make a categorical symbolic/scalar

translation. If nevertheless he is constrained to produce scalars, he will have to

sacrifice a large part of his expertise and the resulting model may lose much of its

relevance to the real situation. We will see later how the difficulty can partly be

overcome by the use of non-scalar translations of labels. Let us assume now, for

the sake of illustration, that the following numerical information has been provided

by the expert (see Tables 7.2, 7.3 and 7.4).

A possible way of constructing such tables is to put the expert in various situ-

ations, to ask him to qualify each situation with one of the admissible labels, and

154 CHAPTER 7. DECIDING AUTOMATICALLY

Temperatures (o C) 10 20 25 30

Soil water content (cg/g) 10 20 30

Times (mn) 5 10 20 35 60

dence. Of course, the reliability of the information elicited with such a process is

questionable. The analyst must be aware of the share of arbitrariness attached to

such a symbolic/numerical translation. He must keep it in mind during the whole

construction of the system and also later in interpreting the outputs of the system.

From the above tables of scalars, the rules allow the following reference points

to be constructed:

t 30 25 20 30 25 20 30 25 20 10 10 10

m 10 10 10 20 20 20 30 30 30 10 20 30

w 60 35 35 35 20 20 20 10 5 0 0 0

Hence, the transfer function f linking watering time w to the pair (t, m) is

known for a finite list of cases and must be extrapolated to the entire range of

possible inputs (t, m). This leads to a well-known mathematical problem since

function f must be defined so as to interpolate points of type (t, m, w) where

w = f (t, m). Of course, the solution is not unique and some additional assumptions

are necessary to define precisely the surface we are looking for. There is no space

in this chapter to discuss the relative interest of the various possible interpolation

methods that could be used to obtain f . The simplest method is to perform a linear

interpolation from the reference points given in Table 7.5. This implies averaging

the outputs associated to the reference points located in the neighbourhood of the

observed parameters (t, m). For instance, if the observation is (t, m) = (29, 16) the

neighbourhood is given by 4 reference points obtained from rules R1 , R2 , R4 , and

R5 . This yields points P1 = (30, 10), P2 = (25, 10),P4 = (30, 20), and P5 = (25, 20)

with the respective weights 0.32, 0.08, 0.48, 0.12, weight ij of point (xi , yj ) being

7.2. A SYSTEM WITH EXPLICIT DECISION RULES 155

defined by:

|29 xi | |16 yj |

(7.1) ij = 1 1

30 25 20 10

The watering times associated to points P1 , P2 , P4 and P5 are 60, 35, 35, 20 and

therefore, the final time obtained by a weighted linear aggregation is 41 minutes

and 12 seconds. Performing the same approach for any possible input (t, m) leads

to the following piecewise linear approximation of function f , see Figure 7.2.

factory. First of all, no information justifies that function f is linear between

points to be interpolated. Many other interpolation methods could be used as

well, making a non-linear f possible. For example, one can use more sophisticated

interpolating methods based on B-spline functions that produce very smooth sur-

faces with good continuity and locality properties (see e.g. Bartels, Beatty and

Barsky 1987). Moreover, as mentioned above, the definition of reference points

from the gardeners rules is far from being easy and other relevant sets of scalar

values could be considered as well. As a consequence, the need of interpolating

the reference points given in Table 7.5 is itself questionable. Instead of performing

an exact interpolation of these points, one may prefer to modify the link between

symbols and numerical scales in order to allow symbols to be represented by sub-

sets of plausible numerical values. Thus, reference points are replaced by reference

areas in the parameters space (t, m, w), and the interpolation problem must be

reformulated. This point is discussed below.

156 CHAPTER 7. DECIDING AUTOMATICALLY

In the gardeners example, substituting labels Ti and Mj by scalar values on the

temperature and moisture scales has the advantage of simplicity. However, it

does not provide a complete solution since function f is only known for a finite

sample of inputs and requires interpolation to be extended to the entire set of

possible inputs. Moreover, in many cases, each label represents a range of values

rather than a single value on a numerical scale. In such cases, representing the

different labels used in the rules by intervals seems preferable. If the intervals are

defined so as to cover all plausible values, any possible input belongs to at least

one interval and therefore, can be translated into at least one label. Basically, we

can distinguish two cases, depending on whether the intervals associated to labels

partially overlap or not.

Suppose that the gardener is able to divide the temperature scale into consecutive

intervals, each corresponding to the most plausible values attached to a label

Ti . Assuming this is also possible for the moisture scale, these intervals form a

partition of the temperature and moisture scales respectively. Hence, each input

(t, m) corresponds to a pair {Ti , Mj } where Ti (resp. Mj ) is the label associated

to the interval containing t (resp. m). In this case, thereis a unique active rule

in Table 7.1 and the conclusion is easy to reach. For example, let us consider the

following intervals:

Temperatures (o C) (, 17.5) [17.5, 22.5) [22.5 27.5) [27.5, +)

Soil water content (cg/g) [0, 15) [15, 25) [25, 100]

If (t, m) = (29, 16), then the associated labels are {Hot, M edium} and there-

fore, the only active rule is R4 whose conclusion is watering time is long. Thus,

if we keep the interpretation of long given in Table 7.4 the numerical output is

35.

This process is simple but has serious drawbacks. The granularity of the lan-

guage used to describe the current state of the system is poor and many signif-

icantly different states are seen as equivalent. This is the case, for example, of

the two inputs (17.5, 15) and (22.4, 24.9) that both translate as (Cool, M edium).

On the contrary, for some other pairs of inputs that are very similar, the trans-

lation diverges. This is the case of (17.4, 14.9) and (17.5, 15) that respectively

7.2. A SYSTEM WITH EXPLICIT DECISION RULES 157

give (Cold, Low) and (Cool, M edium). In the first case, rule R10 is activated and

a zero watering time is decided. In the second case, rule R6 is activated and a

medium watering time is recommended, 20 minutes according to Table 7.4. Such

discontinuities cannot really be justified and make the output f (t, m) arbitrarily

sensitive to the inputs (t, m). This is not suitable because such decision systems

are often included in a permanent observation/reaction loop. Suppose for example

that several consecutive situations of temperature and moisture in a stable situa-

tion yield different values for parameter t and m due to the imperfection of gauges

and that these variations occur around a point of discontinuity in the system.

This can produce alternated sequences of outputs such as Short, Zero, Medium,

Zero, leading to alternate starts and stops of the system, and possibly leading to

dysfunctions.

It is true that narrowing the intervals and multiplying the labels would reduce

these drawbacks and refine the granularity of the description, but the number

of rules necessary to characterise f would grow significantly with the number of

labels. Expressing so many labels and rules requires a very important cognitive

effort that cannot reasonably be expected from the expert. Nevertheless, reducing

discontinuity induced by interval boundaries without multiplying labels is possible.

A first option for this is allowing for overlap between consecutive intervals, as

shown below.

In order to improve on the previous solution, we have to specify the links between

the values of physical variables describing the system and the symbolic labels used

to describe the current state of the system more carefully. Since it is difficult

to separate such intervals with precise boundaries, one can make them partially

overlap. As a consequence, in some intermediary areas of the temperature scale,

two consecutive labels are associated to a given temperature, reflecting the possible

hesitation of the gardener in the choice of a unique label. Typically, if Warm and

Hot are represented by intervals [20, 30] and [25, +) respectively, 29o C becomes

a temperature compatible with the two labels. More precisely, from 20o C to

25o C, Warm is a valid label (a possible source of rule activation) but not Hot,

from 25o C to 30o C both labels are valid, and from 30o C, hot is valid but not

warm. This progressive transition between the two states warm and hot refines

the initial sharp transition from warm to hot by introducing an intermediary state

corresponding to an hesitation between the two labels. This is more realistic,

especially because there is no reasonable way of separating the warm and hot

with a precise boundary. Note however that measuring a temperature of 29o C

possibly allow several rules to be active in the same time. This raises a new

problem since these rules may possibly conclude to diverging recommendations

from which a synthesis must be derived. Any output label (labels Wk in the

example) must be translated by numbers and these numbers must be aggregated

to obtain the numerical output of the system (the value of w in the example). Thus,

the definition of a numerical output can be seen as an aggregation problem, where

aggregation is used to interpolate between conflicting rules. As an illustration, we

158 CHAPTER 7. DECIDING AUTOMATICALLY

assume now that the labels are represented by the intervals given in Tables 7.8

and 7.9:

Temperatures (o C) (, 20] [15, 25] [20, 30] [25, +)

Soil water content (cg/g) [0, 20] [10, 30] [20, 100]

relevant labels are {W arm, Hot} for temperature and {M edium, High} for mois-

ture. These qualitative labels allow some of the gardeners rules to be activated,

namely R1 , R2 , R4 , R5 . This gives several symbolic values for the watering dura-

tion, namely Medium (by R5 ), Long (by R2 , R4 ) and VeryLong (by R1 ). Therefore,

we can observe 3 conflicting recommendations and the final decision must be de-

rived from a synthesis of these results. Of course, defining what could be a fair

synthesis of conflicting qualitative outputs is not an easy task. Deriving a numer-

ical duration from this synthesis is not any easier.

A simple idea is to process symbols as numbers. For this, one can link symbolic

and numerical information using Table 7.4. In the example, we obtain three dif-

ferent durations, i.e 20, 35 and 60 minutes that must be aggregated. For example,

one can calculate the arithmetic mean of the 3 outputs. More generally, we can

define a weight (R) for each decision rule R in the gardener database B. This

weight represents the activity of the rule and, by convention, for any state (t, m),

we set (R) = 1 when the decision rule R is activated and (R) = 0 otherwise.

Let B() denote the subset of rules concluding to a watering time . For any

possible value of w, a weight () measuring the activity or importance of the

set B() can be defined as a continuous and increasing function of the quantities

(R), R B(). For example, we can choose:

RB()

Hence, each watering time activated by at least one rule receives the weight 1

and any other time receives the weight 0. For example, with the observation

(t, m) = (29, 16), we have seen that the active rules are R1 , R2 , R4 and R5 and

therefore (R1 ) = (R2 ) = (R4 ) = (R5 ) = 1 whereas (R) = 0 for any

other rule R. Let us now present in detail the calculation of (35). Since 35

(minutes) is the scalar translation of Long, we obtain from the gardeners rules

B(35) = {R2 , R3 , R4 }. Hence (35) = sup{(R2 ), (R3 ), (R4 )} = 1. Similarly

7.2. A SYSTEM WITH EXPLICIT DECISION RULES 159

active rules left, () = 0 for all other .

Another option taking account of the number of rules supporting each time

could be: X

(7.3) () = (R)

RB()

Coming back to the example, we now obtain: (60) = (R2 )+ (R3 )+ (R4 ) =

2 whereas the others () remain unchanged. This second option gives more im-

portance to a time supported by several rules than to a time 0 supported by

a single rule. Everything works as if each active rule was voting for a time. The

more a given time is supported by the set of active rules, the more it becomes

important in the calculation of the final watering time. The option (7.3) could

be preferred when the activation of the various rules are independent. On the

contrary, when the activation of a subset of rules necessarily implies that another

subset of rules is also active, one could prefer resorting to (7.2) so as to avoid pos-

sible overweighing due to redundancy in the set of rules. In a practical situation,

one can easily imagine that the choice of one of these options is not easy to justify.

Since there is a finite number of rules, there is only a finite number of times

activated by the rules in a given state. In order to synthesise these different times,

the most popular approach is the centre of gravity method which amounts to

performing a weighted sum (see also chapter 6) of all possible times . Formally

the final output is defined by:

P

().

(7.4) w = P

()

From the observation (t, m) = (29, 16), equations (7.2) and (7.4) yield a water-

ing time of (60 + 35 + 20)/3 yielding 38 minutes and 20 seconds, whereas equation

(7.3) yields: w = 0.25 (60 + 35 + 35 + 20) that amounts to 37 minutes and 30

seconds. Note that the choice of a weighted sum as final aggregator in equation

(7.4) is questionable and one could formulate criticisms similar to those addressed

to the weighted average in the previous chapters (especially in chapter 6).

subsection, the final result has been obtained as a result of the following sequence:

160 CHAPTER 7. DECIDING AUTOMATICALLY

This process is perhaps the more elementary way of using a set of symbolic decision

rules to build a numerical decision function. It shows a simple illustration of the

so-called computing with words paradigm advocated by Zadeh (see Zadeh 1999).

The main advantages of such a process are the following:

language used by the expert,

it allows one to define a reasonable decision function allowing numerical

outputs to be computed from any possible numerical input,

if necessary, any decision can be explained very simply. The outputs can

always be presented as a compromise between recommendations derived from

several of the experts decision rules.

ous transfers from inputs to outputs. In fact, it is not easy to describe a continuum

of states (characterised by all pairs (t, m) in the gardener example) with a finite

number of labels of type (Ti , Mj ). This induces arbitrary choices in the descrip-

tion of the current state which could disrupt the diagnosis stage and make the

automatic decision process discontinuous, as shown by the following example.

Example (1). Consider two very similar states s1 and s2 characterised by the

observations (t, m) = (25.01, 19.99) and (t, m) = (24.99, 20.01). According to Ta-

bles 7.8 and 7.9, state s1 makes valid the labels {W arm, Hot} for temperature,

and {Low, M edium} for soil moisture. This activates rules R1 , R2 , R4 and R5

whose recommendations are VeryLong, Long, Long, Medium respectively. The

resulting watering time obtained by equation (7.4) is therefore 38 minutes and

45 seconds. Things are really different for s2 however. The valid labels are

{Cool, W arm} for temperature, and {M edium, High} for soil moisture. This

activates rules R5 , R6 , R8 and R9 whose recommendations are Medium, Medium,

Short, VeryShort respectively. The resulting watering time obtained by equation

(7.4) is therefore 13 minutes and 45 seconds. It is worth noting that, despite the

close similarity between states s1 and s2 , there is a significant difference in the wa-

tering times computed from the two input vectors. This is due to the discontinuity

of the transfer function that defines the watering time from the input (t, m) for

(t, m) = (25, 20). In the right neighbourhood of this entry (t > 25 and m < 20),

the decision rules R1 , R2 and R4 are fully active but this is no longer the case in

the left neighbourhood of the point (t < 25 and m > 20) where they are replaced

by rules R6 , R8 and R9 , thus leading to a much shorter time. The activations

and computations performed for s1 and s2 differ significantly. They lead to very

different outputs, despite the similarity of the states.

This criticism is serious, but the difficulty can partly be overcome. It is true

that, depending on the choice of the numerical encoding of the labels, the numer-

ical outputs resulting from the decision rules may vary significantly. Since the

numerical/symbolic and then symbolic/numerical translations are both sources of

arbitrariness, the following question can be raised: why not usenumbers directly?

7.2. A SYSTEM WITH EXPLICIT DECISION RULES 161

There are two partial answers: first, in many decision contexts, the possibility

of justifying decisions is a great advantage. Although this is not crucial in our

illustrative example, the ability of automatic decision systems to simulate human

reasoning and explain decision by rules is generally seen as an important advan-

tage. This argument often justifies the use of rule-based systems to automatise

decision-making, even if each decision considered separately is of marginal impor-

tance. Second, there are several ways of improving the process proposed above

and of refining the formal relationship between qualitative labels and numerical

values. It is not our purpose to cover all possibilities in detail. We only present and

discuss some very simple and intuitive ideas used to construct more sophisticated

models and tools in this context.

One step back in the modelling process, we can redefine the relationship between

a given label and the numerical scale associated to the label more precisely. As an

expert, the gardener can easily specify the typical temperatures associated with

each label. He can also define areas that are definitely not concerned with each

label. For example, he could explain that Warm means between 20 and 30 degrees

with 25 as the most plausible value. More precisely, one can define the relative

likelihood of each temperature when the temperature has been qualified as Hot,

Warm, Cool or Cold. In this case, each label Ti is represented by a [0, 1]-valued

function Ti defined on the temperature scale in such a way that Ti (t) represents

the compatibility degree between temperature t and label Ti . As a convention,

we set Ti (t) = 0 when temperature t is not connected to the label Ti , and Ti (t)

= 1 when t is perfectly representative of the label. Thus, each label Ti is defined

with fuzzy boundaries and characterised by the function Ti . These fuzzy labels

can partially overlap but they must be defined in such a way that any part of the

temperature scale is covered by at least one label. A simple example of such fuzzy

labels is represented in Figures 7.3 and 7.4.

Note that sometimes, the fuzzy labels are defined in such a way that member-

ship adds up to 1 for any possible value of the numerical parameter. This is the

case of labels defined in figure 7.4 for which we have:

162 CHAPTER 7. DECIDING AUTOMATICALLY

that the fuzzy labels Low, Medium and High form a partition of the set of possible

moistures. Note however that this property makes sense only when membership

values have a cardinal meaning.

With such fuzzy labels, each decision rule can be activated to a certain degree.

This is the degree to which the numerical inputs match the premises of the rule.

More precisely, for any rule Rij of type:

if T is Ti and M is Mj then W is Wk

where Wk = F (Ti , Mj ), and for any numerical observation (t, m), the weight (or

activation degree) ij of the rule Rij reflects the importance (or relevance) of the

rule in the current situation. This importance depends on the matching of the

input (t, m) and the premise (Ti , Mj ). It is therefore natural to state:

where h is an aggregation function representing the logical and used in the rule,

e.g. h(x, y) = min(x, y).

(t, m) = (29, 16) leads to Hot (t) = 0.8 and Low (m) = 0.4. Thus, the temperature

is Hot to the degree 0.8 and the moisture is Low to the degree 0.4 and therefore,

the weight of the rule R1 is min(0.8, 0.4) = 0.4. Using this approach for each rule

with h = min yields the following activation weights (see Table 7.10):

Mj Mj \ Ti 0 0 0.2 0.8

Low 0.4 0 (R10 ) 0 (R3 ) 0.2 (R2 ) 0.4 (R1 )

Medium 0.6 0 (R10 ) 0 (R6 ) 0.2 (R5 ) 0.6 (R4 )

High 0 0 (R10 ) 0 (R9 ) 0 (R8 ) 0 (R7 )

Table 7.10: The weights of the rules when (t, m) = (29, 16)

7.2. A SYSTEM WITH EXPLICIT DECISION RULES 163

0.4 60 + 0.2 35 + 0.6 35 + 0.2 20

w=

0.4 + 0.2 + 0.6 + 0.2

and therefore the watering time is 40 minutes.

Note that the definition of an aggregation function yields a compromise solution

between the various active decision rules whose outputs are partially conflicting.

In the additive formulation characterised by equation (7.4), everything works as

if each active rule was voting for one candidate chosen in the set Wlabels. The

more the premise of the rule matches the current situation, the more important

the rule is in the voting process. The activation level of each rule is graduated

on the [0, 1] scale and the weights directly reflect the adequacy of the rule in the

current situation. This enables a soft control of the output that can be perfectly

illustrated by the example discussed at the end of subsection 7.2.4. If we consider

the two neighbour states s1 and s2 introduced in this example, and if we choose

h = min in equation (7.6), the resulting activation weights are those given in

Tables 7.11 and 7.12.

Mj Mj \ Ti 0 0 0.998 0.002

Low 0.001 0 (R10 ) 0 (R3 ) 0.001 (R2 ) 0.001 (R1 )

Medium 0.999 0 (R10 ) 0 (R6 ) 0.998 (R5 ) 0.002 (R4 )

High 0 0 (R10 ) 0 (R9 ) 0 (R8 ) 0 (R7 )

Mj Mj \ Ti 0 0.002 0.998 0

Low 0 0 (R10 ) 0 (R3 ) 0 (R2 ) 0 (R1 )

Medium 0.999 0 (R10 ) 0.002 (R6 ) 0.998 (R5 ) 0 (R4 )

High 0.001 0 (R10 ) 0.001 (R9 ) 0.001 (R8 ) 0 (R7 )

Hence, using equation (7.4) and Table 7.4, we get w(s1 ) = 20 minutes and

5 seconds as the final output. Similarly, for state s2 , the activation of the rules

obtained from equation (7.6) are only slightly different from those for s1 and

the final output derived from Table 7.12 using equation (7.4) gives w(s2 ) = 19

minutes and 58 seconds. Here, we notice that the activity of each rule does not

vary significantly when passing from state s1 to state s2 . This is due to the

way activation weights are defined and used in the process. These weights depend

continuously on input parameters t and m, and the membership functions defining

the labels have soft variations. As a consequence, since the aggregation function

164 CHAPTER 7. DECIDING AUTOMATICALLY

used to derive the final watering time w is also a continuous function of quantities

(R) (see equation (7.4)), quantity w depends continuously on input parameters t

and m. This explains the observed improvement with respect to the previous model

based on the use of all or nothing activation rules. Thus, the use of fuzzy labels

to interpret input labels has a significant advantage: it makes it possible to define

a continuous transformation of numerical input data (temperature, moisture) into

symbolic variables used in decision rules. The resulting decision system is more

realistic and robust to slight variations of inputs. This advantage is due to the

use of fuzzy sets and has greatly contributed to the practical success of the fuzzy

approach in automatic control (fuzzy control, (see e.g. Mamdani 1981, Sugeno

1985, Bouchon 1995, Gacogne 1997, Nguyen and Sugeno 1998). However, several

criticisms can be addressed to the small fuzzy decision module presented above.

Among them, let us mention the following:

the choice h = min in equation (7.6) requires that quantities of type Ti (t)

and Mj (m) are commensurate. This assumption, which is rarely explicit,

is very strong because it requires much more than comparing the relative

fit of two temperatures (resp. two moistures) to a Label Ti (resp. Mj ).

It also requires comparing the fit of any temperature to any label Ti with

the fit of any moisture to any label Mj . A perfectly sound definition of

such membership values would require more information than can easily be

obtained in practice. Moreover, the choice of min is often justified by the fact

that h is used to evaluate a conjunction between several premises of a given

rule (a conjunction of type temperature is Ti and moisture is Mj ). Note

however that the idea of the conjunction is captured by any other t-norm (see

for instance, Fodor and Roubens (1994)). Thus, the product could perhaps

replace the min and the particular choice of the min is not straightforward.

This is problematic because this choice is not without consequence on the

definition of the watering time.

scalar values is not easy to justify. Why not use a description of these labels

as intervals, in the same way as for input labels?

to sophisticate the previous construction so as to improve the output processing.

Paralleling the treatment of symbolic inputs, we can use intervals or fuzzy inter-

vals later in the process so as to continuously link symbolic outputs of the rules

(Wlabels) to numerical outputs (watering times). This point is discussed in the

next subsection.

Suppose for example that Wlabels are no longer described by scalar values but by

subsets of the time scale. For instance, the labels Wk could be represented by a

set of intervals (overlapping or not) with advantages similar to those mentioned

for input labels Ti and Mj . More generally, we assume here that Wlabels are

7.2. A SYSTEM WITH EXPLICIT DECISION RULES 165

represented by fuzzy intervals of the time scale. For the sake of illustration, we let

us consider the labels represented in Figure 7.5.

For any state (t, m) of the system, the range of relevant watering times is the

union of all values compatible with labels Wk derived from active rules. In the

example, the active rules are R1 , R2 , R4 , R5 , and therefore the Wlabels concerned

are Medium, Long and VeryLong. Hence the set of relevant watering times

is [10, 70]. However, all times are not equivalent inside this set. Each of them

represents a possible numerical translation of a label Wk obtained by the acti-

vation of one or several rules. To be fully considered, a time must be perfectly

representative of a label Wk that has been obtained by a fully active rule. In more

nuanced situations, the weight attached to a possible time is function of the fitness

of the times activated to a certain degree by the rules. For example, by analogy

with Mamdanis approach to fuzzy control (Mamdani 1981), the weight of any

watering time can be defined by:

Rij B

where B represents the set of rules (here the gardeners rules) and Rij represents

the rule:

If T = Ti and M = Mj then W = Wk

and h is a non-decreasing function of its arguments (in Mamdanis approach,

h = min). The idea in equation (7.7) is that a watering time must receive

an important weight when there is at least one rule Rij whose premises (Ti , Mj )

are valid for the observation (t, m) and whose conclusion Wk is compatible with

. This explains that t,m () is defined as an increasing function of quantities

Ti (t), Mj (m) and Wk (). Notice that equation (7.7) is a natural extension

of equation (7.2). In our example, the observation (t, m) = (29, 16) leads to a

function 29,16 (w) represented in Figure 7.6.

In order to obtain a precise watering time, we can use an equation similar to

(7.4). However, this equation must be generalised because there may be an infinity

of times activated by the rules (e.g. a whole interval). The usual extension of the

weighted average to an infinite set of values is given by the following integral:

R

t,m () d

(7.8) w= R

t,m () d

166 CHAPTER 7. DECIDING AUTOMATICALLY

P

i t,m (i ).i

(7.9) w= P

i t,m (i )

sation of the time scale. In our example, a discretisation with step 0.1 gives a final

time of 37 minutes and 32 seconds.

This last sophistication meets our objective because it provides a transfer func-

tion f with good continuity properties. However, the use of equations (7.77.9)

can be seriously criticised:

gregation function h is not very natural. Indeed, bearing in mind the form

of rule Rij , the quantity h(Ti (t), Mj (m), Wk ()) stands for the numerical

translation of the proposition:

In the fields of multi-valued logic and fuzzy sets theory, admissible functions

used to translate implications are required to be non-increasing with re-

spect to the value of the left hand-side of the implication and non-decreasing

with respect to the value of the right hand-side (Fodor and Roubens 1994,

Bouchon 1995, Perny and Pomerol 1999). As an example the value attached

to the sentence A implies B can be defined by the Lukasiewicz implication

min(1 v(A) + v(B), 1) where v(A) and v(B) are the values of A and B

respectively. In our case, the conjoint use of the min operator to interpret

the conjunction on the left hand-side and that of the Lukasiewicz implication

would lead to the following h function:

Note that this function is not increasing in its arguments, as required above in

the text. However, resorting to implication operators instead of conjunctions

in order to implement an inference via rule Rij also seems legitimate. This

7.2. A SYSTEM WITH EXPLICIT DECISION RULES 167

formula like (7.7) is used to generalise the so-called modus ponens inference

rule (Zadeh 1979), (Baldwin 1979), (Dubois and Prade 1988), (Bouchon

1995). To go further in this direction, one could also discuss the use of min

to interpret a conjunction whereas the Lukasiewicz implication is used to

interpret implications. A reasonable alternative to min(x, y) could be the

Lukasiewicz t-norm: max(x + y 1, 0). As a conclusion, the definition of h is

not straightforward and must be justified in the context of the application.

Some general guidelines for choosing a suitable h are given in (Bouchon

1995).

Now, inequalities of type Ti (t) > Wk () play a role in the process. Thus,

we should be able to determine whether any temperature t is a better rep-

resentative of a label Ti than time is representative of label Wk . This

is a very strong assumption, especially if we consider the way these labels

are represented in the model. Usually, a label thought as a fuzzy interval is

assessed on the basis of 3 elements:

the support, i.e. the interval of all numerical values compatible with

the label, their membership must be strictly positive,

the core, i.e. the interval of all numerical values perfectly representative

of the label (the core is a subset of the support), their membership is

equal to 1,

the membership function making a continuous transition from the bor-

der of the support to the border of the kernel.

For example, the label Long in Figure 7.5 is defined by support [20, 55], core

[30, 40] and two linear transitions (membership to non-membership) in the

range [20, 30] [40, 55]. One could expect that the decision-maker is able to

specify the support and core of each fuzzy label, as well as the trend of the

membership function (increasing from the border of the support to the border

of the core). Even with this information, however, the choice of a precise

membership function often remains arbitrary. The above information leaves

room for an infinity of functions. In practice, the shape of the membership

function in the transition area is often chosen as linear or gaussian (for

derivability) but rarely justified by questioning the decision-maker. Thus,

in many cases, the only reliable information contained in the membership

function is the relative adequation of each temperature, moisture, time, to

each label. For example, Long (21) = 0.1 and Long (25) = 0.5 only means

that 25 minutes is a better numerical translation of the qualifier Long than

21 minutes. This does not necessarily mean that 25 minutes is more Long

than 30 minutes is Medium, even if M edium (30) = 0.4, nor that 25 minutes

is more Long than 26o C is Hot, even if Hot (26o ) = 0.2. However, without

such assumptions, the definition of weights t,m () in equation (7.8) with

h = min is difficult to justify.

168 CHAPTER 7. DECIDING AUTOMATICALLY

Bearing in mind that the weights t,m are used as cardinal weights in (7.4)

while they are defined from membership values Ti (t), Mj (m), and Wk (),

the membership values should have a cardinal interpretation. This is one

more very strong hypothesis. For example, we need to consider that 25

minutes is 5 times better than 21 minutes to represent long, because the

membership value is 5 times larger. Even when the commensurability as-

sumption of membership scales is realistic, the weights cannot necessarily

be interpreted as cardinal values and the weighted aggregation proposed in

equation (7.8) is questionable.

As an illustration of the latter, consider the following example showing the im-

pact of an increasing transformation of membership values on the output watering

time:

Example (2). Consider the two following input vectors i1 = (29, 29) and i2 =

(18, 16). These two inputs lead to activation weights given in Tables 7.13 and

7.14. Then, for the sake of simplicity, we use the non-fuzzy labels given in Table

7.4 for interpretation of labels Wk . Then, assuming we use equations (7.2) and

(7.4) to define the watering time w, we obtain the following result: w(i1 ) = 19

minutes and 33 seconds and w(i2 ) = 21 minutes and 40 seconds. Notice that

the times as not so different, despite the important difference between inputs i1

and i2 . This can be easily explained by observing that, in the second case, the

temperature is lower, but the soil water content is also lower, and the two aspects

compensate each other. Now, we transform all membership functions of the labels

by the function (x) = 3 x. This preserves the support and the core of each label,

as well as the slope (increasing or decreasing) of membership functions. In fact,

it represents the same ordinal information about membership degrees. However,

the activation tables are altered as shown in Tables 7.15 and 7.16. This gives the

following watering times: w(i1 ) = 20 minutes and 34 seconds, w(i2 ) = 19 minutes

and 42 seconds. Note that we now have w(i1 ) > w(i2 ) whereas it was just the

opposite before the transformation of membership values.

Mj Mj \ Ti 0 0 0.2 0.8

Low 0 0 (R10 ) 0 (R3 ) 0 (R2 ) 0 (R1 )

Medium 0.1 0 (R10 ) 0 (R6 ) 0.1 (R5 ) 0.1 (R4 )

High 0.9 0 (R10 ) 0 (R9 ) 0.2 (R8 ) 0.8 (R7 )

This example shows that comparison of output values is not invariant to mono-

tonic transformations of membership values and this explains the more than ordi-

nal interpretation of membership values in the computation of w. Although this

inversion of duration is not a crucial problem in the case of the watering system,

it could be more problematic in other contexts. For instance, if we use a similar

system (based on fuzzy rules) to rank candidates in a competition, the choice of

7.2. A SYSTEM WITH EXPLICIT DECISION RULES 169

Mj Mj \ Ti 0.2 0.6 0 0

Low 0.4 0.2 (R10 ) 0.4 (R3 ) 0 (R2 ) 0 (R1 )

Medium 0.6 0.2 (R10 ) 0.6 (R6 ) 0 (R5 ) 0 (R4 )

High 0 0 (R10 ) 0 (R9 ) 0 (R8 ) 0 (R7 )

Mj Mj \ Ti 0 0 0.585 0.928

Low 0 0 (R10 ) 0 (R3 ) 0 (R2 ) 0 (R1 )

Medium 0.464 0 (R10 ) 0 (R6 ) 0.464 (R5 ) 0.464 (R4 )

High 0.965 0 (R10 ) 0 (R9 ) 0.585 (R8 ) 0.928 (R7 )

Mj Mj \ Ti 0.585 0.843 0 0

Low 0.737 0.585 (R10 ) 0.737 (R3 ) 0 (R2 ) 0 (R1 )

Medium 0.843 0.585 (R10 ) 0.843 (R6 ) 0 (R5 ) 0 (R4 )

High 0 0 (R10 ) 0 (R9 ) 0 (R8 ) 0 (R7 )

a particular shape for membership must be well justified because it may really

change the winner.

Another possibility is resorting to other aggregation methods that do not re-

quire the same level of information. Several alternatives to the weighted sum are

compatible with ordinal weights, e.g. Sugeno integrals (see Sugeno 1977, Dubois

and Prade 1987), and could be used advantageously to process ordinal weights.

However, they also have some limitations. They are not as discriminating as the

weighted sum and they cannot completely avoid commensurability problems (see

Dubois, Prade and Sabbadin 1998, Fargier and Perny 1999).

based automatic decision systems further.

To go further with rule-based systems using fuzzy sets, the reader should con-

sult the literature about fuzzy inference and fuzzy control, which has received

much attention in the past decades. As a first set of references for theory and

applications, one can consult (Mamdani 1981), (Sugeno 1985), (Bouchon 1995),

(Gacogne 1997) and (Nguyen and Sugeno 1998) for a recent synthesis on the sub-

ject. These works present formal models but also empirical principles derived from

practical applications and thus provide a variety of techniques that have proved

170 CHAPTER 7. DECIDING AUTOMATICALLY

sentations and operators are now available, bringing justifications to some methods

used by engineers in practical applications and also suggesting also multiple im-

provements (see Dubois, Prade and Ughetto 1999).

7.3.1 Controlling the quality of biscuits during baking

The control of food processes is a typical example where humans traditionally play

an important role to preserve the standard quality of the product. The overall

efficiency of production lines and the quality of the final product highly depend

on the ability of human supervisors to identify a degradation of the quality of

the final product and on their aptitude to best fit the control parameters to the

current situation.

As an example, let us report some elements of an application concerning the

control of the quality of biscuits through oven regulation during baking (for more

details see Trystram, Perrot and Guely 1995, Perrot, Trystram, Le Guennec and

Guely 1996, Perrot 1997, Grabisch et al. 1997).

In the field of biscuit manufacturing, human operators controlling biscuit bak-

ing lines have the possibility of regulating the ovens during the baking process.

This implies periodic evaluation, diagnosis and decision tasks that could perhaps

be automatised. However, such automatisation is not obvious because human ex-

pertise in oven control during the baking of biscuits mainly relies on a subjective

evaluation, e.g. a visual inspection of the general aspect, the colour of the bis-

cuits and on the operators skill in reacting to possible perturbations of the baking

process.

For instance, when an overcooked biscuit is detected, the operator properly

retroacts on the oven settings after checking its current temperature. In the case

of an automatic system, the only information accessible to the system consists of

physical objective parameters obtained from measures and sensors, which are not

easily linked to human perception.

In the example of automatic diagnosis during baking, the only available mea-

sures are the following:

a sensor located in the oven measures the air moisture, within the oven, near

the biscuit line. The evaluation m is given in cg/g (centigrams per one gram

of dry matter) in the range [0, 10] with the desired values being around 4

cg/g.

the mean of 6 consecutive measures performed on biscuits and expressed in

mm and the desired values are about 33 or 34 mm.

measures colours with 3 parameters, which are the luminance L, a level a on

7.3. A SYSTEM WITH IMPLICIT DECISION RULES 171

the red-green axis and a level b on the yellow-blue axis. The desired color is

not easy to specify.

from the expert to construct a satisfactory rule database (in section 7.4 we will

see an approach integrating expert rules in the control of baking). Sometimes, the

only information accessible must be directly inferred from the experts observation

during his control activity. Hence, following the approach adopted in section 7.2

seems problematic, especially concerning the aspect of the biscuit that cannot

be easily linked by the expert to the physical parameters (L, a, b) measured by

an automatic system. The following subsection presents an alternative way of

establishing this link using similarity from known examples.

amples

In performing oven control, the decision-making process consists of two consecutive

stages: a diagnosis stage, which consists in evaluating the state of the last biscuits,

and a decision stage, which must determine a regulation action on the oven, if

necessary. Like in many other domains, the diagnosis task performed by the

expert controlling baking can be seen as a pattern recognition task. It is not

unrealistic to assume that usual disfunctions have been identified and categorised

by the expert and that for each of them, a standard regulation action is known.

Thus, assuming that a finite list of categories is implicitly used by the expert

(each of them being associated to a pattern, i.e. a characteristic set of irregular

biscuits) the diagnosis stage consists in identifying the relevant pattern for any

irregular biscuit and the decision stage consists in performing the regulation action

appropriate to the pattern.

In this context, the patterns are implicit and subjective. They can be approx-

imated by observing the action of a human controller on the oven in a variety of

cases. However, we can construct an explicit representation of patterns in a more

objective space formed by the observable variables. In this space, subjective

evaluation of biscuits can be partially explained by their objective description.

Assuming a representative sample of biscuit is available, using sensors, we can

represent each biscuit i of the sample by a vector xi = (mi , ti , Li , ai , bi ) in the

multiple attribute space of physical variables used to describe biscuits. Then, each

biscuit can be evaluated by the expert and a diagnosis of disfunction d(xi ) can

be obtained for each description xi , explaining the bad quality of biscuit i (e.g.

oven too hot, oven not hot enough). Hence, a pattern associated to each

disfunction z is defined by the set of points xi such that d(xi ) = z. Determining

the right pattern for any new input vector x is a classification problem where the

categories C1 , . . . , Cq are the q possible disfunctions and the objects to be assigned

are vectors x = (m, t, L, a, b).

(e.g. a biscuit), a classification procedure can be seen as a function assigning to

172 CHAPTER 7. DECIDING AUTOMATICALLY

each vector x X the vector (C1 (x), . . . , Cq (x)) giving the membership of x

to each category (e.g possible disfunction of the oven). One of the most popular

classification methods is the so called Bayes rule which is known to minimise the

expected error rate. However, the rule requires knowing the prior and conditional

probability densities of all categories, which is not frequent in practice. When this

information is not available (this is the case in our example) the nearest neighbour

algorithm is very useful. The basic principle of the kNearest Neighbour assign-

ment rule (kNN) introduced in (Fix and Hodges 1951) is to assign an object to

the class to which the majority of its k-nearest neighbours belong.

More precisely, for any sample S X of vectors whose correct assignment is

known, if Nk (x) represents the subset of S formed by the k nearest neighbours of

x within S, the kNN rule is defined for any k {1, . . . , n} by:

P

1 if j = Arg maxi { yNk (x) Ci (y)}

(7.10) Cj (x) =

0 otherwise

where Arg maxi , g(i) represents, the value i for which g(i) is maximal. This

supposes that the maximum is reached for a unique i. When this is not the

case, one can use a second criterion for discriminating between all g-maximal

solutions

P or, alternatively, choose all of them. In equation (7.10), function g(i)

equals yNk (x) Ci (y) and represents the total number of vectors, among the

k-nearest neighbours of x that have been assigned to category i.

It has been proved that the error rate of the kNN rule tends towards the

optimal Bayes error rate when both k and n tend to infinity while k/n tends to 0

(see Cover and Hart 1967). The main drawback of the k N N procedure is that

all elements of Nk (x) are equally weighted. Indeed, in most cases, the neighbours

are not equally distant from x and one may prefer to give less importance to

neighbours very distant from x. For this reason, several weighted extensions of

the kNN algorithm has been proposed (see Keller, Gray and Givens 1985, Bezdek,

Chuah and Leep 1986, Bereau and Dubuisson 1991). For example, the fuzzy kNN

rule proposed by Keller et al. (1985) is defined by:

P Cj (y)

yNk (x) 2

kxyk m1

(7.11) Cj (x) = P 1

yNk (x) 2

kxyk m1

new input x is also a matter of aggregation. Indeed, the membership value Cj (x)

is defined as the weighted average of quantities Cj (y), y Nk (x), weighted by

coefficients inversely proportional to a power of the Euclidean distance between x

and y. This formula seems natural but several points are questionable. Firstly,

the choice of the weighted sum as an aggregator of membership values Cj (y)

for all y in the neighbourhood Nk is not straightforward. It includes several im-

plicit assumptions that are not necessarily valid (see chapter 6) and alternative

compromise aggregators could possibly be used advantageously. The choice of a

compromise operator itself can be criticised and one can readily imagine cases

where a disjunctive or a conjunctive operator should be preferred. Moreover, even

7.3. A SYSTEM WITH IMPLICIT DECISION RULES 173

when the weighted arithmetic mean seems convenient, the use of weights linked to

distances of type k x y k and to parameter m is not obvious. Indeed, the norm

of x y is not necessarily a good measure of the relative dissimilarity between the

two biscuits represented by x and y. This is the case, for instance, when units

are different and non commensurate on the various axis. In order to distinguish

between significant and non significant differences on each dimension, one may

include discrimination thresholds (see chapter 6) in the comparison, allowing to

distinguish differences that are significant for the expert from those that are neg-

ligible. This is particularly suitable in the field of subjective evaluation in which

preferences and perceptions of the expert (or decision-maker) are not usually lin-

early related to the observable parameters. For instance, one could define a fuzzy

similarity relation (x, y) as a function of quantities of type k xi yi k for any

attribute i, representing the relative closeness of x and y for the expert. Then, we

can use a general aggregation rule of type:

where Nk (x) = {y1 , . . . , yk } and is an aggregation function.

This is the proposition made in (Henriet 1995), (Henriet and Perny 1996) and

(Perny and Zucker 1999) where the membership of Cj (x) is defined by:

k

Y

(7.13) Cj (x) = 1 (1 (x, yi ).Cj (yi ))

i=1

(x, y), one per attribute i) defined as follows:

if |xi yi | qi

1

|xi yi |qi

(7.14) i (x, y) = pi qi if qi < |xi yi | < pi

if |xi yi | pi

0

In the above formula, qi and pi are thresholds (possibly varying with the level

xi or yi ) used to define a continuous transition from full similarity to dissimilarity

as shown in the example given in Figure 7.7. It should be noted however that the

definition of similarity indices i (x, y) is very demanding. It requires assessing

two thresholds for attribute level xi . Moreover the linear transition from similarity

to non-similarity is not easy to justify and a full justification of the shape of the

similarity function i would require a lot of information about difference of type

xi yi . Usually, the construction of such similarity functions is only based on

empirical evidence and common sense principles.

Coming back to the example, the kNN algorithm can be used for periodically

computing two coefficients too hot (x) and not hot enough (x). These coeffi-

cients evaluate the necessity for a regulation action, by analysing the measure x

of the last biscuit. For instance, too hot (x) = 1 and not hot enough (x) = 0

means that decreasing the oven temperature is necessary. The decision process

174 CHAPTER 7. DECIDING AUTOMATICALLY

~ i (x, y)

1

0 yi

x i - pi xi - qi xi + qi x i + pi

is improved if we use the fuzzy version of the kNN algorithm in the diagno-

sis stage. In this case, the values too hot (x) and not hot enough (x) possibly

take any value within the unit interval, and these values can be interpreted as

indicators of the amplitude of the regulation and help the system in choosing a

soft regulation action. The main drawback of this automatic decision process is

the absence of explicit decision rules explaining the regulation actions. This is

not a real drawback in this context because the quality of biscuits is a sufficient

argument for validation. However, in many other decision problems involving an

automatic system, e.g. the automatic pre-filtering of loan files in a bank, the need

for explanations is more crucial, first to validate a priori the system, and secondly

to explain decisions a posteriori to the clients. The use of rules in the context of

baking control is discussed in the next section.

making

In the case reported in (Perrot 1997) about the control of biscuits during baking,

the diagnosis stage was not uniquely based on the kNN algorithm. Indeed, in

this application, it was possible to elicit decision rules for the diagnosis stage.

Actually, the quality of the biscuit is evaluated by the expert on the basis of

3 attributes, subjectively evaluated, which are the moisture (m), the thickness

(t) and the aspect of the biscuit (colour). The qualifiers used for labelling these

attributes are:

moisture: dry, normal, humid

thickness: too thin, good, too thick

aspect burned, overdone, done, underdone, not done,

Then, the human expertise in the diagnosis stage is expressed using these labels

by rules of type:

then the oven is too hot

7.4. AN HYBRID APPROACH FOR AUTOMATIC DECISION-MAKING 175

then the oven is not hot enough

eters (m, t, L, a, b) to the labels used in the rules, in order to be able to implement

a hybrid approach based on kNN algorithms to get a fuzzy symbolic description

of the biscuit and the fuzzy rule-based approach presented in section 7.2 to in-

fer a regulation action. The numeric-symbolic translation is natural for moisture

and thickness. The labels used for these two parameters are represented by the

following fuzzy sets (see Figures 7.8 and 7.9).

1

0 m

(cg/g)

3 3.8 4.7 5.8

1

0 t

28 32 35 38 (mm)

The translation is more difficult for labels used for the biscuit aspect because

the aspect is represented by a fuzzy subset of the 3-dimensional space characterised

by the components (L, a, b). This problem has been solved by the fuzzy kNN

algorithm. It is indeed sufficient to ask an expert in baking control to qualify,

with a label yi each element i of a representative sample of biscuits, using only

the 5 labels introduced to describe aspect. At the same time, the sensors assess

the vector xi = (Li , ai , bi ) describing the biscuit i in the physical space. Then the

fuzzy kNN algorithm is applied with reference points (xi , yi ) for all biscuits i

in the sample. For any input x = (L, a, b) it gives the membership values yj (x)

176 CHAPTER 7. DECIDING AUTOMATICALLY

for any label yj , j {1, . . . , 5} used to describe the biscuits aspect. The fuzzy

nearest neighbour algorithm provides a representation of labels yj , j = 1, . . . , 5 by

fuzzy subsets of the (L, a, b) space. This makes it possible to resort to the fuzzy

control approach presented in section 7.2.

rule-based system provides a soft automatic decision system whose action can be

explained by the experts rules. This control system can be integrated within a

continuous regulation loop, alternating action and retroaction steps, as illustrated

in Figure 7.10

m

too hot (x)

t Diagnosis Decision

x L

Module Module

t

a not hot enough(x)

b

biscuits

settings

7.5 Conclusion

We have presented simple examples illustrating some basic techniques used to

simulate human diagnosis, reasoning and decision-making, in the context of re-

peated decision problems, convenient for an automatisation. We have shown the

importance of constructing suitable mathematical representation of knowledge and

decision rules. The task is difficult because human diagnosis is mainly based on

human perception whereas sensors naturally give numerical measures, and be-

cause human reasoning is mainly based on words and propositions drawn from the

natural language, whereas computers are basically suited to perform numerical

computations. As shown in this chapter, some simple and intuitive formal mod-

els have been proposed, enabling to establish a formal correspondence between

symbolic and numeric information. They are based on the definition of fuzzy sets

linking labels to observable numerical measures through membership functions.

However, a proper use of these fuzzy sets requires a very careful analysis. Indeed,

we have shown that many apparently natural choices in the modelling process

possibly hide strong assumptions that can turn out to be false in practice. For

instance, small numerical examples given in the chapter show that, in the context

of rule based control systems, the output of the system highly depends on the

choice of numbers used to represent symbolic knowledge. In particular, one must

be aware that multiplying arbitrary choices in the construction of membership

functions can make the output of the system completely meaningless.

7.5. CONCLUSION 177

of weighting propositions and aggregating numerical information. This shows the

great importance of mastering the variety of aggregation operations, their proper-

ties and the constraints to be satisfied in order to preserve the meaningfulness of

conclusions. It must be clear that by not thoroughly respecting these constraints,

the outputs of any automatic decision system are more the consequences of arbi-

trary choices in the modelling process than those of a sound deduction justified

by the observations and the decision rules. Designing an automatic decision pro-

cess in which the arbitrary choice of numbers used to represent knowledge is more

decisive than the knowledge itself is certainly the main pitfall of the modelling

exercise.

Since one cannot reasonably expect to avoid all arbitrary choices in the mod-

elling process, both theoretical and empirical validations of the decision system are

necessary. The theoretical validation consists in investigating the mathematical

properties of the transfer function that forms the core of the decision module. This

is the opportunity to control the continuity and the derivatives of the function, but

also to check whether the computation of the outputs is meaningful with respect

to the nature of the information given to the system as input. The empirical or

practical validation consists in testing the decisional behaviour of the system in

various typical states of the system. It takes the form of trial and errors sequences

enabling a progressive tuning of the fuzzy-rule based model to better approxi-

mate the expected decisional behaviour. This can be used to determine suitable

membership functions characterising the rules. This can even be used to learn

the rules themselves. Indeed, when a sufficiently rich basis of examples is avail-

able, the rules and the membership values can be learned automatically (see e.g

Bouchon-Meunier and Marsala 1999) or (Nauck and Kruse 1999) for neuro-fuzzy

methods in fuzzy rule generation. The neuro-fuzzy approach is very interesting

for designing an automatic decision system, because it takes advantage of the ef-

ficiency of neural networks while preserving the easy to interpret feature of a

rule based-system. Notice however that, due to the need for learning examples to

show the system what the right decisions in a great number of situations are, the

learning-oriented approach is only possible when the decision task is completely

understood and mastered by a human. This is usually the case when the automa-

tisation of a decision task is expected, but one should be aware that this approach

is not easily transposable to more complex decision situations where preferences

as well as decision rules are still to be constructed.

8

DEALING WITH UNCERTAINTY:

AN EXAMPLE IN ELECTRICITY

PRODUCTION PLANNING

8.1 Introduction

In this chapter, we describe an application that was the theme of a research col-

laboration between an academic institution and a large company in charge of the

production and distribution of electricity. We do not give an exhaustive descrip-

tion of the work that was done and of the decision-aiding tool that was developed.

A detailed presentation of the first discussions, of the progressive formulation of

the problem, of the assumptions chosen, of the hesitations and backtrackings,

of the difficulties encountered, of the methodology adopted and of the resulting

software would require nearly a whole book. Our purpose is to point out some

characteristics of the problem, especially on the modelling of uncertainties. The

description was thus voluntarily simplified and some aspects, of minor interest in

the framework of this book, were neglected. The main purpose of this presenta-

tion is to show how difficult it is to build (or to improvise) a pragmatic decision

model that is consistent and sound. It illustrates the interest and the importance

of having well-studied formal models at our disposal when we are confronted with

a decision problem. Sections 8.2 and 8.3 present the context of the application

and the model that was established. Section 8.4 is based on a didactical example:

it first illustrates and comments some traditional approaches that could have been

used in the application; then it gives a detailed description of the approach that

was applied in the concrete case. Section 8.5 provides some general comments on

the advantages and drawbacks of this approach.

The company must periodically make some choices for the construction or closure

179

180 CHAPTER 8. DEALING WITH UNCERTAINTY

of coal, gas and nuclear power stations, in order to ensure the production of elec-

tricity and satisfy demand. Due to the diversity of points of view to be taken into

account, the managers of the production department wanted to develop a multiple

criteria approach for evaluating and comparing potential actions. They considered

that aggregating financial, technical and environmental points of view into a type

of generalised cost (see Chapter 5) was neither possible nor very serious. A collab-

oration was established between the company and an academic department (we

will call it the analyst) that rapidly discovered that, beside the multiple criteria

aspect, an enormous set of potential actions, a significant temporal dimension and

a very high level of uncertainty on the data needed to be managed. The next

section points out these aspects through the description of the model as it was

formulated in collaboration with the companys engineers.

8.3.1 The set of actions

In this chapter, we call decision a choice made at a specific point in time: it

consists in choosing the number of production units of the different types of fuel

(Nuclear, Coal, Gas) to be planned and in specifying whether the downgrade plan

(previously defined by another department of the company) has to be followed,

or partially anticipated (A) or delayed (D). In terms of electricity production and

delay, each unit and modification of the downgrade plan has different specificities

(see Table 8.1).

N 900 9

C 400 6

G 350 3

A 300 0

D +300 0

Table 8.1: Power and construction delay for the different types of production unit

For simplicity, the decisions are only taken at chosen milestones, separated by

a time period of about 3 years (this period between two decisions is called block ).

At most one unit of each type per year may be ordered, and the choice concerning

the downgrade plan (follow, anticipate or delay) is of course exclusive. A decision

for a block of 3 years could thus be for example

meaning that one nuclear, one coal and two gas production units are planned and

that the downgrade plan has to be anticipated.

8.3. THE MODEL 181

Each decision is irrevocable and naturally has consequences for the future, not

only on the production of electricity, as seen in Table 8.1, but also in terms of

investment, exploitation cost, safety, environmental effects, ... (see Section 8.3.2).

An action is a succession of decisions over the whole time period concerned by

the simulation (the horizon), i.e. a period of about 20-25 years or 7 blocks. An

action is thus for example

{1N, 1C, 2G, A}, {1C}, {2G}, {}, {3G}, {1G, 1C}, {1N, 2G} .

The number of possible actions is of course enormous. Even after adding some

simple rulesonly one (or zero) nuclear units are allowed exclusively on the first and

last block, anticipation and delay are only allowed on the first and second blocks,

an anticipation followed by a delay (or the inverse) is forbiddenthe number of

actions is still of around 108 . Many of these actions are completely unrealistic,

as for example no new unit for 20 years or 3G and 3C in every block: they can

be eliminated by fixing reasonable limits on the power production of the park.

In this problem, the decision-maker only kept the actions so that, for each block,

the surplus is less than 1 000 MW and the deficit be less than 200 MW. These

limitations led to a set of approximately 100 000 potential actions. The temporal

dimension of the problem naturally leads to a tree structure for these actions, built

on decision nodes (represented by squares in Figure 8.1). Depending on the block

considered, there are typically between 3 and 30 branches leaving each decision

node.

The list of criteria was defined by the industrial partner in order to avoid unbear-

able difficulties in data collection and to work on a sufficiently realistic situation.

Remember that the purpose of the study was to build a decision-aiding method-

ology and was not to make a decision. It was important to test the methodology

with a realistic set of criteria but it was also clear that the methodology should be

independent of the criteria chosen. In the application described here, the following

eight criteria were taken into account, for the time period of the simulation:

marginal cost, i.e. the amount of total cost for a variation of 1 GWh, in BEF,

to minimise;

182 CHAPTER 8. DEALING WITH UNCERTAINTY

A : {}, {2G}, {3G}, {2G}, {3G}, {}, {}

B : {1N, 2G, 2C}, {2C, 1G}, {3C}, {2C}, {1N }, {}, {}

A B

Fuel cost 33 500 31 000 MBEF

Exploitation cost 45 000 49 000 MBEF

Investment cost 360 000 770 000 MBEF

Marginal cost 730 620 KBEF/GWH

Deficient power 16.7 10.3 TWH

CO2 emissions 22 000 16 000 Ktons

SO2 + N Ox emissions 70 48 Ktons

Sales Balance 23 000 30 000 MBEF

The evaluations of the actions on these criteria are of course not known with

certainty, because they depend on many factors that are not or not well known by

the decision-maker. The uncertainties have an impact on the evaluations, which

can be direct (the prices of the raw materials influence their total costs) or indirect

(if the gas price increases more than the coal price, the coal power stations will be

more intensively exploited than the gas ones; this will have an impact on the fuel

costs and the environmental impacts of the production park). Table 8.2 presents

an example of evaluations for two particular actions in a scenario where the fuel

price is low and the demand for electricity is relatively weak. Other scenarios must

be envisaged in order to improve the realism and usefulness of the model.

Generally speaking, the determination of the value of a parameter at a given

moment can lead to the following situations:

the value is not known: the value is relative to the past and was not measured,

the value is relative to the present but is technically impossible or very

expensive to obtain, the value is relative to the future for a parameter with

a completely erratic evolution;

8.3. THE MODEL 183

the value can be approximated by an interval: the bounds result from the

properties of the system considered, the interval is due to the imprecision of

the measure or to the use of a forecasting method; sometimes, a probability,

a possibility or a confidence index can be associated with each value of the

interval;

the value is not unique: several measures did not yield the same value, several

scenarios are possible; again a probability, a possibility, a confidence index

or the result of a voting process can be associated with each value;

the value is unique but not reliable, with a certain information on the degree

of reliability.

In the particular situation described here, the industrial partner was already

using stochastic programming for the management of the production park. He

wanted to have another methodology in order to take better account of the num-

ber of potential actions and the multiple criteria aspects. For the uncertainties,

however, they were used to working with probabilities and the framework of the

study did not allow to suggest anything else. So, scenarios were defined and subjec-

tive probabilities were assigned to them by the companys experts. More precisely,

two types of uncertainties were distinguished and respectively called aleas and

major uncertainties: the difference between them is based on the more or less

strong dependence between the past and the future. The industrial partner con-

sidered that nuclear availability in the future was completely independent of the

knowledge of the past and called this type of uncertainty alea: this means that

the level of nuclear availability was completely open for each period of three years

(a breakdown at a given time does not imply that there will be no breakdown in

the near future). The selling price of electricity was also considered as an alea

in order to be able to capture the deregulation phenomena due to a forthcoming

new legislation.

The major uncertainties (for which some dependence can exist between the

values at different moments) were the fuel price (the market presents global ten-

dencies and a high price for the first two blocks reinforces the probability of having

a high price for the third one), the demand for electricity (same reasoning) and

the legislation concerning pollution (in this example, the law may change for the

third block, and the uncertain parameters after this block are thus strongly re-

lated: either the same as for the first blocks, or more severe, but in both cases,

constant over all blocks after block 2).

The major uncertainties allow for a learning process that must be taken into

account in the analysis: each decision, at a given time, may use the previous values

of the uncertain parameters and deduce information from them about the future.

This information may modify the choices of the decision-maker. Suppose for in-

stance that a variable x may be equal to 0 or 1 in the future. The corresponding

probabilities are assessed as follows:

184 CHAPTER 8. DEALING WITH UNCERTAINTY

P (x = 0) > 0.5, after past scenario A,

P (x = 0) < 0.5, after past scenario B,

where the past scenario is known at the time of decision. The decision-maker

has to choose between two decisions: a and b. If he prefers a when x = 0 and b

when x = 1, a reasonable decision will be to choose a after scenario A and b after

scenario B.

The previous explanation is not valid for aleas, because their independence

does not allow for direct inference from the past.

Because of the statistical dependence and of the possible learning process in

the major uncertainty case, a complete treatment and a tree-structure for these

scenarios (a scenario is a succession of observed uncertainties) are necessary. If

there are 3 levels for the fuel price, 3 levels for the demand, 2 levels for the

legislation, and if the horizon is divided into 7 blocks, there are, a priori, (3 3

2)7 ' 6108 possible scenarios. Fortunately, most of these scenarios are negligible

because the probability of a very fluctuating scenario is very small: the major

uncertainty scenarios are rather strongly correlated, and a sequence of levels for

the fuel price such as HHLMHLH (H for high, M for medium and L for low) is

much less probable than a sequence HHHMMMM. In practice, two sequences were

retained for legislation (MMMMMMM and MMHHHHH ), it was imposed that

scenarios could only change after two blocks, and each modification was penalised

so that very fluctuating scenarios were hardly possible. The analyst finally retained

around 200 representative scenarios that were gathered in a tree-structure of major

uncertainty nodes (represented by circles in Figure 8.1).

Of course, the complete scenario for a decision node at time t is not known but

a probability is associated to each of them, allowing to compute the conditional

probability of each complete scenario knowing the already observed partial scenario

at time t.

On the contrary, the aleas are by essence uncorrelated and there is no reason

to neglect any scenario. If there are 3 levels for the selling price and 2 levels for the

availability of nuclear units , then the number of scenarios is (3 2)7 = 279 936.

Fortunately, the tree structure of the aleas is obvious: each node gives rise to

the same possibilities, with the same probability distribution. For these reasons,

the aleas act much more simply than the major uncertainties, and it is possible to

take the whole set of scenarios into account.

Independently of the dependence between the past and the future in the modelling

of the uncertainties, the temporal dimension plays an important role in this kind

of problem.

First, the time period between the decision to build a certain type of power

station and the beginning of the exploitation of that station is far from being

negligible. Second, some consequences of the decisions appear after a very long

time (as the environmental consequences for example). Third, the consequences

8.3. THE MODEL 185

decisions period decisions decisions period

186 CHAPTER 8. DEALING WITH UNCERTAINTY

themselves can be dispersed over rather long periods and vary within these pe-

riods. Fourth, the consequences of a decision can be different according to the

moment that decision is taken. It is rather usual, in planning models, to introduce

a discounting rate that decreases the weight of the evaluations for distant conse-

quences (see Chapter 5) and the industrial partner did this here. However, for a

long term decision problem with important consequences for future generations,

such an approach may not be the best one and the decision-maker could be more

confident in the flexible approach and the richness of the scenarios. That is why

the analyst kept the possibility to introduce discounting or not.

The complete model can be described by a tree structure including decision nodes

(squares) and uncertainty nodes (circles), as illustrated in Figure 8.1. At t = 0

(square node at the beginning of block 1), a first decision is made (a branch is

chosen) without any information on the scenario, leading to a circle node. During

block 1, one may observe the actual values of the uncertain parameters (nuclear

disponibility, electricity selling price, fuel price, electricity demand and environ-

mental legislation), determining one branch leaving the considered circle node and

leading to one of the decision nodes at time t = 1. A new decision is then made,

taking the previous information into account, and so on until the last decision

(square) node and the last scenario (circle) node that determine the whole action

and the whole observed scenario. In the resulting tree (Figure 8.1), the decision

nodes (squares) correspond to active parts of the analysis where the decision-maker

has to establish his strategy, while the uncertainty nodes (circles) correspond to

passive parts of the analysis where the decision-maker undergoes the modifications

of the parameters.

Consider Figure 8.2 describing two successive time periods. At time t = 0, two

decisions A and B are eligible; during the first period, two events S and T are

possible, each with probability 1/2. At the beginning of the second period, two

decisions C and D are eligible if the first decision was A and three decisions E,

F, G are eligible if the first decision was B. During the second period, two events

U and V are possible after S (with respective probabilities 1/4 and 3/4) and two

events Y and Z are possible after T (with respective probabilities 3/4 and 1/4).

Figure 8.2 presents the tree and the evaluation of each action (set of decisions)

for each complete scenario. Remark that this didactic example contains only one

8.4. A DIDACTIC EXAMPLE 187

evaluation for each action (problem with one criterion). We do not insist on the

multiple criteria aspect of the problem here (this was treated in Chapter 6) and

focus on the treatment of uncertainty.

In the traditional approach, the nodes of the tree are considered from the leaves

to the root (folding back) and the decisions are taken at each node in order

to maximise their expected values, i.e. the mean of the corresponding probability

distributions for the evaluations. Of course, this is only possible when the eval-

uations are elements of a numerical scale. At node N2 (beginning of the second

period), the expected value of decision C is (1/4 7 + 3/4 4.5) = 41/8 while

the expected value of decision D is (1/4 4.5 + 3/4 5.5) = 42/8. So, the best

decision at node N2 is D and the expected value associated to N2 is 42/8. Making

similar calculations for N3, N4 and N5, one obtains the tree represented in Figure

8.3.

At node N1, the expected values of decisions A and B are respectively 39/8

and 5, so the best decision is B.

In conclusion, the optimal action obtained by the traditional approach will

consist in applying decision B at the beginning of the first period and decision

E or G at the beginning of the second period, depending on whether the event

occurred in the first period was S or T.

Just as the weighted sum (already discussed in the other chapters of this book), the

expected value presents some characteristics that the user must be aware of. For

example, probabilities intervene as tradeoffs between the values for different events:

the difference of one unit in favour of C over D for event V, whose probability is 3/4,

would be completely compensated by a difference of three units in favour of D over

C for event U because its probability is 1/4. A consequence is that a big difference

in favour of a specific decision in some scenario could be sufficient to overcome a

systematic advantage for another decision in all the other scenarios, as illustrated

in the example presented in Figure 8.4. In this example, if the probabilities of S, T

and U are all equal to 1/3, the expected value will give preference to A, although

B is better than A in two scenarios out of three.

Remember the famous St. Petersburg game (see for example Sinn 1983)

showing that the expected value approach does not always represent the attitude

of the decision-maker towards risk very well. The game consists of tossing a coin

repeatedly until the first time it lands on heads; if this happens on the k th toss,

the player wins 2k e. The question is to find out how much a player would be

ready to bet in such a game. Of course, the answer depends on the player but,

in any case, the amount would not be very big. However, applying the expected

value approach, we see that the expected gain is

X 1 k

.2 = +.

2k

k=1

188 CHAPTER 8. DEALING WITH UNCERTAINTY

Value

U (1/4) N6 7

C

V (3/4) N7 4.5

N2 D

U (1/4) N8 4.5

S (1/2)

V (3/4) N9 5.5

C

T (1/2) Z (1/4) N11 4.5

N3 D

Y (3/4) N12 1

A

Z (1/4) N13 5

E

N1

U (1/4) N16 3

F

N4

G V (3/4) N17 1

B S (1/2)

U (1/4) N18 1

V (3/4) N19 1

Y (3/4) N20 6

Z (1/4) N21 1

E

T (1/2)

Y (3/4) N22 2

F

N5

G Z (1/4) N23 2

Y (3/4) N24 5

Z (1/4) N25 5

8.4. A DIDACTIC EXAMPLE 189

The expected utility model, which is the subject of the next section, allows

to resolve this paradox and, more generally, to take different possible attitudes

towards risk into account.

As the preferences of the decision-maker are not necessarily linearly linked to the

evaluations of the actions, it may be useful to replace these evaluations by the

psychological values they have for the decision-maker through so-called utility

functions (Fishburn 1970).

Denoting by u(xi ) the utility of the evaluation xi , the expected utility value of

a decision leading to the evaluation xi with probability pi (i = 1, 2, ..., n) is given

by

X

pi u(xi ).

i

This model dates back at least to Bernoulli (1954) but the basic axioms, in

terms of preferences, were only studied in the present century (see for instance von

Neumann and Morgenstern 1944).

In the case of the St. Petersburg game, if we denote by u(x) the utility of

winning x e, the expected utility of refusing the game is u(0), while the expected

utility of betting an amount of s e in the game is

X

1/2k u(2k s).

k=1

As an exercise, the reader can verify that for a utility function defined by

u(x) =

1 iff x > 220 ,

the expected utility of betting s e in the game is positive (hence superior to the

expected utility of refusing the game) as long as s is less than 21(1 1/220 ) e,

and is negative for larger values. The expected utility can also be finite with an

unbounded utility function such as, for example, the logarithmic function.

In the example in Figure 8.2 and with a utility function defined by

u(1) = u(2) = 1,

u(3) = u(3.5) = 2,

u(4.5) = u(5) = u(5.5) = 3,

u(6) = u(7) = 4,

The optimal action is then to apply decision A at the beginning of the first

period and decision C at the beginning of the second period, contrary to what was

obtained with the expected value approach.

190 CHAPTER 8. DEALING WITH UNCERTAINTY

S(1/2) N2 D 5.25

A T(1/2) N3 C 4.5

N1

B S(1/2) N4 E 5

T(1/2) N5 G 5

10

S

T

15

A U

20

15

S

B

T

20

U

9

8.4. A DIDACTIC EXAMPLE 191

S(1/2) N2 C 13/4

A T(1/2) N3 C 1/2

N1

B S(1/2) N4 E 11/4

T(1/2) N5 E 1/2

Much literature is devoted to this approach, the probabilities being objective or

subjective: see for example Savage (1954), Luce and Raiffa (1957), Ellsberg (1961),

Fishburn (1970) and Fishburn (1982), Allais and Hagen (1979), McCord and de

Neufville (1983), Loomes (1988), Bell et al. (1988), Barbera, Hammond and Seidl

(1998))

We simply recall one or two characteristics here that every user should be aware

of. As in every model, the expected utility approach implicitly assumes that the

preferences of the decision-maker satisfy some properties that can be violated in

practice. The following example illustrates the well-known Allais paradox (see

Allais 1953) . It is not unusual to prefer a guaranteed gain of 500 000 e to an

alternative providing 500 000 e with probability 0.89, 2 500 000 e with probability

0.1 and 0 e with probability 0.01. Applying the expected utility model leads to

the following inequality

hence, grouping terms,

0.11u(500 000) > 0.1u(2 500 000) + 0.01u(0).

At the same time, it is reasonable to prefer an alternative providing 2 500 000 e

with probability 0.1 and 0 e with probability 0.9 to an alternative providing

500 000 e with probability 0.11 and 0 e with probability 0.89. In this case, the

expected utility model yields

0.1u(2 500 000) + 0.9u(0) > 0.11u(500 000) + 0.89u(0),

192 CHAPTER 8. DEALING WITH UNCERTAINTY

W R G

A 100 0 0

B 0 100 0

W R G

C 100 0 100

D 0 100 100

Table 8.3

which is in contradiction with the inequality obtained above. So, the expected

utility model cannot explain the two previous preference situations simultaneously.

A possible attitude in this case is to consider that the decision-maker should

revise his judgment in order to be more rational, that is, in order to satisfy the

axioms of the model. Another interpretation is that the expected utility approach

sometimes implies unreasonable constraints on the preferences of the decision-

maker (in the previous example, the violated property is the so-called independence

axiom of Von Neumann and Morgenstern). This last interpretation led scientists

to propose many variants of the expected utility model, as in Kahneman and

Tversky (1979), Machina (1982, 1987), Bell et al. (1988), Barbera et al. (1998).

Before explaining why the expected utility model (or one of its variants) was

not applied by the analyst in the electricity production planning problem, let us

mention why using probabilities may cause some trouble in modelling uncertainties

or risk. The following example illustrates the so-called Ellsberg paradox and is

extracted from Fishburn (1970, p.172). An urn contains one white ball (W) and

two other balls. You only know that the two other balls are either both red (R),

or both green (G), or one is red and one is green. Consider the two situations in

Table 8.3 where W, R, and G represent the three states according to whether one

ball drawn at random is white, red or green. The figures are what you will be paid

(in Euros) after you make your choice and a ball is drawn.

Intuition leads many people to prefer A to B and D to C, while the expected

utility approach leads to indifference between A and B and as well as between C

and D.

This type of situation shows that the use of the probability concept may be

debatable for representing attitude towards risk or uncertainty; other tools (pos-

sibility theory, belief functions or fuzzy integrals) can also be envisaged.

8.4. A DIDACTIC EXAMPLE 193

Events Probab. C D

U 1/4 7 4.5

V 3/4 4.5 5.5

Table 8.4

We will now present the approach that was applied in the electricity production

planning problem. This approach is certainly not ideal (some drawbacks will be

pointed out in the presentation). However, it does not aggregate the multiple

criteria consequences of the decisions into a single dimension, thus avoiding some

of the pitfalls mentioned in Chapter 6 on the multi-attribute value functions.

Moreover, it does not introduce a discounting rate for the dynamic aspect (see

Chapter 5) and it allows to model the particular preferences of the decision-maker

along each evaluation scale.

In the electricity production planning problem described in Section 8.3, the

analyst did not know whether the probabilities given by the company were really

probabilities (and not plausibility coefficients) and it was not sure that the con-

sequences of one scenario were really comparable to the consequences of another.

On the one hand, it was definitely excluded to transform all the consequences into

money and to aggregate them with a discounting rate (as in Chapter 5). On the

other hand, the company was not prepared to devote much time to the clarification

of the probabilities and to long discussions about the multiple criteria and dynamic

aspects of the problem, so that it was impossible to envisage an enriched variant of

the expected utility model. The analyst decided to propose a paired comparison

of the actions, scenario by scenario, as illustrated below for the didactical example

presented in Figure 8.2.

At node N2, we have to consider Table 8.4.

The comparison between C and D was made on the basis of the differences

in preference between them for each of the considered events similarly to what

is done in the Promethee method (Brans and Vincke 1985). Let us consider a

preference function defined by

1 x > 1,

f (x) =

0 elsewhere,

where x is the difference in the evaluations of two decisions. Other functions can

be defined similarly to what is done in the Promethee method. This function

expresses the fact that a difference which is smaller or equal to 1 is considered

to be non significant. As we see, an advantage of this approach is to enable the

introduction of indifference thresholds.

The analyst proposed the following index to measure the preference of C over

D, on the basis of the data contained in Table 8.4:

194 CHAPTER 8. DEALING WITH UNCERTAINTY

C D

C 0 1/4

D 0 0

Table 8.5

Events Probab. C D

Y 3/4 4.5 1

Z 1/4 4.5 5

Table 8.6

These preference indices are summarised in Table 8.5. The score of each deci-

sion is then the sum of the preferences of this decision over the other minus the

sum of the preferences of the other over it. In the case of Table 8.5, this trivially

gives 1/4 and 1/4 as respective scores for C and D. The maximum score deter-

mines the chosen decision. So, the chosen decision at node N2 is C. Remark that,

despite the analysts doubt about the real nature of the probabilities, he used

them to calculate a sort of expected index of preference for each decision over each

other decision. This is certainly a weak point of the method and other tools, which

will be described in a volume in preparation, could have been used here. Note also

that, in the multiple criteria case, a (possibility weighted) sum is computed for all

the criteria in order to obtain the global score of a decision.

At node N3, we have to consider Table 8.6, leading to the preference indices

presented in Table 8.7. For example, the preference index of C over D is

The scores of C and D are respectively 3/4 and 3/4, so that the chosen

decision at node N3 is also C.

At node N4, decision E dominates F and G and is thus chosen (where domi-

nates means is better in each scenario).

At node N5, we must consider Table 8.8.

The preference index of G over E (for example) is

C D

C 0 3/4

D 0 0

Table 8.7

8.4. A DIDACTIC EXAMPLE 195

Probab. E F G

Y 3/4 6 2 5

Z 1/4 1 2 5

Table 8.8

E F G

E 0 3/4 0

F 0 0 0

G 1/4 1 0

Table 8.9

The other preference indices are presented in Table 8.9; they yield 1/2, 7/4

and 5/4 as respective scores for E, F and G, so that G is the chosen decision at

node N5.

We can now consider Table 8.10 associated to N1. The values in this table are

those that correspond to the chosen decisions at the nodes N2 to N5 (they are

indicated in parentheses).

On basis of this table, the preference of A over B is

In conclusion, the optimal action obtained through this first step consists

in choosing A at the beginning of the first period and C at the beginning of the

second period.

This approach allows to take the comparisons of the decisions separately for

each scenario into account. Let us illustrate this point for the example of Figure

Scenarios Probab. A B

S-U 1/8 7(C) 3.5(E)

S-V 3/8 4.5(C) 5.5(E)

T-Y 3/8 4.5(C) 5(G)

T-Z 1/8 4.5(C) 5(G)

Table 8.10

196 CHAPTER 8. DEALING WITH UNCERTAINTY

8.4, where 9 has been replaced by 10 in the evaluation of B for event U. If the

probabilities of S,T and U are equal to 1/3,

the expected utility approach gives

the same value 1/3 u(10) + u(15) + u(20) to A and B that are thus considered

as indifferent. However, if we compare A and B separately for each event, we see

that B is better than A for events S and T, with a probability equal to 2/3. The

approach described in this section will give a preference index of A over B equal

to

With the same function f as before, this will lead to the choice of B. Making

the (natural) assumption that f (x) = 0 when x is negative, we see that this

approach will lead to indifference between A and B only with a function f such

that f (20 10) = f (15 10) + f (20 15).

As this approach is based on successive pairwise comparisons, it also presents some

pitfalls which must be mentioned. The example presented in Figure 8.6 will allow

to illustrate a first drawback. In this example, three periods of time are considered,

but there are no uncertainties during the first two periods. Two decisions A and

B are possible at the beginning of the first period. At the beginning of the second

period, two decisions C and D are possible after A and only one decision is possible

after B. At the beginning of the third period, two decisions E and F are possible

after C while only one decision is possible in each of the other cases. During the

last period, three events S, T and U can occur, each with a probability of 1/3.

Let us apply the approach described in Section 4.5 with the same function f .

At node N4, the preference index of E over F will be

At node N2, we must consider Table 8.11, where the values of C are those of

F (decision chosen at node N4).

On basis of Table 8.11, we compute the preference index of C over D by

8.4. A DIDACTIC EXAMPLE 197

N7 10

S

T N8 15

U

E N9 20

N4 N10 15

S

C

T N11 20

F

U

N2 N12 0

A S N13 20

D

T

N5 N14 0

U

N1 N15 5

N16 0

S

B

T N17 5

N3 N6 U

N18 10

Events Probab. C D

S 1/3 15 20

T 1/3 20 0

U 1/3 0 5

Table 8.11

198 CHAPTER 8. DEALING WITH UNCERTAINTY

Events Probab. A B

S 1/3 20 0

T 1/3 0 5

U 1/3 5 10

Table 8.12

S 1/3 0 10

T 1/3 5 15

U 1/3 10 20

Table 8.13

At node N1, we must consider the Table 8.12, where the values of A are those

of D (decision chosen at node N2).

On basis of Table 8.12, the preference index of A over B is given by

In conclusion, the methodology leads to the choice of the action B despite the

fact that it is dominated by the action (A,C,E) as is shown in Table 8.13.

This is due to the fact that the comparisons are too local in the tree; in the

concrete application described in this chapter another drawback was the fact that,

for decisions at nodes relative to the last periods, the evaluations were not very

different, due to the large common part of the actions and scenarios preceding

these decisions. The conclusion was many indifferences between the decisions at

each decision node.

To improve the methodology, the analyst proposed to introduce a second step

that is the subject of the next section.

In order to introduce more information into the comparisons of local decisions and

to take the tree as a whole into account, a second step was added by the analyst.

8.4. A DIDACTIC EXAMPLE 199

U 1/4 7 4.5 3.5

V 3/4 4.5 5.5 5.5

Table 8.14

At each decision node, the local decisions are also compared to the best actions in

the same scenarios in each of the branches of the tree.

In Figure 8.2, at node N2, C and D are also compared to the best decision in

N4, i.e. to E (after event S).

This leads to the consideration of Table 8.14

Using the same preference function as before, the preference of C over D is

still 1/4 (see section 4.4), the preference of D over C is still 0, the preference of

C over E is [1/4 f (3.5) + 3/4 f (1)] = 1/4, the preference of E over C is

[1/4 f (3.5) + 3/4 f (1)] = 0, the preference of D over E is [1/4 f (1) + 3/4

f (0)] = 0 and the preference of E over D is [1/4 f (1) + 3/4 f (0)] = 0.

Table 8.15 summarises these values.

C D E

C 0 1/4 1/4

D 0 0 0

E 0 0 0

Table 8.15

The scores for C and D are respectively 1/2 and 1/4, C is therefore chosen

at node N2.

At node N3, we compare C and D with the best decision in N5, i.e. with G

(after event T), on basis of Table 8.16.

Table 8.17 gives the preference indices.

The scores of C and D are respectively 3/4 and 3/2, so that C is also chosen

in N3.

The analysis of N4 (comparison of E, F, G and C (N2)) and of N5 (comparison

of E, F, G and C (N3)) lead to the same conclusions as in the first step, so that,

in this example, the second step does not change anything.

However, the interest of this second step is to choose, at each decision node,

a decision leading to a final result that is strong, not only locally, but also in

Events Probab. C D G

Y 3/4 4.5 1 5

Z 1/4 4.5 5 5

Table 8.16

200 CHAPTER 8. DEALING WITH UNCERTAINTY

C D G

C 0 3/4 0

D 0 0 0

G 0 3/4 0

Table 8.17

Prob. E F D B

1/3 10 15 20 0

1/3 15 20 0 5

1/3 20 0 5 10

Table 8.18

comparison with the strongest results obtained during the first step in the other

branches of the tree (always in the same scenarios). This is illustrated by the

example in Figure 8.6 where the second step works as follows. At node N4, we

compare E and F with D and B (the best actions in the other branches as they

are unique), through Table 8.18

Table 8.19 presents the preference indices.

The scores of E and F respectively become 1 and 1/3, so that the best decision

at N4 is now E.

At N2, we have to compare C (followed by E) with D and B (best action in

the other branch): the scores of C and D are respectively 4/3 and -2/3, so that

the best decision in N2 is now C.

At N1, we have to compare A (followed by C and E) with B and we choose

A (that dominates B). So we see that this second step somehow avoids to choose

dominated actions, although this property is not guaranteed in all cases.

8.5 Conclusions

This approach (first and second steps) was successfully implemented and applied

by the company (after many difficulties due to the combinatorial aspects of the

problem) and some visual tools were developed in order to facilitate the decision-

E F D B

E 0 1/3 2/3 1

F 2/3 0 1/3 2/3

D 1/3 2/3 0 1/3

B 0 1/3 2/3 0

Table 8.19

8.5. CONCLUSIONS 201

following advantages:

quences of another decision in the same scenario;

the preferences of the decision-maker for each evaluation scale.

However, this approach also presents some mysterious aspects that should be

more thoroughly investigated:

it computes a sort of expected index for preference of each action over each

other action, although the role of the so-called probabilities is not that clear

in the modelling of uncertainty;

it is a rather bizarre mixture of local (first step) and global (second step)

comparisons of the actions, but it does not guarantee that the chosen action

is non-dominated.

abundant in decision analysis. Beside the expected utility model (traditional ap-

proach), a lot of other approaches were studied by many authors, such as Dekel

(1986), Jaffray (1989), Munier (1989), Quiggin (1993), Gilboa and Schmeidler

(1993), . . . They pointed out more or less desirable properties: linearity, replace-

ment separability, mixture separability, different kinds of independence, stochastic

dominance, . . . Moreover, as mentioned by Machina (1989), it is important to make

the distinction between what he calls static and dynamic choice situations. A dy-

202 CHAPTER 8. DEALING WITH UNCERTAINTY

50

0.5

0.5

N1 0

B 10

0.2 1

0

0.8

namic choice problem is characterised by the fact that at least one uncertainty

node is followed by a decision node (this is typically the case of the application de-

scribed in this chapter). In such a context, an interesting property is the so-called

dynamic consistency: a decision-maker is said to be dynamically inconsistent

if his actual choice when arriving at a decision node differs from his previously

planned choice for that node.

Let us illustrate this concept by a short example. Assume that a decision-

maker prefers a game where he wins 50 e with probability 0.1 (and nothing with

probability 0.9) to a game where he wins 10 e with probability 0.2 (and nothing

with probability 0.8). At the same time, he prefers to receive 10 e with certainty

to a game where he wins 50 e with probability 0.5 (and nothing with probability

0.5). Note that these preferences violate the independence axiom of Von Neumann

and Morgenstern. Now consider the tree of Figure 8.7.

According to the previous information, the actual choice of the decision-maker,

at node N1, will be B. However, if he has to plan the choice between A and B

before knowing the first choice of nature, he can easily calculate that if he chooses

A, he wins 50 e with probability 0.1 (and nothing with probability 0.9), while if he

chooses B, he wins 10 e with probability 0.2 (and nothing with probability 0.8),

so that the best choice for him (before knowing the first choice of nature) is A.

So, the actual choice at N1 differs from the planned choice for that node,

illustrating the so-called dynamic inconsistency. It can be shown that any depar-

ture from the traditional approach can lead to dynamic inconsistency. However,

Machina (1989) showed that this argument relies on a hidden assumption con-

cerning behaviour in dynamic choice situations (the so-called consequentialism)

and argued that this assumption is inappropriate when the decision-maker is a

non-expected utility maximiser.

This example shows that no approach can be considered as ideal in the context

of decision under uncertainty. As for the other situations studied in this book,

each model, each procedure, can present some pitfalls that have to be known by

the analyst. Knowing the underlying assumptions of the decision-aid model which

8.5. CONCLUSIONS 203

will be used is probably the only way, for the analyst, to guarantee an as scientific

as possible approach of the decision problem. It is a fact that, due to lack of

time and other priorities, many decision tools are developed in real applications

without taking enough precautions (this is also the case in the example presented

in this chapter, due to the short delays and to the necessity of overcoming the

combinatorial aspects of the problem). This is why we consider providing some

guidelines for modelling a decision problem important to the analysts: this will be

the subject of a volume in preparation.

9

SUPPORTING DECISIONS: A

REAL-WORLD CASE STUDY

Introduction

In this chapter1 we report on a real world decision aiding process which took place

in a large Italian firm, in late 1996 and early 1997, concerning the evaluation

of offers following a call for tenders for a very important software acquisition.

We will try to extensively present the decision process for which the decision

support was requested, the actors involved, the decision aiding process, including

the problem structuring and formulation, the evaluation model created and the

multiple criteria method adopted. The reader should be aware of the fact that

very few real world cases of decision support are reported in literature although

much more occur in reality (for noteworthy exceptions see Belton, Ackermann and

Shepherd 1997, Bana e Costa, Ensslin, Correa and Vansnick 1999, Vincke 1992,

Roy and Bouyssou 1993).

We introduce such a real case description for two reasons.

1. The first reason consists in our will to give an account of what providing de-

cision support in a real context means and to show the importance of elements

such as the participating actors, the problem formulation, the construction of the

criteria etc., often neglected in many conventional decision aiding methodologies

and in operational research. From this point of view the reader may find questions

already introduced in previous chapters of the book, but here they are discussed

from a decision aiding process perspective.

2. The second reason is our will to introduce the reader to some concepts and

problems that will be extensively discussed in a forthcoming volume by the au-

thors. Our objective is to stimulate the reader to reflect on how decision support

tools and concepts are used in real life situations and how theoretical research

may contribute to aide real decision makers in real decision situations. More

precisely, the chapter is organised as follows. Section 1 introduces and defines

some preliminary concepts that will be used in the rest of the chapter such as

decision process, actors, decision aiding process, problem formulation, evaluation

model etc.. Section 2 presents the decision process for which the decision sup-

port was requested, the actors involved and their concerns (stakes), the resources

1A large part of this chapter uses material already published in Paschetta and Tsoukias (1999).

205

206 CHAPTER 9. SUPPORTING DECISIONS

involved and the timing. Section 3 describes the decision aiding process, mainly

through the different products of such a process that are specifically analysed

(the problem formulation, the evaluation model and the final recommendation)

and discusses the experience conducted. The clients comments on the experience

are also included in this section. Section 4 summarises the lessons learned in such

an experience. All technical details are included in Appendix A (an ELECTRE-

TRI type procedure is used), while the complete list of the evaluation attributes

is provided in Appendix B.

9.1 Preliminaries

We will make extensive use of some terms (like actor, decision process etc.) in this

chapter that, although present in literature (see Simon 1957, Mintzberg, Rais-

inghani and Theoret 1976, Jacquet-Lagreze, Moscarola, Roy and Hirsch 1978,

Checkland 1981, Heurgon 1982, Masser 1983, Humphreys, Svenson and Vari 1993,

Moscarola 1984, Nutt 1984, Rosenhead 1989, Ostanello 1990, Ostanello 1997, Os-

tanello and Tsoukias 1993), can have different interpretations. In order to help

the reader understand how such terms are used in this presentation we introduce

some informal definitions.

isations characterising one or more objects or concerns (the problems).

define his behaviour in the process. The term decisionmaker is also used

in the literature and in other chapters of this book, but in this context we

prefer to use the term client.

demand.

Decision Aiding Process: part of the decision process and more precisely the

interactions occurring at least between the client and the analyst.

cess when the decision support is requested and what the client is expecting

to obtain form the decision support (this is one of the products of the decision

aiding process).

client asked the analyst to support him (this is one of the products of the

decision aiding process).

mulation for which a specific decision support method can be used (this is

one of the products of the decision aiding process).

9.2. THE DECISION PROCESS 207

In early 1996 a very large Italian company operating a network based service de-

cided, as part of a strategic development policy, to equip itself with a Geographical

Information System (GIS) on which all information concerning the structure of the

network and the services provided all over the country was to be transferred. How-

ever, since (at that time) this was quite a new technology, the companys Infor-

mation Systems Department (ISD) asked the affiliated research and development

agency (RDA) and more specifically the department concerned with this type of

information technology (GISD) to perform a pilot study of the market in order to

orient the company towards an acquisition. The GISD of the RDA noticed that:

the market offered a very large variety of software which could be used as a

GIS for the companys purposes;

the company required a very particular version of GIS that did not exist as

a ready made product on the market, but had to be created by customising

and combining different modules of existing software, with the addition of

ad-hoc written software for the purpose of the company;

the question asked by the ISD was very general, but also very committing,

because it included an evaluation prior to an acquisition and not just a simple

description of the different products;

the GISD felt able to describe and evaluate different GIS products based on

a set of attributes (at the end several hundreds), but was not able to provide

a synthetic evaluation, the purpose of which was just as obscure (the use

of a weighted sum was immediately set aside because it was perceived as

meaningless).

At this point of the process the GISD found out that a unit concerned with

the use of the MCDA (Multiple Criteria Decision Analysis) methodology in soft-

ware evaluation (MCDA/SE) was operating within the RDA and presented this

problem as a case study opening a specific commitment. The MCDA/SE unit

responsible then decided to activate its links with an academic institution in order

to get more insight and advice on the problem that soon appeared to overcome the

knowledge level of the unit at that time. At this point we can make the following

remarks.

The decision process for which the decision aid was provided concerned the

acquisition of a GIS for X (the company). The actors involved at this level

are the companys IS manager, acquisition (AQ) manager, the RDA, differ-

ent suppliers of GIS software, some of the companys external consultants

concerned with software engineering.

A first decision aiding process was established where the client was the IS

manager and the analyst was the GIS department of the RDA.

208 CHAPTER 9. SUPPORTING DECISIONS

A second decision aiding process was established where the client was the

GIS department of the RDA and the analyst was the MCDA/SE unit. A

third actor involved in this process was the supervisor of the analyst in

the sense of someone supporting the analyst in different tasks, providing him

with expert methodological knowledge and framing his activity.

We will focus our attention on this second decision aiding process where four

actors are involved: the IS manager, the GISD (or team of analysts) as the client

(bear in mind their particular position of clients and analysts at the same time),

the MCDA/SE unit as the analyst and the supervisor.

The first advice by the analyst to the GISD was to negotiate a more specific

commitment such that their task could be more precise and better defined with

their client. After such a negotiation the GISDs activity has been defined as

technical assistance to the IS manager in a bid, concerning the acquisition of a

GIS for the company and its specific task was to provide a technical evaluation

of the offers that were expected to be submitted. For this purpose the GISD drafted

a decision aiding process outline where the principal activities to be performed were

specified, as well as the timing, and submitted this draft to its client (see figure

9.1). At this point it is important to note the following.

1. The call for tenders concerned the acquisition of hundreds of software li-

censes, plus the hardware platforms on which such software was expected to

run, the whole budget being several million e. From a financial point of view

it represented a large stake for the company and a high level of responsibility

for the decisionmakers.

2. From a procedural point of view the administration of a bid of this type is

delegated to a committee which in this case included the IS manager, the

AQ manager, a delegate of the CEO and a lawyer from the legal staff. From

such a perspective the task of the GISD (and of the decision aiding process)

was to provide the IS manager with a global technical evaluation of the

offers that could be used in the negotiations with the AQ manager (inside

of the committee) and the suppliers (outside of the committee).

3. As already noted before, the bid concerned software that was not ready made,

but a collection of existing modules of GIS software which was expected to

be used in order to create ad-hoc software for the specific necessities of the

company. Two difficulties arose from this:

without being able to test it on specific company-related cases;

the timing of the evaluation (including testing the offers) could be ex-

tremely long compared with the rapidity of the technological evolution

of this type of software.

9.2. THE DECISION PROCESS 209

Bid Start

of call for environment study

tenders study

technical advisor

Call for tenders Definition of

answer client

requirements,

preparation points of view &

advisor + client

First set of answers decision problem

from suppliers supplier

First

Selection Problem

Formulation

Invitation

letter

Completion of

Tender

decision model Definition of

preparation

for second prototype

requirements Lab preparation

selection

for prototype

Second set of evaluation

answers from

suppliers Second selection

Requirements model for ranking:

definition of criteria &

Prototype aggregation procedure

Development

Prototypes

Prototype analysis;

sorting & final ranking

Final Choice

210 CHAPTER 9. SUPPORTING DECISIONS

Once the call for tenders had been prepared (including the software require-

ments sections, the tenderers requirements section, the timing and evaluation pro-

cedure), a set of was presented to the company and the technical evaluation activity

was settled. It is interesting to notice that the GISD staff charged with this evalu-

ation has been supported by external consultants, software engineering experts

in the companys sector who practically acted as the IS managers delegates in the

group. It is this extended group that signed the final recommendation presented

to the IS manager and that we will hereafter call team of analysts (for the IS

manager) or client (for the MCDA/SE unit and for us).

A second step in the decision aiding process was the generation of a problem

formulation and of an evaluation model. Although we formally consider the two

as two distinct products of the process, in reality and in this case specifically, they

have been generated contemporaneously. We will discuss the problem formulation

and the evaluation model in detail in the next section, but we can anticipate that

the final formulation consisted in an absolute evaluation of the offers under a set

of points of view that could be divided into two parts: the quality evaluation

and the performance evaluation. Although the set of alternatives was relatively

small (only six alternatives were considered), the set of attributes was extremely

complex (as often happens in software evaluation). Actually there were seven basic

evaluation dimensions, expanded in an hierarchy with 134 leaves resulting in 183

evaluation nodes (see Appendix B).

A third and final step in the decision aiding process was the elaboration of the

final recommendation after all the necessary information for the evaluation had

been obtained and the evaluation performed. We will discuss such constructions

in detail in the next sections, but we can anticipate that such an elaboration

highlighted some questions (substantial and methodological) that have not been

considered before.

Some months after the end of the process and the delivery of the final report

we asked our client (the team of analysts) to discuss their experience with us and

to answer some questions concerning the methodology used, how they perceived it,

what they learned and what their appreciation was. The discussion was conducted

in a very informal way, but the client provided us with some written remarks that

were also reported during a conference presentation (see Fiammengo, Buosi, Iob,

Maffioli, Panarotto and Turino 1997). Such remarks are introduced in the following

section.

We present the three products of the decision aiding process here: the problem

formulation, the evaluation model and the final recommendation. We should re-

member that the problem formulation and a first outline of the evaluation model

were established while the call for tenders was under elaboration for two reasons:

for legal reasons, an outline of the evaluation model has to be included in

9.3. DECISION SUPPORT 211

offers which in turn defines the information to be provided by the tenderers.

For instance, the call for tenders specified that a prototype was requested in

order to test some performances. The tenderers therefore knew that they had

to produce a prototype within a certain time frame. The choice to introduce

some tests was made during the definition of the evaluation model.

From the presentation of the process we can make the following observations:

1. It was extremely important for the client (the team of analysts) to under-

stand his role in the process, what his client expected and what they were

able to provide. In fact, at the beginning of the process, the problem situ-

ation was absolutely unclear. Moreover, the client considered to be able to

understand that the expectations of the other actors involved in the process

were extremely relevant both for strategic reasons (having to do with or-

ganisational problems of the company) and operational reasons (recommend

something reliable in a clear and sound way for all the actors involved in the

bid).

Reporting the clients remarks: ....MCDA (Multi Criteria Decision Analy-

sis) was very useful in organising the overall process and structure of the bid:

what were the important steps to do, how to define the call for tenders,....,

....MCDA was used as a background for the whole decision process. With

such a perspective it turned out to be very useful because every activity had a

justification...., ....as a formal process MCDA guaranteed greater control

and transparency to the process...., A complex process, such as a bid, could

be greatly eased by the use of any process centred methodology.

It is this last sentence which clearly highlights the necessity for the client to

have a support along the whole process and for all its aspects, which could

be able to take what was happening in the decision process into account. We

actually agree with their comment that any process modelled methodology

could be useful and we consider that their positive perception of MCDA is

based on the fact that it was the first decision support approach process they

came to know.

greater control and transparency..... Complex decision processes are based

on human interactions and these are based on the intrinsic ambiguity of

human communication (thanks to ambiguity human communication is also

very efficient). However, such an ambiguity might result in an impossibility

to understand and ultimately to propose viable solutions. Moreover, when

significant stakes are considered (as in our case), decisionmakers may con-

sider it dangerous to make a decision without having a clear idea of the

212 CHAPTER 9. SUPPORTING DECISIONS

consequences of their acts. The use of a formal approach enables the reduc-

tion of ambiguity (without completely eliminating it) and thus appears to

be an important support to the decision process.

It is clear that defining a precise problem formulation became a key issue for

the client because it clarified his role in the decision process (the bid management),

his relation with the IS manager (his client) and gave him a precise activity to

perform.

We define (Morisio and Tsoukias 1997) a problem formulation as the collection

of: a set of actions, a set of points of view and a problem statement. The only point

that caused a discussion in the analysts team concerning the problem formulation

was the problem statement. The set of alternatives was considered to be the set of

offers submitted after the call for tenders. A first idea to evaluate the tenderers,

as well as the offers, was eliminated due to the particular technology where no

consolidated producers exist. The set of points of view was defined using the

team of analysts technical knowledge and can be viewed in two basic sets. One

concerning quality including specific technical features required for the software

plus some ISO/IEC 9126 (1991) based dimensions and the second concerning the

performance of the offered software to be tested on prototypes. Such points of

view formed a huge hierarchy (see further on for details). No cost estimates were

required by the client and so they were not considered in this set.

After some discussion the problem statement adopted was the one of an ab-

solute evaluation of the offers both on a disaggregated level and on a global one.

Actually, the team of analysts interpreted the clients demand as a question of

whether the offers could be considered as intrinsically good, bad etc. and not

to compare bids amongst themselves. There were two reasons for this choice.

1. A simple ranking of the offers could conceal the fact that all of them could

be of very poor quality or satisfy the software requirements to a very low

level. In other words it could happen that the best bid could be bad and

this was incompatible with the importance and cost of the acquisition.

2. The team of analysts felt uncomfortable with the idea of comparing the

merits (or de-merits) of an offer with merits (or de-merits) of another offer.

A first informal discussion of the problem of compensation convinced them

to overcome this question by comparing the offers to profiles about which

they had sufficient knowledge.

offers to pre-established profiles can be viewed as a measurement procedure) the

result that the team of analysts was looking for appeared to be the conclusion

of repeated aggregations of measures. Using the terminology introduced by Roy

(1996), the problem statement appeared to be an hierarchically organised sorting

of the offers, the sorting being repeated at all levels of the hierarchy.

As far as the problem formulation is concerned, an ex-post remark made by the

team of analysts concerned the length of the evaluation process. They considered

that such a process was so long that the information available at the beginning and

9.3. DECISION SUPPORT 213

the formulation itself could no longer be valid at the end of the process. This was

partly due to the very rapid evolution of GIS technology that could completely

innovate the state of the art in six months. Another observation made by part of

the team of analysts was that towards the end of the process, due to the knowledge

acquired in this period (mainly due to the process itself), they could revise some

of their judgements. Actually, the length of the evaluation was considered as a

negative critical issue in the clients remarks.

The final report did not consider any revision of the formulation and the eval-

uations since in the context of a call for tenders, it could be considered unfair to

modify the evaluations just before the final recommendation.

We consider that this is a critical issue for decision support and decision aiding

processes. Information is valid only for a limited period of time and consequently

the same is true for all evaluations based on such information. Moreover the

client himself may revise the problem formulation or update his perception of the

information and modify his judgements. This is rarely considered in decision aiding

methodologies. While for relatively short decision aiding processes the problem

may be irrelevant, it is certain that in long processes such a problem cannot be

neglected and requires specific consideration.

The different components of the evaluation model were specified in an iterative

fashion. In the following we present their definition as they occurred in the decision

aiding process. We may notice that despite the fact that we had a large amount

of information to handle in our model, the case did not present any exogenous

uncertainty since the client considered the basic data and its judgements reliable

and felt confident with them.

The set of alternatives was identified as the set of offers legally accepted by the

company in reply to the call for tenders. No preliminary screening of the offers

was expected to be made. Although each offer was composed of different modules

and software components, they have been considered as wholes.

The set of evaluation dimensions was a complex hierarchy with seven root

nodes, 134 leaves and 183 nodes in total (the complete list is available in Appendix

B). This is a typical situation in software evaluation (see Morisio and Tsoukias

1997, Blin and Tsoukias 1998, Stamelos and Tsoukias 1998). The key idea was that

each node of the hierarchy was an evaluation model itself for which the evaluation

dimensions to aggregate and the aggregation procedure had to be defined. Each

node was subject to extensive discussion before arriving at a final version. Basically

two issues have been considered in such discussions:

- the choice of the attributes to use;

- the semantics of each attribute.

Regarding the first issue, a frequent attitude of technical committees charged

with evaluating complex objects (as in our case) is to define an excellence list

where every possible aspect of the object is considered. Such a list is generally

provided by the literature, the experience, international standards etc.. The result

is that such a list is an abstract collection of attributes, independent from the spe-

214 CHAPTER 9. SUPPORTING DECISIONS

which can invalidate the evaluation. Our client was aware of the problem, but had

no knowledge and no tools to enable him to simplify and reduce the first version

of the list they had defined. The repeated use of a coherence test (in the sense

of Roy and Bouyssou 1993) for each intermediate node of the hierarchy made it

possible to eliminate a significant number of redundant and dependent attributes

(more than 30%) and to better understand the semantics of each attribute used.

Verifying the separability of each subdimension with respect to the parent node

was very helpful, in the sense that each subnode should be able to discriminate

alone the offers with respect to the evaluation considered at the parent level.

Despite this work, the client wrote, in his ex-post considerations: ....it was

not necessary to be so detailed in the evaluation; the whole process could be faster

because we needed the software for a due date; it could be preferable to use a limited

number of criteria..... On the other hand it is also true that it is only after the

process that the client was able to determine which were the really significant

criteria that discriminated among the alternatives.

With respect to the second issue we pushed the client to provide us with a short

description of each attribute and when a preference model was associated to it, a

short description of the model (why a certain value was considered as better than

another). Such an approach helped the client both to eliminate redundancies (be-

fore using the coherence test which is time consuming) and in better understanding

the contents of the evaluation model.

For instance, at a certain point in the hierarchy definition process, there was

a discussion about some attributes that could also be considered as leaves at the

top level of the hierarchy. These were the so called process attributes, i.e. they

were intended to evaluate special functionality inside different processes (in this

context process means a chunk of functionality aiming towards supporting a

stream of activities of a software). In fact, one can consider a process attribute (at

the final level) and then subdivide it in quality aspects, or alternatively consider

single independent quality aspects whose evaluation depends on how the process

attribute is considered. The final choice was to put process attributes at the top

level because directly emanating from the evaluation scope.

Such an activity also helped the client to realise that they needed an absolute

evaluation of the alternatives for almost all the intermediate nodes of the hierarchy

thus implicitly defining the problem statement of the model.

The basic information available was of the subjective ordinal measurement

type. With this term we want to indicate that each alternative could be described

by a vector of the 134 elementary pieces of information that were in the large

majority either subjective evaluations by experts (mostly part of the team of an-

alysts, the client) of the good, acceptable etc. type or descriptions of the

operating system X, compatible with graphic engine Y etc. type. The latter

were expressed on nominal scales, while the former were expressed on ordinal

scales. It was almost impossible that the experts could be able to give more in-

formation than such an order and it was exactly this type of information that

pushed the client to look for another evaluation model than the usual weighted

sum widely diffused in software evaluation manuals and standards (see ISO/IEC

9.3. DECISION SUPPORT 215

Obtaining the information was not a difficult task, but a time consuming pro-

cess that required the establishment of an ad-hoc procedure during the process

(see figure 9.1). We consider that this is also a critical issue in a decision aiding

process. Gathering and obtaining the relevant information for an evaluation model

is often considered as a second level activity and therefore neglected from further

specific considerations. But such a problem can invalidate the problem formulation

adopted. Moreover, the information used in an evaluation model results from the

manipulation of the rough information available at the beginning of the process.

We can consider that the information is constructed during the decision aiding

process and cannot be viewed as a simple input.

Before continuing the definition of the model associated to each node the prob-

lem of the aggregation procedure was faced since it could influence the construction

of such models. An important discussion with the client concerned the distinction

between measures and preferences.

As already reported, the basic information consisted either in observations con-

cerning the offers (expressed in nominal scales) or in expert judgements (expressed

in ordinal scales of value of the good, acceptable etc.. type). All the interme-

diate nodes were expected to provide information of the second type. Clearly all

nominal scales had to be transformed into ordinal ones, associating a preference

model on the elements of the nominal scale of the attribute. Under such a perspec-

tive it was important for the client to understand on what they were expressing

their preferences on.

Actually, the client did not compare the alternatives amongst themselves, but

to a-priori defined (by the client) standards of good, acceptable etc.. When

asked to formulate preferences they concerned the elements of the nominal scales

and not the alternatives themselves. The preference among the alternatives was ex-

pected to be induced once the alternatives could be measured by the attributes.

From a certain point of view we can claim that, except for the final aggregation

level, the client needed to aggregate ordinal measures and not preferences (in the

sense that they had to aggregate the ordinal measures obtained when comparing

the alternatives to the standards and not to compare the alternatives amongst

themselves). Such an observation greatly helped the client to understand the

nature and scope of the evaluation model and ultimately to define the problem

statement of the model. Moreover, the discussion on the different typologies of

measurement scales helped the client to understand the problem of choosing an

appropriate aggregation procedure.

In our case, the presence of ordinal information for almost all leaves and the

problem statement that required a repeated sorting of the offers, oriented the

team of analysts to choose an aggregation procedure based on the ELECTRE-TRI

method (see Yu 1992). See also appendix A for a presentation of the procedure.

At this point the team was ready to define their specific evaluation models for all

nodes. In particular we had the following cases.

1. For all leave nodes an ordinal scale was established. The available technical

knowledge consisted in different possible states in which an offer could find

216 CHAPTER 9. SUPPORTING DECISIONS

itself. For instance, consider the leave nodes 1.1.1 (type of presentation on

the user interface in the land-base management), 1.1.2 (graphic engine of

the user interface in the land-base management), 1.1.3 (customisation of the

user interface in the land-base management). The possible states on these

characteristics were:

1.1.1: standard graphics (SG), non standard graphics (NSG);

1.1.2: station M (M; graphic engine already adopted in other software used

in the company), other acceptable graphic engine (OA), other non accept-

able graphic engine (ON);

1.1.3: availability of a graphic tool (T), availability of an advanced graphic

language (E), availability of a standard programming language (S), no cus-

tomisation available (N). In this case different possible combinations were

possible (for instance a software could provide both an advanced graphic

language and a standard programming language: value E,S). The three or-

dinal scales associated to the three nodes were ( representing the scale

order):

1.1.1: SG NSG;

1.1.2: M OA ON;

1.1.3: T,E,S T,E T,S T E,S E S N.

2. For all parent nodes, a brief descriptive text of what the node was expected

to evaluate was provided. All parent nodes were equipped with the same

number of classes: unacceptable (U), acceptable (A), good (G), very good

(VI), excellent (E). Then, two possibilities for defining the relationship be-

tween the values on the subnodes and the values on the parent nodes were

established.

nodes was provided. For instance consider node 1.1 (user interface

of the land-base management) which has the three evaluation models

introduced in the previous example as subnodes. In this case we have

the following evaluation model:

- E: T,E,S;M;SG or T,E;M;SG or T,S;M;SG;

- VG: T;M;SG or T,E,S;OA;SG or T,E;OA;SG or T,S;OA;SG;

- G: T;OA;SG or E,S;M;SG or E;M;SG;

- A: all remaining cases except the unacceptable;

- U: all cases where 1.1.1 is NSG or 1.1.2 is ON or 1.1.3 is N.

2.2 When an exhaustive combination of the values was impossible, an ELECTRE-

TRI procedure was used. For this purpose, the following information

was requested:

- the relative importance of the different sub nodes;

- the concordance threshold for the establishment of the outranking re-

lation among the offers and the profiles;

- a veto condition on the sub node such that the value on the parent

node could be limited (possibly unacceptable).

9.3. DECISION SUPPORT 217

have been established using a reasoning on coalitions (for details see Chapter

6). In other words the team of analysts established the characteristics of the

subnodes for which an offer could be considered very good (therefore should

outrank the very good profile) and consequently compared the values of the

parameters of relative importance and of the concordance threshold. The

veto condition was established as the presence of the value unacceptable

at a subnode. The presence of a veto also produced an unacceptable

value at the level of the parent node. In other words, the team of analysts

considered any unacceptable value to be a severe technical limitation of the

offer. The reader may notice that this is a very strong interpretation of a veto

condition among the ones used in the outranking based sorting procedures,

but it was the one with which the team of analysts felt comfortable at the

time of construction of the evaluation model. The team of analysts also

established very high concordance thresholds (never less than 80%, very

often around 90%) that result in very severe evaluations. Such a choice

reflected the conviction, of at least a part of the team of analysts, that very

strong reasons were required to qualify an offer as very good. Since the whole

model was calibrated starting from the very good value, this conviction had

wider effects than the team of analysts could imagine. For example we can

take node 1 (land-base management) which has eight sub nodes:

1.1: User interface;

1.2: Functionality;

1.3: Development environment;

1.4: Administration tools;

1.5: Work flow connection;

1.6: Interoperability;

1.7: Integration between land-base products and the Spatial Data manager;

1.8: Integration among land-base products;

The relative importance parameters were established as follows:w(1.1) =

4, w(1.2) = 8, w(1.3) = 5, w(1.4) = 4, w(1.5) = 1, w(1.6) = 8, w(1.7) =

8, w(1.8) = 2 and the concordance threshold was fixed as 29/36 (around

0.8). Such choices imply that no coalition that excluded nodes 1.2 or 1.7

was acceptable and that the smallest acceptable coalition should necessarily

include the nodes 1.2, 1.7, 1.3 and any two of the nodes 1.1, 1.4 and 1.6.

The analyst and the supervisor explained this aspect to the client who on

this basis, revised the importance parameters several times.

3. As already mentioned, the set of dimensions was built around two basic

points of view: the quality and the performances. The first generated

six evaluation dimensions, which will be called the quality attributes or

quality criteria or quality part of the hierarchy hereafter, corresponding

to six (among seven) of the root nodes of the model. The seventh root

node (node 7, subnodes 7.1, 7.2, 7.3, 7.4) concerned the evaluation of the

performances of the prototypes submitted to tests by the team of analysts.

Such performances are basically measured in the time necessary to execute

a set of specific tasks under certain conditions and with some external fixed

218 CHAPTER 9. SUPPORTING DECISIONS

parameters. For instance, consider node 7.3 (performance under load). The

dimension is expected to evaluate the performance of the prototype while

the quantity of data that have to be elaborated increases. The value v(x)

(x being an offer) combines an observed measure Wx (t) and an interpolated

one Tx (t) (t representing the data load; the interpolation is not necessarily

linear). The combination is obtained, in this case, through the following

formula:

Z

v(x) = Wx (t)Tx (t)dt

In this case there are no external profiles with which to compare the perfor-

mances because the prototypes are created ad-hoc, the technology is quite

new and there are no standards of what a very good performance could be.

An ordinal scale was created considering the best performances as first,

all performances presenting a difference of more than 5% and less than 20%

second, all performances presenting a difference of more than 20% and less

than 25% third, all performances presenting a difference of more than 25%

and less than 50% fourth and all performances presenting a difference of

more than 50% fifth. The same model was applied to all subnodes of

node 7. A sorting procedure could then be established to obtain the final

evaluation.

This process was repeated for all the intermediate nodes up to the seven root

nodes representing the seven basic evaluation dimensions. It took four to five

months for all the nodes to be equipped with their evaluation model and the

process generated several discussions inside the team of analysts, mainly of a

technical nature (concerning the specific contents of the values for each node).

The most discussed concept of the model was the concordance threshold and the

veto condition since part of the team considered that the required levels were

extremely severe. However, since such an approach corresponded to a cautious

attitude, it prevailed in the team and finally was accepted. The length of the

process is justified, not only by the quantity of nodes to define, but also because

the team of analysts was obliged to define a new measurement scale and a precise

measurement aggregation procedure for each node. Although this process can

be often qualified as subjective measurement, it was the only way to obtain

meaningful values for the offers. The set of criteria to be used, if a preference

aggregation comparing the alternatives amongst themselves was requested, was

defined as the seven root nodes equipped with a simple preference model: the

weak order induced by the ordinal scale associated to each of these nodes.

No exogenous uncertainty was considered in the evaluation model. The in-

formation provided by the tenderers concerning their offers was considered to be

reliable and the use of ordinal scales made it possible to avoid the problems of im-

precision or of measurement errors. This reasoning however, is less true for node 7

and its subnodes, but the team of analysts felt sufficiently confident with the tests

and did not analyse the problem further. Some endogenous uncertainty appeared

as soon as the model was put into practice (the offers being available). We shall

9.3. DECISION SUPPORT 219

discuss this problem in more detail in the next section (concerning the elaboration

of the final recommendation), but we can anticipate that the problem was created

by the double evaluation provided by the chosen ELECTRE-TRI type aggrega-

tion consisting in an optimistic and a pessimistic evaluation which may not

necessarily coincide.

The evaluation model was coded in a formal document that was submitted

(and explained) to the final client receiving his consensus. It is worthwhile to

note that the final client was not able to participate in the elaboration of the

model (technical details, establishment of the parameters etc.). Part of the team

of analysts (some of the external consultants) were acting as his delegates. The

establishment of the evaluation model and its acceptance by the client opened the

way for its application on the set of offers received and for the elaboration of the

final recommendation.

The client greatly appreciated his involvement in the establishment of the eval-

uation model that turned out to be a product considered to be their own (from

their ex-post remarks: ....this (the involvement) turned out to be important....for

the acceptability of the evaluation results). The fact that each node of the hier-

archy was discussed, analysed and finally defined by the team of analysts allowed

them to understand the consequences for the global level, to be able to explain the

contents of the model to their client and justify the final result on the grounds of

their own knowledge and experience, not of the procedure adopted.

In other words we can claim that the model was validated during its construc-

tion. Such an approach helped both the acceptability of the model and the final

result, eased the discussion when the question of the final aggregation was settled

and definitely legitimated the model in the eyes of the client.

The evaluation of the six offers, which effectively had been submitted after the call

for tenders was elaborated, was carried out in two main steps. The first consisting

in evaluating the six quality attributes and the second consisting in testing the

prototypes provided by the tenderers.

The method adopted to aggregate the information and construct the final eval-

uations was a variant of the ELECTRE TRI procedure (see Yu 1992). The reader

can also see Appendix A and refer to Chapter 6 for more details. We have the

following remarks on the use of such a method.

1. The key parameters used in the method are the profiles (to which the al-

ternatives are compared in order to be classified in a specific class), the

importance of each criterion for each parent criterion classification and the

concepts of concordance thresholds and veto conditions.

For each intermediate node such parameters were extensively discussed be-

fore reaching a precise numerical representation. As already mentioned in

section 3.2 the relative importance of each criterion and the concordance

threshold were established using a reasoning based on the identification of

the winning coalitions enabling the outranking relation to hold. The veto

220 CHAPTER 9. SUPPORTING DECISIONS

use, then, as an eliminatory threshold, but the client soon realised its impor-

tance mainly when it was necessary to have an incomparability instead of an

indifference that was a counterintuitive situation when very different objects

were compared. Further on and as soon as the veto conditions were under-

stood by the client, they decided to introduce a similar concept each times

they wanted to distinguish between positive reasons (for the establishment

of the outranking relation) and negative reasons (against the establishment

of the outranking relation), since they are not necessarily complementary

and must be evaluated in a separate and independent way.

The profiles were established using the knowledge of the team of analysts

(experts in their domain) that were able to identify the minimal requirements

to qualify an object in a certain class. It is interesting to notice that for the

client, the intuitive idea of a profile was that of a typical object of a class

and not of the lower bound of the class. The shift from the intuitive idea to

the one used in the case study was immediate and presented no problems.

The fact remains, that the distinction between the two concepts of profile

is crucial, while the lower bound approach appears to be less intuitive than

the typical element one.

2. The whole method (and the model) was implemented on a spreadsheet. This

was of great importance because spreadsheets are a basic tool for communi-

cation and work in all companies and enable an immediate understanding of

the results. Moreover, they enable on-line what-if operations when specific

problems, concerning precise information and/or evaluation, appeared dur-

ing the discussions inside the team of analysts. The experimental validation

of the model was greatly eased by the use of the spreadsheet.

Further on it helped the acceptability and legitimation of the model through

the idea that if it can be implemented on a spreadsheet it is sufficiently

simple and easy to be used by our company. In fact some of the critiques by

the client about the approach adopted in this case were that ....MCDA is

not yet a universally known method...., ....seems less intuitive than other

well known techniques such as the weighted sum..., ....it is time consuming

to apply a new methodology...., all these problems limiting the acceptability

of the methodology towards the clients client (the IS manager) and the

company more generally. Being able to implement the method and the model

on a spreadsheet was, for them, a proof that, although new, complex and

apparently less intuitive, the method was simple and easy and therefore

legitimately used in the decision process.

A specific problem which was raised in the first step was the generation of un-

certainty due to the aggregation procedure. The ELECTRE-TRI type procedure

adopted produces an interval evaluation consisting in a lower value (the pessimistic

evaluation) and an upper value (the optimistic evaluation). When an alternative

has a profile on the subnodes that is very different from the profiles of the classes

on the parent node then, due to the incomparabilities that occur when comparing

9.3. DECISION SUPPORT 221

O1 O2 O3 O4 O5 O6

C1 A-A G-G A-VG A-G G-VG A-A

C2 A-A G-VG A-VG A-VG G-G A-G

C3 A-A G-G A-VG G-G A-A A-A

C4 A-G G-VG A-VG G-VG A-VG A-G

C5 U-U G-VG G-G A-G G-VG U-U

C6 A-A VG-VG E-E VG-VG G-G VG-VG

Table 9.1: the values of the alternatives on the six quality criteria (U: unacceptable,

A: acceptable, G: good, VG: very good, E: excellent)

the alternative to the profiles, it may happen that the two values do not coin-

cide (see more details in Appendix A). When the user of the model is not able to

choose one of the two evaluations in an hierarchical aggregation can be a problem

since at the next aggregation the subnodes may have evaluations expressed on

an interval. This is a typical case of endogenous uncertainty created by a method

itself and not by the available information. The client was keen to consider the

pessimistic and optimistic evaluation as bounds of the real value, but there was

no uncertainty distribution on the interval. For this purpose, the following pro-

cedure was adopted. Two distinct aggregations were made, one where the lower

values were used and the other where the upper values were used. Each of these,

in turn, may produce a lower value and an upper value. At the next aggregation

step, the lowest of the two lower values and the highest of the two upper values is

used. This is a cautious attitude and has the drawback of widening the intervals

as the aggregation goes up the hierarchy. However, this effect did not occur here

and the final result for the six dimensions is represented in table 9.1 (from here

on we will represent the criteria by Ci and the alternatives by Oi).

theoretical problem that deserves future consideration (very little literature on the

subject is available to our knowledge: (see Roubens and Vincke 1985, Vincke 1988,

Pirlot and Vincke 1997, Tsoukias and Vincke 1999).

Another modification introduced in the aggregation procedure concerned the

use of the veto concept. As already mentioned, a strong veto concept was used

in the evaluation model such that the presence of an unacceptable value on any

node (among the ones endowed with such veto power) could result in a global

unacceptable value. However, during the evaluation of the offers, weaker con-

cepts of veto appeared necessary. The idea was that certain values could have a

limitation effect of the type: if an offer has the value x on a subnode then it

cannot be more than y on the parent node.

The results on node 7 concerning the performances of the prototypes are pre-

222 CHAPTER 9. SUPPORTING DECISIONS

O1 O2 O3 O4 O5 O6

C7 A-A G-G G-G A-A E-E A-A

Table 9.2: the values of the alternatives on the performance criterion (U: unac-

ceptable, A: acceptable, G: good, VG: very good, E: excellent)

sented in table 9.2. Remember that such a result is an ordinal scale obtained by

aggregating the four scales defined as explained in the previous section. Therefore,

it could be considered more as a ranking than as an absolute evaluation. For this

reason the team of analysts decided to use such an attribute only to rank the

different offers after their sorting obtained by using the six quality attributes. For

this purpose the team of analysts tested three different aggregation scenarios cor-

responding to three different hypotheses about the importance of the performance

attribute.

set of six quality attributes. This scenario represents the idea that the tests

on the software performances correspond to the only real or objective

measurement of the offers and it should therefore be viewed as a validation of

the result obtained through the subjective measurement carried out on the

six quality attributes. The aggregation procedure consisted in using the six

quality attributes as criteria equipped with a weak order from which to obtain

a final ranking. Since the evaluations for some of the six attributes were in

the form of an interval, an extended ordinal scale was defined in order to in-

duce the weak order: E V G G V G G A V G A G A U .

The importance parameters are w(1.) = 2, w(2.) = 2, w(3.) = 4, w(4.) =

1, w(5.) = 4, w(6.) = 2 and the concordance threshold 12/15 (0.8). The six

orders are the following (x,y standing for indifference between x and y):

- O5 O2 O3 O4 O1, O6;

- O2 O5 O3 O4 O6 O1;

- O2 O4 O3 O5, O1, O6;

- O2, O4 O3, O5 O1, O6;

- O2, O5 O3, O4 O1, O6;

- O3 O2 O6, O4 O5 O1.

The final result is presented in table 9.3. In order to rank the alternatives

a score is computed for each of them. It is the difference of the number

of alternatives preferred to this specific alternative and the number of alter-

natives to which this specific alternative is preferred. Then, the alternatives

are ranked by decreasing magnitude of this score. The final ranking thus

obtained is given in figure 9.2 2a (it is worthwhile noting that the indiffer-

ence obtained in the final ranking corresponds to incomparabilities obtained

in the aggregation step). An intersection was therefore operated with the

9.3. DECISION SUPPORT 223

O2 O5

@

@

@

R

O2 O3

?

O3,O4,O5 O4

?

? ?

O6 O6

? ?

O1 O1

2a 2b

Figure 9.2: 2a: the final ranking using the six quality criteria. 2b: the final ranking

as intersection of the six quality criteria and the performance criterion

9.2 2b.

2. The performance attribute is considered to be of secondary importance, to

be used in order to distinguish among the alternatives assigned in the same

class using the six quality attributes. In other words, the principal evalua-

tion was to be considered as the one using the six quality attributes and the

performance evaluation was only a supplement enabling an eventual further

distinction. Such an approach resulted in a low confidence evaluation being

awarded to the performance and the undesirability of assigning it high im-

portance. A lexicographic aggregation has been therefore applied using the

six quality criteria as in the previous scenario and applying the performance

criterion to the equivalence classes of the global ranking. The final ranking

is O2 O5 O3 O4 O6 O1.

3. A third approach consisted in considering the seven attributes as seven cri-

teria to be aggregated to obtain a final ranking assigning them a reasoned

importance parameter. The idea was that while the client could be inter-

ested in having the absolute evaluation of the offers (result obtainable only

using the six quality attributes) he could also be interested in a ranking of

the alternatives that could help him in the final choice. From this point of

224 CHAPTER 9. SUPPORTING DECISIONS

O1 O2 O3 O4 O5 O6

O1 1 0 0 0 0 0

O2 1 1 1 1 1 1

O3 1 0 1 0 0 1

O4 1 0 0 1 0 1

O5 1 0 0 0 1 1

O6 1 0 0 0 0 1

Table 9.3: the outranking relation aggregating the six quality criteria

O1 O2 O3 O4 O5 O6

O1 1 0 0 0 0 0

O2 1 1 1 1 0 1

O3 1 0 1 0 0 1

O4 1 0 0 1 0 1

O5 1 0 0 0 1 1

O6 1 0 0 0 0 1

view the absolute evaluations on of the six quality attributes were trans-

formed into rankings as in the first scenario adding the seventh attribute as

a seventh criterion. The seven weak orders are the following:

- O5 O2 O3 O4 O1, O6;

- O2 O5 O3 O4 O6 O1;

- O2 O4 O3 O5, O1, O6;

- O2, O4 O3, O5 O1, O6;

- O2, O5 O3, O4 O1, O6;

- O3 O2 O6, O4 O5 O1.

- O5 O2, O3 O4, O6, O1.

The importance parameters are w(1.) = 2, w(2.) = 2, w(3.) = 4, w(4.) =

1, w(5.) = 4, w(6.) = 2, w(7.) = 4 and the concordance threshold 16/19

(more than 0.8). The final result is reported in table 9.4.

O3, O4 O6 O1.

Finally and after some discussions with the client, the third scenario was

adopted and used as the final result. The two basic reasons were:

- while it was meaningful to interpret the ordinal measures for the six quality at-

tributes as weak orders representing the clients preferences, it was not meaningful

to translate the weak order obtained for the performance attribute as an ordinal

measurement of the offers;

9.3. DECISION SUPPORT 225

- the first and second scenarios implicitly adopted two extreme positions concern-

ing the importance of the performance attribute that correspond to two different

philosophies present in the team of analysts, but not to the clients perception of

the problem. The importance parameters and the concordance threshold adopted

in the final version made it possible to define a compromise of these two extreme

positions expressed during the decision aiding process.

In fact the performance criterion is associated with an importance parameter

of 4 which combined with the concordance threshold of 16/19 implies that it is

impossible for an alternative to outrank another if its value on the performance

criterion is worse (and this satisfied the part of the team of analysts that considered

the performance criterion as a critical evaluation of the offers). Giving a regular

importance parameter to the performance criterion avoided the extreme situation

in which all other evaluations could become irrelevant. The final ranking obtained

respects this idea and the outranking table could be understood by all the members

of the team of analysts. As already reported, the client considered the approach

to be useful because every activity was justified. A major concern for people

involved in complex decision processes is to be able to justify their behaviour,

recommendations and decisions towards a director, a superior in the hierarchy of

the company, an inspector, a committee etc.. Such a justification applies both to

how a specific result was obtained and to how the whole evaluation was conducted.

In this case, for instance, the choice of the final aggregation was justified by

a specific attitude towards the two basic evaluation points of view: the quality

information and the performance of the prototypes. It was extremely important

for the client to be able to summarise the correspondence between an aggregation

procedure and an operational attitude because it enabled them to better argue

against the possible objections of their client.

A final question that arose during the elaboration of the final recommendation

was elaborated was whether it would be possible to provide a numerical represen-

tation of the values obtained by the offers and of the final ranking. It was soon

clear that the question originated from the will of the final client to be able to

negotiate with the AQ manager on a monetary basis since it was expected that he

would introduce the cost dimension into the final decision.

For this purpose an appendix was included in the final recommendation where

the following was emphasised:

- it is possible to give a numerical representation to both the ordinal measurement

obtained using the six quality attributes and to the final ranking obtained using

the seven criteria, but is was meaningless to use such a numerical representation

in order to establish implicit or explicit trade-offs with a cost criterion;

- it is possible to compare the result with a cost criterion following two possible

approaches:

1.) either induce an ordinal scale from the cost criterion and then, using an

ordinal aggregation procedure construct a final choice (then the negotiation should

concentrate on defining the importance parameters, the thresholds etc.);

2.) or establish a value function of the client using one of the usual protocols

available in literature (see also in Chapter 6) to obtain the trade-offs between the

226 CHAPTER 9. SUPPORTING DECISIONS

quality evaluations, the performance evaluations and the cost criterion (then the

negotiations should concentrate on a value function);

- the team of analysts was also available to conduct this part of the decision aiding

process if the client desired it.

The final client was very satisfied with the final recommendation and was also

able to understand the reply about the numerical representation. He nevertheless

decided to conduct the negotiations with the AQ manager personally and so the

team of analysts terminated its task with the delivery of the final recommendation.

A final consideration can be the fact that it is sure that there was space (but

no time) to experiment with more variants and methods for the aggregation pro-

cedure and the construction of the final recommendation. Valued relations, valued

similarity relations, interval comparisons using extended preference structures, dy-

namic assignment of alternatives to classes and other innovative techniques were

considered too new by the client who already considered the use of an approach

different from the usual grid and weighted sum a revolution (compared with the

companys standards). In their view, the fact of being able to aggregate the ordinal

information available in a correct and meaningful way was more than satisfactory

as they report in their ex-post remarks: ....pointed out that it was not necessary

to always use ratio scales and weighted sums, as we thought before, but that it was

possible to use judgements and aggregate them.....

9.4 Conclusions

Concluding this chapter we may try to summarise the lessons learned in this real

experience of decision support.

The most important lesson perhaps concerns the process dimension of decision

support. What the client needed was continuous assistance and support during

the decision process (the management of the call for tenders) enabling them to

understand their role, the expected results, and the way to provide a useful con-

tribution. If the support was limited to answering the client demand on how to

define a global evaluation (based on the weighted sum of their notes on the prod-

ucts) we may have provided them with an excellent multi-attribute value model

that would have been of no interest for their problem. This is not against multi-

attribute value based methods, which in other decision aiding processes can be

extremely useful, but an emphasis on a process based decision aiding activity.

A careful analysis of the problem situation, a consensual problem formulation, a

correct definition of the evaluation model and an understandable and legitimated

final recommendation are the products that we have to provide in a decision aiding

process.

A second lesson learned concerns the ownership of the final recommendation.

By this we want to indicate the fact that the client will be much more confident in

the result and much more ready to apply it if he feels that he owns the result in the

sense that it is a product of his own convictions, values, computations, experience,

simulations and whatever else. Such ownership can be achieved if the client not

only participates in elaborating the parameters of the evaluation model, but actu-

9.4. CONCLUSIONS 227

ally build the model with the help of the analyst (which has been the case in our

experience). Although the specific case may be considered exceptional (due to the

specific dimension of the evaluation model and the double role of the client being

analyst for another client at the same time) we claim that is always possible to

include the client in the construction of the evaluation model in a way that allows

him to feel responsible and to own the final recommendation. Such ownership

greatly eases the legitimisation of the recommendation since it is not just the ad-

vice recommended by the experts who do not understand anything. It might be

interesting to notice that a customised implementation of the model on the tools

on which the client is accustomed (as in our case the company spreadsheet) greatly

improves the acceptance and legitimisation of the evaluation model.

A third lesson concerns the key issue of meaningfulness. The construction of

the evaluation model must obey two dimensions of meaningfulness. The first is

a theoretical and conceptual one and refers to the necessity to manipulate the

information in a sound and correct way. The second is a practical one and refers

to the necessity to manipulate the information in a way understandable by the

client and corresponding to his intuitions and concerns. It is possible that such

two dimensions may conflict. However, the evaluation model has to satisfy both

requirements, thus implying a process of adaptation guided by reciprocal learning

for the client and the analyst. The existence of clear and sound theoretical re-

sults for the use of specific preference modelling tools, preference and/or measure

aggregation procedures and other modelling tools definitely helps such a process.

A fourth lesson concerns the importance of the distinction between measures

and preferences. The first refer to observations made on the set of alternatives

either through objective or through subjective measures. The seconds refer

to the clients values, is always subjective and depends on the problem situation.

Moving from one to the other might be possible, but not obvious and has to be

carefully studied. Knowing that a software has n function points, while another

has m function points does not imply any particular preference between them. We

hope that the case study offered an introduction to this problem.

A fifth lesson concerns the definition of the aggregation procedure in the evalu-

ation model. The previous chapters of this book provide enough evidence that uni-

versal methods for aggregating preferences and/or measures do not exist. There-

fore, the aggregation procedures included in an evaluation model are choices that

have to be carefully studied and justified.

A sixth lesson is about uncertainty. Even when the available information is

considered reliable, uncertainty may appear (as in our case). Moreover, uncer-

tainty can appear in a very qualitative way and not necessarily in the form of an

uncertainty distribution. It is necessary to have a large variety of uncertainty rep-

resentation tools in order to include the relevant one in the evaluation model. Last,

but not least, we emphasise the significant number of open theoretical problems

the case study highlights (interval evaluation, ordinal measurement, hesitation

modelling, hierarchical measurement, ordinal value theory etc.).

228 CHAPTER 9. SUPPORTING DECISIONS

Appendix A

The basic concepts adopted in the procedure used (based on ELECTRE TRI) are

the following.

A set A of alternatives ai , i = 1 m.

malised in the interval [0, 1]) is attributed to each criterion gj .

1 k.

heh1 ehn i, such that if ehj belongs to profile ph , eh+1

j cannot belong to profile

ph1 .

bound of category ch and the lower bound of category ch+1 .

as x is at least as good as y.

- x A Pj (x, ehj ) gj (x) ehj

- x A Pj (ehj , x) gj (x) ehj

- x A Ij (x, ehj ) gj (x) ehj

, induced by the ordinal scale associated to criterion gj .

where

X X X

x A, y P : C(x, y) wj c and ( wj wj )

jG jG+ jG

y A, x P : C(x, y)

X X X X X

( wj c and wj wj ) or ( wj > wj )

jG jG+ jG jG+ jG

9.4. CONCLUSIONS 229

X

wj d and gj not vj (x, y)

jG

where

- G+ = {gj G : Pj (x, y)}

- G = {gj G : Pj (y, x)}

- G= = {gj G : Ij (x, y)}

- G = G+ G=

- c: the concordance threshold c [0.5, 1]

- d: the discordance threshold d [0, 1]

- vj (x, y): veto, expressed on criterion gj , of y on x

2. When the relation S is established, assign any element ai on the basis of the

following rules.

- ai is iteratively compared with pt p1 ,

- as soon as s(ai , ph ) is established, assign ai to category ch .

2.2 optimistic assignment

- ai is iteratively compared with p1 pt ,

- as soon as is established s(ph , ai )s(ai , ph ) then assign ai to category

ch1 .

The pessimistic procedure finds the profile for which the element is not the

worst. The optimistic procedure finds the profile against which the element

is surely the worse. If the optimistic and pessimistic assignments coincide,

then no uncertainty exists for the assignment. Otherwise, an uncertainty

exists and should be considered by the user.

In order to better understand how the procedure works consider the following

example.

equipped with an ordinal scale A B C D.

gories: unacceptable (U), acceptable (A) and good (G) (p2 being the mini-

mum profile for category G, p1 being the minimum profile for category A).

Three alternatives:

a1 = hD, B, B, Bi, a2 = hB, C, A, Ai, a3 = hA, B, B, Ci.

230 CHAPTER 9. SUPPORTING DECISIONS

S = {(p2 , a1 ), (p2 , a2 ), (p2 , a3 ), (a2 , p1 ), (a3 , p1 )}. The reader can easily check that

the pessimistic assignment puts alternative a1 in category U and alternatives a2

and a3 in category A, while the optimistic assignment puts all three alternatives

in category A.

9.4. CONCLUSIONS 231

Appendix B

The complete list of the attributes used in the evaluation model

1 LAND-BASE MANAGEMENT

1.1.1 Graphics type

1.1.2 Graphics engine adequacy

1.1.3 Interface personalisation

1.2 Functionality

1.2.1 Availability

1.2.2 Adequacy

1.2.2.1 Planes analysis functions

1.2.2.2 Topological connectivity functions

1.2.2.3 Graphical rendering functions

1.3 Development environment

1.3.1 Libraries personalisation

1.3.2 Development support tools

1.3.3 Debugging support tools

1.3.4 Code documentation

1.3.4.1 Documentation support tools

1.3.4.2 Code browsing

1.3.5 Documentation Quality

1.3.5.1 Completeness

1.3.5.2 Documentation support type

1.3.5.3 Information retrieval ease

1.3.5.4 Contextual help

1.4 Administration tools

1.4.1 User administration functions

1.4.2 Software configuration management

1.4.3 Performance data collection

1.5 Work flow connection

1.6 Interoperability

1.7 Integration between Land-base products and the Spatial Data Manager

1.7.1 Vectorial data products integration

1.7.2 Descriptive data products integration

1.7.3 Raster data products integration

1.7.4 Digital Terrain Model products integration

1.8 Integration among Land-base products

1.8.1 Interfaces integration

1.8.2 Data sharing

232 CHAPTER 9. SUPPORTING DECISIONS

2 GEOMARKETING

2.1.1 Graphics type

2.1.2 Graphics engine adequacy

2.1.3 Interface personalisation

2.2 Functionality

2.2.1 Availability

2.2.2 Adequacy

2.2.2.1 Planes analysis functions

2.2.2.2 Graphical rendering functions

2.3 Development environment

2.3.1 Libraries personalisation

2.3.2 Development support tools

2.3.3 Debugging support tools

2.3.4 Code documentation

2.3.4.1 Documentation support tools

2.3.4.2 Code browsing

2.3.5 Documentation Quality

2.3.5.1 Completeness

2.3.5.2 Documentation support type

2.3.5.3 Information retrieval ease

2.3.5.4 Contextual help

2.4 Administration tools

2.4.1 Software configuration management

2.5 Interoperability

2.6 Integration between Geomarketing products and the Spatial Data Manager

2.6.1 Vectorial data products integration

2.6.2 Descriptive data products integration

2.6.3 Raster data products integration

2.7 Integration among Geomarketing products

2.7.1 Interfaces integration

2.7.2 Data sharing

3.1.1 Graphics type

3.1.2 Graphics engine adequacy

3.1.3 Interface personalisation

3.2 Functionality

9.4. CONCLUSIONS 233

3.2.1 Availability

3.2.2 Adequacy

3.2.2.1 Planes analysis functions

3.2.2.2 Topological connectivity functions

3.2.2.3 Graphical rendering functions

3.2.2.4 Network schema creation

3.3 Development environment

3.3.1 Libraries personalisation

3.3.2 Development support tools

3.3.3 Debugging support

3.3.4 Code documentation

3.3.4.1 Documentation support tools

3.3.4.2 Code browsing

3.3.5 Documentation Quality

3.3.5.1 Completeness

3.3.5.2 Documentation support type

3.3.5.3 Information retrieval ease

3.3.5.4 Contextual help

3.4 Administration tools

3.4.1 User administration functions

3.4.2 Software configuration management

3.4.3 Performance data collection

3.5 Work flow connection

3.6 Interoperability

3.7 Integration between this process products and the Spatial Data Manager

3.7.1 Vectorial data products integration

3.7.2 Descriptive data products integration

3.7.3 Raster data products integration

3.7.4 Digital Terrain Model products integration

3.8 Integration among this process products

3.8.1 Interfaces integration

3.8.2 Data sharing

4.1.1 Graphics type

4.1.2 Graphics engine adequacy

4.1.3 Interface personalisation

4.2 Functionality

4.2.1 Availability

234 CHAPTER 9. SUPPORTING DECISIONS

4.2.2 Adequacy

4.2.2.1 Planes analysis functions

4.2.2.2 Topological connectivity functions

4.2.2.3 Graphical rendering functions

4.2.2.4 Network schema creation

4.3 Development environment

4.3.1 Libraries personalisation

4.3.2 Development support tools

4.3.3 Debugging support

4.3.4 Code documentation

4.3.4.1 Documentation support tools

4.3.4.2 Code browsing

4.3.5 Documentation Quality

4.3.5.1 Completeness

4.3.5.2 Documentation support type

4.3.5.3 Information retrieval ease

4.3.5.4 Contextual help

4.4 Administration tools

4.4.1 Software configuration management

4.4.2 Performance data collection

4.5 Interoperability

4.6 Integration between this process products and the Spatial Data Manager

4.6.1 Vectorial data products integration

4.6.2 Descriptive data products integration

4.6.3 Raster data products integration

4.7 Integration among this process products

4.7.1 Interfaces integration

4.7.2 Data sharing

5.1.1 Fundamental properties

5.1.2 Transaction typology support

5.1.3 Data / Function association

5.1.4 Client data access libraries

5.2 Basic properties of the Spatial Data Manager

5.2.1 Data model

5.2.2 Data management

5.2.3 Data integration

5.2.4 Spatial operators

9.4. CONCLUSIONS 235

5.2.6 Vectorial data continuous management

5.3 Special properties of the Spatial Data Manager

5.3.1 Data sharing constraints

5.3.2 Feature versioning

5.3.3 Feature life-cycle management

5.3.4 Data distribution

5.4 Integration between the Spatial Data Manager and the Data Layer

5.4.1 Server data access libraries

5.4.1.1 Public libraries for feature manipulation

5.4.1.2 Structured Query Language to access descriptive data

5.4.2 Independence from features structure

5.4.3 Integration with Oracle

5.4.4 Integration with Unix and MVS relational databases

5.4.5 Integration with Oracle Designer 2000

5.4.6 Logical scheme import capability

5.4.7 Spatial Data Manager platform

5.5 Data administration tools

5.5.1 Database distribution

5.5.2 Database access control

5.5.3 Backup

6 SOFTWARE QUALITY

6.1 Robustness

6.2 Maturity

6.3 Easiness of installation and maintenance

7 PERFORMANCES

7.2 Data Manager under different operation typology

7.3 Data Manager under different concurrent transactions

7.4 Graphical interfaces performances

10

CONCLUSION

The aim of this book was to provide a critical introduction to a number of for-

mal decision and evaluation methods. By this, we mean a set of explicit and

well-defined rules to collect, assess and process information in order to make rec-

ommendations in decision and/or evaluation processes. Although these methods

may not be entirely formalised, their underlying logic should be explicit contrary

to, say, astrology or graphology. Such methods emanate from many different dis-

ciplines (Political Science, Education Science, Statistics, Economics Operational

Research, Computer Science, Decision Theory, Engineering, etc.) and are used to

support numerous kinds of decision or evaluation processes. It is not an overstate-

ment to say that nowadays nearly everyone is, implicitly or explicitly, confronted

with such methods.

We briefly summarise below the main methods presented in this book and the

difficulties that have been encountered.

Following a democratic election Mr. X has been elected

As citizens, we hopefully have to cast several kinds of votes. As mentioned in

chapter 2, elections are governed by rules that are very far from being innocuous.

Similar votes may well lead to very different results depending on the rules used

to process them. Such electoral rules contribute towards shaping the entire

political debate in a country and, thus, influence the type of democracy we live in.

Therefore, under a slightly different electoral system, Mr. X might not have been

elected.

Your child has a GPA of 9.54. Therefore we cannot allow him to continue

with this programme

Our early life at school was governed to a large extent by the grades we ob-

tained, the exams we passed or not. It is likely that the present professional life

of many readers is still governed by some type of formal evaluation method that

somehow uses grades (this is clearly the case for most academics). In chapter 3

we saw, that a grade, although being a very familiar concept, is in fact a com-

plex evaluation model. Not surprisingly, the aggregation of such evaluations is not

an obvious task. Therefore, the decision made concerning your child might well

237

238 CHAPTER 10. CONCLUSION

have been significantly different depending on the grading policy and/or correction

habits of some teachers, the fact that his exams were corrected late at night or on

the way his various grades were aggregated.

Things are going well since the well-being index in our country raised by

more than 10% over the last three years

Statisticians have elaborated an incredible number of indicators or indices aim-

ing at capturing many aspects of reality (including the quality of the air we breeze,

the richness of a country, its state of development, etc.) by using numbers. Not

only are our newspapers full of these kinds of figures but they are also routinely

used to make important political or economic decisions. In chapter 4, we saw that

such measures should not be confounded with the familiar measurement oper-

ations in Physics. The resulting numbers do not appear to be measured on some

well-defined type of scale. Their properties are sometimes intriguing and they

surely should be manipulated with care. Therefore, claiming that the well-being

index has increased by 10% gives, at best, a very crude indication.

Calculations show that it is not profitable to equip this hospital with a mater-

nity department

The quality of the roads on which we drive, the tariffing of public transporta-

tion, the way our electricity is produced, the safety regulations applied to factories

near our homes, the quality of our social security system, etc., depend on partic-

ular ways of assessing and summarising the costs and the benefits of alternative

projects. Cost-benefit analysis evaluates such projects using money as a yardstick.

This raises many difficulties outside simple cases: how to convert the various

consequences of a complex project into monetary units, how to cope with equity

considerations in the distribution of costs and benefits, how to take the distribution

in time of these consequences into account? In chapter 5 we saw that cost-benefit

analysis can hardly claim to always solve all these difficulties in a satisfactory

manner. Therefore, the apparently objective calculations invoked to refuse the

creation of a maternity department in our hospital, are highly dependent on nu-

merous debatable hypotheses (e.g. the pricing of a number of statistical delivery

incidents due to a longer transportation time for some mothers). It is not unlikely

that other reasonable hypotheses may have led to an opposite decision.

Based on numerous tests it appears that the best buy is car Z

How to take several, generally conflicting, criteria into account when making

a decision ? This area, known as Multiple Criteria Decision Making (MCDM)

is the subject of chapter 6. We showed that, in most cases, the analyst has the

choice between several aggregation strategies that could lead to different results.

Furthermore, apparently familiar concepts, like the importance of criteria, are

shown to have little (if any) clear meaning outside a well-defined aggregation

strategy. Each of these strategies requires the assessment of more or less rich

and precise inter-criteria information. Since such assessments shape preference

information as much as they collect it, the comparison of these strategies raises

many problems. Therefore, because each potential buyer has his own preferences

and interests and there are many different and yet reasonable ways to aggregate

them, the very notion of a best buy is highly debatable.

10.2. WHAT HAVE WE LEARNED? 239

Relax, our new camera will choose the optimal focus for you

Our washing machines, our cameras, our TV sets often take decisions on their

own, e.g. concerning the amount of water or energy to use, the right focus, the,

supposedly optimal tuning of channels, the clarity of an image. The decision

modules underlying such automatic decisions were studied in chapter 7. We saw

that they are based on concepts and techniques that are very similar to the ones

examined in chapter 6 and, thus, raise similar problems and questions. Contrary

to the situation in chapter 6 however, they are used in real time without human

intervention after the implementation stage. This raises new difficulties and issues.

Therefore, relying on the automatic decisions taken by the new camera might not

always be your best option.

Given what you told me about your preferences and beliefs, you should not

invest in this project in view of its expected utility

Standard decision analysis techniques (see e.g. Raiffa 1970) are often seen as

synonymous with decision support methods in risky and/or uncertain situations.

Using a real example in electricity production planning, in chapter 8, we showed

why the implementation of these standard techniques may not be as straightfor-

ward as is often believed. Besides possible computational problems, the assessment

and revision of (subjective) probability distributions in highly ambiguous environ-

ments and in situations involving a long period of time, is an enormous task.

Alternative tools, such as possibilities, belief functions, fuzzy sets and other

kinds of non-additive uncertainty measures may appear as good contenders al-

though their theoretical basis may be seen as less firm than the one underlying

standard Bayesian analysis. Furthermore, important considerations, like the dy-

namic consistency of choices and the aggregation of consequences over time were

shown to be largely open questions. Therefore, there might be more than one

way to assess preferences and beliefs and to combine them in order to make a

recommendation.

Whether we like it or not, it seems difficult nowadays to escape from formal

decision and evaluation methods. We may ignore them. The authors of this book

believe that it may be interesting and profitable to give them a closer look. The

real case-study presented in chapter 9 has shown that their proper use can have a

significant impact on real complex decision or evaluation processes.

Although the methods examined in this book are apparently very different and

emanate from various disciplines, they appear to have a lot in common. This

should not be much of a surprise since these methods have the common objective of

providing recommendations in complex decision and evaluation processes. What

might be slightly more surprising, is that most of these methods and tools are

plagued with many difficulties. Let us try to summarise the main findings and

problems encountered in the preceding chapters here.

240 CHAPTER 10. CONCLUSION

decision/evaluation processes. Using them rarely amounts to solving a

well-defined formal problem. Their usefulness not only depends on their

intrinsic formal qualities but also on the quality of their implementation

(structuration of the problem, communication with actors involved in

the process, transparency of the model, etc.). Having a sound theo-

retical basis is therefore a necessary but insufficient condition to their

usefulness (see chapter 9).

The objective of these models may be different from recommending the

choice of a best course of action. More complex recommendations,

e.g. ranking the possible courses of action or comparing them to stan-

dards, are also frequently needed (see chapters 3, 4, 6 and 7). Moreover,

the usefulness of such models is not limited to the elaboration of sev-

eral types of recommendations. When properly used, they may provide

support at all steps of a decision process (see chapter 9)

Collecting data

All models imply collecting and assessing data of various types and

qualities and manipulating these data in order to derive conclusions that

will hopefully be useful in a decision or evaluation process. This more or

less inevitably implies building evaluation models trying to capture

aspects of reality that are difficult to define with great precision (see

chapters 3, 4, 6 and 9).

The numbers resulting from such evaluation models often appear as

constructs that are the result of multiple options. The choice between

these various possible options is only partly guided by scientific con-

siderations. These numbers should not be confounded with numbers

resulting from classical measurement operations in Physics. They are

measured on scales that are difficult to characterise properly. Further-

more, they are often plagued with imprecision, ambiguity and/or un-

certainty. Therefore, more often than not, these numbers seem, at best,

to give an order of magnitude of what is intended to be captured (see

chapters 3, 4, 6, 8).

The properties of the numbers manipulated in such models should be

examined with care; using numbers may only be a matter of con-

venience and does not imply that any operation can be meaningfully

performed on them (see chapters 3, 4, 6 and 7).

The use of evaluation models greatly contributes to shaping and trans-

forming the reality that we would like to measure. Implementing a

decision/evaluation model only rarely implies capturing aspects of re-

ality that can be considered as independent of the model (see chapters

6 and 9).

Aggregating evaluations

10.2. WHAT HAVE WE LEARNED? 241

ing an easy task. Although many aggregation models amount to sum-

marising these numbers into a single one, this is not the only possible

aggregation strategy (see chapters 3, 4, 5 and 6).

The pervasive use of simple tools such as weighted averages may lead to

disappointing and/or unwanted results. The use of weighted averages

should in fact be restricted to rather specific situations that are seldom

met in practice.

Devising an aggregation technique is not an easy task. Apparently

reasonable principles can lead to a model with poor properties. A formal

analysis of such models may therefore prove of utmost importance (see

chapters 2, 4 and 6).

Aggregation techniques often call for the introduction of preference

information. The type of aggregation model that is used greatly con-

tributes to shaping this information. Assessment techniques, therefore,

not only collect but shape and/or create preference information (see

chapter 6).

Many different tools can be envisaged to model the preferences of an

actor in a decision/evaluation process (see chapters 2 and 6).

Intuitive preference information, e.g. concerning the relative importance

of several points of view, may be difficult to interpret within a well-

defined aggregation model (see chapter 6).

the model should explicitly deal with imprecision, uncertainty and in-

accurate determination. Modelling all these elements into the classical

framework of Decision Theory using probabilities may not always lead

to an adequate model. It is not easy to create an alternative framework

in which problems such as dynamic consistency or respect of (first or-

der) stochastic dominance are dealt with in a satisfactory manner (see

chapters 6 and 8).

Deriving robust conclusions on the basis of such aggregation models

requires a lot of work and care. The search for robust conclusions may

imply analyses much more complex than simple sensitivity analyses

varying one parameter at a time in order to test the stability of a

solution (see chapters 6 and 8).

We saw that the methods reviewed in chapters 2 to 8 are far from being without

problems. Indeed these chapters can be seen as a collection of the defects of these

methods. Some readers may think that, faced with such evidence, this type of

method should be abandoned and that intuition or expertise are not likely to

do much worse, at lower cost and with less effort. In our opinion, this would be a

totally unwarranted conclusion. It is the firm belief and conviction of the authors

242 CHAPTER 10. CONCLUSION

that the use of formal decision and evaluation tools is both inevitable and useful.

Three main arguments can be proposed to support this claim.

First, it should not be forgotten that formal tools lend themselves more easily

to criticism and close examination than other kinds of tools. However, whenever

intuition or expertise has been subjected to close scrutiny, it has been more or

less always shown that such types of judgements are based on heuristics that are

likely to neglect important aspects of the situation and/or are affected by many

biases (see the syntheses of Kahneman, Slovic and Tversky 1981, Bazerman 1990,

Russo and Schoemaker 1989, Hogarth 1987, Poulton 1994, Thaler 1991)

Second, formal methods have a number of advantages that often prove crucial

in complex organisational and/or social processes:

process by offering them a common language;

concentrating efforts on crucial matters. Thus, formal methods are often

indispensable structuration instruments.

ration capabilities are crucial in order to devise robust recommendations.

Although these advantages may have little weight compared to the obvious draw-

backs of formal methods in terms of effort involved, money and time consumed

in some situations (e.g. a very simple decision/evaluation process involving a sin-

gle actor) they appear to us fundamental to us in most social or organisational

processes (see chapter 9).

Third, casual observation suggests that there is an increasing demand for such

tools in various domains (going from executive information systems, decision sup-

port systems and expert systems to standardised evaluation tests and impact stud-

ies). It is our belief that the introduction of such tools may have quite a beneficial

impact in many areas in which they are not commonly used. Although many com-

panies use tools such as graphology and/or astrology in order to select between

applicants for a given position, we are more than inclined to say that the use of

more formal methods could improve such selection processes (let alone on issues

such as fairness and equity) in a significant way. Similarly, the introduction of

more formal evaluation tools in the evaluation of public policies, laws and regu-

lations (e.g. policy against crime and drugs, policy towards the carrying of guns,

fiscal policy, the establishment of environmental standards, etc.), an area in which

they are strikingly absent in many countries, would surely contribute to a more

transparent and effective government.

We would thus answer a clear and definite yes to the question of whether

formal decision and evaluation tools are useful.

10.3. WHAT CAN BE EXPECTED? 243

Our plea for the introduction of more formal decision and evaluation tools may

appear paradoxical in view of the content of this book. Have we been overly

critical then? Certainly not. Our willingness to keep mathematics and formalism

to the lowest possible level has not allowed us to explore many technical details

and difficulties. Indeed, a thorough critical examination of each of the methods

covered in chapters 2 to 8 could be the subject of an entire book.

The paradox between our conviction in the usefulness of formal methods and

the content of this book is only apparent and results from a misunderstanding. The

fact that many decision and evaluation tools are plagued with serious difficulties is

troublesome. It should not be unexpected however, unless one believes that there

is a single best way to provide support in each type of decision or evaluation

process. We doubt that this is a reasonable belief. Indeed, the very way in which

a good formal decision/evaluation method is defined, is nothing but clear. Two

main, non-exclusive, paths have often been suggested for this purpose. None of

them appear totally convincing to us.

the engineering route that amounts to saying that a method is good because

it works, i.e. has been applied several times in real-world problems and

has been well accepted by the actors in the process. Although we would

definitely not favour a method that would be unable to pass such a test, we

doubt that the engineering argument is sufficient to define what would dis-

tinguish good formal decision or evaluation methods. First, it is important

to remember that the quality of the support provided by a formal tool is

very difficult to separate from considerations linked to the implementation of

the method. As should be apparent from of chapter 9, the formal tools used

by an analyst are implemented in decision or evaluation processes that may

be highly complex (involving many different actors, lasting a long time and

being governed by complex rules and/or regulations). The resulting deci-

sion/evaluation aid process is therefore conditioned by many factors outside

the realm of a formal method: the quality of the structuration of the prob-

lem, of communication with stakeholders, the availability of user-friendly

softwares, the timing and costs of the study, etc. are elements of utmost

importance in the quality of a decision/evaluation aid process. Supporting

a decision or an evaluation process should not be confounded with solving

a well-defined formal problem. Although it may make sense to associate

a good method for solving it to such a problem, supporting real decision

and evaluation processes should not be confounded with this formal exer-

cise. Second, in practice, it is often difficult to know whether the proposed

model worked or not. Even though the final decision is at variance with

the recommendations derived from the model, the very presence of analysts,

the questions they raised, the type of reasoning they have promoted could

have had a significant impact on the decision process. Should we say then

that the method has worked or not?

A close variant of the engineering route could be called the naive route.

244 CHAPTER 10. CONCLUSION

to good decisions. The literature on decision (see Raiffa 1970, Russo

and Schoemaker 1989, Keeney, Hammond and Raiffa 1999), however, has

always insisted on the fact that good decisions do not necessarily lead to

good outcomes. This literature shows that it is very difficult to define what

would constitute a good decision a priori (good in which state of nature ?

good for whom ? good according to what criteria ? at what moment in time

?, etc.) and that the essential idea is to promote a good decision process.

is backed by a sound theory of rational choice. Although we find theories

most useful, the criteria for separating sound from unsound theories of ratio-

nal choice do not appear obvious to us. A striking example of this difficulty

can be found in the area of decision under risk and uncertainty. While, until

the beginning of the eighties, expected utility theory was considered almost

unanimously as the rational theory of choice under risk, the proliferation

of alternative theories since then (see e.g. Dubois, Fargier and Prade 1997,

Fishburn 1988, Gilboa and Schmeidler 1989, Jaffray 1988, Jaffray 1989, Kah-

neman and Tversky 1979, Loomes and Sugden 1982, Machina 1982, Quiggin

1982, Schmeidler 1989, Wakker 1989, Yaari 1987) fostered by the result

of numerous empirical experiments (see e.g. Allais 1953, Hershey, Kun-

reuther and Schoemaker 1982, Johnson and Schkade 1989, Kahneman and

Tversky 1979, McCord and de Neufville 1982, McCrimmon and Larsson

1979) presently results in a very complex situation in which it is not easy

to discriminate between theories both from an empirical (see e.g. Abdellaoui

and Munier 1994, Carbone and Hey 1995, Harless and Camerer 1994, Hey

and Orme 1994, Sopher and Gigliotti 1993) or a normative point of view

(see e.g. Hammond 1988, Machina 1989, McClennen 1990, Nau 1995, Nau

and McCardle 1991). This is true even though most, if not all, of these

theories have been axiomatically characterised (i.e. a set of conditions is

known that completely characterises the proposed choice or evaluation mod-

els). Having axioms is certainly useful in order to compare theories but the

rational content of the axioms and their interpretation remain much de-

bated. Furthermore, the relation between the formal axiomatic theory and

the assessment technologies derived from it are far from being obvious (see

e.g. Bouyssou 1984).

Analysts implementing formal decision and evaluation tools are in a position sim-

ilar to that of an engineer. Contrary to most engineers, however, these decision

engineers often lack clear criteria for appreciating the success or failure of

their models.

At this point it should be apparent that research on formal decision and evalu-

ation methods should not be guided by the hope of discovering models that would

be ideal under certain types of circumstances. Can something be done then? In

view of the many difficulties encountered with the models envisaged in this book

and the many fields in which no formal decision and evaluation tools are used, we

do think that this area will be rich and fertile for future research.

10.3. WHAT CAN BE EXPECTED? 245

Freed from the idea that we will discover the method, we can, more modestly

and more realistically, expect to move towards:

and evaluation models in complex and conflictual decision processes;

flexible preference models able to cope with data of poor or unknown quality,

conflicting or lacking information;

assessment protocols and technologies able to cope with complex and unsta-

ble preferences, uncertain trade-offs, hesitation and learning;

tools for comparing aggregation models in order to know what they have

in common and whether one is likely to be more appropriate in view of the

quality of the data?

explicit involvement and participation of all stakeholders, flexible preference mod-

els tolerating hesitations and contradictions, flexible tools for modelling impreci-

sion and uncertainty, evaluation models fully taking incommensurable dimensions

into account in a meaningful way, assessments technologies incorporating fram-

ing effects and learning processes, exploration techniques allowing to build robust

recommendations (see Bouyssou et al. 1993). Thus, thanks to rigourous con-

cepts, well-formulated models, precise calculations and axiomatic considerations,

we should be able to clarify decisions by separating what is objective from what

is less objective, by separating strong conclusions from weaker ones, by dissipat-

ing certain forms of misunderstanding in communication, by avoiding the trap

of illusory reasoning, by bringing out certain counter-intuitive results (Roy and

Bouyssou 1991).

This utopia calls for a vast research programme requiring many different

types of research (axiomatic analyses of models, experimental studies of models,

clinical analyses of decision/evaluation processes, conceptual reflections on the

notions of rationality and performance, production of new pieces of software,

etc.).

The authors are preparing another book that will hopefully contribute to this

research programme. It will cover the main topics that we believe to be useful in

order to successfully implement formal decision/evaluation models in real-world

processes :

aggregation models,

246 CHAPTER 10. CONCLUSION

If we managed to convince you that formal decision and evaluation models are an

important topic and that the hope of discovering ideal methods is somewhat

chimerical, it is not unlikely that you will find the next book valuable.

Bibliography

[1] Abbas, M., Pirlot, M. and Vincke, Ph. (1996). Preference structures and co-

comparability graphs, Journal of Multicriteria Decision Analysis 5: 8198.

[2] Abdellaoui, M. and Munier, B. (1994). The closing in method: An ex-

perimental tool to investigate individual choice patterns under risk, in

B. Munier and M.J. Machina (eds), Models and experiments in risk and

rationality, Kluwer, Dordrecht, pp. 141155.

[3] Adler, H.A. (1987). Economic appraisal of transport projects: A manual with

case studies, Johns Hopkins University Press for the World Bank, Balti-

more.

[4] Airaisian, P.W. (1991). Classroom assessment, McGraw-Hill, New York.

[5] Allais, M. and Hagen, O. (eds) (1979). Expected utility hypotheses and the

Allais paradox, D. Reidel, Dordrecht.

[6] Allais, M. (1953). Le comportement de lhomme rationnel devant le risque :

Critique des postulats et axiomes de lecole americaine, Econometrica

21: 50346.

[7] Armstrong, W.E. (1939). The determinateness of the utility function, The

Economic Journal 49: 453467.

[8] Arrow, K.J. and Raynaud, H. (1986). Social choice and multicriterion

decision-making, MIT Press, Cambridge.

[9] Arrow, K.J. (1963). Social choice and individual values, 2nd edn, Wiley, New

York.

[10] Atkinson, A.B. (1970). On the measurement of inequality, Journal of Eco-

nomic Theory 2: 244263.

[11] Baldwin, J.F. (1979). A new approach to approximate reasoning using a fuzzy

logic, Fuzzy Sets and Systems 2: 309325.

[12] Balinski, M.L. and Young, H.P. (1982). Fair representation, Yale University

Press, New Haven.

[13] Bana e Costa, C.A., Ensslin, L., Correa, E.C. and Vansnick, J.-C. (1999).

Decision support systems in action: Integrated application in a multi-

criteria decision aid process, European Journal of Operational Research

113: 315335.

[14] Barbera, S., Hammond, P. and Seidl, C. (eds) (1998). Handbook of utility

theory, Vol. 1: Principles, Kluwer, Dordrecht.

247

248 BIBLIOGRAPHY

[15] Bartels, R. H.., Beatty, J. C.. and Barsky, B.H.. (1987). An introduction

to Spline for use in computer graphics and geometric Modeling, Morgan

Kaufmann, Los Altos.

[16] Barzilai, J., Cook, W.D. and Golany, B. (1987). Consistent weights for judg-

ments matrices of the relative importance of alternatives, Operations Re-

search Letters 6: 131134.

[17] Bazerman, M.H. (1990). Judgment in managerial decision making, Wiley,

New York.

[18] Bell, D., Raiffa, H. and Tversky, A. (eds) (1988). Decision making: Descrip-

tive, normative and prescriptive interactions, Cambridge University Press,

Cambridge.

[19] Belton, V., Ackermann, F. and Shepherd, I. (1997). Integrated support

from problem structuring through alternative evaluation using COPE and

VISA, Journal of Multi-Criteria Decision Analysis 6: 115130.

[20] Belton, V. and Gear, A.E. (1983). On a shortcoming of Saatys analytic hi-

erarchies, Omega 11: 228230.

[21] Belton, V. (1986). A comparison of the analytic hierarchy process and a simple

multi-attribute value function, European Journal of Operational Research

26: 721.

[22] Bereau, M. and Dubuisson, B. (1991). A fuzzy extended k-nearest neighbor

rule, Fuzzy Sets and Systems 44: 1732.

[23] Bernoulli, D. (1954). Specimen theori nov de mensura sortis, Commen-

tarii Academi Scientiarum Imperialis Petropolitan (5, 175192, 1738),

Econometrica 22: 2336. Translated by L. Sommer.

[24] Bezdek, J., Chuah, S.K. and Leep, D. (1986). Generalised k-nearest neighbor

rules, Fuzzy Sets and Systems 18: 237256.

[25] Blin, M.-J. and Tsoukias, A. (1998). Multicriteria methodology contribution

to the software quality evaluation, Technical report, Cahier du LAMSADE

No 155, Universite Paris-Dauphine, Paris.

[26] Boardman, A. (1996). Cost benefit analysis: Concepts and practices, Prentice-

Hall, New-York.

[27] Boiteux, M. (1994). Transports : Pour un meilleur choix des investissements,

La Documentation Francaise, Paris.

[28] Bonboir, A. (1972). La docimologie, PUF, Paris.

[29] Borda, J.-Ch. (1781). Memoire sur les elections au scrutin, Comptes Rendus

de lAcademie des Sciences. Translated by Alfred de Grazia as Mathe-

matical derivation of an election system, Isis, Vol. 44, pp. 4251.

[30] Bouchon, B. (1995). La logique floue et ses applications, Addison Wesley, New

York.

BIBLIOGRAPHY 249

in D. D. J. Bezdek and H. Prade (eds), Fuzzy sets in approximate reason-

ing and information systems, Vol. 3 of Handbook of Fuzzy Sets, Kluwer,

Dordrecht, chapter 4, pp. 279304.

[32] Bouyssou, D., Perny, P., Pirlot, M., Tsoukias, A. and Vincke, Ph. (1993).

A manifesto for the new MCDM era, Journal of Multi-Criteria Decision

Analysis 2: 125127.

[33] Bouyssou, D. and Perny, P. (1992). Ranking methods for valued preference

relations: A characterization of a method based on entering and leaving

flows, European Journal of Operational Research 61: 186194.

[34] Bouyssou, D. and Pirlot, M. (1997). Choosing and ranking on the basis of

fuzzy preference relations with the Min in Favor, in G. Fandel and T. Gal

(eds), Multiple criteria decision making Proceedings of the twelfth inter-

national conference, Hagen, Germany, Springer Verlag, Berlin, pp. 115

127.

[35] Bouyssou, D. and Vansnick, J.-C. (1986). Noncompensatory and generalized

noncompensatory preference structures, Theory and Decision 21: 251266.

[36] Bouyssou, D. (1984). Decision-aid and expected utility theory: A critical

survey, in O. Hagen and F. Wenstp (eds), Progress in utility and risk

theory, Kluwer, Dordrecht, pp. 181216.

[37] Bouyssou, D. (1986). Some remarks on the notion of compensation in MCDM,

European Journal of Operational Research 26: 150160.

[38] Bouyssou, D. (1990). Building criteria: A prerequisite for MCDA, in

C.A. Bana e Costa (ed.), Readings in multiple criteria decision aid,

Springer Verlag, Berlin, pp. 5880.

[39] Bouyssou, D. (1992). On some properties of outranking relations based on

a concordance-discordance principle, in A. Goicoechea, L. Duckstein and

S. Zionts (eds), Multiple criteria decision making, Springer-Verlag, Berlin,

pp. 93106.

[40] Bouyssou, D. (1996). Outranking relations: Do they have special properties?,

Journal of Multi-Criteria Decision Analysis 5: 99111.

[41] Brams, S.J. and Fishburn, P.C. (1982). Approval voting, Birkhauser, Basel.

[42] Brans, J.-P. and Vincke, Ph. (1985). A preference ranking organization

method, Management Science 31: 647656.

[43] Brekke, K.A. (1997). The numeraire matters in cost-benefit analysis, Journal

of Public Economics 64: 117123.

[44] Brent, R.J. (1984). Use of distributional weights in cost-benefit analysis: A

survey of schools, Public Finance Quarterly 12: 213230.

[45] Brent, R.J. (1996). Applied cost-benefit analysis, Elgar, Adelshot Hants.

[46] Broome, J. (1985). The economic value of life, Economica 52: 281294.

250 BIBLIOGRAPHY

[47] Carbone, E. and Hey, J.D. (1995). A comparison of the estimates of expected

utility and non-expected utility preference functionals, Geneva Papers on

Risk and Insurance Theory 20: 111133.

[48] Cardinet, J. (1986). Evaluation scolaire et mesure, De Boeck, Brussels.

[49] Chatel, E. (1994). Quest-ce quune note : recherche sur la pluralite des

modes deducation et devaluation, Les Dossiers dEducation et Forma-

tions 47: 183203.

[50] Checkland, P. (1981). Systems thinking, systems practice, Wiley, New York.

[51] Condorcet, M.J.A.N.C., marquis de. (1785). Essai sur lapplication de

lanalyse a la probabilite des decisions rendues a la pluralite des voix, Im-

primerie Royale, Paris.

[52] Cover, T. M. and Hart, P. E. (1967). Nearest neighbor pattern classification,

IEEE, Transactions on Information Theory, IT-13 1: 2127.

[53] Cross, L.H. (1995). Grading students, Technical Report Series EDO-TM-95-5,

ERIC/AE Digest.

[54] Daellenbach, H.G. (1994). Systems and decision making. A management sci-

ence approach, Wiley, New York.

[55] Dasgupta, P.S., Marglin, S. and Sen, A.K. (1972). Guidelines for project eval-

uation, UNIDO, New York.

[56] Dasgupta, P.S. and Pearce, D.W. (1972). Cost-benefit analysis: Theory and

practice, Macmillan, Basingstoke.

[57] Davis, B.G. (1993). Tools for teaching, Jossey-Bass, San Francisco.

[58] de Jongh, A. (1992). Theorie du mesurage, agregation des criteres et appli-

cation au decathlon, Masters thesis, SMG, Universite Libre de Bruxelles,

Brussels.

[59] Dekel, E. (1986). An axiomatic characterization of preference under uncer-

tainty: Weakening the independence axiom, Journal of Economic Theory

40: 304318.

[60] Desrosieres, A. (1995). Refleter ou instituer : Linvention des indicateurs

statistiques, Technical Report 129/J310, INSEE, Paris.

[61] de Ketele, J.-M. (1982). La docimologie, Cabay, Louvain-La-Neuve.

[62] de Landsheere, G. (1980). Evaluation continue et examens. Precis de doci-

mologie, Labor-Nathan, Paris.

[63] Dinwiddy, C. and Teal, F. (1996). Principles of cost-benefit analysis for de-

veloping countries, Cambridge University Press, Cambridge.

[64] Dorfman, R. (1996). Why benefit-cost analysis is widely disregarded and what

to do about it?, Interfaces 26: 16.

[65] Dreze, J. and Stern, N. (1987). The theory of cost-benefit analysis, in

A.J. Auebach and M. Feldstein (eds), Handbook of public economics, El-

sevier, Amsterdam, pp. 909989.

BIBLIOGRAPHY 251

[66] Dubois, D., Fargier, H. and Prade, H. (1997). Decision-making under ordinal

preferences and uncertainty, in D. Geiger and P.P. Shenoy (eds), Proceed-

ings of the 13th conference on uncertainty in artificial intelligence, Morgan

Kaufmann, Los Altos, pp. 157164.

[67] Dubois, D., Prade, H. and Sabbadin, R. (1998). Qualitative decision theory

with Sugeno integrals, Proceedings of the 14t h conference on uncertainty

in artificial intelligence, Morgan Kaufmann, Los Altos, pp. 121128.

[68] Dubois, D., Prade, H. and Ughetto, L. (1999). Fuzzy logic, control engi-

neering and artificial intelligence, in H.B. Verbruggen, H.J. Zimmermann

and R. Babuska (eds), Fuzzy algorithms for control, Kluwer, Dordrecht,

pp. 1758.

[69] Dubois, D. and Prade, H. (1987). The mean value of a fuzzy number, Fuzzy

Sets and Systems 24: 279300.

[70] Dubois, D. and Prade, H. (1988). Possibility theory, Plenum Press, New-York.

[71] Dupuit, J. (1844). De la mesure de lutilite des travaux publics, Annales des

Ponts et Chaussees (8).

[72] Dyer, J.S. (1990). Remarks on the analytic hierarchy process, Management

Science 36: 249258.

[73] Ebel, R.L. and Frisbie, D.A. (1991). Essentials of educational measurement,

Prentice-Hall, New-York.

[74] Ellsberg, D. (1961). Risk, ambiguity and the Savage axioms, Quarterly Jour-

nal of Economics 75: 643669.

[75] Fargier, H. and Perny, P. (1999). Qualitative decision models under uncer-

tainty without the commensurability hypothesis, in K.B.. Laskey and

H. Prade (eds), Proceedings of the 15t h conference on uncertainty in ar-

tificial intelligence, Morgan Kaufmann, Los Altos, pp. 188195.

[76] Farrell, D.M. (1997). Comparing electoral systems, Contemporary Political

Studies, Prentice-Hall, New-York.

[77] Fiammengo, A., Buosi, D., Iob, I., Maffioli, P., Panarotto, G. and Turino, M.

(1997). Bid management of software acquisition for cartography applica-

tions. Presented at AIRO 97 Conference, Aosta.

[78] Fishburn, P.C. and Sarin, R.K. (1991). Dispersive equity and social risk,

Management Science 37: 751769.

[79] Fishburn, P.C. and Sarin, R.K. (1994). Fairness and social risk I: Unaggre-

gated analyses, Management Science 40: 11741188.

[80] Fishburn, P.C. and Straffin, P.D. (1989). Equity considerations in public risks

evaluation, Operations Research 37: 229239.

[81] Fishburn, P.C. (1970). Utility theory for decision-making, Wiley, New York.

[82] Fishburn, P.C. (1976). Noncompensatory preferences, Synthese 33: 393403.

[83] Fishburn, P.C. (1977). Condorcet social choice functions, SIAM Journal on

Applied Mathematics 33: 469489.

252 BIBLIOGRAPHY

theories, in S. Zionts (ed.), Multicriteria problem solving, Springer Verlag,

Berlin, pp. 181224.

[85] Fishburn, P.C. (1982). The foundations of expected utility, D. Reidel, Dor-

drecht.

[86] Fishburn, P.C. (1984). Equity axioms for public risks, Operations Research

32: 901908.

[87] Fishburn, P.C. (1988a). Nonlinear preference and utility theory, Johns Hop-

kins University Press, Baltimore.

[88] Fishburn, P.C. (1988b). Normative theories of decision making under risk

and under uncertainty, in M. Kacprzyk and M. Roubens (eds), Non-

conventional preference relations in decision making, Springer Verlag,

Berlin, pp. 469489.

[89] Fishburn, P.C. (1991). Nontransitive preferences in decision theory, Journal

of Risk and Uncertainty 4: 113134.

[90] Fix, E. and Hodges, J.L. (1951). Discriminatory analysis, non-parametric

discrimination: consistency properties, Technical report, USAF Scholl of

aviation and medicine, Randolph Field. 4.

[91] Fodor, J.C. and Roubens, M. (1994). Fuzzy preference modelling and multi-

criteria decision support, Kluwer, Dordrecht.

[92] Folland, S., Goodman, A.C. and Stano, M. (1997). The economics of health

and health care, Prentice-Hall, New-York.

[93] French, S. (1981). Measurement theory and examinations, British Journal of

Mathematical and Statistical Psychology 34: 3849.

[94] French, S. (1993). Decision theory An introduction to the mathematics of

rationality, Ellis Horwood, London.

[95] Gacogne, L. (1997). Elements de logique floue, Hermes, Paris.

[96] Gafni, A. and Birch, S. (1997). Equity considerations in utility-based mea-

sures of health outcomes in economic appraisals: An adjustment algo-

rithm, Journal of Health Economics 10: 329342.

[97] Gehrlein, W.V. (1983). Condorcets paradox, Theory and Decision 15: 161

197.

[98] Gibbard, A. (1973). Manipulation of voting schemes: A general result, Econo-

metrica 41: 587601.

[99] Gilboa, I. and Schmeidler, D. (1989). Maxmin expected utility with a non-

unique prior, Journal of Mathematical Economics 18: 141153.

[100] Gilboa, I. and Schmeidler, D. (1993). Updating ambigous beliefs, Journal of

Economic Theory 59: 3349.

[101] Grabisch, M., Guely, F. and Perny, P. (1997). Evaluation subjective, Les

cahiers du Club CRIN - Association ECRIN, Paris.

BIBLIOGRAPHY 253

cision making, European Journal of Operational Research 89: 445456.

[103] Hammond, P.J. (1988). Consequentialist foundations for expected utility,

Theory and Decision 25: 2578.

[104] Hanley, N. and Spash, C.L. (1993). Cost-benefit analysis and the environ-

ment, Elgar, Adelshot Hants.

[105] Harker, P.T. and Vargas, L.G. (1987). The theory of ratio scale estimation:

Saatys analytic hierarchy process, Management Science 33: 13831403.

[106] Harless, D. and Camerer, C.F. (1994). The utility of generalized expected

utility theories, Econometrica 62: 12511289.

[107] Harvey, C.M. (1992). A slow-discounting model for energy conservation, In-

terfaces 22: 4760.

[108] Harvey, C.M. (1994). The reasonableness of non-constant discounting, Jour-

nal of Public Economics 53: 3151.

[109] Harvey, C.M. (1995). Proportional discounting of future costs and benefits,

Mathematics of Operations Research 20: 381399.

[110] Henriet, L. and Perny, P. (1996). Methodes multicrit res non-compensatoires

pour la classification floue dobjets, Proceedings of LFA96, pp. 915.

[111] Henriet, L. (1995). Probl mes daffectation et methodes de classification,

Memoire du dea 103, Universite Paris Dauphine.

[112] Hershey, J.C., Kunreuther, H.C. and Schoemaker, P.J.H. (1982). Sources of

bias in assessment procedures for utility functions, Management Science

28: 936953.

[113] Heurgon, E. (1982). Relationships between decision making process and

study process in OR interventions, European Journal of Operational Re-

search 10: 230236.

[114] Hey, J.D. and Orme, C. (1994). Investigating generalizations of expected

utility theory using experimental data, Econometrica 62: 12511289.

[115] Hogarth, R. (1987). Judgement and choice: The psychology of decision, Wi-

ley, New York.

[116] Holland, A. (1995). The assumptions of cost-benefit analysis: A philoso-

phers view, in K.G. Willis and J.T. Corkindale (eds), Environmental

valuation: New perspectives, CAB International, Oxford, pp. 2138.

[117] Horn, R.V. (1993). Statistical indicators, Cambridge University Press, Cam-

bridge.

[118] Humphreys, P.C., Svenson, O. and Vari, A. (1993). Analysis and aiding

decision processes, North-Holland, Amsterdam.

[119] IEEE 92 (1992). Standard for a software quality metrics methodology, Tech-

nical report, The Institute of Electrical and Electronics Engineers.

[120] International Atomic Energy Agency (1993). Cost-benefit aspects of food ir-

radiation processing, Bernan Associates, Washington D.C.

254 BIBLIOGRAPHY

tion, quality characteristics and guidelines for their use, Technical report,

ISO, Geneve.

[122] Jacquet-Lagreze, E., Moscarola, J., Roy, B. and Hirsch, G. (1978). Descrip-

tion dun processus de decision, Technical report, Cahier du LAMSADE

No 13, Universite Paris-Dauphine, Paris.

[123] Jacquet-Lagreze, E. and Siskos, J. (1982). Assessing a set of additive utility

functions for multicriteria decision making: The UTA method, European

Journal of Operational Research 10: 151164.

[124] Jacquet-Lagreze, E. (1990). Interactive assessment of preferences using holis-

tic judgments. The PREFCALC system, in C.A. Bana e Costa (ed.), Read-

ings in multiple criteria decision aid, Springer Verlag, Berlin, pp. 335350.

[125] Jaffray, J.-Y. (1988). Choice under risk and the security factor: An axiomatic

model, Theory and Decision 24: 169200.

[126] Jaffray, J.-Y. (1989a). Some experimental findings on decision making un-

der risk and their implications, European Journal of Operational Research

38: 301306.

[127] Jaffray, J.-Y. (1989b). Utility theory for belief functions, Operations Research

Letters 8: 107112.

[128] Johannesson, M. (1995a). A note on the depreciation of the societal perspec-

tive in economic evaluation in health care, Health Policy 33: 5966.

[129] Johannesson, M. (1995b). The relationship between cost-effectiveness anal-

ysis and cost-benefit analysis, Social Science and Medicine 41: 483489.

[130] Johannesson, M. (1996). Theory and methods of economic evaluation of

health care, Kluwer, Dordrecht.

[131] Johansson, P.O. (1993). Cost-benefit analysis of environmental change, Cam-

bridge University Press, Cambridge.

[132] Johnson, E.J. and Schkade, D.A. (1989). Bias in utility assesments: Further

evidence and explanations, Management Science 35: 406424.

[133] Kahneman, D., Slovic, P. and Tversky, A. (1981). Judgement under uncer-

tainty Heuristics and biases, Cambridge University Press, Cambridge.

[134] Kahneman, D. and Tversky, A. (1979). Prospect theory: An analysis of

decision under risk, Econometrica 47: 263291.

[135] Keeler, E.B. and Cretin, S. (1983). Discounting of life-saving and other non-

monetary effects, Management Science 29: 300306.

[136] Keeney, R.L., Hammond, J.S. and Raiffa, H. (1999). Smart choices: A guide

to making better decisions, Harvard University Press, Boston.

[137] Keeney, R.L. and Raiffa, H. (1976). Decisions with multiple objectives: Pref-

erences and value tradeoffs, Wiley, New York.

BIBLIOGRAPHY 255

[138] Keller, J., Gray, M. and Givens, J. (1985). A fuzzy knearest neighbor

algorithm, IEEE Transactions on Systems Man and Cybernetics. 15: 580

585.

[139] Kelly, J.S. (1991). Social choice bibliography, Social Choice and Welfare

8: 97169.

[140] Kerlinger, F.N. (1986). Foundations of behavioral research, 3rd edn, Holt,

Rinehart and Winston, New York.

[141] Kirkpatrick, C. and Weiss, J. (1996). Cost-benefit analysis and project ap-

praisal in developing countries, Elgar, Adelshot Hants.

[142] Kohli, K.N. (1993). Economic analysis of investment projects: A practical

approach, Oxford University Press for the Asian Development Bank, Ox-

ford.

[143] Krantz, D.H., Luce, R.D., Suppes, P. and Tversky, A. (1971). Foundations of

measurement, Vol. 1: Additive and polynomial representations, Academic

Press, New York.

[144] Krutilla, J.V. and Eckstein, O. (1958). Multiple purpose river development,

Johns Hopkins University Press, Baltimore.

[145] Laska, J.A. and Juarez, T. (1992). Grading and marking in American

schools: Two centuries of debate, Charles C. Thomas, Springfield.

[146] Laslett, R. (1995). The assumptions of cost-benefit analysis, in K.G. Willis

and J.T. Corkindale (eds), Environmental valuation: New perspectives,

CAB International, Oxford, pp. 520.

[147] Lesourne, J. (1975). Cost-benefit analysis and economic theory, North-

Holland, Amsterdam.

[148] Lindheim, E., Morris, L.L. and Fitz-Gibbon, C.T. (1987). How to measure

performance and use tests, Sage Publications, Thousand Oaks.

[149] Little, I.M.D. and Mirlees, J.A. (1968). Manual of industrial project analysis

in developing countries, O.E.C.D, Paris.

[150] Little, I.M.D. and Mirlees, J.A. (1974). Project appraisal and planning for

developing countries, Basic books, New York.

[151] Loomes, G. and Sugden, R. (1982). Regret theory: An alternative theory of

rational choice under uncertainty, Economic Journal 92: 805824.

[152] Loomes, G. (1988). Different experimental procedures for obtaining valua-

tions of risky actions: Implications for utility theory, Theory and Decision

25: 123.

[153] Loomis, J., Peterson, G., Champ, P., Brown, T. and Lucero, B. (1998).

Paired comparisons estimates of willingness to accept and contingent val-

uation estimates of willingness to pay, Journal of Economic Behavior and

Organisation 35: 501515.

[154] Luce, R.D., Krantz, D.H., Suppes, P. and Tversky, A. (1990). Foundations

of measurement, Vol. 3: Representation, axiomatisation and invariance,

Academic Press, New York.

256 BIBLIOGRAPHY

[155] Luce, R.D. and Raiffa, H. (1957). Games and Decisions, Wiley, New York.

[156] Luce, R.D. (1956). Semiorders and a theory of utility discrimination, Econo-

metrica 24: 178191.

[157] Lysne, A. (1984). Grading of students attainement: Purposes and functions,

Scandinavian Journal of Educational Research 28: 149165.

[158] Machina, M.J. (1982). Expected utility without the independence axiom,

Econometrica 50: 277323.

[159] Machina, M.J. (1989). Dynamic consistency and non-expected utility models

of choice under uncertainty, Journal of Economic Literature 27: 1622

1688.

[160] Mamdani, E. H.. (1981). Gaines fuzzy reasonning and its applications, Aca-

demic Press, New York.

[161] Marchant, Th. (1996). Valued relations aggregation with the Borda method,

Journal of Multi-Criteria Decision Analysis 5: 127132.

[162] Masser, I. (1983). The representation of urban planning-processes: An ex-

ploratory review, Environment and Planning B 10: 4762.

[163] May, K.O. (1952). A set of independent necessary and sufficient conditions

for simple majority decisions, Econometrica 20: 680684.

[164] McClennen, E.F. (1990). Rationality and dynamic choice: Foundational ex-

plorations, Cambridge University Press, Cambridge.

[165] McCord, M. and de Neufville, R. (1983). Fundamental deficiency of expected

utility analysis, in S. French, R. Hartley, L.C. Thomas and D.J. White

(eds), Multiobjective decision making, Academic Press, London, pp. 279

305.

[166] McCord, M. and de Neufville, R. (1982). Empirical demonstration that

expected utility decision analysis is not operational, in B. Stigum and

F. Wenstp (eds), Foundations of utility and risk theory, D. Reidel, Dor-

drecht, pp. 181199.

[167] McCrimmon, K.R. and Larsson, S. (1979). Utility theory: Axioms versus

paradoxes, in M. Allais and O. Hagen (eds), Expected utility hypotheses

and the Allais paradox, D. Reidel, pp. 27145.

[168] McLean, J.E. and Lockwood, R.E. (1996). Why and how should we assess

students? The competing measures of student performance, Sage Publica-

tions, Thousand Oaks.

[169] Merle, P. (1996). Levaluation des eleves. Enquete sur le jugement professo-

ral, PUF, Paris.

[170] Mintzberg, H., Raisinghani, D. and Theoret, A. (1976). The structure of un-

structured decision processes, Administrative Science Quarterly 21: 246

272.

[171] Mishan, E. (1982). Cost-benefit analysis, Allen and Unwin, London.

BIBLIOGRAPHY 257

[172] Moom, T.M. (1997). How do you know they know what they know? A hand-

book of helps for grading and evaluating student progress, Grove Publish-

ing, Westminster.

[173] Morisio, M. and Tsoukias, A. (1997). IUSWARE: A formal methodology for

software evaluation and selection, IEE Proceedings on Software Engineer-

ing 144: 162174.

[174] Moscarola, J. (1984). Organizational decision processes and ORASA inter-

vention, in R. Tomlinson and I. Kiss (eds), Rethinking the process of oper-

ational research and systems analysis, Pergamon Press, Oxford, pp. 169

186.

[175] Mousseau, V. (1993). Problemes lies a levaluation de limportance en aide

multicritere a la decision : Reflexions theoriques et experimentations,

PhD thesis, LAMSADE, Universite Paris-Dauphine, Paris.

[176] Munier, B. (1989). New models of decisions under uncertainty, European

Journal of Operational Research 38: 307317.

[177] Nas, T.F. (1996). Cost-benefit analysis: Theory and application, Sage Pub-

lications, Thousand Oaks.

[178] Nauck, D. and Kruse, R. (1999). Neuro-fuzzy methods in fuzzy rule gener-

ation, in D. D. J. Bezdek and H. Prade (eds), Fuzzy sets in approximate

reasoning and information systems, Vol. 3 of Handbook of Fuzzy Sets,

Kluwer, Dordrecht, chapter 5, pp. 305333.

[179] Nau, R.F. and McCardle, K.F. (1991). Arbitrage, rationality and equilib-

rium, Theory and Decision 31: 199240.

[180] Nau, R.F. (1995). Coherent decision analysis with inseparable probabilities

and utilities, Journal of Risk and Uncertainty 10: 7191.

[181] Nguyen, H.T. and Sugeno, M. (1998). Modelling and control, Kluwer, Dor-

drecht.

[182] Nims, J.F. (1990). Poems in translation: Sappho to Valery, The University

of Arkansas Press, Arkansas.

[183] Noizet, G. and Caverini, J.-P. (1978). La psychologie de levaluation scolaire,

PUF, Paris.

[184] Nurmi, H. (1987). Comparing voting systems, D. Reidel, Dordrecht.

[185] Nutt, P.C. (1984). Types of organizational decision processes, Administrative

Science Quarterly 19: 414450.

[186] Nyborg, K. (1998). Some Norwegian politicians use of cost-benefit analysis,

Public Choice 95: 381401.

[187] Ostanello, A. and Tsoukias, A. (1993). An explicative model of public in-

terorganizational interactions, European Journal of Operational Research

70: 6782.

[188] Ostanello, A. (1990). Action evaluation and action structuring Different

decision aid situations reviewed through two actual cases, in C.A. Bana

258 BIBLIOGRAPHY

Berlin, pp. 3657.

[189] Ostanello, A. (1997). Validation aspects of a prototype solution implemen-

tation to solve a complex MC problem, in J. Clmaco (ed.), Multi-criteria

analysis, Springer Verlag, Berlin, pp. 6174.

[190] Ott, W.R. (1978). Environmental indices: Theory and practice, Ann Arbor

Science, Ann Arbor.

[191] Paschetta, E. and Tsoukias, A. (1999). A real world MCDA application:

Evaluating software, Technical report, Document du LAMSADE No 113,

Universite Paris-Dauphine, Paris.

[192] Perny, P. and Pomerol, J.-Ch.. (1999). Use of artificial intelligence multi-

criteria decision making, in T. Gal, Th.J. Stewart and Th. Hanne (eds),

Advances in MCDM models, algorithms, theory, and applications, Kluwer,

Dordrecht, pp. 15.115.43.

[193] Perny, P. and Roubens, M. (1998). Fuzzy preference modelling, in

R. Slowinski (ed.), Fuzzy sets in decision analysis, operations research

and statistics, Kluwer, Dordrecht, pp. 330.

[194] Perny, P. and Zucker, J.D. (1999). Collaborative filtering methods based on

fuzzy preference relations, Proceedings of EUROFUSE-SIC99, pp. 279

285.

[195] Perny, P. (1992). Sur le non-respect de laxiome dindependance dans les

methodes de type ELECTRE, Cahiers du CERO 34: 211232.

[196] Perrot, N., Trystram, G., Le Guennec, D. and Guely, F. (1996). Sensor

fusion for real time quality evaluation of biscuit during baking. compari-

son between bayesian and fuzzy approaches, Journal of Food Engineering

29: 301315.

[197] Perrot, N. (1997). Matrise des procedes alimentaires et theorie des ensem-

bles flous, PhD thesis, Ecole Nationale Superieure des Industries Agricoles

Alimentaires.

[198] Pieron, H. (1963). Examens et docimologie, PUF, Paris.

[199] Pirlot, M. and Vincke, Ph. (1997). Semiorders. Properties, representations,

applications, Kluwer, Dordrecht.

[200] Pirlot, M. (1997). A common framework for describing some outranking

procedures, Journal of Multi-Criteria Decision Analysis 6: 8693.

[201] Popham, W.J. (1981). Modern educational measurement, Prentice-Hall,

New-York.

[202] Poulton, E.C. (1994). Behavioral decision theory: A new approach, Cam-

bridge University Press, Cambridge.

[203] Quiggin, J. (1982). A theory of anticipated utility, Journal of Economic

Behaviour and Organization 3: 323343.

BIBLIOGRAPHY 259

model, Kluwer, Dordrecht.

[205] Raiffa, H. (1970). Decision analysis Introductory lectures on choices under

uncertainty, Addison-Wesley, New York.

[206] Riley, H.J., Checca, R.C., Singer, T.S. and Worthington, D.F.. (1994).

Grades and grading practices: The results of the 1992 AACRAO survey,

American Association of Collegiate Registrars and Admissions Officers,

Washington D.C.

[207] Rosenhead, J. (1989). Rational analysis of a problematic world, Wiley, New

York.

[208] Roubens, M. and Vincke, Ph. (1985). Preference modelling, Springer Verlag,

Berlin.

[209] Roy, B. and Bouyssou, D. (1991). Decision-aid: an elementary introduction

with emphasis on multiple criteria, Investigacion Operativa 2: 95110.

[210] Roy, B. and Bouyssou, D. (1993). Aide multicritere a la decision : Methodes

et cas, Economica, Paris.

[211] Roy, B. and Skalka, J.-M. (1984). ELECTRE IS : Aspects methodologiques

et guide dutilisation, Technical report, Document du LAMSADE No 30,

Universite Paris-Dauphine, Paris.

[212] Roy, B. (1974). Criteres multiples et modelisation des preferences : lapport

des relations de surclassement, Revue dEconomie Politique 1: 144.

[213] Roy, B. (1990). Science de la decision ou science de laide a la decision ?,

Technical report, Cahier du LAMSADE No 97, Universite Paris-Dauphine,

Paris.

[214] Roy, B. (1993). Decision science or decision-aid science?, European Journal

of Operational Research 66: 184204.

[215] Roy, B. (1996). Multicriteria methodology for decision aiding, Kluwer, Dor-

drecht. Original version in French Methodologie multicritere daide a la

decision, Economica, Paris, 1985.

[216] Russo, J.E. and Schoemaker, P.J.H. (1989). Confident decision making, Pi-

atkus, London.

[217] Saaty, T.L. (1980). The analytic hierarchy process, McGraw-Hill, New York.

[218] Sabot, R. and Wakeman, L.J. (1991). Grade inflation and course choice,

Journal of Economic Perspectives 5: 159170.

[219] Sager, C. (1994). Eliminating grades in schools: An allegory for change, A

S Q Quality Press, Milwaukee.

[220] Salles, M., Barrett, C.R. and Pattanaik, P.K. (1992). Rationality and ag-

gregation of preferences in an ordinally fuzzy framework, Fuzzy Sets and

Systems 49: 913.

260 BIBLIOGRAPHY

[221] Satterthwaite, M.A. (1975). Strategy proofness and Arrows conditions: Ex-

istence and correspondence theorems for voting procedures and social wel-

fare functions, Journal of Economic Theory 10: 187217.

[222] Savage, L. (1954). The foundations of statistics, 1972, 2nd revised edn, Wiley,

New York.

[223] Schmeidler, D. (1989). Subjective probability and expected utility without

additivity, Econometrica 57: 571587.

[224] Schneider, Th., Schieber, C., Eeckoudt, L. and Gollier, C. (1997). Eco-

nomics of radiation protection: Equity considerations, Theory and De-

cision 43: 24151.

[225] Schofield, J. (1989). Cost-benefit analysis in urban and regional planning,

Unwin and Hyman, London.

[226] Scotchmer, S. (1985). Hedonic prices and cost-benefit analysis, Journal of

Economic Theory 37: 5575.

[227] Sen, A.K. (1986). Social choice theory, in K.J. Arrow and M.D. Intriliga-

tor (eds), Handbook of mathematical economics, Vol. 3, North-Holland,

Amsterdam, pp. 10731181.

[228] Sen, A.K. (1997). Maximization and the act of choice, Econometrica 65: 745

779.

[229] Simon, H.A. (1957). A behavioural model of rational choice in Models of

man, Wiley, New York, pp. 241260.

[230] Sinn, H.W. (1983). Economic decisions under uncertainty, North-Holland,

Amsterdam.

[231] Slowinski, R. (ed.) (1998). Fuzzy sets in decision analysis, operations research

and statistics, Kluwer, Dordrecht.

[232] Sopher, B. and Gigliotti, G. (1993). A test of generalized expected utility

theory, Theory and Decision 35: 75106.

[233] Speck, B.W. (1998). Grading student writing: An annotated bibliography,

Greenwood Publishing Group, Westport.

[234] Stamelos, I. and Tsoukias, A. (1998). Software evaluation problem situa-

tions, Technical report, Cahier du LAMSADE No 156, Universite Paris-

Dauphine, Paris.

[235] Steuer, R.E. (1986). Multiple criteria optimisation: Theory, computation,

and application, Wiley, New York.

[236] Stratton, R.W., Myers, S.C. and King, R.H. (1994). Faculty behavior, grades

and student evaluations, Journal of Economic Education 25: 515.

[237] Sugden, R. and Wiliams, A. (1983). The principles of practical cost-benefit

analysis, Oxford University Press, Oxford.

[238] Sugeno, M. (1977). Fuzzy measures and fuzzy integrals: a survey, in

M.M. Gupta, G.N. Saridis and B.R. Gains (eds), Fuzzy automata and

decision processes, North Holland, Amsterdam, pp. 89102.

BIBLIOGRAPHY 261

Sciences 36: 5983.

[240] Suzumura, K. (1999). Consequences, opportunities and procedures, Social

Choice and Welfare 16: 1740.

[241] Syndicat des Transports Parisiens (1998). Methodes devaluation des projets

dinfrastructures de transports collectifs en region Ile-de-France, Technical

report, Syndicat des Transports Parisiens, Paris.

[242] Tchudi, S. (1997). Alternatives to grading student writing, National Council

of Teachers of English, Urbana.

[243] Teghem, J. (1996). Programmation lineaire, Editions de lUniversite de

Bruxelles-Editions Ellipses, Brussels.

[244] Thaler, R.H. (1991). Quasi rational economics, Russell Sage Foundation,

New York.

[245] Toth, F.L. (1997). Cost-benefit analysis of climate change: The broader per-

spectives, Birkhauser, Basel.

[246] Trystram, G., Perrot, N. and Guely, F. (1995). Application of fuzzy logic for

the control of food processes, Processing Automation 4: 504512.

[247] Tsoukias, A. and Vincke, Ph. (1995). A new axiomatic foundation of partial

comparability, Theory and Decision 39: 79114.

[248] Tsoukias, A. and Vincke, Ph. (1999). A characterization of PQI interval

orders, Proceedings OSDA 98, Electronic Notes on Discrete Mathematics,

pp. (http://www.elsevier.nl/locate/endm), to appear also in Discrete

Applied Mathematics.

[249] Tversky, A. (1969). Intransitivity of preferences, Psychological Review

76: 3148.

[250] United Nations Development Programme (1997). Human Development Re-

port 1997, Oxford University Press, Oxford.

[251] van Doren, M. (1928). An anthology of world poetry, Albert and Charles

Boni, New York.

[252] Vansnick, J.-C. (1986). De Borda et Condorcet a lagregation multicritere,

Ricerca Operativa (40): 744.

[253] Vassiloglou, M. and French, S. (1982). Arrows theorem and examination

assessment, British Journal of Mathematical and Statistical Psychology

35: 183192.

[254] Vassiloglou, M. (1984). Some multi-attribute models in examination assess-

ment, British Journal of Mathematical and Statistical Psychology 37: 216

233.

[255] Vincke, Ph. (1988). P, Q, I preference structures, in J. Kacprzyk and

M. Roubens (eds), Non conventional preference relations in decision mak-

ing, Springer Verlag, Berlin, pp. 7281.

262 BIBLIOGRAPHY

problem, Theory and Decision 32: 221241.

[257] Vincke, Ph. (1992b). Multi-criteria decision aid, Wiley, New York. Origi-

nal version in French LAide Multicritere a la Decision, Editions de

lUniversite de Bruxelles-Editions Ellipses, Brussels, 1989.

[258] Viscusi, W.K. (1992). Fatal tradeoffs: Public and private responsibilities for

risk, Oxford University Press, Oxford.

[259] von Neumann, J. and Morgenstern, O. (1944). Theory of games and eco-

nomic behavior, Princeton University Press, Princeton.

[260] von Winterfeldt, D. and Edwards, W. (1986). Decision analysis and behav-

ioral research, Cambridge University Press, Cambridge.

[261] Wakker, P.P. (1989). Additive representations of preferences A new foun-

dation of decision analysis, Kluwer, Dordrecht.

[262] Warusfel, A. (1961). Les nombres et leurs mysteres, Points Sciences, Seuil,

Paris.

[263] Watson, S.R. (1981). Decision analysis as a replacement for cost-benefit anal-

ysis, European Journal of Operational Research 7: 242248.

[264] Weinstein, M.C. and Stason, W.B. (1977). Foundations of cost-effective-

ness analysis for health and medical practices, New England Journal of

Medicine 296: 716721.

[265] Weitzman, M.L. (1994). On the environmental discount rate, Journal of

Environmental Economics and Management 26: 200209.

[266] Weymark, J.A. (1981). Generalized Gini inequality indices, Mathematical

Social Sciences 1: 409430.

[267] Willis, K.G., Garrod, G.D. and Harvey, D.R. (1998). A review of cost-benefit

analysis as applied to the evaluation of new road proposals in the U.K.,

Transportation Research D 3: 141156.

[268] Yaari, M.E. (1987). The dual theory of choice under risk, Econometrica

55: 95115.

[269] Yu, W. (1992). Aide multicritere a la decision dans le cadre de la

problematique du tri : Methodes et applications, PhD thesis, LAMSADE,

Universite Paris-Dauphine, Paris.

[270] Zadeh, L.A. (1979). A theory of approximate reasoning, in J.E. Hayes,

D. Michie and L.I. Mikulich (eds), Machine intelligence, Elsevier, Am-

sterdam, pp. 149194.

[271] Zadeh, L.A. (1999). From computing with numbers to computing with words.

from manipulation of measurement to manipulation of perceptions, Pro-

ceedings of EUROFUSE-SIC99, pp. 12.

[272] Zarnowsky, F. (1989). The decathlon A colorful history of track and fields

most challenging event, Leisure Press, Champaign.

BIBLIOGRAPHY 263

[273] Zerbe, R.O. and Dively, D.D. (1994). Benefit-cost analysis in theory and

practice, Harper Collins, New York.

Index

action, 212 attributes

actor, 206 hierarchy, 214

acyclic, 126 attributes hierarchy, 213

aggregation, 30, 41, 51, 148, 245 automatic decision, 239

weighted sum, 172 automatic decision systems, 148

additive, 44 axiomatic analysis, 244

compensation, 46, 57, 212

conjunctive rule, 41, 96 bayesian decision theory, 239, 244

constructive approach, 130 binary relation

disaggregation, 117 acyclic, 126

dominance, 93 fuzzy, 21

linearity, 46, 80 incomparability, 19, 130, 220

monotonicity, 10, 61 outranking, 105

multi-attribute value function, 105 semiorder, 20

transitivity, 1820

non-compensation, 141

Borda, 14

paired comparison, 193

Bordas method, 124

procedure, 215

rank reversal, 117 call for tenders, 208

screening process, 96 cardinal, 47

single-attribute value function, 106 client, 206

tournament, 125 coalition, 219

utility, 105 coherence test, 214

utility function, 106 communication, 242

value function, 106 compensation, 46, 57, 212

weight, 35 computer science, 30, 237

weighted average, 42, 59, 85, 241 concordance, 216, 219

weighted sum, 155, 159, 166 concordance threshold, 134

AHP, 111 Condorcet, 13

rank reversal, 117 paradox, 51

air quality, 61, 63 Condorcets method, 125

Allais paradox, 191 conjunctive rule, 41, 96

ambiguity, 212 consistency, 84

analyst, 206 constructive approach, 130

anchoring effect, 34 corporate finance, 73

Arrow, 16 correlation, 102

aspiration level, 96 cost-benefit analysis, 71, 238

264

INDEX 265

markets, 76 St. Petersburg game, 187

net present social value, 76 disaggregation, 117

price, 75, 85 discordance, 138

price of human life, 81 discounting, 74, 86, 239

price of time, 80 net present value, 74

public goods, 78 social rate, 76, 82

social benefits, 75 dominance, 51, 93, 96

social costs, 75 dynamic consistency, 202, 239

social welfare, 77, 86

credibility index, 139 economics, 71, 237

criteria education science, 237

coalition, 219 elections, 237

coherence test, 214 ELECTRE-TRI, 215, 228

coherent family, 214 Ellsbergs paradox, 192

hierarchy, 213, 214 engineering, 30, 237

interaction, 103 environment, 71, 82

point of view, 212 equity, 79, 86

relative importance, 216, 219 evaluation

cycle reduction, 135 absolute, 212

model, 1, 40, 51, 206, 213

decathlon, 63, 66 problem statement, 215

decision software, 207, 213

dynamic consistency, 202, 239 evaluation model

legitimation, 220 problem statement, 212

model, 1 expected value, 187

decision aiding process, 206 externalities, 78

decision model, formal, 84, 212, 237

decision process, 85, 206 final recommendation, 219

actor, 206 forecasting, 80

analyst, 206 fuzzy, 21

client, 206 control, 149, 165

decision rule, 149, 151 implication, 166

decision support, 210 interval, 161

evaluation model, 206, 213 labels, 161

final recommendation, 219 rule, 169

learning process, 119 set, 161, 169

problem formulation, 206, 211,

212 GPA, 48

problem situation, 206 grade, 29, 237

problem statement, 212, 215 anchoring effect, 34

decision table, 152 GPA, 48

decision theory, 237 marking scale, 33

Allais paradox, 191 minimal passing, 36, 42

Ellsbergs paradox, 192 standardised score, 34

expected utility, 189, 201 graphology, 237

266 INDEX

heuristics, 242 ordinal, 39, 47, 214

hierarchy, 213, 214 ratio scale, 98

human development, 54, 61 reliability, 33

scale, 32, 38, 79

ideal point, 117 standard sequences, 107

implication, 166 subjective, 214

imprecision, 51, 103 validity, 33

incomparability, 19, 130, 220 model, 40, 245

independence, 58, 102 structuration, 84, 242, 245

of irrelevant alternatives, 15 mono-criterion analysis, 76, 85

separability, 214 monotonicity, 10, 61

indicator, 238

indices, 173 nearest neighbours, 172

indifference threshold, 201 net present social value, 76

interaction, 103 net present value, 74

interactive methods, 105 nominal scale, 214

interpolation, 155 non-compensation, 141

interval scale, 99

intuition, 242 operational research, 30, 237

ordinal, 39, 47, 214

kernel, 131 outranking, 105

outranking methods, 124, 129

learning process, 119 discordance, 138

legitimation, 220 concordance, 216, 219

linear scale, 104 concordance threshold, 134

credibility index, 139

majority rule, 125 cycle reduction, 135

manipulability, 17 ELECTRE-TRI, 215, 228

markets, 76 incomparability, 130

marking scale, 33 indifference threshold, 201

mathematics, 30 majority rule, 125

MCDM, 238 PROMETHEE, 193

ideal point, 117 veto, 216, 219

interactive methods, 105

sorting, 212 paired comparison, 193

profiles, 220 point of view, 212

substitution rate, 57 political science, 237

swing-weight, 110 preference

trade-off, 101 model, 214, 245

meaningfulness, 62, 227 nontransitive, 130

measurement, 38, 51, 67, 212 relation, 125

absolute scale, 115 threshold, 50

cardinal, 47 price, 75, 85

interval scale, 99 price of human life, 81

linear scale, 104 price of time, 80

meaningfulness, 62, 227 priority, 111

INDEX 267

problem formulation, 206, 211, 212 201, 239

problem situation, 206 endogenous, 218, 221

problem statement, 212, 215 exogenous, 218

PROMETHEE, 193 utility, 105, 106

public goods, 78 expected, 189, 201

ranking, 212 multi-attribute, 105

ratio scale, 98 single-attribute, 106

relative importance, 216, 219 veto, 216, 219

risk, 239 voting procedure

robustness, 86, 242, 245 Bordas method, 124

rule Concordet paradox, 51

aggregation, 148 Condorcets method, 125

manipulability, 17

scale, 32, 38, 79 unanimity, 13

screening process, 96

security, 81 weight, 35

semiorder, 20 weighted average, 42, 59, 85, 241

sensitivity analysis, 83 weighted sum, 155, 159, 166, 172

stability, 99, 101

separability, 214

similarity

indices, 173

relation, 173

social benefits, 75

social costs, 75

social rate, 76, 82

social welfare, 77, 86

software, 207, 213

sorting, 212

St. Petersburg game, 187

stability, 99, 101

statistics, 237

structuration, 84, 242, 245

subjective, 214

substitution rate, 57

t-norm, 164

threshold, 50, 173

tournament, 125

trade-off, 101

transitivity, 1820

transportation, 71, 79

unanimity, 13

- Critical ReasoningUploaded byShine Sam Shine
- The Timing of ElectionsUploaded bynxxnyc
- Factsheet on the 2010 Parliamentary Election in AfghanistanUploaded byNATO Civil MIlitary Fusion Centre (Archive)
- PoliticsUploaded byNaushaba Parveen
- BaliUploaded byEdward_Leung_3207
- Submission to Individual Electoral Registration (IER) Consultation FromUploaded byhelenduffett
- Elections in MalaysiaUploaded byMegat Mohd Azman Adzmi
- Miranda v. AbayaUploaded bySarah Rosales
- Seventh Circuit Decision in Common Cause Indiana v. Individual Members of the Indiana Election CommissionUploaded byIVN.us Editor
- US Supreme Court: 06-713Uploaded bySupreme Court
- ElectionsUploaded byherroyalblaireness
- EnglishUploaded byAsrRoxx
- Thayer Vietnam's National Assembly Elections May 2011Uploaded byCarlyle Alan Thayer
- Sri Lanka Needs Tightening, Not Relaxation, Of the Electoral SchemeUploaded byThavam
- 6. Aquino vs. ComelecUploaded byJuris Formaran
- October 7 2010Uploaded byThe Ontarion
- Marston v. Lewis, 410 U.S. 679 (1973)Uploaded byScribd Government Docs
- CA-GOP Primary OptionsUploaded byBill Gram-Reefer
- On Elections in IndiaUploaded bysaurabh.sangat3733
- Voting Reforms Statement SimonUploaded bySusannah Pasquantonio
- Participation in ElectionsUploaded byJon Marshall
- Rulloda v. COMELEC.docxUploaded bykimgoopio
- Elections ActUploaded byPaul Macharia
- Electoral Politics BEIZE--The Naked TruthUploaded byMyrtle Palacio
- Election ContestUploaded byWreigh Paris
- Representativeness of elections in India-May 2015.docxUploaded byJagdeep Chhokar
- vote at 16Uploaded byAmanda Noemí Nuñez
- Chapter 3 Election LawsUploaded byCarla Cariaga
- g Ch 10Uploaded byajaypal
- Boe Elections Donations HistoryUploaded byMariaPellum

- ElemToOxUploaded bytiago_maia
- Nb-Ta depositsUploaded bytiago_maia
- Mantle & CrustUploaded bytiago_maia
- Metodos_científico e Controversias Nas GeocienciasUploaded bytiago_maia
- Ni-Cu-(PGE) Sulfide magmatic depositesUploaded bytiago_maia
- Application of Stable IsotopeUploaded bytiago_maia
- Controle Biológico Do Ambiente Geoquímico a Hipótese de GaiaUploaded bytiago_maia
- Greenstone and BIFUploaded bytiago_maia

- Levis Strauss & CompanyUploaded byNeetika Chauhan
- World Sect One 022614Uploaded byThe World
- Youth Minstry to YouthUploaded byDavid Arendse
- Mcle LectureUploaded byCoockie Campos
- The Game of GamesUploaded byMike Cechanowicz
- Rib AUploaded byJamshed11
- Abdullah Ibn MasudUploaded byapi-3824061
- SEC v. Elon MuskUploaded byjonathan_skillings
- Abstract Faisal SajjadUploaded byMaria Khan
- Moral DevelopmentsUploaded byIvy lyn Porras
- Developing Critical Communication Theories for CollaborationUploaded byEzzyMunonye
- Chapter 2Uploaded byJomar Teneza
- 9412 eNodeB Compact_Compact Smart LR16.2.L Technical Descriptio.pdfUploaded byarvind24
- Trade Services UpdateUploaded byVijay Kumar
- 403. Prubankers Assn. v. Prudential Bank and CoUploaded byTimothy Wilson
- Customer Satisfaction in LICUploaded bykavitachordiya86
- English School n International EthicsUploaded byosiyes
- Multicast OverviewUploaded byShuvasis Sahu
- Annette C. Baier the Need for More Than JusticeUploaded bymartin
- 2016 Lhc 1638Uploaded byShahbaz Ali
- 27532971wfdUploaded byЈоце Коцев
- Supplier registration form (4).docUploaded byTINA SHARMA
- Carl Marzani & Union Films: Making Left-wing Documentaries during the Cold War, 1946-53Uploaded byLight Industry
- Xenocrates' Daemons and the Irrational Soul.pdfUploaded bySENORTUPSI
- RA 9147Uploaded byNina L. Dela Cruz
- Corporate Finance.pdfUploaded byatul.jha2545
- REPORT ONarospaceUploaded byrashmi
- Rule: Motor vehicle safety standards: Transportation Recall Enhancement, Accountability, and Documentation (TREAD) Act; implementation— Early warning information; reporting requirementsUploaded byJustia.com
- Apeiron Review Issue 4Uploaded byApeiron Review
- Public ParticipationUploaded byNicholas Socrates