You are on page 1of 22

Probabilities, Possibilities, and Fuzzy Sets

John Drakopoulos Stanford University Department of Computer Science Knowledge Systems Laboratory 701 Welch Road, Palo Alto, CA 94304-0106 January 24, 1994
A formal analysis of probabilities, possibilities, and fuzzy sets is presented in this paper. A number of theorems proved show that probabilities carry more information per bit than both possibilities and fuzzy sets. The cost of this higher capacity is increased computational complexity and reduced computational e ciency. The resulting tradeo of high complexity and information capacity versus computational e ciency is discussed under the spectrum of experimental systems and applications.

Abstract

1 Introduction
Probabilities, possibilities, and fuzzy sets are all measures used to formalize and quantify uncertainty. There is an on-going debate regarding the appropriateness of each measure in formalizing uncertainty. Arguments in favour of probabilities can be found in 17, 2] while arguments more in favour of possibilities and fuzzy sets are presented in 16, 13]. A brief presentation and qualitative comparison of the above measures as well as MYCIN's certainty factors ( 26]) and Dempster-Shafer theory ( 5, 25]) appear in 11]. However, a quantitative comparison of those measures in terms of both e ciency and expressiveness is not given. Such a comparison 1

is necessary in order to evaluate and characterize those measures. Some attempts to this direction regard a notion of consistency between probabilities and possibilities 6, 4] and transformations from probabilities and possibilities to Dempster-Shafer theory 14]. However, those transformations cannot tell us about the exact relationship of probabilities and possibilities. Furthermore, the transformations from possibilities to Dempster-Shafer theory as appear in 14] require the elements of the universal set to be ordered in descending possibility values. However, such an ordering does not always exist when the universal set is in nite. This limits the applicability of the transformation to nite universal sets. Finally, there has been no study so far of the extensions of the universal sets that are necessary in order to create maps between probabilities, possibilities, and fuzzy sets. In this paper, we intend to resolve the above issues by presenting a number of theorems that completely specify the relationship between probabilities, possibilities and fuzzy sets. The paper is organized along the general scheme of some preliminary de nitions followed by a number of theorems and a discussion of their implications to systems, architectures, and applications. In the next section, basic de nitions are introduced. These set up the context under which our theorems are presented and consist the foundations of probability, possibility, and fuzzy set theory. In section 3, we present a number of theorems that relate these theories and compare them from the viewpoint of relative expressiveness i.e. the ability of one to simulate the others. Probabilities are proved to be more expressive than both possibilities and fuzzy sets thought they are harder to compute. Possibilities and fuzzy sets can actually simulate probabilities but, in that case, their space requirements are exponential when compared to those of probabilities. The increased complexity of probabilities gives rise to a trade-o which is the subject of section 4. This is a trade-o of complexity and capacity versus e ciency. Experimental systems and approaches as well as di erent applications are discussed. Finally, in our concluding section, it is indicated that both probabilistic and possibilistic approaches are worthwhile. A choice should be made individually for each application depending on its demands in terms of accuracy and e ciency.

2 Mathematical Measures, Systems, and System Classes


A formal de nition of probability, possibility, and fuzzy set theory is given below. We begin with the de nition of a Borel eld 1 : De nition 1 A Borel eld B on a set is a set of subsets of satisfying the following properties: (a) 2 B (b) 8A (if A 2 B then A 2 B); A stands for the complement of A (c) if A1; A2; . . . is any sequence of sets in B then i Ai 2 B Now, we de ne de ne a probability system and a probability measure 2 as follows: De nition 2 A probability system is a triple, ( ; B; P ), where is an arbitrary set (the set of all possible outcomes), B is a Borel eld on (the set of the events of interest) and P is a real valued function de ned for each A 2 B such that: (a) 8A 2 B 0 P (A) 1 (b) P ( ) = 1 (c) if A1;S A2; . . . is any of pairwise disjoint sets in B then P1sequence P( 1 A ) = P ( A ) i i=1 i i=1 A function P that satis es the three conditions above is called a probability measure . The third property is called countable additivity. For a discussion of Borel elds and probability measures see 22]. Similarly, we de ne a possibility system, and a possibility measure ( 7]): De nition 3 A possibility system is a triple, ( ; B; ), where is an arbitrary set (the set of all possible outcomes), B is a Borel eld on (the set of the events of interest) and is a real valued function de ned for each A 2 B such that: (a) (;) = 0 (b) ( ) = 1 (c) if A1S ; A2; . . . is any sequence of sets in B then 1 ( 1 i=1 Ai ) = supi=1 (Ai )
1 A Borel eld is also called event class, sigma algebra, or sigma eld. 2 Originally de ned by A.N.Kolmogorov 15].

Now the function is a possibility measure. Furthermore, we de ne system classes to be classes of di erent systems. Hence, we de ne: De nition 4 A probability system class P is the class of all probability systems ( ; B; P ). A possibility system class is the class of all possibility systems ( ; B; ). 1 A measure F is called extensional i F (S1 i=1 Ai ) = Gi=1 (F (Ai)) where G is an arbitrary function that does not depend on any of the Ai; i = 1; 2; . . .. Otherwise, F would be called intensional . Obviously, probability is an intensional measure though possibility is an extensional one. The de nition can be generalized to extensional and intensional systems 24]. The actual trade-o is between computational e ciency and semantic clarity or content. In the extreme case, one can de ne an extensional measure that is trivial to compute but carries little or no information. On the other hand, one can de ne an intensional measure that carries a lot of information but is di cult to compute. Another important di erence between probability and possibility measures is that the former implies the formula P (A) = 1 ? P (A) (1) where A stands for the complement of A. A similar formula is not implied by the de nition of possibilities. Furthermore, if their de nition is extended to include the above formula then our measure reduces to binary possibility: 8A ( (A) = 1 _ (A) = 0) since 1 = ( ) = (A A) = sup( (A); 1 ? (A)) (2) In 1965, L.A.Zadeh overcame this di culty by abandoning traditional set theory and nding shelter in what he called fuzzy set theory 29]. In that theory, an element x may partially belong to a fuzzy set A. Each fuzzy set is characterized by its membership function fA(x) that states, for any given x, \how much of x belongs to A". Furthermore, membership functions are required to be monotonic and fuzzy set subset-hood is based on them: 8A; B (A B , fA fB ) 4

where fA fB means that 8x (fA (x) fB (x)). Before, we give a formal de nition of fuzzy sets let us rst introduce the concept of the base of a Borel eld: De nition 5 A set is a base of a Borel eld B i the closure of under set union and set complementation (wrt the universal set of B) is equal to B (written as c( ) = B) and no proper subset of has this property. Observe that, we put no restriction to the elements of . They can be fuzzy or ordinary (crispy) sets. Since set union and complementation are well-de ned for both of those kinds of sets, so are their Borel elds and their bases. Now, the complete de nition of a fuzzy set system is as follows: De nition 6 A fuzzy set system is a triple, ( ; D; f ), where is a base of a Borel eld that represents all the fuzzy sets of interest, D is the domain of the fuzzy sets in c( ), and f is a real valued function de ned for each A 2 c( ) and each x 2 D such that: (a) 8x 2 D f;(x) = 0; ; is the empty set in c( ) (b) 8x 2 D f (x) = 1; is the universal set in c( ) (c) if A1; A2; . . . is any sequence of fuzzy sets in B then 8x 2 D fS =1 A (x) = sup1 i=1 fA (x) 1 T 8x 2 D f =1 A (x) = inf i=1 fA (x) (d) 8A 2 c( ) 8x 2 D fA(x) = 1 ? fA(x) Furthermore, a fuzzy set system class F ;D is the class of all fuzzy set systems ( ; D; f ). Fuzzy membership functions, seen as measures on fuzzy sets, are extensional fuzzy possibility measures augmented so that they satisfy the intersection and complement rule as de ned in (c) and (d) in de nition 6. Furthermore, since A A for all purely fuzzy sets A, derivation (2) above does not hold i.e. membership functions are not necessarily binary (i.e. ordinary sets).
1

3 System Relationships
An important question is whether probability and possibility measures or systems (fuzzy or not) are actually di erent. A number of theorems presented 5

below prove that they have some strong similarities as well as di erences. To study them formally, we need to de ne the notion of relative expressiveness of one system with respect to another. In the following, we write ( ; B; M ) to indicate a system where is an arbitrary set, B is a Borel eld on , and M is a real valued function de ned for each A 2 B. Furthermore, we write 2A to indicate the power set of A, for any set A. Now, we de ne: De nition 7 A system S = ( ; B; M ) is less expressive or equal to a system S0 = ( 0; B0; M 0) (written as S S0) i

8A 2 B 9A0 2 B0 M (A) = M 0(A0): The intuition behind this de nition is that if S S0 then the system S0 can simulate S by using M 0(A0) whenever S uses M (A). In the same vein, we
de ne:

De nition 8 A system class C is less expressive or equal to the system class C 0 (written as C C 0 ) i ! 0 ; B 0; M 0 ) 2 C 0 ( 0 0 8B; M ( ; B; M ) 2 C ) 9B ; M ( ; B; M ) ( 0; B0; M 0) and
0 0 0

Therefore, C C0 i each system in C can be simulated by a system in C0 . Similarly, for fuzzy set systems we de ne: De nition 9 A system S = ( ; B; M ) is less expressive or equal to a fuzzy set system S0 = ( ; D; f ) (written as S S0) i
0 0

8A 2 B 9A0 2 c( ) 9x 2 D M (A) = fA (x): Furthermore, S0 S i 8A0 2 c( ) 8x 2 D 9A 2 B M (A) = fA (x): In the case S S0, the fuzzy set system S0 can simulate the system S while in the case S0 S the system S can simulate the fuzzy set system S0. Now,
0 0

we de ne:

De nition 10 A system class C is less expressive or equal to the system class F ;D (written as C F ;D ) i ! ( ; D; f ) 2 F ;D 8B; M ( ; B; M ) 2 C ) 9f ( ; B; M ) ( ; D; f ) and Similarly, F ;D C i ! ( ; B ; M ) 2 C 8f ( ; D; f ) 2 F ;D ) 9B; M ( ; D; f ) ( ; B; M ) and In the following, we denote by B a Borel eld on and by B0 a Borel eld on 0. Furthermore, we write ( ; B; ) to denote a possibility system (i.e. a member of ), write ( ; B; P ) to denote a probability system (i.e. a member of P ), and write ( ; D; f ) to denote a fuzzy set system (i.e. a member of F ;D). Finally, we denote the cardinal of a set A as card(A) and the cardinal of the set N of natural numbers as @0 (see 27] for more details
on cardinal numbers). Now, we can proceed presenting and proving a few second order logic theorems that relate probability, possibility, and fuzzy set systems:

Theorem 1 If is nite or countably in nite then 9 0 card( ) = card( 0) ^ P


In particular, if

is nite then 0 can be chosen so that 0 = .

Proof To prove the above theorem we must prove that, given a possibility system ( ; B; ), we can construct a probability system ( 0; B0; P ) that can simulate it. Therefore, we need to de ne 0; B0; and P so that

8A 2 B 9A0 2 B0

(A) = P (A0)

Furthermore, 0 must be chosen so that card( ) = card( 0). To this purpose, rst observe that can be written as = f!1; !2; . . .g 7

since is either nite or countably in nite. Then, for each i, de ne

LLi = f!j = j < i _ ( j = si = supf j = !j 2 LLig

^ j < i)g

Now, consider the open intervals (si; i); i = 1; 2; . . . and observe that, for each k such that k < i, it is !k 2 LLi and so it must be k si. Therefore

8k

62 (si; i)

i.e. the above intervals are pairwise disjoint. Hence the corresponding closed intervals can intersect in at most a single point. Since the length of at most countably many points is always zero, it must be that the length of the union of any number of the above intervals equals the sum of their lengths:

Length( i si; i]) =

X
i

Length( si; i]) =


(A) sj ; j ]

X
i

( i ? si )

(3)

Now, for each A 2 B, we de ne

SPAN (A) =
and so, by equation (3), it is

Length(SPAN (A)) =
j

X
(A)

( j ? sj )

Since is at most countably in nite, SPAN ( ) should consist of countably many intervals separated by countably many holes. Therefore, it should be SPAN ( ) = (l1; u1) (l2; u2) . . . where l1 < u1 < l2 < u2 < . . . and (a; b) is used to denote the interval from a to b without specifying whether it is closed or not. In addition, we de ne u0 = 0. Now, observe that if is nite then (a) 8!i 2 9!j 2 si = j _ si = 0 (b) 9!i 2 i=1 8

The condition (a) above holds because if is nite then the sets LLi are nite and so their maxima are well de ned and equal to their suprema. Condition (b) holds since 1 = ( ) = supf j = !j 2 g = i; for some !i 2 Therefore, if is nite then the intervals si; i]; i = 1; 2; . . . can be arranged so that they cover the whole interval 0,1] without leaving any holes. In that case we could simply de ne 0 = and so it would be card( 0) = card( ). On the other hand, if is countably in nite then we de ne 0 = 0 0 ; . . .g, where none of the ! 0 ; i = 1; 2; . . . ; belongs to . Obviously f!1; !2 i now it is card( 0 ) = @0 = card( ) Therefore, in either case, it is card( 0) = card( ). Now, we can de ne P as follows: P (f!ig) = i ? si i = 1; 2; . . . P (f!i0g) = li ? ui?1 i = 1; 2; . . . The de nition of P is extended to the rest of 2 by requiring P to satisfy the countable additivity property. Furthermore, for each A 2 B, we de ne A0 = f!j = j (A)g f!j0 = lj (A)g Now, we can de ne B0 to be the closure of all the sets A0, A 2 B, under set union and complementation with respect to 0. Then obviously B0 is a Borel eld. Now, for each A 2 B, it is
0

P (A0) =
j

=
j

(A)

P (!j ) +

X
lj

(A)

( j ? sj ) +

lj

(A)

P (!j0 )
(lj ? uj?1)
(A)

X
(A) lj

= Length(SPAN (A)) + = (A) 9

(lj ? uj?1)

For the last step, simply observe that the intervals in SPAN (A) and the intervals uj?1; lj ]; lj (A) cover the whole interval u0; (A)] and they intersect in at most countably many points. Therefore their total length is equal to (A) ? u0 or equivalently

Length(SPAN (A)) +

lj

(A)

(lj ? uj?1) = (A)

since u0 = 0. The same derivation applies when is nite except that the second sum in the right hand side of the equations above does not exist for SPAN (A) covers the whole interval 0; 1]. Therefore, in either case, it is 8A 2 B (A) = P (A0) Obviously for A = it is A0 = 0 and so P ( 0) = ( ) = 1. Therefore, ( ; B0; P ) would be the desired probability system if 8A 2 B0 0 P (A) 1. However, 8A 2 B0 0 = 1 ? P ( 0) = P (;) P (A) P ( 0) = 1 Q.E.D. The above theorem shows that probabilities are more expressive or equal to possibilities, if is nite or countably in nite. Of course, if the cardinal of is greater or equal to the cardinal of the interval 0; 1] (i.e. if is uncountably in nite) then both and P can be chosen so that their range covers the whole 0; 1] interval. In that case, for each set A of probability p 2 0; 1] we can nd an element ! 2 of possibility (f!g) = p. Similarly, we can map possibilities to probabilities. Therefore, the equivalence of the two systems is established: =P where = P means that P and P . In the same vein, if or D is uncountable and is uncountable then = P = F ;D Therefore, when the domain sets and or D are uncountable then all the above measures are equivalent in terms or expressiveness. In this paper, we mainly study the case where is nite of countably in nite (i.e. card( ) @0). This case is of particular interest in Computer 10

Science where only a nite subset of the rational numbers is representable in our nite machines. Furthermore, in the case where the size of the domain is not exceedingly larger than the range of the above measures, interesting relationships are formed among probabilities, possibilities, and fuzzy sets. The previous and the next few theorems exhibit those relationships.

Theorem 2 8 P

Proof In the following we assume that B0 is a Borel eld on and B is a Borel eld on 2 . We would like to prove that

8B0; P 9B;
(fA0g) (;) (2 ) ( 1 i=1 Ai) = = = =

( ; B0; P ) (2 ; B; )

Therefore, we have to construct B; , given B0; P . This is a straightforward construction. Simply let B be 22 and de ne as follows:

P (A0); 8A0 2 B0 0 1 1 sup (Ai); 8A1; A2; . . . 2 B


i=1

The possibility system (2 ; B; ) is the desired one.

Q.E.D.
0

Theorem 3 If card( 0) @0 and card( ) + 1 < card(2 ) then 3 ) not(P


0

Proof We would like to prove that

9B0; P 8B; not(( 0; B0; P ) ( ; B; )) However, if it were ( 0; B0; P ) ( ; B; ) then it should be that 8A0 2 B0 9A 2 B P (A0) = (A)
3 See

27] for the semantics of cardinal arithmetic.

11

card(P B0) card( B) (4) where P B0 = fP (A0)=A0 2 B0g, for any B0. B is de ned similarly. Now, to prove the theorem, it is enough to construct B0; P so that the above inequality cannot hold for any B; . Since 0 is nite or countably in nite we can write 0 ; ! 0 ; . . .g. Now we can de ne it as: 0 = f!1 2

and so

where m is a constant chosen so that the probabilities above sum up to one. Therefore, ! 0 ) = @0 1 if card ( m = 2 if card( 0 ) is finite
2n ?1
n

B0 = 2 1 ; i = 1; 2; . . . P (f!i0g) = m 2 i
0

This choice of basic probability values make P to be an 1-1 function in B0 since the probability values of two di erent sets are equal if and only if these contain exactly the same !i0 s. Therefore

card(P B0) = card(B0) = card(2 ) = 2card(


0

(5)

Furthermore, measure , for each A 2 B, computes a supremum function over the elements of contained in A. Therefore, it has to be a function onto . The only exception, is when A = ;. In that case, (A) = 0, a value which may not belong to . Therefore,

card( B) = card(

) + 1 card( ) + 1

(6) (7) Q.E.D.

Theorem's hypothesis and equations (5), (6) result in

card(P B0) > card( B)


Equations (4), (7) result in the desired contradiction.

Theorem 4 8 ; P 8 ;D

F ;2 F ;D

(assuming card( ) 2). (assuming D is non-empty and card( ) card( ) ).

12

Proof For the rst part of the proof, we would like to prove that if card( ) 2 then 8 ; B; P 9f ( ; B; P ) ( ; 2 ; f ) i.e. we would like to construct f , given ; ; B; P , so that 8A 2 B 9A0 2 c( ) 9x 2 2 P (A) = fA (x) We choose A0 to be a fuzzy set with domain 2 so that 8x 2 2 fA (x) = P (x) This is possible, since card( ) 2 and so contains at least one fuzzy set other than the empty or universal fuzzy set. Now extend the de nition of f to the rest of c( ) according to the properties of fuzzy membership functions ((a); (b); (c); (d) in de nition (6)). This extension is possible since f is completely de ned for A0 while f 's values are irrelevant to our proof for all other fuzzy sets in (except of course the universal fuzzy set 0 where f should be equal to 1 for each x 2 D). Therefore ( ; 2 ; f ) is the desired fuzzy set system. For the second part, we have to construct ; f , given D; ; B; , so that if D is non-empty then 8A 2 B 9A0 2 c( ) 9x 2 D (A) = fA (x) To this purpose, nd a base B of B and make contain a purely fuzzy set for each element of B . This is possible since B 2 and so card( B) card( ) card( ) Furthermore, since D is non-empty (i.e. there exist at least one x 2 D) and for each A 2 B exists A0 2 , we can de ne 8A 2 B fA (x) = (A) Since fA (x), when x is kept constant, is itself a possibility measure the above equality holds for each A 2 B and its corresponding A0 2 c( ). Therefore the resulting possibility system4 ( ; D; f ) is the desired one. Q.E.D. 4 The de nition of f in the rest of D is irrelevant to this proof. For example, f may be
0 0 0 0 0

zero there.

13

Theorem 5 If card( ) card( ) card(D) and card( ) @0 then F ;D P


Proof Let ( ; D; f ) 2 F ;D . Then, for each x 2 D, ( ; fxg; f ) is equivalent to a possibility system ( x; Bx; x) i.e.

and

( ; fxg; f ) ( x; Bx; x) ( x ; Bx; x) ( ; fxg; f )

(8)

where the elements of c( ) are in a bijective relation with the elements of Bx. Hence, it must be card( ) card( x). Therefore, the smallest cardinality of x that would be guarranteed not violate the above relation is @0 (since card( ) @0). In other words, x can be chosen to be countably in nite. Then by theorem 1 we get
x

and so

( x ; B x ; x ) ( 0 x ; B 0x ; P x ) where card( ) = card( 0). Then by (8), (9) and the transisitivity of \ ", we get: ( ; fxg; f ) ( 0x; B0x; Px)

(9)

Now, by choosing the sets 0x to be disjoint for di erent x's and taking the union over all x 2 D in the last formula above we get: ( ; D; f ) ( ; B0; P ) where = x2D 0x B0 = c( x2DB0x) P = extension of all Px over B0 Obviously, card( ) = card(
x2D x )

card( ) card(D)
14

Q.E.D.

Theorem 6 If both card( ) and card(D) are polynomially related to card( )


then

not(P

F ;D)

Proof We would like to prove that

9B; P 8f not(( ; B; P ) ( ; D; f )) We assume to the contrary that ( ; B; P ) ( ; D; f ). Then it must be card(P B) 2 card( ) card(D)
(the factor of 2 in the formula above is because each fuzzy set in contributes at most card(D) values by itself and at most card(D) values through its complement which cannot belong to ). However, card(P B) can be as large as 2card( ) (see the proof of theorem 3). Therefore it must be 2card( ) 2 card( ) card(D) This and the hypothesis that both card( ) and card(D) are polynomially related to card( ) result in the desired contradiction. Q.E.D. Figure 1 represents our results graphically. The nodes in the graph represent system classes and the links represent \less expressive or equal to" relations. The labels on some of the arcs indicate special conditions under which the corresponding \ " relations hold. The meanings of the labels are as follows: Label Meaning >= 2 card( ) 2 D >= 1 card(D) 1, >= card( ) card( ) :D <= card( ) card(D) card( ) We assume that the sets ; ; D are nite or countably in nite. The dashed line is a link that has not been proved explicitly in any of the previous theorems but can be derived easily from the following: card( ) card( 0) ) C C
0

15


.D <= D >= 1 >=

P
>= 2

F, D

F ,2

Figure 1: Expressiveness relations among probabilities, possibilities, and fuzzy sets.

4 Probabilities vs Possibilities
Theorems 1 and 3 of the previous section showed that possibilities are less expressive than probabilities. Another way to express that is that probabilities can simulate possibilities without any extra space requirements while the opposite is not true, in general. Theorem 2 reveals that when extra space requirements can be greatly tolerated then probabilities can be simulated by possibilities, too. However, as theorems 1 and 3 showed, probabilities carry strictly more information per bit than possibilities. Unfortunately, the space requirements to simulate probabilities by possibilities are excessive since the size of 2 is exponentially larger than the size of 5. Therefore, in practice, we may regard that probabilities cannot be simulated by possibilities. Theorems 4, 5, and 6 reveal that a similar relation exists between probabilities and fuzzy sets. Probabilties can simualte fuzzy sets without any extra space requirements: they both require O(card( ) card(D)) space. However, fuzzy serts cannot simulate probabilities without exponentially greater requirements (theorem 6). The second part of theorem 4 shows that fuzzy sets
is countably in nite then the cardinality of 2 is equal to the cardinality of the set R of the real numbers. See 27] for details.
5 When

16

can simulate possibilities without extra space requirements. This comes at no surprise for fuzzy sets have been de ned in a possibilistic way. The ip side of the information capacity advantage of probabilities is their e ciency disadvantage. Possibilistic measures like possibilities and fuzzy sets are easier to compute than probabilities since the former are extensional while the later are intensional measures. This theoretical advantage of possibilities is very noticeable in practice where one can easily compute the possibility value of a compound logical expression or relation among events or entities by simply applying the de nition. On the other hand, using probabilities, such a task is either very di cult or impossible. For example, evaluating a Bayesian Network 1, 23], is computationally an NP-hard problem ( 3]). Although there are simple and e cient algorithms for the restricted class of singly connected networks ( 19]), in general, exact algorithms are very complicated and often very ine cient. This led to approximate evaluations of the conditional probabilities in a Bayesian network ( 1, 24]), a solution which introduces an error which in turn reduces the information probabilities carry. Furthermore, the structure of the network imply independence assumptions which do not always capture all the possible situations and as such act as a new source of error and information loss. Therefore, it appears that probabilities often carry too much information imposing approximate or ine cient solutions. Probabilistic approaches are not limited to Bayesian networks. Probabilistic logic 20] is another example. In general, Bayesian statistics have been used as a statistical theory of evidence, uncertainty, and inference ( 10, 18]). Furthermore, a number of probabilistic techniques have been applied for pattern recognition and classi cation. These include (but are not limited to) Bayes classi er and learning, Parzen windows, hidden Markov models, knearest neighbor, and stochastic approximation methods. A good overview of them appers in 9]. Despite the wide range of applications and techniques of probabilities, problems like the ones mentioned above as well as some empirical evidence that people are very poor probability estimators (see 12]) led researcher to nd alternative solutions and explore new measures. Approaches that do not exacly t in the probability landscape include MYCIN's certainty factors 26] and Dempster-Shafer theory 5, 25]. Certainty factors require strong independence conditions to hold among their primitive units (rules) in order to give accurate results. Therefore they 17

have the same problems that probabilities had though to a lesser extent. On the other hand, the information they carry is not as precise and semantically clear as that of probabilities. In short, certainty factors stand between probabilities and possibilities both in terms of information content and computational e ciency. Dempster-Shafer theory has been shown to be a broad theory that subsumes both probabilities and possibilities. When the focal elements (e.g. elements of non-zero basic probability density or assignment m) are sigletons or nested then Dempster-Shafer theory collapses to probability or possibility theory, respectively 13, 14]. However, the case of possibilities, as appears in 13, 14], applies only for nite universal sets for their elements must be ordered in descending possibility values. Such an ordering is not always possible for in nite sets. Of course, a transformation similar to the one used in theorem 1 can be used to reduce possibilities to Dempster-Shafer theory in the general case. However, the opposite transform, in general, requires exponentially larger spaces. This is due to theorem 3 and the fact that Dempster-Shafer theory subsumes probability theory. In short, the structure of the focal elements (singletons or nested) will determine whether the de ned measures will be probability or possibility measures (respectively). However, they cannot be both at the same time. On the other hand, if the focal elements are neither sigletons nor nested then Dempster-Shafer theory is neither probability nor possibility theory. In that case, similar to certainty factors, Dempster-Shafer theory stands in between probability and possibility theory. Computational e ciency is sacri ced if the entity primitive units (sets of propositions) are large. Accuracy is reduced unless strong independence conditions hold among the functional primitive units (belief functions). Possibilistic measures, which are the most e cient of all to compute for they require no special applicability conditions, do carry the least information of all. Therefore their clear computational advantage should not come at a surprise. Applications of possibilities are mainly within the spectrum of fuzzy set applications. Fuzzy sets and logic have been used in representation and approximate reasoning 28], pattern recognition 8, 21], operations research 8], and modeling uncertainty and control 8]. Applications include decision support systems, expert systems, natural language processing ( 28]), database management, linear programming, robotics, vision ( 8]), clustering, classi 18

cation, image analysis ( 8, 21]), and speech recognition ( 21]).

5 Conclusion
The choice between probabilities and possibilities or fuzzy sets is not easy. In general, probabilities should give more accurate estimations though they would usually carry too much information for the actual computations to be numerically accurate or computationally tractable. Over-simplifying a probabilistic model so that its computations can be easily performed may reduce the information conveyed to such a level that possibilities would be preferable not only for their simplicity and e ciency advantage but also for their information content. Depending on the nature of each individual application or situation under consideration and its computational demands in terms of accuracy, simplicity, and e ciency, one can choose the best measure to use in his model or system. The existence of possibilistic and probabilistic measures and their already proved irreducible di erences should not be seen as a limiting constraint but rather as a resourcefull option.

Acknowledgements
This research was partially supported by NASA grant NAG2-581 (ARPA Order #8607) and Teknowledge Federal Systems, Inc. contract 71715-1 (ARPA contract DAAA21-92-C-0028).

References
1] E. Charniak. Bayesian networs without tears. AI Magazine, 12(4):50{63, 1991. 2] P. Cheeseman. Probabilistic versus fuzzy reasoning. In L.N. Kanal and J.F. Lemmer, editors, Uncertainty in Arti cial Intelligence, pages 85{ 102. North Holland, Amsterdam and New York., 1986. 3] G.F. Cooper. Probabilistic inference using belief networks is NP-hard. Technical Report KSL{87{27, Medical Computer Science Group, Stanford University, 1987. 19

4] M. Delgado and S. Moral. On the concept of possibility-probability consistency. Fuzzy Sets and Systems, 21:311{318, 1987. 5] A.P. Dempster. A generalization of Bayesian inference. Journal of the Royal Statistical Society, Series B, 30:205{247, 1968. 6] D. Dubois and H. Prade. Unfair coins and necessity measures: towards a possibilistic interpretation of histograms. Fuzzy Sets and Systems, 10:15{20, 1983. 7] D. Dubois and H. Prade. Fuzzy numbers: An overview. In J.C. Bezdek, editor, Analysis of Fuzzy Information, pages 3{39. CRC Press, Boca Raton, Fla., 1987. 8] D. Dubois, H. Prade, and R. Yager. Readings in Fuzzy Sets for Intelligent Systems. Morgan Kaufman, San Mateo, CA, 1993. 9] R.O. Duda and P.E. Hart. Pattern Classi cation and Scene Analysis. Wiley, New York, 1973. 10] M. Genesereth and N. Nilsson. Logical Foundations of Arti cial Intelligence. Morgan Kau mann, Los Altos, CA, 1987. 11] S.J. Henkind and M.C. Harrison. Analysis of four uncertainty calculi. IEEE Trans. on Man Systems and Cybernetics, 18(5):700{714, 1988. 12] D. Kahneman, P. Slovic, and A. Tversky, editors. Judgement under Uncertainty: Heuristics and Biases. Cambridge University Press, New York, 1982. 13] G. Klir. Is there more to uncertainty than some probability theorists would have us believe? Int. Journal of General Systems, 15(4):347{378, 1989. 14] G. Klir and B. Parviz. Probability-possibility transformations: A comparison. Int. Journal of General Systems, 21(1):291{310, 1992. 15] A.N. Kolmogorov. Foundations of the Theory of Probability. Chelsea, New York, 1950. 20

16] B. Kosko. Fuzzines vs. probability. Int. Journal of General Systems, 17(2{3):211{240, 1990. 17] D.V. Lindley. The probability approach to the treatment of uncertainty in arti cial intelligence and expert systems. Statistical Science, 2(1):17{ 24, 1987. 18] J. Lukasiewicz. Logical foundations of probability theory. In L. Berkowski, editor, Jan Lukasiewicz, Selected Works, pages 16{43. NorthHolland, Amsterdam, 1970. 19] E. Neapolitan. Probabilistic Reasoning in Expert Systems: Theory and Algorithms. Wiley, New York, 1990. 20] N.J. Nilsson. Probabilistic logic. Arti cial Intelligence, 28(1):71{87, 1986. 21] S.K. Pal and D.K. Dutta Majumder. Fuzzy mathematical approach in pattern recognition problems. Wiley, New York, 1986. 22] A. Papoulis. Probability, random variables, and stochastic processes. McGraw-Hill, New York, 1991. 23] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kau man, Palo Alto, 1988. 24] J. Pearl. Reasoning under uncertainty. Annual Review of Computer Science, 4:37{72, 1990. 25] G. Shafer. A Mathematical Theory of Evidence. Princeton University Press, Princeton, NJ, 1976. 26] E.H. Shortli e. Computer-Based Medical Consultations: MYCIN. Elsevier, New York, 1976. 27] R.R. Stoll. Set Theory and Logic. W.H. Freeman and Company, San Fransisco and London, 1963. 28] R.R. Yager, S. Ovchinnikov, R.M. Tong, and H.T.Nguyen, editors. Fuzzy Sets and Applications: Selected Papers by L.A. Zadeh. Wiley, New York, 1987. 21

29] L.A. Zadeh. Fuzzy sets. Inf. Control, 8(338), 1965.

22