You are on page 1of 35

This article was downloaded by: [Columbia University]

On: 15 October 2014, At: 17:40


Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,
37-41 Mortimer Street, London W1T 3JH, UK

International Journal of General Systems


Publication details, including instructions for authors and subscription information:
http://www.tandfonline.com/loi/ggen20

A MATHEMATICAL ANALYSIS OF INFORMATION-


PRESERVING TRANSFORMATIONS BETWEEN
PROBABILISTIC AND POSSIBILISTIC FORMULATIONS OF
UNCERTAINTY
a a
JAMES F. GEER & GEORGE J. KLIR
a
Department of Systems Science , Thomas J. Watson School of Engineering and Applied
Science, State University of New York , Binghamton, New York, 13902-6000, U.S.A
Published online: 27 Apr 2007.

To cite this article: JAMES F. GEER & GEORGE J. KLIR (1992) A MATHEMATICAL ANALYSIS OF INFORMATION-PRESERVING
TRANSFORMATIONS BETWEEN PROBABILISTIC AND POSSIBILISTIC FORMULATIONS OF UNCERTAINTY, International Journal of
General Systems, 20:2, 143-176, DOI: 10.1080/03081079208945024

To link to this article: http://dx.doi.org/10.1080/03081079208945024

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained
in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the
Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and
are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and
should be independently verified with primary sources of information. Taylor and Francis shall not be liable for
any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever
or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of
the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
In,. I . Ccnerol System. Vol. 20. pp. 143-176 0 1992 Gordon md Breach Scicncc hblirhcn S.A.
Repnne available directly from the publirhcr Rinted in thc United Kingdom
Phataopying permitted by liccnw only

A MATHEMATICAL ANALYSIS OF
INFORMATION-PRESERVING
TRANSFORMATIONS BETWEEN PROBABILISTIC
AND POSSIBILISTIC FORMULATIONS OF
UNCERTAINTY

JAMES F. GEER and GEORGE J. KLIR


Department of Systems Science, Thomas J . Watson School of Engineering and
Downloaded by [Columbia University] at 17:40 15 October 2014

Applied Science, State University of New York. Binghamton.


New York, 13902-6000, U . S . A .

(Received June 7. 1991: in final form Augusr 1 . 1991)

It is now generally recognized that uncertainty can be formalized in different mathematical theories.
Two of these theories. on which we focus in this paper, are probability theory and possibility theory.
The paper deals with transformations from probabilistic formalizations of uncertainty into their possi-
bilistic countrrparts that contain the same amount of uncertainty and, consequently, the same amount
of information (expressed as a reduction of uncertainty) as well; it also deals with the inverse uncertainty
and information preserving transformations. Since well-justified and unique measures of uncertainty (and
information) are now well established in both probability theory and possibility theory, the tnnsfor-
mations are well defined. Mathematical properties of the transformations are analyzed in the paper under
the assumption that probabilities and possibilities are connected via interval or log-interval scales. The
primary results are: (i) the interval scale transformation that preserves information exists and is unique
only from probability theory to possibility theory, but the inverse transformation does not always exist;
(ii) the log-interval scale transformation exists and is unique in both directions; and (iii) the log-interval
scale transformation satisfies the probability-possibility consistency requirement.

INDEX TERMS: Uncertainty. information, probability theory, possibility theory. information-pre-


serving transformations. Shannon entropy, nonspecificity, discord, interval scales,
log-interval scales.

1. INTRODUCTION

Consider a mathematical system that was constructed to model some aspect of real-
ity. Assume that the purpose for which the system was constructed (such as predic-
tion, retrodiction, or prescription) involves some uncertainty. This uncertainty (pre-
dictive, retrodictive, or prescriptive) is formalized within some mathematical theory.
This means, in essence, that the uncertainty is expressed in relevant numerical values
(degrees of belief, weights of evidence, etc.). These values conform to the axiomatic
constraints of the theory and are derived by certain rules (including, possibly, sub-
jective judgements) from some inconclusive evidence regarding the phenomenon of
concern. When the model is utilized for answering pertinent questions (e.g. giving

The work on this paper was partially supported by the National Science Foundation under Grant No.
IRI-90 15675.
I44 JAMES F. GEER AND GEORGE J. KLIR

predictions of a requested kind), these numerical values are manipulated according


to the calculus of the theory.
For three centuries (from the mid-seventeenth century until the mid-twentieth cen-
tury), uncertainty was conceived solely in terms of probability theory. This seem-
ingly unique connection between uncertainty and probability has recently been chal-
lenged by several alternative theories of uncertainty. The most visible of these theories,
which began to emerge in the 1950's, are the theories of Choquet's capacities,' fuzzy
set^.^".^^ fuzzy measure^,^^.^^.^' random set^,^^"^'^ imprecise probabilities,2s~'6and rough
sets,4." as well as possibility and the Dempster-Shafer theory of evidence."
By investigating uncertainty within these theories, it has increasingly been recog-
nized that probability theory is not capable, in fact, of capturing the full scope of
un~ertainty.~.~'
An important aspect of every theory of uncertainty is the capability to quantify
the uncertainty involved. This means the capability of measuring, in an adequately
Downloaded by [Columbia University] at 17:40 15 October 2014

justified way, the amount of uncertainty associated with each possible characteriza-
tion of uncertainty within the theory. Moreover, the measured amount must be unique
when we choose a particular measurement unit.
The concept of uncertainty is intuitively connected with the concept of informa-
tion. For example, when uncertainty in predicting an experimental outcome is re-
solved by observing the actual outcome, the amount of information obtained by the
observation may be measured by the amount of uncertainty prior to the observation.
Similarly, when uncertainty regarding some historical event is reduced by finding a
relevant historical document, the amount of information in the document (with re-
spect to the event of concern) may be measured by the amount of uncertainty re-
duced. In general, the amount of information obtained by an action may be measured
by the reduction of uncertainty that results from the action.
Consider now that the action by which we reduce uncertainty is the construction
of a system that models some aspect of the real world. Such a system must be
l'orniulated within a particular experimental frame. A question posed to the system
may be answered, in general, with some uncertainty. If no model were available
within the experimental frame, the question would be answered with the highest
possible uncertainty allowed by the experimental frame, expressing thus our total
ignorance regarding the phenomenon involved. [nformation obtained by constructing
the model (or information contained in the model) with respect to the question of
concern is measured by the difference between the highest uncertainty associated
with the experimental frame and the actual uncertainty expressed by the model.
Some of the theories of uncertainty are more general than others, while some are
not comparable in this respect. The theories also differ from one another in their
meaningful interpretations, computational complexit robustness, and other aspects.
It has recently been argued on several occasions7.8.Y4.23 that none of the theories is
superior in all situations. Each theory seems to have some advantages and some
disadvantages when compared with the other theories. Furthermore, this comparison
is context-dependent; each theory is suitable for some types of evidence and unfit
for others. Which theory to use in each application context should be decided by
appropriate metarules on the basis of the type of evidence involved, estimated com-
putational complexity, and other criteria.
In order to utilize opportunistically advantages of the various theories of uncer-
tainty in knowledge-based systems, we need the capability of moving from one the-
ory to another as appropriate. These moves, or transformations, from one theory to
another should satisfy some justifiable requirements. As argued previously," we should
INFORMATION-PRESERVING TRANSFORMATIONS

require that the numbers expressing uncertainty in one theory (probabilities, possi-
bilities, weights of evidence, etc.) be transformed into the corresponding numbers
in another theory by an appropriate scale. In addition, we should also require that
the amount of uncertainty (and information) be preserved under the transformation.
Transformations that satisfy these two requirements (scaling and uncertainty in-
variance) are called information preserving transformations. These transformations
guarantee that no information is unwittingly added or eliminated solely by moving
from one mathematical theory of uncertainty into another within the same experi-
mental frame.
Two theories of uncertainty, on which we focus in this paper, are probability
theory and possibility theory. The paper deals with information-preserving transfor-
mations between the two theories, which were first proposed at the 1989 IFSA Con-
gress in seattlel0 and discussed more thoroughly in a subsequent paper." It was
shown that ratio and difference scales are too rigid and, consequenlly, not applicable
Downloaded by [Columbia University] at 17:40 15 October 2014

to these transformations. Ordinal scales, on the other hand, are too loose and, hence,
lead to ambiguous transformations. Interval and log-interval scales were shown to
be appropriate candidates for unique information-preserving transformations, but their
mathematical properties were not adequately scrutinized. A mathematical analysis
of these transformations, especially their existence and uniqueness, is the subject of
this paper. The primary results are: (i) the interval scale transformation that preserves
information exists and is unique only from probability theory to possibility theory,
but the inverse transformation does not always exist; (ii) the log-interval scale trans-
formation exists and is unique in both directions; and (iii) the log-interval scale trans-
formation satisfies the probability-possibility consistency requirement.

2. RELEVANT BACKGROUND

In this paper, we deal with probability and possibility theories defined only on finite
sets. We assume that the reader is familiar with these theories at least at the level
at which they are covered in the text by Klir and ~ 0 1 ~ e rHowever,
.l~ to introduce
the relevant terminology, notation, and conventions, we briefly overview in this sec-
tion basic notions of the two theories. For this purpose, it seems appropriate to over-
view the theories from a broader perspective of the Dempster-Shafer theory, under
which they appear as special cases.
Let X denote a universal set under consideration, assumed here to be finite, and
let P(X) denote the power set of X. One way of formulating the Dempster-Shafer
theory is to define a function

m : P(X)+ [0, I ]

such that

m(4) = 0 and 2 m(A)


AgX
= I.

This function is called a basic probability assignmenr; the value m(A) expresses the
degree of belief (based on relevant evidence) in a proposition that is represented by
146 JAMES F. GEER AND GEORGE J . KLIR

the set A , but it does not include possible degrees of belief in additional propositions
represented by the various subsets of A . That is, m(A) expresses the degree of belief
that is committed exactly to set A , not the degree of total belief committed to A . To
obtain the latter, we have to add to m(A) the values m(B) for all proper subsets B
of A . The roral belief committed to set A , B e l ( A ) , is thus an aggregate calculated
from the basic evidential claims by the formula

Basic assignment is also employed for calculating a degree of roral plausibiliry,


P I ( A ) . of a proposition represented by a set A . This degree is obtained by adding to
tn(A) the values of m(B) for all sets B that overlap with A . That is,
Downloaded by [Columbia University] at 17:40 15 October 2014

Functions Be1 and PI define, respectively, superadditive and subadditive measures


on X . They are connected by the equation

where A denotes the complement of A . By Eqs. (1) and (2). clearly,

for all A E P ( X ) .
Every set A E P ( X ) for which m(A) # 0 is called a focal element. The pair ( F , m),
where F is the set of all focal elements associated with m, is called a body of evidence.
When all focal elements are singletons, belief and plausibility measures collapse
into a single measure, a classical additive probabiliry measure. Any probability mea-
sure, Pro, on a finite set X is uniquely determined by a probabiliry disrriburion
funcrion

via the forn~ula

From the standpoint of the Dempster-Shafer theory,

hence, it is required that


INFORMATION-PRESERVING TRANSFORMATIONS 147

When all of the focal elements are nested (ordered by set inclusion), the body of
evidence is called possibilistic. In this case, we obtain special belief and plausibility
measures, which are called necessity and possibility measures, n:spectively. A pos-
sibility measure, Pos, is conveniently (and uniquely) determined by a possibility
distribution function

via the formula

Pos(A) = max r(x)


=€A

for all A E P(X); it is required that


Downloaded by [Columbia University] at 17:40 15 October 2014

max r(x) = 1.
SEX

The corresponding necessity measure, Nec. is determined for all A E P(X) by a


formula equivalent to Eq. (3),

Furthermore, possibility and necessity measures satisfy the equations

Pos(A U B) = max[Pos(A), Pos(B)],


Nec(A f l B) = min[Nec(A), Nec(B)].

Assume that X = {x,, x,, . . ., x,,} and let

A, C A , C . . . CA,,

where A, = {xl, x2. . . ., x,}, i = I , 2, . . ., n, be a complete sequence of nested


subsets that contains all focal elements of a possibility measure Pos. That is, m(A)
= 0 for each A @ {A,, A,, . . ., A,}. Let mi = m(A,) and r, = r(xi) for all i = 1 , 2,
. . . , n. Then the n-tuples

fully characterize the basic assignment and the possibility distribution, respectively,
by which the possibility measur'e Pos is defined. The possibility distribution r is
ordered in the sense that r, 2 r,,, for all i = 1, 2, . . ., n - 1. It is well known that

for all i = 1, 2, . . ., n, where r,,, = 0 by convention.I2


148 JAMES F. GEER AND GEORGE J . KLIR

Possibility theory can be formulated not only in terms of the Dempster-Shafer


theory, but also in terms of fuzzy Given a normal fuzzy set A defined by
its membership grade function

a possibility distribution function induced by A is defined as numerically equal to


PA. i.e.,

for all x E X. In this interpretation of possibility theory, focal elements correspond


to distinct a-cuts, A,, of the fuzzy set A. These are sets
Downloaded by [Columbia University] at 17:40 15 October 2014

A, = {x I h ( x ) 2 a}.

Since, clearly, A, C Ag when a > P , the a-cuts of every fuzzy set are nested.

3. MEASURES O F UNCERTAINTY

It is well knownI2 that the Dempster-Shafer theory is capable of capturing two types
of uncertainty. One type is connected with evidential claims focusing on propositions
that are not specific, the other with propositions that conflict with one another. Mea-
sures of both types of uncertainty are now well established.
Given a body of evidence (m, F), its nonspecificiry. N(m), is expressed by the
forniula

N(m) = m(A) log21Al,


AEF

where IA~denotes the cardinality of A. Function N, which was proven unique under
appropriate requirements,'' measures nonspecificity in units that are called bits: one
bit of uncertainty (in this case nonspecificity) is equivalent to the total ignorance
regarding the truth or falsity of one proposition. The range of the function is

N(m) = 0 r f f all focal elements are singletons; N(m) = log21X(iff m(X) = I.


An adequately justified measure of the second type of uncertainty, connected with
evidential claims focusing on conflicting propositions emerged only recently." This
measure is expressed by a function D given by the formula

D(tn) = - m(A) l o g 2 ( z m(B) -


AEF BEF lB I
INFORMATION-PRESERVING TRANSFORMATIONS 149

This function, which is referred to as discord, measures the conflict among evidential
claims in bits. Its range is

D(m) = 0 ~ f m
f ( A ) = I for one particular set A ; D ( m ) = log,l~liff ~ nis the uniform
probability distribution on x . ' ~
We may also define the amount of total uncertainty, T ( m ) , as the sum of the
amounts of the two types of uncertainty, N ( m ) and D(rn), that coexist in the theory.
That is
Downloaded by [Columbia University] at 17:40 15 October 2014

T(m) = N(m) + D(m) = z


AEF-
m(*) log2 (12)

BEF

Function T measures uncertainty again in bits and it has already been proven18that
again

T ( m ) = 0 iff m({x}) = 1 for a particular x E X. The maximum, T ( m ) = log,l~I, is


not unique; it is obtained not only for m ( X ) = 1 and for the ur~iformprobability
distribution on X, but also for other bodies of evidence that seem to possess certain
kinds of symmetries.
When we specialize to probability theory, we can easily see that N ( m ) = 0 and
the function D assumes the form

which is equivalent to the well-known Shannon entropy

Furthermore, T ( m ) = H ( p ) for any probabilistic body of evidence ( m , F).


When we specialize to possibility theory, both types of uncertainty are applicable,
but the nonspecificity is substantially more significant.' Using the notation intro-
duced in Sec. 2 , which is based upon the notion of a complete sequence of nested
150 JAMES F. GEER AND GEORGE J . KLIR

subsets that contains all focal elements, it is easy to derive the following possibilistic
forms of N and D:

Furthermore, using Eq. (9), we obtain


Downloaded by [Columbia University] at 17:40 15 October 2014

4. INFORMATION-PRESERVING TRANSFORMATIONS

Let the 11-tuples p = (p,, p,, . . ., p,) and r = (r,, r,, . . ., r,) denote, respectively,
ordered probability and possibility distributions (defined on a finite set X with n or
more elements) that do not contain zero components. That is
(a) pi E (0, I] and r, E (0, I] for all i = I, 2, . . ., n;
( b ) p i 2 p , + , and r r + forall i = l , 2 , ..., n - I;
(c) 1)) + p2 + . . . + pn = I (probabilistic normalization);
(d) r , = 1 (possibilistic normalization);
(e) n 5 1x1.
It is assumed that only one of the distributions, p or r, is initially given. The other
distribution is to be determined from the given one by an information-preserving
transformation. That is, values pi are assumed to correspond to values r, for all i =
1 , 2, . . . , 11 by some scale (at least ordinal) and both p and r are required to contain
the same amount of uncertainty (and information). Furthermore, if n < 1x1, we also
assume that pi = ri = 0 for all i = n + 1, n + 2, . . ., 1x1.
The requirement that p and r contain the same amount of uncertainty is formally
expressed by the equation

The expression on the left-hand side of this equation represents the Shannon entropy
INFORMATION-PRESERVING TRANSFORMATIONS 151

(probabilistic uncertainty); the expression on the right-hand side represents the total
possibilistic uncertainty (the sum of nonspecificity and discord). When a given p is
;o be transformed into-r, the left-hand side of thd equation becomes a constant ithe
value of the Shannon entropy for the given p) and, similarly, when a given r is to
be transformed into p, the right-hand side of the equation beconies a constant.
Basic ideas regarding information-preserving transformations between probabili-
ties and possibilities are discussed for different scales in a previous paper.11 It is
shown that ratio and difference scales are too rigid and, consequently, not applicable.
Either of them possesses only one free coefficient, which is uniquely determined by
the probabilistic or possibilistic normalization requirement. No freedom is then left
for imposing the requirement of uncertainty equivalence expressed by Eq. (16). Or-
dinal scales, on the other hand, are too loose. Since they preserve only the order of
the given numbers, but not any additional structure involved, ordinal scale trans-
Downloaded by [Columbia University] at 17:40 15 October 2014

formations are not unique.


Interval and log-interval scales, each of which involves two free coefficients, are
potentially unique under the requirement of normalization and uncertainty equiva-
lence. Although conversion formulas based on these scales are developed in the men-
tioned paper," the questions of existence and uniqueness of solutions are not ad-
dressed. A mathematical analysis pertaining to these questions is the primary purpose
of this paper.
Before engaging in our analysis, we should mention that the previous investigationsu
are based on the assumption that nonspecificity is the only uncertainty involved in
possibilistic bodies of evidence. According to this assumption, which we now con-
sider incorrect, the right-hand side of Eq. (16) contains only the first term. Our
reformulation of the equation is based upon a recent critical re-examination of un-
certainty in the Dempster-Shafer theory,l3 which resulted in the novel concept of
discord, expressed by Eq. ( 1 l ) , and the notion of the total uncertainty, expressed
by Eq. (12).
Let us add a remark regarding the two formulations of the equation expressing the
uncertainty equivalence. According to recent results concerning the concept of dis-
cord,' nonspecificity is substantially more significant than discord in all possibilistic
bodies of evidence and increases considerably more rapidly with n. Furthermore.
discord is bounded for n + m by a rather small number (0.892), while nonspecificity
is not bounded. The implication of these facts is that the effect of discord in Eq.
(16) is often negligible, especially for large bodies of evidence. Excluding discord
from Eq. (16) would somewhat simplify the computation involved. Nevertheless,
our analysis in this paper is based on the full form of Eq. (16). Ln the next two
sections, we analyze transformations based on interval and log-interval scales.

5. INTERVAL SCALE TRANSFORMATIONS

5.1 From Probabilities to Possibilifies

Given an ordered probability distribution (pi), with 1 2 p , 2 pz 2 . . . 2 p, 2 0 ,


the interval scale transfom~ationsare of the form r; = p i a + p for all i = 1, 2, . . .,
152 JAMES F. GEER AND GEORGE I. KLIR

11, where a and p are constants ( a > 0). From the possibilistic normalization, we
obtain

i = 1 -a -p , for all i = I , 2, , . ., n. (17)

T o preserve the amount of uncertainty, a must satisfy the equation g(a) = 0 , where
Downloaded by [Columbia University] at 17:40 15 October 2014

In order to satisfy the requirements that each r, must lie in the interval [0, I I and
~ilustsatisfy the inequality ri 2 pi, which expresses the general possibility-probability
consistency c ~ n d i t i o n we
, ~ must require that

0 5 a % a,,, = min [ ( I - pi)/(pl - pi)] = (1 - p,)/(pl - p.). (18)


2sisn

In Fig. I we have plotted g(a) for the special case when the (pi's) are given by the
formula

pi = 1/17 + cr - 2a(i - I)/(n - I), for i = I , 2, . . ., ~ 7 ,

where 0 < CI < 1/11 and n = 10. This corresponds to probabilities pi which are
equally spaced between I/n - a and I/n + a . The figure illustrates that, for this
particular probability distribution, the function g(a) has a unique positive root, which
lies in the interval (0, a,,).
We now prove some facts about the function g(a), which we state in the form of
lemmas, that will eventually allow us to prove (in general) a theorem concerning
the existence and uniqueness of a solution to the equation g(a) = 0.
LEMMA5.1 If pi = I/n for all I 5 i 5 n, then g(a)
the interval 10, w).
- 0 for any choice of a in

Proof From the definition of g(a) for this case we find


INFORMATION-PRESERVING TRANSFORMATIONS 153
Downloaded by [Columbia University] at 17:40 15 October 2014

formation p -
Figure 1 Plots of function g ( a ) for several special probability distributions and interval scale uans-
r (Sec. 5 . 1 ) .

In view of Lemma 5.1, we shall assume from now on that p , > I /n and p, <
I/n. An immediate consequence of this assumption is that

since it is easy to show that H, regarded as a function of the n "variables" (pi) (with
the constraints that the pi's are positive and sum to one), has its unique maximum
when p, = p 2 - . . . = p,, = 1/11.

LEMMA5.2 As a -+ O', g ( a ) + g(O+) = log2(n) - H > 0.

Proof Letting a --* 0' in the definition of g(a) and using (19): we find

Q.E.D.

LEMMA5.3 g l ( a ) = (d/da)g(a) < 0 for all a > 0 and g'(0) = 0.


154 JAMES F. GEER AND GEORGE J. KLIR

Proof Using the definition of g(a),we find

Setting a = 0 in this expression and using the fact that y,(O)= 1 - i / n , we find
Downloaded by [Columbia University] at 17:40 15 October 2014

Then, we can write

The second term in this expression is obviously negative for any a > 0,since each
term in the summation is non-negative and at least one term is strictly positive. Also,
since

Y i 2 (1 -
j=i+ 1
I -j - I ) <O = 1 - i n , for a > 0,

we see that I - y,(a)2 i / n and hence each of the logarithm terms in the first
summation is also negative. Thus, g'(a)< 0 for any a > 0.
Q.E.D.
LEMMA 5.4 g(a)+ -m as a --, +m and hence g(a)< 0 for all sufficiently large
values of a.
INFORMATION-PRESERVING TRANSFORMATIONS 155

Proof Using the definition of g(a), for large (positive) values of a we can write

as a + m. (Here the expression O(a) indicates terms in g ( a ) which are bounded by


a constant time a, as a + m.) Since P > 0, we see that g(a) -* -a as a + +m.
Q.E.D.
THEOREM1 The function g(a) has exactly one positive real root.
Proof From Lemmas 5.2 and 5.4 we see that g(a) attains both positive and neg-
ative values for positive values of a . Thus g(a) has at least one real root in the
interval (0, m). By lemma 5.3, g(a) is strictly monotonically decreasing on the in-
Downloaded by [Columbia University] at 17:40 15 October 2014

terval (0, m) and hence it has at most one real root on this interval. Q.E.D.
Theorem I establishes that g(a) has exactly one positive real root, say a = 6, but
it does not establish that d lies in the interval (0, a,) (see condition (18)). Thus we
have not shown that the unique possibility distribution (ri), defined by (17) with a
= 6, satisfies the possibility-probability consistency conditions (ri t pi, for i = 1,
2, . . ., n). In fact, we discovered an example in which the possibility-probability
consistency condition is not satisfied. However, we have not pursued this point since,
as we shall show in the next subsection, it is not always possible to transform from
possibilities to probabilities using an interval scale.

5.2 From Possibilities to Probabilities


Given an ordered possibility distribution (ri) we obtain

for all i = 1, 2, . . ., n, as a result of applying the probabilistic normalization to the


f o m ~ri = p,a + P. Here a must satisfy the equation f ( a ) = 0 , where

f (a) = z
i= 1
((ri - P)/a + I /n) log2[(ri - ? ) / a + 1/n]

.n
+ 2 ri log2[i/(i - I)] -
i=z
z
n-l

i= 1
(ri - ri+J 10g2[l - i
j=i+1 I
rj/(j(j - I)) .

We note that this expression makes sense (i.e. is real) only when a > n ( i - r,),
since the argument of each logarithm function must be strictly positive. In Fig. 2
we have plotted f ( a ) for the special case when the {r;} are given by the formula

r = I - (I - a - 1 - 1 for i = 1, 2, . . ., n,
with n = 10 and for several values of a in the range 0 < a < 1. This corresponds
156 JAMES F. GEER AND GEORGE J . KLIR
Downloaded by [Columbia University] at 17:40 15 October 2014

mation r -
Figure 2 Plots of function/(o) for several special possibility distributions and interval scale transfor-
p (Sec. 5.2).

to a distribution of (r;) which are equally spaced between I and a . In particular, the
figure illustrates that there is no (real) solution for a when a is less than about 0.4.
From this example we conclude that some possibilistic bodies of evidence cannot
be converted by interval scale transfornations to their probabilistic counterparts that
contain the same amount of uncertainty and information. Hence, interval scale trans-
formations are not acceptable for our purpose.

6. LOG-INTERVAL SCALE TRANSFORMATlONS


6.1 Frotn Probabilities to Possibilities

Log-interval scale transformations have the form

r; = P(p,)", for all i = I , 2, . . ., n,

where a and p are positive constants. From the possibilistic normalization require-
ment, we obtain I = P(p,)" and, hence, p = I /(p,)" and
INFORMATION-PRESERVING TRANSFORMATIONS 157

The constant a in this formula is determined by solving (numerically) Eq. (16). in


which the left-hand side is a constant

and each ri on the right-hand side is replaced, according to Eq. (20). with [p,/pIJn.
Thus, for any real number a > 0 , we now define the function g(a) by
Downloaded by [Columbia University] at 17:40 15 October 2014

Any a for which g(a) = 0 is clearly a solution of Eq. (16). Questions regarding the
existence and uniqueness of the solution of Eq. (16) can thus be studied by examining
the behavior of this function.
LEMMA6.1 If pi = l / n for I 5 i 5 n, then g(a) = 0 for all a > 0.
Proof T o see this, we note first that, in this case,

and, since (pi/pl) - ( ~ ; + ~ / =


p ~0) for all I 5 i 5 n - I , we have

for all a . Q.E.D.


In view of Lemma 6.1, we shall assume from now on that p l > I/n and p, <
I/n. An immediate consequence of this assumption is that H < log2(n), since it is
easy to show that H, regarded as a function of the n "variables" (11,) (with the con-
straints that the pi's are positive and sum to one), has its unique maximum when
pI = p 2 = ... = p n = l / n .
In Fig. 3 we have plotted g(a) as a varies between 0 and I when the probabilities
pi are defined by

where a is a parameter which satisfies 0 < a < I. Here the (pi) are uniformly
distributed between 2a/(n(l + a)) and 2/(n(l + a)). In particular, when a = 1,
each pi = I /n. This figure illustrates for this particular probability distribution that
g(a) has a unique simple root which lies in the interval (0, 1).
JAMES F. GEER AND GEORGE J . KLLR
Downloaded by [Columbia University] at 17:40 15 October 2014

formation p -
Figure 3 Plots of function g(o) for several special probability distributions and log-interval scale trans-
r (Sec. 6.1)

We shall now prove three lemmas which will establish the existence and unique-
ness of a root of g(a) in the interval 0 < a < I for a general probability distribution
p (assumed to be ordered).
LEMMA6.2 As a + O+, g(a) + g(O+) = log2(n) - H > 0.
Proof Since each ratio (pi/pl) > 0 it follows that each term (pi/pI)" + 1 as a
+ O+. Hence

= log&) - H > 0, as a + 0 + . Q.E.D.


LEMMA6.3 The function g is strictly negative when a = I , i.e. g ( l ) < 0.
Proof To see this, we first use Eq. (20) with a = 1 to express g(1) in the form

where

- 10~s-
) x
n-l

i= 1
(r, - ri+l)log(l - Y;),
INFORMATION-PRESERVING TRANSFORMATIONS

with r, = p,/p, and

sE z
i= 1
r i = ( ~ / p , ) z p ; =I/p,
i= 1
and yi= i 2
j=i+,
rj/(j(j- I)),

for 0 5 r, 5 r,-, 5 . . . 5 r, = 1. Here the base of the logarithm function is now


understood to be e, i.e. the base of the natural logarithms. Thu!i in order to show
that g ( l ) is strictly negative it is sufficient to show that

h.<O, for O < r , , 5 r ,,-, S . . . S r , = I . (22)

In our proof which follows, we will use the following inequalities:


Downloaded by [Columbia University] at 17:40 15 October 2014

(a) log(z) 5 z - 1, for z > 0;


(b) log(1 +z) 2 z - (1/2)z2, for z > 0;
(c) log(1 + z) 5 z , for z > - 1.
(These inequalities are easily established by elementary calculus methods. For ex-
ample, to establish (c), we define the function f(z) = z - log(l +- z) and then note
that f(0) = 0 and df/dz = z/(l + z). Thus, df/dz > 0 for all z > 0 and df/dz <
0 for - 1 < z < 0. Hence f(z) 2 0 , i.e. inequality (c) holds, for all z > - 1, with
strict equality holding only when z = 0.)
To show that (22) is indeed true, we consider first the case when n = 2. Using
(21) with n = 2, we find

h2(x) = (x/(I + x)) lo&) - log[l + x(l - x)/2] + x log(2 - x),


where x = r2. We now use inequality (a) above with z = x in the first term in h,,
then use inequality (b) with z = x(l - 1 ) / 2 in the second term, and finally inequality
(c) with z = 1 - x in the third term, to write

for all 0 < x < I. Thus, h2(x) < 0 for 0 < x < 1 and h2(x) = 0 only when x = 0
or .r = 1. This establishes (22) for the special case when n = 2.
T o show (22) holds for n 2 3, we shall demonstrate first that

and then show that

h,(r,,) < 0, for a11 o < r,, < I . (24)


160 JAMES F. GEER AND GEORGE 1. KLIR

To see that (23) holds, we first use (21) to express the difference I;, - h, (after
a little manipulation) in the form

h, - h, = -(I /s) x
n- I

i-2
ri log(r,) + (1 /S .- I /s)r, log(r,,)
Downloaded by [Columbia University] at 17:40 15 October 2014

+r [k=2
- Is o r - IogS/s
1 + (1 /S - Ir l o g r (25)

for 0 5 r, 5 r,-, 5 . . . 5 r , = 1 . Here we have defined S =-n - 1 r,. +


Now, to show that (23) holds, it is sufficient to show that h, - h. 2 0 for all 0
S r 5 r 5 . . . S r = I. We shall do this by showing that each term on the
right side of Eq. (25) is non-negative. To begin, we note that the last term in (25)
is non-negative since log(r,,) 5 0 and ( l / 5 - I/s) 5 0, since 5 2 s. The coefficient
of r,-I in the next to last term in (25) can be written as

Using our inequality (a) above with z = rk in each term in the first summation above
and then using inequality (c) with i = i in the second term above, we can write
"-1 "-1

Thus, the next to last term in (25) is also non-negative. The first term in (25) will
be non-negative if we can show that (n - I)(] - yl)s/(S(l - Y,,-~))> 1, or equiv-
alently, that

Using the definitions of y;, s, and S, we can write D as


n

0- I

B = I - , - - / - 1
j=2
INFORMATION-PRESERVING TRANSFORMATION!; 161

Now D will be non-negative if we can show that each Bk 2 0 . T o see that this is
the case, we note that, for 2 5 k S n - 1 ,

and
Downloaded by [Columbia University] at 17:40 15 October 2014

Thus, the first term in (25) is non-negative.


The remaining terms in (25) will all be non-negative if we can demonstrate that

for 2 5 i 5 n - 2. Using inequality (a) above with z = rk and inequality (c) with
z = i(1 - y,-,)S/(s(n - I) (I - yi)) - 1, we can write

Thus, E, will be non-negative if we can show that

where

- k k- 1

C k 1-i j j - 1 -i k k - I ) i r), 2 5 k z n - 1,
j=i+ 1 1=,+1
162 JAMES F. GEER AND GEORGE J . KLlR

Now, to show that E; 2 0, we only need to demonstrate that each Ck2 0. But, from
their definitions, we have, for 2 5 k 5 n - 1,

and

Thus, each term on the right side of (25) is non-negative and hence (23) is established.
Finally, to show that (24) holds, we use equation (21) with r, = x and r,-, = r,-,
- . . . = r l = 1 to write
Downloaded by [Columbia University] at 17:40 15 October 2014

We now use inequality (a) above with z = x in the first term in (26), then use
inequality (b) with z = x(l - x)/(k(k - 1)) in the second term, and finally inequality
(c) with z = (1 - x)/(k - 1) in the third term, to write

h,,(x) < X(X- I)/(k - 1 + x) - x(l - x)/(k(k - I))


+ (1 /2)x2(1 - ~ ) ~ / ( k (-k + x(l - .r)/(k - 1)
= -(2k2(k - ~ ) ~ -( kI + x))-lx(1 - ~ ) ~ [ k ( 2 (-k I ) -
~ I ) + (1 - k)x - x2]
< 0,
f o r O < x < I. Q.E.D.
LEMMA6.4 dg/da 'gl(a) < 0 for all a > 0
Proof Using the definition of g(a) and defining q; = pi/pl, we can write g'(a)
(after a little manipulation) in the form
n- I

g'(4 = C ql log,(qOllogz(l - yi-,(a))/(i - I)] - log,[(^ - yi(a))/i]}


i=2

+ 4; log,(%) log21n(l - y,-,(a))/(n - 1)l


n-l

+ (l/log,(2)) C [(q? - qY+1)/(1 - yi(a))ly:(a)


i= 1

where

Y = 2j
I=,+ l
- 1)) and $:(a) = i
J='+ 1
qp lo&(qJ)/(j(j - 1)).
INFORMATION-PRESERVING TRANSFORMATIONS 163

We shall now show that g'(a) < 0 by showing that each term in each of the
summations in (27) above is nonpositive and that at least one of these terms is strictly
negative. To do this, we examine first the last summation in (27) and note first that

since each q; $ 1 and q, < 1. Then 1 - y,(a) > 1 - (1 - i/n) = i/n > 0. Also,
y:(a) < 0, since log,(qj) 5 0 (I 5 j 5 n - 1) and log,(q,) < 0. Thus each term in
the last summation is nonpositive (since each term (qp - qp,,) is nonnegative) and
the last term is strictly negative.
The single term just before the last summation is strictly negative since
Downloaded by [Columbia University] at 17:40 15 October 2014

log,(q,) < 0 and n(l - y,-,(a))/(n - 1) > n[l - (1 - (n - I)/n)]/(tl - 1) = 1.

Finally, each term in the first summation will be nonpositive if we can show that

log2[(l - y,-,(a))/(i - I)] - log2[(l - y,(a))/i] 2 0, for 2 5 i 5 n - 1. (28)


To see that (28) does indeed hold, we observe that

= [I - q;]/(i(i - I)) 2 0, for 2 5 i s n - 1,

from which (28) follows immediately. Q.E.D.


The preceeding lemmas allow us to state the following theorem.
THEOREM 2 If p I > l/n, then the function g(a) has exactly one positive real root
and this root lies in the open interval 0 < a < 1.
Proof Since g(a) is a continuous function of a and assumes both positive and
negative values in the interval (0, 1) (Lemmas 6.2 and 6.3), it must have at least
one real root in this interval. Also, since gl(a) < 0 for all positive tr (Lemma 6.4),
g is monotonically decreasing and hence has at most one real root.
Q.E.D.

6.2 From Possibilities to Probabilities


From Eq. (20), we have pi = ( r ; / ~ ) ' / "and, using the probabilistic normalization
requirement, we obtain p = ~;,,(r~)'/".Hence.
1 64 JAMES F. GEER AND GEORGE J . KLIR

The constant a in this formula for pi is determined by solving (numerically) Eq.


(16). in which the right-hand side is a constant

and each p i on the left-hand side is replaced with the expression on the right side of
Eq. (29).
For any real number a > 0,we define the function f ( a ) by
Downloaded by [Columbia University] at 17:40 15 October 2014

where

In Figure 4 we have shown some typical plots of the function f ( a ) , as a varies


between 0 and 1 , using the same distribution (r;) as in Sec. 5.2, with several choices
of the parameter a.
Before we examine the roots off ( a ) = 0,we make some observations about the
quantity T . We note first that

if r = r = . .. = r = I, then T = TM = log2(n).

Also, using the identity


$1- 1

{(I - ri) logJi/(i + I ) ] - (ri - r i + l )log2[(i + I)/n]} = 0,


i=1

we can write

Here we have used the fact that


INFORMATION-PRESERVING TRANSFORMATIONS 165
Downloaded by [Columbia University] at 17:40 15 October 2014

0 0.5 1
o!
Figure 4 Plots of function /(a)for several special possibility distributions and log-interval scale trans-
formation r + p (Sec. 6.2).

and hence log2[(n/i) (I - y;)] Z log,[(n/i) (I - (1 - i/n))] = IO~:~(I]


= 0. Thus,
TMrepresents the maximum value of T , i.e.

T 5 TM = log2(n), for all 1 = rl Z r2 2 . .. 2 r,, .: 0

Furthermore, if T = T M ,then each term in the summation in (30) must vanish and
we are led to the conditions that each ri = 1 , for I 5 i 5 n. Thus, we have shown

-
that

T = T, log2(n) if and only if r, = r2 = . . . = r, == 1

We now summarize these observations in the following lemma


LEMMA6.5 For any possibility distribution {r;} satisfying I = rl S r2 2 .. . S r,
> 0 it is true that

and, fi~rthermore,

T = T ,- log,(n)
= if and only if r , = r2 = . . . = r, = I

We shall now prove the existence and uniqueness of a solution to the equation
166 JAMES F. GEER AND GEORGE 1. KLIR

f ( a ) = 0 , which lies in the interval 0 < a < 1. T o this end, we prove the following
lemmas.
6.6 If r, = 1, then f(cr) = 0 for all a > 0.
LEMMA
Proof To see this, just note that, if r, = 1, then r, = . . . = r,, = 1. Consequently,
for any a > 0 , we have T = log2(n), P = n, Pi = I/n, and hence

f(a) = 2 (l/n) log2(l/n) + log2(n) = -log,(n)


;= I
+ log,(n) = 0.

Q.E.D.

Thus, from now on, we shall assume that r,, <I


Downloaded by [Columbia University] at 17:40 15 October 2014

LEMMA6.7 For all sufficiently small positive a , f ( a ) > 0 , i.e. as a + 0'. f ( a )


+ f(O+) > 0.
Proof We let

I = rl = r2 = . . . = rN> rN+I2 rNt22 . . . 2 r,,,

where N < t i . Then, as a -* 0 + , l / a + +W so that

I, if 15 i S N I/N, if 15 is N
p + N, and p,+
0, if N < i 5 n' 0, if N < i % n'

Thus, as a -+ 0 + ,

since each term in the first summation is strictly positive and each term in the second
summation is non-negative. Q.E.D.
6.8 When a = 1, the function f is strictly negative, i.e. f ( l ) < 0.
LEMMA
Proof From the definition of f ( a ) , we can write
INFORMATION-PRESERVING TRANSFORMATIONS 167

where pi = ri/s and s = C;=, r,. Inserting the definition of Pi into this expression,
we find that we can express f(1) as

where g ( l ) is defined in the beginning of the proof of Lemma 6.3. Thus, since by
Downloaded by [Columbia University] at 17:40 15 October 2014

Lemma 6.3 we know that g ( l ) < 0 , we see that we must also have f ( l ) < 0.
Q.E.D.
We now make the following observation. Let 6 be any root o f f in the interval
(0, 1). Suppose we could show that f '(&) < 0 , i.e. that the derivative off is strictly
negative at a root. Then it would follow that f has only one root in the interval (0,
1). The reason for this is the following. Suppose that f has k roots at a = a;, where
0 < a I 5 a25 . . . 5 at < 1. Since f'(6) # 0 , it follows that the roots off are all
simple and hence the derivative o f f must alternate in sign at these roots, i.e.
f '(ai)f '(ai+,) < 0 for i = 1, 2, . . ., k - 1. But this would be inipossible i f f ' is
negative at each root.
We now prove the following lemma.

LEMMA6.9 Let 6 be a root off, i.e. a solution of f ( 6 ) = 0. Then f ' ( 6 ) < 0.

Proof From the definition off (a) we find

In deriving this expression we have made use of the fact that

pi = 1, and hence (d/da) Pi = P; = 0.


i= 1 i= 1 ;=I .
168 JAMES F. GEER AND GEORGE I. KLIR

Also, from the definition of Pi we find, after a little manipulation, that

Substituting this expression for P,! into our expression for f '(a), we find
Downloaded by [Columbia University] at 17:40 15 October 2014

So far we have just computed an expression for f ' for a general value of a . Now,
if 6 is a root of f ( a ) = 0, then

from the definition of f(a). Using this expression, our expression for the derivative
off at a root off becomes

Since the factor log,(2)/6 is positive, we wish to show that

C P,lIog2(~,)]'- T~ > 0,
I= I
at a root of f ( a ) = 0,

2 Pi[log2(~,)12- T2 > 0,
i= l
when
i= 1
Pi log(P,) = - T
INFORMATION-PRESERVING TRANSFORMATIONS

But we now observe that

0 < z
i= 1
Pi[log2(Pi) + T ] ~ (since each term is positive)
Downloaded by [Columbia University] at 17:40 15 October 2014

which is just what we wished to show. Q.E.D.


As a consequence of Lemmas 6.6-6.9, we can state the following theorem.
THEOREM 3 If 0 < r. < 1 , then the function f (a) has exactly one positive real root
and this root lies in the interval (0,I).
Proof Since f ( a ) is a continuous function of a and assumes both positive and
negative values in the interval (0, I) (from Lemmas 6.6-6.8), it must have a t leasr
one real root in this inverval. From Lemma 6.9 and the remarks immediately pre-
ceeding it, this root is the unique root off. Q.E.D.
Theorems 2 and 3 are significant since they imply that the log-interval information-
preserving transformation between probabilities and possibilities exists and is unique
for all probability and possibility distributions on finite sets. This important trans-
formation, which basically involves these theorems, is overviewed in Figure 5.

7. EXAMPLES

T o illustrate the log-interval information-preserving transformation between proba-


bilities and possibilities, whose existence and uniqueness is proven in this paper for
all probability and possibility distributions on finite sets (Theorems 2 and 3), let us
describe two simple examples.
First, let us consider the standard textbook example of an urn containing known
numbers of black and white balls. Assume that a ball is drawn at random and then
returned to the urn. Let the probability of drawing a black ball be p(B) and that of
drawing a white ball be p(W) = 1 - p(B). What is the possibilistic representation
of the same situation that preserves the amount of information contained in the prob-
abilistic representation? For example, for each given p(B), what is the corresponding
possibility of drawing a black ball, Pos(B), and what is the corresponding necessity
of drawing a black ball, Nec(B)?
To answer these questions, let us use Fig. 5, which overviews the transformation
established in this paper, as a guide. Solving Eq. I1 for different values of proba-
bilities p(B) and p(W) = 1 - p(B), we obtain the values of cr and the associated
possibilities r(B) and r(W) given in Table I. Observe that Pos(B) = r(B), Pos(W)
JAMES F. G E E R AND G E O R G E J. K L I R

From probabilitiesto possibilities: p, -,r,


I I

Pi
a
~9.1: ri= (T -

I
I ,
€9.1:
I
H(P)
I
- I
N(r1 + D(r)
Downloaded by [Columbia University] at 17:40 15 October 2014

I I
4 If
I

l/a
Pi -- ri

From possibilities to probabilities: r +pi

Figure 5 Overvicw o f the log-interval scale transformation between probabilities and possibilities.

= r ( W ) , Nec(B) = I - Pos(W), and Nec(W) = 1 - P o s ( W ) . Clearly, p(B) can be


treated as either p , or p , in Eq. I depending on whether p(B) > 0.5 or p(B) < 0.5,
respectively, and it does not matter how it is treated when p(B) = 0.5.
The values of Pos(B) and Nec(B) for any given value of p(B) are shown in Figure
6a. Observe that Nec(B) = 0 when p(B) 5 0.5 and Pos(W) = 1 when p(B) 2 0.5.
Observe also that, as shown in the figure, three types of nested structures are in-
volved: one for p(B) < 0.5, one for p(B) = 0.5, and one for p(B) > 0.5.
The two functions, Pos and Nec, whose range is [O, 11, can also be converted to
a single combined function, C . whose range is [- 1, I]. For each A E P ( X ) , function
C is defined by the equation

Conversely, for each A E P(X),


Nec(A) = 0 and Pos(A) = C ( A ) + 1, when C ( A ) 5 0,
and
Nec(A) = C ( A ) and Pos(A) = I when C ( A ) 5 0.
Observe that function C has a meaning that resembles the concept of certainty factors
INFORMATION-PRESERVING TRANSFORMATIONS 171

Table I Transformations from probabilities to uossibilities in the urn examule

any value
0.5888859
0.5425401
0.5086888
0.4805572
0.4557550
0.4331 166
0.41 19632
0.3918519
0.3724626
any value
0,3724626
0.3918519
0.41 19632
Downloaded by [Columbia University] at 17:40 15 October 2014

0.4331 166
0.4557550
0.4805572
0.5086888
0.5425401
0.5888859
anv value

introduced in the context of the medical expert system M Y C I N . ~F'ositive


' values of
C(A) indicate the degree of confirmation of A by the evidence available, while its
negative values express the degree of disconfirmation of A by the evidence.
Function C describing the urn example is shown in Figure 6b. In this example,
clearly,

(a) (b)

Figure 6 Illustration of the urn example


JAMES F. GEER AND GEORGE J . KLIR

and, similarly,

As a second example, let us consider the problem of estimating the age of a per-
son, say Joe Doe, from two independent sources of incomplete information. From
the first source, we know that "Joe is around 21, but certainly not less than 19 or
more than 23." From the second source, we know that he is a student at a particular
college and we happen to have information on the age distribution of current students
Downloaded by [Columbia University] at 17:40 15 October 2014

at the college. How can we combine these two sources of information to obtain the
best estimate of the age of Joe Doe?
Assume that the age of students at the college varies from 17 to 26. That is, our
universal set, X, is the set of integers (17, 18, . . ., 26). Information from the first
source is almost perfectly expressed by a fuzzy set F on X whose membership func-
tion pp is specified in Figure 7a. As mentioned at the end of Sec. 2, values of this
function can directly be interpreted as possibilities. That is, the information can readily
be expressed by the possibility distribution

A graph of this distribution is identical with the graph of pF in Fig. 7a. The cor-
responding nested body of evidence is shown in Fig. 7b.
lnformation from the second source is of a statistical nature and, hence, it can be
expressed directly in terms of a probability distribution on X. Assume that the prob-
ability distribution derived from the statistical data is

The two pieces of information are now represented in different theories, pos-
sibility theory and probability theory, and, hence, they are not compatible. To com-
bine them, we need either to convert the probability distribution p, to its meaningful
possibilistic counterpart, r,, or to convert the possibility distribution r, to its mean-
ingful probabilistic counterpart, p,. Since either conversion should preserve infor-
mation contained in the given distribution, we can utilize the main results of this
paper and en~ploythe log-interval information-preserving transformation.
Applying the transformation to the probability distribution p,, we obtain for a =
0.3302797 the possibility distribution

A graph of this distribution is shown in Fig. 7c, and the corresponding nested
body of evidence is given in Fig. 7d.
INFORMATION-PRESERVING TRANSFORMATIONS
Downloaded by [Columbia University] at 17:40 15 October 2014

Figure 7 Possibility distributions and the corresponding nested bodies of evidence derived from two
incompatible sources of incomplete information (the example of estimated age).

Having now information from both sources represented by possibi,lity distributions,


rF and rs, we can apply rules of combination of possibility theory.- This issue itself
is nontrivial, but it is not our concern in this paper. Combining, For example, the
distributions by the minimum operator and normalizing the resulting distribution, as
suggested by Dubois and ~ r a d e , ' we obtain the possibility distribution

This distribution and the associated nested body of evidence are shown in Fig.
8a,b, respectively. From r,, we can easily calculate values of the basic assignment,
m,, necessity measure, Nec,, and possibility measure Pos,, for the four focal ele-
ments. These are given in Table 11. For other subsets of X, clearly, the necessity
measure is always zero.
JAMES F. GEER AND GEORGE 1. KLlR
Downloaded by [Columbia University] at 17:40 15 October 2014

Figure 8 Combined possibility distribution of distributions r, and r, given in Fig. 7.

An alternative approach to deal with this problem is to convert the possibility


distribution r, to its probabilistic counterpart, p,. Using again the log-interval in-
formation-preserving transformation, we obtain

It is not obvious how to combine p, and p,, but, again, this is not our concern in
this paper.

8. DISCUSSION

The major results of this paper, which are expressed by Theorem 2 (supplemented
with Lemma 6.1) and Theorem 3 (supplemented with Lemma 6.6). are: the log-
interval scale allows us to transform any given probability distribution on a finite set
to a unique possibilistic counterpart that contains the same amount of information
and, similarly, the scale facilitates a unique information-preserving inverse trans-
forniation for any given possibility distribution; furthermore, the range of the trans-
formation constant a (Fig. 5) is the open interval (0, I), which implies that the
possibility-probability consistency condition is satisfied by the transformation. It is
interesting to observe that the log-interval transformation for CY = 1, which is used

Table 11 Necessity and possibility values for the focal elements defined by the possibility distribution
r, given in Fig. 8.
Focal clcments ma N~CFS POSF,
121) 0.042 0.042 I
INFORMATION-PRESERVING TRANSFORMATIONS 175

in the literature most frequently as an ad hoc conversion formula, is actually not


information preserving.
We obtained also an important negative result regarding the interval scale infor-
mation-preserving transformation. We showed that the transformation exists and is
unique for every given probability distribution, but the inverse transformation does
not always exist. That is, the applicability of the interval scale for the information-
preserving transformations between probabilities and possibilities is deficient and,
consequently, the scale must be abandoned for this purpose. It is also known that
the ratio and difference scales are not applicable in this case." This leaves the log-
interval scale as the only scale that facilitates a unique transformation between prob-
abilities and possibilities under which the amount of information is preserved. If
some additional conditions are imposed on the transformation, the log-interval scale
is, of course, not applicable either, and we have to resort to the ordinal scale." In
general, however, ordinal-scale transformations are not unique.
Downloaded by [Columbia University] at 17:40 15 October 2014

Since the value of possibilistic discord is severely restricted and often negligible
when compared with non~pecificity,~ the term D(r) in Eq. I1 in Fig. 5 plays a rel-
atively minor role and may be often neglected, especially for large bodies of evi-
dence. This saves computing time. Let us mention in this context that this existence
and uniqueness of the transformation is not affected when it is :simplified by ex-
cluding the term D(r) from Eq. 11. We fully analyzed the simplified transformation
and found that the existence and uniqueness hold in both directions.
After the mathematical analysis presented in this paper, which establishes the log-
interval scale as the only meaningful scale for the desired transfo~mation,the next
stage of research in this area should focus on the investigations of the pragmatic
value of this mathematically sound transformation. Covering as many application
areas as possible, the purpose of this investigation in each application area is to
compare, by relevant performance criteria, the use of the log-interval scale infor-
mation-preserving transformation with other transformations.

REFERENCES

I . G . Choquet, "Theory of capacities.", Annales de L'lnstitut Fourier. 5 , 1953-54, pp. 131-295.


2. M. Delgado, and S . Moral, 'On the concept of possibility-probability consistency." Fuzzy Sets and
Systems. 21, No. 3, 1987, pp. 311-318.
3. D. Dubois, and H. Prade, Possibiliry Theory. Plenum Press, New York, 1988.
4. D. Dubois, and H. Prade, 'Rough fuzzy sets and fuzzy rough sets." Intern. J. of General Systems,
17, Nos. 2-3. 1990, pp. 191-209.
5 . J . F. Geer. and G . J. Klir, "Discord in possibility theory." Intern J. of General Systems. 19. No.
2, 1991, pp. 119-132.
6. 1. R. Goodman, and H. T. Nguyen, Uncertainty Models for Knowledge-Based Systems. North-
Holland. New York. 1985.
7. S. J . enk kind, and^. C. Harrison, "An analysis of four uncertainty calculi." I E E E Trans. on
Systems. M a n , and Cybernetics. 18, No. 5. 1988, pp. 700-714.
8. E. J. Howitz, D. E. Heckerman, and C. P. Langlotz. 'A framework for comparing alternative
formalisms for plausible reasoning." Proc. Fifth National Conf. on Artificial Intelligence, Phila-
delphia, 1986, pp. 210-214.
9. G . J . Klir. "Is there more to uncertainty than some probability theorists might have us believe?"
Int. J. of General Systems. 15, No. 4, 1989, pp. 347-378.
10. G . J. Klir. 'Probability-possibility conversion." Proc. 3 r d I F S A Congress. Seattle, 1989. pp. 408-
....
dl I
11. G. 1. Klir, 'A principle of uncertainty and information invariance." Intern. J . of General Systems.
17, Nos. 2-3, 1990. pp. 249-275.
176 JAMES F. GEER AND GEORGE J. KLIR

12. G . J. Klir, and T. A. Folger. Fuzzv Sers. Uncertainty, and Informarion. Prentice Hall, Englewood
Cliffs. N.J. 1988.
13. G . J. Klir, and A. Ramer. 'Uncertainly in the Dempster-Shafer theory: a critical re-examination."
Intern. J. of General Systems, 18, No. 2, 1990, pp. 155-166.
14. N. S. Lee, Y. L. Grize, and K. Dehnad, 'Quantitative models for reasoning under uncertainty in
knowledge-based expert systems." Inrern. J . of lnrelligenr Systems. 2, 1987, pp. 15-38.
15. G . Matheron, Random Sets and Integral Geometry. John Wiley, New York, 1975.
16. H. T. Nguyen. "On random sets and belief functions." J . of Marh. Analysis and Appl.. 65, 1978,
pp. 531-542.
17. 2. Pawlak, 'Rough Sets." Inrern. J . of Computers and lnformnrion Sciences. 11, 1982, pp. 341-
356.
18. A. Ramer, 'Inequalities in evidence theory based on concordance and conflict." Proc. 4th IFSA
Congress. Brussels. 1991.
19. A. Ramer, and G . J. Klir, 'Measures of conflict and discord." lnformarion Sciences. 1992 (to
appear).
20. G . Shafer. A Morhemarical Theory of Evidence. Princeton Univ. Press. Princeton. N.J.. 1976.
2 1 . E. H. Shonliffe, Compurer-Based Medical Consulrarions: MYCIN. Elsevier, New York, 1976.
Downloaded by [Columbia University] at 17:40 15 October 2014

22. J. R. Sims, and Z. Wang, "Fuzzy measures and fuzzy integrals: an overview.- Inrern. J . of General
Sysrems, 17, Nos. 2-3, 1990, pp. 157-189.
23. H. E. Stephanou, and A. P. Sage. "Perspectives on imperfect information processing." IEEE Trans.
on Sysrcrrrs. Man. and Cybernerics. SMC-17, No. 5, 1987, pp. 780-798.
24. M. Sugcno. "Fuzzy measures and fuzzy integrals: a survey" In: Fuzzy Auromars and Decision Pro-
cesses, edited by M. M. Gupta. G . N. Saridis, and B. R. Gaines. North-Holland, Amsterdam and
New York. 1977, pp. 89-102.
25. P. Waley, Srarisrical Reasonir~g~virhImprecise Probabiliries. Chapman and Hall, London and New
York. 1991.
26. P. Walley, and T. L. Fine. 'Varieties of modal (classificatory) and comparative probability."
. Synrhese.
.
41, 1979, pp. 321-374.
27. 2. Wang, and G. J. Klir. Fuzzy Measure Theory. Plenum Press. New York. 1992.
28. R. R. Yager, el al.. F u z q Sets and Applicarions: Selected Papers by L. A. Zndeh. Wiley-lnter-
science. New York. 1987.
29. L. A . Zadch. 'Fuzzy Sets." lnformarion and Conrrol. 8, 1965. pp. 338-353.
30. L. A. Zadch. -Fuzzy sets as a basis for a theory of possibility." Fuzzy Sets and Systems, I , No.
1 , 1978. pp. 3-28.

For biogrnplrics and phorographs of James F . Geer and George J . K l i r , please see Vol. 19. No. 2.
1991, p . 132, and Vol. 17. Nos, 2-3. 1990, p. 275, respectively.

You might also like