You are on page 1of 4

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 23, NO.

5, OCTOBER 2015 1899

On the Maximum Entropy Negation of a Probability Distribution


Ronald R. Yager, Fellow, IEEE

Abstract—We suggest a transformation to obtain the negation of a prob- We shall define this negation by the following:
ability distribution. We investigate the properties of this negation. Using the
Dempster–Shafer theory of evidence, we show of all the possible negations 1 − pi
our proposed negation is one having a maximal type entropy. p̄i = .
n−1
Index Terms—Aggregation, decision-making, membership grade, non-
standard fuzzy set. We first note that P̄ is a probability distribution since
p̄i ∈ [0, 1].
1)  n
n
2) i= 1 p̄i =
1
n −1 i= 1 (1 − pi ) = 1.
I. INTRODUCTION Example: Assume X = {x1 , x2 , x3 , x4 , x5 }.
N the construction of artificial and intelligent systems a cru- i. Let P be such that

I cial issue is the problem of knowledge representation. A


considerable body of literature has been developed addressing
p1 = 1
pi = 0 for i ࣔ 1.
In this case not P, P̄ , is obtained as
the problem of formally representing the knowledge contained
in sources of information. In this note we are concerned with the
representation of the knowledge contained in the negation of a p̄1 = 0
probability distribution. A simple context in which this need can
p̄i = 1/4 for i = 1
arise is the following. Consider a rule base system consisting
of rules of the form If V is tall then U is b and If V is not tall ii. Let P be such that
then U is d. If we represent tall as a fuzzy set then the process p1 = 1/2, p2 = 1/2 and pi = 0 for i = 3, 4, 5
of obtaining not tall is well known. If however, we represent the In this case we get
concept of tall using a probability distribution then the determi-
nation of not tall becomes one of determining the negation of a p̄1 = p̄2 = 1/8 and p̄3 = p̄4 = p̄5 = 1/4.
probability distribution. The issue of determining the negation
of a probability distribution was formally raised by Zadeh in his In order to see the motivation behind the proposed definition
BISC blog. for the negation we consider the following scheme for generat-
The process of taking the negation of a probability distribution ing the negation of a probability distribution P on X. We shall
does not generally lead to a unique probability distribution but construct P̄ from P in the following manner. For each element xi
generates a whole set of possible probability distributions that in X in constructing the negation of P we allocate its probability,
can be seen as being consistent with the idea of the negation. This pi , equally among the n − 1 other elements. Thus
family of possible negations can be expressed as a Dempster– p2 p3 p4 pn
p̄1 = + + +···+
Shafer belief structure [1]–[3]. In this note we suggest a unique n−1 n−1 n−1 n−1
implementation for the negation that can be shown to be the n
1 1 − p1
possible negation based on a maximal entropy allocation of the p̄1 = pi = .
weights associated with each focal element. n−1 n−1
i=2

1
 1−p j
II. NEGATION OF A PROBABILITY DISTRIBUTION Similarly for any xj we have p̄j = n−1 i= j pi = n−1 .
We note for the special case when n = 2 we get
We would like to consider here the problem of finding the
negation of a probability distribution. In the following we shall p̄i = 1 − pi ,
suggest a negation of a probability distribution.
furthermore since p1 + p2 = 1 we get
Assume our frame of reference is the set X = {x1 , . . . , xn }.
Let P = {p1 , . . . , pn } be a probability distribution on X. We of p̄1 = p2
course have 1. Σ pi = 1 and 2. pi ࢠ [0, 1].
We shall let P̄ = [p̄1 , . . . , p̄n ] be the negation of the proba- p̄2 = p1 .
bility distribution P. By the negation we mean to represent the The procedure suggested for obtaining the negation of a prob-
knowledge we use if we have the statement “not P”. ability distribution can easily be shown to have the property of
order reversal. This follows since if pi ࣙ pj then it is the case
Manuscript received March 21, 2014; revised August 13, 2014; accepted that p̄j ≥ p̄i .
October 23, 2014. Date of publication November 24, 2010; date of current We note that a different perspective on the proposed form for
version October 2, 2015. This work was supported by the Office of Naval
Research and the Army Research Office MURI Program. p̄i can be had if observe that
Ronald R. Yager is with the Machine Intelligence Institute, Iona College,
1 − pi 1 − pi 1 − pi
New Rochelle, NY 10805 USA (e-mail:yager@panix.com). p̄i = n = n = .
Digital Object Identifier 10.1109/TFUZZ.2014.2374211 j=1 (1 − pj ) n − j=1 pj n−1

1063-6706 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
1900 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 23, NO. 5, OCTOBER 2015

Thus we see that p̄i is obtained by taking the complement of The reason that one selects maximum entropy alternatives is
pi and then normalizing these so their sum is one. that it picks the allowable alternative which brings with it the
This procedure for obtaining the negation of a probability least unsupported information.
distribution is not involutionary, The following theorem shows that the uniform distribution is
¯i = pi . a fixed point of the negation transformation operation, that is

the negation of the uniform distribution is uniform.
It should be noted that the negation in the Heyting intuition- Theorem: Assume P is such that pi = 1/n for all i then
istic logic [4] also is not involutionary. p̄i = 1/n for all i.
The general form of the double negation obtained from our Proof: Using our definition for negation p̄i = 1n −−p1i and the
proposed transformation is assumption that pi = 1/n for all i we get for all i that
1 − (1−p
n−1
i)
(n − 1) − (1 − pi ) pi + n − 2 1 − n1 1 (n − 1) 1
¯i ) =
(p̄ = = . p̄i = = = .
n−1 (n − 1)2 (n − 1)2 n − 1 n (n − 1) n
While in general (p̄ ¯i ) ࣔ pi , we see that in the special case
It can be shown that the process of repeated negation can be
¯
when n = 2, then (p̄i ) = pi . modeled as a difference equation. Let pi (k) equal the value of
A reason for this lack of involution is the fact that this negation
pi after the kth iteration. Since pi (k + 1) = 1−p
n−1
i (k)
this is a
acts in a manner to increase the entropy. In the following we
difference equation
shall show this characteristic.
While there are many different measures of entropy we (n − 1)pi (k + 1) + pi (k) = 1.
shall measure the entropy of a probability distribution as
[5], [6] The solution of this difference equation for n > 2 approaches
  1/n as i increases. Thus the uniform distribution is an attractor
H(P) = (1 − pi )(pi ) = 1 − p2i .
i i of this equation. We note also that this corresponds to a maximal
We have chosen this form of entropy measure rather than entropy allocation of the probabilities.
the classic Shannon measure [7] because of the simplicity of
calculation it brings without having the log. Since we have no III. VIEW FROM DEMPSTER–SHAFER
need for the additivity of entropies of independent probability, a
unique property of the Shannon entropy, we lose nothing using We can view the situation from the point of view of the
this measure. Dempster–Shafer [1]–[3] theory and see that our negation is in-
We further note that for P̄ we also get deed a solution having a maximum entropy allocation. We recall
  a Dempster–Shafer belief structure m on a space X consists of a
H(P) = (pi )(1 − pi ) = 1 − p2i . finite collection, Ai for i = 1 to q, of non-null subsets of X and
i i
an associated
 collection of weights, m(Ai ) = αi so that αi ࣙ
If we take the difference between the two entropies we 0 and qi=1 αi = 1. Associated with a Dempster–Shafer belief
can show the increase in entropy obtained by the negation structure are the concepts upper and lower probabilities [1]. In
process. particular forany subset F of X its upper probability is defined
 
H(P) − H(P) = p2i − p2i . as P∗ (F) =  i,A i ∩F= ∅ αi and its lower probability is defined as
i i P∗ (F) = i,A i ⊆F αi . Dempster also noted that the probability
Thus we get of F, P(F), can be seen to lie in the interval [P∗ (F), P∗ (F)], it is
 1  only imprecisely known. We note that in the case where P∗ (F)
H(P) − H(P) = p2i − (1 − 2pi + p2i ) = P∗ (F) then the value of P(F) is precisely known.
i (n − 1)2 i
Assume we have a probability distribution P defined on the
(n − 2)   2 
space X = {x1 , . . . xn }. We can express this as a Dempster-
H(P) − H(P) = n p − 1 .
(n − 1)2 i
Shafer belief structure, m. In particular the focal elements of m
i
are A1 , . . . An where Ai = {xi } and the belief assignment for the
Since for the
 pi ’s coming from a probability distribution the structure is m(Ai ) = pi . It can easily be
shown in this case that for
minimum of i p2i , which is attained for the uniform distribu- any subset F of X, P∗ (F) = P∗ (F) = x i ∈F pi = P(A). Shafer
 2 
tion, is n n1 which is 1/n. Hence we see that (n i p2i − 1) ≥ [2] calls these belief structures with singleton focal elements
0 and thus H(P) − H(P) ≥ 0. Thus we can state the following Bayesian belief structures.
theorem. We define m̄ to be the negation of m. In particular the negation
Theorem: Assume P is a probability distribution and P̄ its of the belief structure m can be obtained by assigning the mass
negation then H(P) ≥ H(P). m(Ai ) associated with focal element Ai to its negation, Āi [8].
A reason for this increase in entropy, as we shall subsequently Thus m̄ has focal elements Bi = Āi and m̄(Bi ) = m(Ai ) = pi . It
see, is that our proposed negation is essentially selected from a should be emphasized that in this Bayesian situation Bi = Āi =
class of possible negations as the one that has maximal entropy. X − {xi }, Bi is the whole space less xi .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 23, NO. 5, OCTOBER 2015 1901

From this we can calculate the upper and lower probabilities Essentially the weight associated with any focal element of
[1] of each element xi , m is divided among its constituents.
 Essentially we see that a probability distribution m̂ is consis-
P̄∗ (xi ) = m̄(Bj ) = 1 − pi tent with the belief structure m, if we can construct m̂ from m by
j, x i ∈B j dividing the weights of the focal elements among its constituent
 elements.
P̄∗ (xi ) = m̄(Bj ) = 0 (for n > 2)
It has been shown [2] that for the Bayesian type belief struc-
j,B j ∈{x i }
tures, m̂, for any subset A of X, P̂ (A) is uniquely defined. If m
In this case, when n > 2, we have is a belief structure such that for any subset A, P∗ (A) and P∗ (A)
are its upper and lower probabilities then it can be shown that if
0 ≤ p̄i ≤ 1 − pi . m̂ is consistent with m it follows that
Thus we see that the probabilities of the negation, p̄i , are not P∗ (A) ≤ P̂ (A) ≤ P∗ (A).
uniquely determined but are constrained to be in some inter-
val. Thus, there are many possible negations associated with a Furthermore if m̂ is not consistent there exists at least one A
probability distribution. for which P̂(A) is outside its bounds.
In the special case when n = 2, B1 and B2 are singletons, From this we can say that for any belief structure m, the set
B1 = {x2 } and B2 = {x1 }. This fact leads to a situation in M of all consistent Bayesian belief structures is the set of pos-
which P̄∗ (xi ) is not zero but rather sible probability distributions given m. We note that Smets and
P̄∗ (xi ) = 1 − pi . Kennes [10], Smets [11] and Han et al. [12] concerned them-
selves with the problem of selecting a probability distribution
Since this is the same as the upper bound we get a unique from Smets and Kennes [10] and Smets [11] suggested one
representation for the negation when n = 2, approach for obtaining a probability distribution from a belief
structure which they called the pignistic probability distribu-
p̄i = 1 − pi . tion. Assume m is a belief structure on a space X consisting
of a finite collection of focal sets, Ai for i = 1 to q and an
This is exactly the value we suggested in the earlier section
associated
q collection of weights, m(Ai ) = αi so that αi ࣙ 0 and
for n = 2.
We note that in the special case when n = 2 the negation of a i=1 αi = 1. Under the pignistic allocation the weights in each
focal set are distributed equally among the elements in the set.
probability distribution is essentially the same operation as the
Thus, if xj is contained in the focal Fi , then xj is allocated, from
logical negation. αi
this focal element, a probability weight Card(F . Here Card(Fi )
Let us now return to the more general case where n > 2 and i)

see if we can reasonably select a probability distribution from is the number in of elements in Fi . We see this as a kind of
the multiple possible negations implicit in our result. maximal entropy allocation of the weights associated each fo-
A first step in addressing this problem is to introduce the cal element to its constituents. Performing this type of maximal
idea of consistency among belief structures. We note that the entropy allocation with respect to all the focal  sets we get that
idea introduced here is closely related to Yager’s [9] concept the total probability allocated to xj is pj = qi=1 Card(F αi
fi (xj )
i)

of entailment of belief structures. We say belief structure m1 where fi (xj ) = 1 if x ࢠ Fi and fi (xj ) = 0 if x ࢡ Fi .
entails belief structure m2 if for every subset A the interval Let us now return to the situation of interest to us. We start
bounded by the upper and lower probabilities of A generated with a probability distribution P on X, we represent this as
by m2 contains the interval generated by m1 . As noted in [9] a Bayesian belief structure. We then take the negation of this
the process of entailment is similar to that of inference in logic, belief structure giving us the belief structure m̄. We then find the
that is if belief structure m1 entails belief structure m2 then if set M̄ of all Bayesian belief structures that are consistent with m̄.
we know m1 to be true we can infer that m2 is true. The set of probability distributions, Bayesian belief structures,
Let P̂ be a probability distribution and let m̂ be the Dempster- making up the set M̄ are all the possible probability distributions
Shafer belief structure associated with P̂. We recall that in this that can be the negation of our original probability distribution
case the focal elements of m̂ are Ej = {xj } with m̂(Ej ) = p̂j . P. M̄ can be seen as generating a possibility distribution over the
Assume we have another Dempster–Shafer belief structure m set of probability distributions on X [13].
with focal elements F1 , F2 , . . . Fn . We shall say that the prob- In general, as is well established in the theory of possibility
ability distribution P̂ is consistent with m if the belief structure [14], one can’t arbitrarily select one element from M̄ as the
m̂ associated with the probability distribution P̂ can be obtained negation. However if one adds some additional criteria it is
from m by the following process [9]–[12]: possible to select one element. In our case we shall select the
For every focal element Fi of m decompose it into singleton probability distribution in M̄ that has a maximal entropy type
subsets Bij , one for each xj ࢠFi , where Bij = {xj }. Allocate the allocation of the weights associated with the focal sets of m̄ its
weight aij to Bij respecting the following conditions: constituent elements. As we noted above this is what we called
1) aij ࢠ [0, 1]. the pignistic probability distribution.
2) m(Fi ) = Σj aij . We recall that M̄ consists of all the Bayesian structures ob-
3) m̂({xj }) = Σi aij . tained by some distribution of the weights of the focal elements
1902 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 23, NO. 5, OCTOBER 2015

of m̄ to the constituents of that focal element. Furthermore, we [3] R. R. Yager and L. Liu, Classic Works of the Dempster-Shafer Theory of
recall that each focal element of m̄ is of the form Bi = {xj | Belief Functions, A. P. Dempster and G. Shafer, Eds. Heidelberg, Ger-
rmany: Springer, 2008.
all j ࣔ i}. Thus Bi has n − 1 elements. Furthermore m̄(Bi ) = [4] A. Heyting, Intuitionism, an Introduction. Amsterdam, The Netherlands:
pi . In this case we get for the pignistic distribution the special North Holland, 1956.
Bayesian structure m̂ where [5] J. Aczel and Z. Daroczy, On Measures of Information and Their Charac-
terizations. New York, NY, USA: Academic, 1975.
 m̄(Bj )  m̄(Bj ) pj [6] R. R. Yager, “On the completion of qualitative possibility measures,” IEEE
m̂({xi }) = = Trans. Fuzzy Syst., vol. 1, no. 3, pp. 184–194, Aug. 1993.
n−1 n−1 n−1 [7] C. L. Shannon and W. Weaver, The Mathematical Theory of Communica-
j= i j= i
tion. Urbana, IL, USA: Univ. Illinois Press, 1964.
1 [8] R. R. Yager, “Arithmetic and other operations on Dempster-Shafer struc-
= (1 − pi ) = p̄i . tures,” Int. J. Man-Mach. Stud., vol. 25, pp. 357–366, 1986.
n−1 [9] R. R. Yager, “The entailment principle for Dempster-Shafer granules,”
Thus our proposed negation of a probability distribution can Int. J. Intell. Syst., vol. 1, pp. 247–262, 1986.
[10] P. Smets and R. Kennes, “The transferable belief model,” Artif. Intell., vol.
indeed be seen as the one that has maximum entropy allocation 66, pp. 191–234, 1994.
among all possible negations. [11] P. Smets, “Decision making in the TBM: the necessity of the pignistic
transformation,” Int. J. Approx. Reason., vol. 38, pp. 133–147, 2005.
[12] D. Han, J. Dezert, C. Han and Y. Yang, “Is entropy enough to evaluate the
IV. CONCLUSION probability transformation approach of belief function?” presented at the
In the preceding we have suggested a transformation to obtain 13th Conf. Inf. Fusion, Edinburgh, U.K., 2010.
[13] L. A. Zadeh, “Fuzzy sets as a basis for a theory of possibility,” Fuzzy Sets
the negation of a probability distribution. Using the Dempster– Syst., vol. 1, pp. 3–28, 1978.
Shafer theory of evidence we have shown of all the possible [14] W. Pedrycz and F. Gomide, Fuzzy Systems Engineering: Toward Human-
negations our proposed negation is the unique one that is based Centric Computing. New York, NY, USA: Wiley, 2007.
on maximum entropy.

REFERENCES
[1] A. P. Dempster, “Upper and lower probabilities induced by a multi-valued
mapping,” Ann. Math. Statis., vol. 38, pp. 325–339, 1967.
[2] G. Shafer, A Mathematical Theory of Evidence. Princeton, NJ, USA:
Princeton Univ. Press, 1976.

You might also like