236 views

Uploaded by api-3769743

- m10180[Shannon's Noisy Channel Coding]
- Communication
- sha2_512_fips_180.txt
- An Introduction to Digital Communications - Part 2
- Calculating entropy at different scales among diverse communication systems
- information theory
- Btech II Information Tech Revised 11042015
- tmpAEBC.tmp
- GEC 324-Technical Communication Note
- 0701024
- entropy-18-00007 (1)
- Reception Theory (1)
- The Rafiki Map
- enotita-B2
- Shannons Capacity Theorem and Spectral Efficiency of Modulation Methods
- EC2252-LP-B
- DSIP Case Studies
- LINGI2348 Channel Coding_Ex1
- EC2252-LP-A
- MMSec_Chap4_2004.pdf

You are on page 1of 22

www.elsevier.com/locate/fss

Andrea Sgarro ∗

Department of Mathematical Sciences (DSM), University of Trieste, 34100 Trieste, Italy

Received 20 April 2001; accepted 19 November 2001

Abstract

We de*ne information measures which pertain to possibility theory and which have a coding-theoretic meaning. We put

forward a model for information sources and transmission channels which is possibilistic rather than probabilistic. In the case

of source coding without distortion we de*ne a notion of possibilistic entropy, which is connected to the so-called Hartley’s

measure; we tackle also the case of source coding with distortion. In the case of channel coding we de*ne a notion of

possibilistic capacity, which is connected to a combinatorial notion called graph capacity. In the probabilistic case Hartley’s

measure and graph capacity are relevant quantities only when the allowed decoding error probability is strictly equal to zero,

while in the possibilistic case they are relevant quantities for whatever value of the allowed decoding error possibility; as the

allowed error possibility becomes larger the possibilistic entropy decreases (one can reliably compress data to smaller sizes),

while the possibilistic capacity increases (one can reliably transmit data at a higher rate). We put forward an interpretation of

possibilistic coding, which is based on distortion measures. We discuss an application, where possibilities are used to cope

with uncertainty as induced by a “vague” linguistic description of the transmission channel.

c 2001 Elsevier Science B.V. All rights reserved.

Keywords: Measures of information; Possibility theory; Possibilistic sources; Possibilistic entropy; Possibilistic channels; Possibilistic

capacity; Zero-error information theory; Graph capacity; Distortion measures

which have a coding-theoretic meaning. This kind of

When one speaks of possibilistic information the- operational approach to information measures was

ory, usually one thinks of possibilistic information *rst taken by Shannon when he laid down the foun-

measures, like U-uncertainty, say, and of their use dations of information theory in his seminal paper of

in uncertainty management; the approach which one 1948 [18], and has proved to be quite successful; it has

takes is axiomatic, in the spirit of the validation lead to such important probabilistic functionals as are

of Shannon’s entropy which is obtained by using source entropy or channel capacity. Below we shall

Hin<cin’s axioms; cf. e.g. [8,12–14]. In this paper adopt a model for information sources and transmis-

we take a diBerent approach: we de*ne information sion channels which is possibilistic rather than prob-

abilistic (is based on logic rather than statistics); this

will lead us to de*ne a notion of possibilistic entropy

Partially supported by MURST and GNIM-CNR. Part of this

and a notion of possibilistic capacity in much the same

paper, based mainly on Section 5; has been submitted for presen- way as one arrives at the corresponding probabilistic

tation at Ecsqaru-2001, to be held in September 2001 in Toulouse,

notions. An interpretation of possibilistic coding is

France.

∗ Corresponding author. Tel.: +40-6762623; fax: +40-6762636. discussed, which is based on distortion measures, a

E-mail address: sgarro@univ.trieste.it (A. Sgarro). notion which is currently used in probabilistic coding.

0165-0114/01/$ - see front matter c 2001 Elsevier Science B.V. All rights reserved.

PII: S 0 1 6 5 - 0 1 1 4 ( 0 1 ) 0 0 2 4 5 - 7

12 A. Sgarro / Fuzzy Sets and Systems 132 (2002) 11 – 32

We are con*dent that our operational approach may of data protection in noisy channels is devilishly

be a contribution to enlighten, if not to disentangle, diLcult, and has lead to a new and fascinating branch

the vexed question of de*ning adequate information of coding theory, and more generally of informa-

measures in possibility theory. tion theory and combinatorics, called zero-error

We recall that both the entropy of a probabilistic information theory, which has been pretty recently

source and the capacity of a probabilistic channel are overviewed and extensively referenced in [15]. In

asymptotic parameters; more precisely, they are limit particular, the zero-error capacity of a probabilis-

values for the rates of optimal codes, compression tic channel is expressed in terms of a remarkable

codes in the case of sources, and error-correction combinatorial notion called Shannon’s graph capac-

codes in the case of channels; the codes one consid- ity (graph-theoretic preliminaries are described in

ers are constrained to satisfy a reliability criterion Appendix A).

of the type: the decoding-error probability of the So, to be fastidious, even in the case of probabilistic

code should be at most equal to a tolerated value , entropy and probabilistic capacity one deals with two

06¡1. A streamlined description of source codes step-functions of , which can assume only two dis-

and channel codes will be given below in Sections tinct values, one for = 0 and the other for ¿0. We

4 and 5; even from our Keeting hints it is however shall adopt a model of the source and a model of the

apparent that, at least a priori, both the entropy of channel which are possibilistic rather than probabilis-

a source and the capacity of a channel depend on tic, and shall choose a reliability criterion of the type:

the value which has been chosen to specify the the decoding-error possibility should be at most equal

reliability criterion. If in the probabilistic models the to , 06¡1. As shown below, the possibilistic ana-

mention of is usually omitted, the reason is that logues of entropy and capacity exhibit quite a perspic-

the asymptotic values for the optimal rates are the uous step-wise behaviour as functions of , and so the

same whatever the value of , provided however that mention of cannot be disposed of. As for the “form”

is strictly positive. 1 Zero-error reliability criteria of the functionals one obtains, it is of the same type

lead instead to quite diBerent quantities, zero-error as in the case of the zero-error probabilistic measures,

entropy and zero-error capacity. Now, the problem even if the tolerated error possibility is strictly posi-

of compressing information sources at zero error tive. In particular, the capacities of possibilistic chan-

is so trivial that the term zero-error entropy is sel- nels are always expressed in terms of graph capacities;

dom used, if ever. 2 Instead, the zero-error problem in the possibilistic case, however, as one loosens the

reliability criterion by allowing a larger error possi-

1

The entropy and the capacity relative to a positive error

bility, the relevant graph changes and the capacity of

probability allow one to construct sequences of codes whose the possibilistic channel increases.

probability of a decoding error is actually in/nitesimal; it will be We describe the contents of the paper. In Sec-

argued below that this point of view does not make much sense tion 2, after some preliminaries on possibility theory,

for possibilistic coding; cf. Remark 4.3. possibilistic sources and possibilistic channels are

2 No error-free data compression is feasible for probabilistic

introduced. Section 3 contains two simple lemmas,

codes whose codewords have all the same length; this is why one Lemmas 3.1 and 3.2, which are handy tools apt to

has to resort to variable-length codes, e.g., to HuBman codes. As “ translate” probabilistic zero-error results into the

for variable-length coding, the possibilistic theory appears to lack framework of possibility theory. Section 4 is devoted

a counterpart for the notion of average length; one should have to to possibilistic entropy and source coding; we have

choose one of the various aggregation operators which have been

proposed in the literature (for the very broad notion of aggregation

decided to deal in Section 4 only with the problem of

operators, and of “averaging” aggregations in particular, cf., e.g., source coding without distortion, and to relegate the

[12] or [16]). Even if one insists on using block-codes, the problem more taxing case of source coding with distortion to

of data compression at zero error is far from trivial when a an appendix (Appendix B); this way we are able to

distortion measure is introduced; cf. Appendix B. In this paper make many of our points in an extremely simple way.

we deal only with the basics of Shannon’s theory, but extensions

are feasible to more involved notions, compound channels, say,

In Section 5, after giving a streamlined description

or multi-user communication (as for these information-theoretic of channel coding, possibilistic capacity is de*ned

notions cf., e.g., [3] or [4]). and a coding theorem is provided. Section 6 explores

A. Sgarro / Fuzzy Sets and Systems 132 (2002) 11 – 32 13

the consequences of changing the reliability criterion standard; for more details we refer the reader, e.g.,

used in Section 5; one requires that the average error to [3] or [4]. As for possibility theory, and in par-

possibility should be small, rather than the maximal ticular for a clari*cation of the elusive notion of

error possibility. 3 Up to Section 6, our point of view non-interactivity, which is often seen as the natural

is rather abstract: the goal is simply to understand possibilistic analogue of probabilistic independence

what happens when one replaces probabilities by pos- (cf. Section 2), we mention [5,6,9,11,12,16,23].

sibilities in the standard models for data transmission.

A discussion of the practical meaning of our proposal

is instead deferred to Section 7: we put forward an in- 2. Possibilistic sources and possibilistic channels

terpretation of the possibilistic model which is based

on distortion measures. We discuss an application to We recall that a possibility distribution

the design of error-correcting telephone keyboards; in over a *nite set A = {a1 ; : : : ; ak }, called the al-

the spirit of “soft ” mathematics possibilities are seen phabet, is de*ned by giving a possibility vector

as numeric counterparts for linguistic labels, and are = ( 1 ; 2 ; : : : ; k ) whose components i are the

used to cope with uncertainty as induced by “vague” possibilities (ai ) of the k singletons ai (16i6k,

linguistic information. k¿2):

Section 7 points also to future work, which does

(ai ) = i ; 0 6 i 6 1; max i = 1:

not simply aim at a possibilistic translation and 16i6k

generalization of the probabilistic approach. Open

problems are mentioned, which might prove to be The possibility 4 of each subset A ⊆ A is the maxi-

stimulating also from a strictly mathematical view- mum of the possibilities of its elements:

point. In this paper we take the asymptotic point of

(A) = max i : (2.1)

view which is typical of Shannon theory, but one ai ∈A

might prefer to take the constructive point of view of

algebraic coding, and try to provide *nite-length code In particular (∅) = 0; (A) = 1. In logical terms

constructions, as those hinted at in Section 7. We taking a maximum means that event A is -possible

deem that the need for a solid theoretical foundation when at least one of its elements is so, in the sense of

of “soft ” coding, as possibilistic coding basically is, a logical disjunction.

is proved by the fact that several ad hoc coding algo- Instead, probability distributions are de*ned

rithms are already successfully used in practice, e.g., through a probability vector P = (p1 ; p2 ; : : : ; pk ),

those for compressing images, which are not based on P(ai ) = pi ; 06pi 61, 16i6k pi = 1, and have an

probabilistic descriptions of the source or of the chan- additive nature:

nel (an exhaustive list of source coding algorithms is P(A) = pi :

to be found in [21]). Probabilistic descriptions, which ai ∈A

are derived from statistical estimates, are often too

costly to obtain, or even unfeasible, and at the same With respect to probabilities, an empirical interpreta-

time they are uselessly detailed. tion of possibilities is less clear. The debate on the

The paper aims at a minimum level of self- meaning and the use of possibilities is an ample and

containment, and so we have shortly re-described long-standing one; the reader is referred to standard

certain notions of information theory which are quite texts on possibility theory, e.g., those quoted at the

probabilistic frame, as argued in Section 3: it is enough to take 4 The fact that the symbol is used both for vectors and for

possibilities which are equal to zero when the probability is zero, distributions will cause no confusion; below the same symbol will

and equal to one when the probability is positive, whatever its be used also to denote a stationary and non-interactive source,

value. However, the consideration of possibility values which are since the behaviour of the latter is entirely speci*ed by the vector

intermediate between zero and one does enlarge the frame; cf. . Similar conventions will be tacitly adopted also in the case of

Theorem 6.1 in Section 6, and the short comment made there just probabilistic sources, and of probabilistic and possibilistic chan-

before giving its proof. nels.

14 A. Sgarro / Fuzzy Sets and Systems 132 (2002) 11 – 32

cability of our model to real-world data transmission

is discussed. W n (y|x) = W n (y1 y2 : : : yn |x1 x2 : : : xn )

The probability distribution P over A can be ex- n

tended in a stationary and memoryless way to a prob-

= W (yi |xi ): (2.2)

ability distribution P n over the Cartesian power An i=1

by setting for each sequence x = x1 x2 : : : xn ∈ An :

Note that W n is itself a stochastic matrix whose rows

P n (x) = P(xi ): are headed to the sequences in An , and whose columns

16i6n are headed to the sequences in Bn . The memoryless

nature of the channel is expressed by the fact that the

We recall that the elements of An are the k n sequences n transition probabilities W (yi |xi ) are multiplied.

of length n built over the alphabet A. Each such se- We now de*ne the possibilistic analogue of stochas-

quence can be interpreted as the information which is tic (probabilistic) matrices. The k rows of a possibilis-

output in n time instants by a stationary and memo- tic matrix with h columns are possibility vectors

ryless source, or SML source. The memoryless nature over the output alphabet B. Each entry (b|a) will

of the source is expressed by the fact that the n prob- be interpreted as the transition possibility 5 from the

abilities P(xi ) are multiplied. Similarly, we shall ex- input letter a ∈ A to the output letter b ∈ B; cf. the

tend the possibility distribution in a stationary and example given below. In De*nition 2.2 is such a

non-interactive way to a possibility distribution [n] possibilistic matrix.

over the Cartesian power An :

Denition 2.2. A stationary and non-interactive

Denition 2.1. A stationary and non-interactive channel, or SNI channel, [n] , extends to n-tuples

information source over the alphabet A is de*ned by and is de*ned as follows:

setting for each sequence x ∈ An :

[n] (y|x) = [n] (y1 y2 : : : yn |x1 x2 : : : xn )

[n]

(x) = min (xi ):

16i6n = min (yi |xi ): (2.3)

16i6n

In logical terms, this means that the occurrence of

sequence x = x1 x2 : : : xn is declared -possible when Products as in (2:2) are replaced in (2:3) by a

this is so for all of the letters xi , in the sense of a logical minimum operation; this expresses the non-interactive

conjunction. An interpretation of non-interactivity in nature of the extension. Note that [n] is itself a

our models of sources and channels is discussed in possibilistic matrix whose rows are headed to the

Section 7. sequences in An , and whose columns are headed to

the sequences in Bn . Taking the minimum of the n

Let A = {a1 ; : : : ; ak } and B = {b1 ; : : : ; bh } be two transition possibilities (yi |xi ) can be interpreted as a

alphabets, called in this context the input alphabet

and the output alphabet, respectively. Probabilistic

5 Of course transition probabilities and transition possibilities

channels are usually described by giving a stochastic

matrix W whose rows are headed to the input alpha- are conditional probabilities and conditional possibilities, respec-

tively, as made clear by our notation which uses a conditioning

bet A and whose columns are headed to the out- bar. We have avoided mentioning explicitly the notion of condi-

put alphabet B. We recall that the k rows of such a tional possibilities because they are the object of a debate which is

stochastic matrix are probability vectors over the out- far from being closed (cf. e.g., Part II of [5]); actually, the worst

put alphabet B; each entry W (b|a) is interpreted as problems are met when one starts by assigning a joint distribution

the transition probability from the input letter a ∈ A and wants to compute the marginal and conditional ones. In our

case it is instead conditional possibilities that are the starting point:

to the output letter b ∈ B. A stationary and memo- as argued in [2], “prior” conditional possibilities are not problem-

ryless channel W n , or SML channel, extends W to atic, or rather they are no more problematic than possibilities in

n-tuples, and is de*ned by setting for each x ∈ An and themselves.

A. Sgarro / Fuzzy Sets and Systems 132 (2002) 11 – 32 15

logical conjunction: only when all the transitions -equivalent when the following double implication

are -possible, it is -possible to obtain output y from holds ∀a ∈ A:

input x; cf. also Section 7. If B is a subset of Bn , one

has in accordance with (2:1): P(a) = 0 ⇔ (a) 6 :

[n] (B|x) = max [n] (y|x):

y∈B The following lemma shows that -equivalence,

rather than a relation between letters, is a relation

Example 2.1. For A = B = {a; b} we show a possi- pertaining to the extended distributions P n and [n] ,

bilistic matrix and its “square” [2] which speci*es seen as set-functions over An :

the transition possibilities from input couples to output

couples. The possibility that a is received when b is Lemma 3.1. Fix n¿1. The probability vector P and

sent is ; this is also the possibility that aa is received the possibility vector are -equivalent if and only

when ab is sent, say; 0661. Take B = {aa; bb}; if the following double implication holds ∀A ⊆ An :

then [2] (B|ab) = max[; 0] = . In Section 6 this ex-

ample will be used assuming = 0, = 1. P n (A) = 0 ⇔ [n] (A) 6 :

[2] | aa ab ba bb Proof. To prove that the double implication implies

| a b −− + −− −− −− −− -equivalence, just take A = {aa : : : a} for each letter

−− + −− −− aa | 1 0 0 0 a ∈ A. Now we prove that if P and are -equivalent

then the double implication in Lemma 3.1 holds true.

a | 1 0 ab | 1 0 0 First assume A is a singleton, and contains only se-

b | 1 ba | 0 1 0 quence x. The following chain of double implications

bb | 1 holds:

P n (x) = 0 ⇔ ∃i : P(xi ) = 0 ⇔

3. A few lemmas

not matter, what matters is only whether that prob- i

⇔ (x) 6 :

sponding event E is “impossible” or “possible”. The

canonical transformation maps probabilities to binary

This means that, if the two vectors P and

(zero-one) possibilities by setting Poss{E} = 0 if and

are -equivalent, so are also P n and [n] , seen as vec-

only if Prob{E} = 0, else Poss{E} = 1; this transfor-

tors with k n components. Then the following chain

mation can be applied to the components of a prob-

holds too, whatever the size of A:

ability vector P or to the components of a stochastic

matrix W to obtain a possibility vector or a pos-

P n (A) = 0 ⇔ ∀x ∈ A : P n (x) = 0 ⇔

sibilistic matrix , respectively. Below we shall in-

troduce an equivalence relation called -equivalence

which in a way extends the notion of a canonical trans-

formation; here and in the sequel is a real number ∀x ∈ A : [n] (x) 6 ⇔

such as 06¡1. It will appear that a vector or a

max [n] (x) 6 ⇔ [n] (A) 6 :

matrix obtained canonically from P or from W are x∈A

-equivalent to P or to W , respectively, for whatever

value of . However simple, Lemma 3.1 and its straightfor-

ward generalization to channels, Lemma 3.2 below,

Denition 3.1. A probability vector P and a pos- are the basic tools used to convert probabilistic

sibility vector over alphabet A are said to be zero-error coding theorems into possibilistic ones.

16 A. Sgarro / Fuzzy Sets and Systems 132 (2002) 11 – 32

bilistic matrix are said to be -equivalent when the

following double implication holds ∀a ∈ A; ∀b ∈ B: (a; a ) = ((|a); (|a ))

W (b|a) = 0 ⇔ (b|a) 6 : b∈B

Example 3.1. We re-take Example 2.1 above. One

Lemma 3.2. Fix n¿1. The stochastic matrix W and has: (a; a) = (b; b) = 1, (a; b) = . With re-

the possibility matrix are -equivalent if and only spect to [2] , the proximity of two letter couples x

if the following double implication holds ∀x ∈ An , and x is either 1 or , according whether x = x or

∀B ⊆ Bn : x = x (recall that [2] can be viewed as a possibilistic

matrix over the “alphabet ” of letter couples). Cf. also

Examples 5.1, 5.2 and the example worked out in

W n (B|x) = 0 ⇔ [n] (B|x) 6 :

Section 7.

In Sections 5 and 6 on channel coding we shall

Denition 3.3. Once a possibilistic matrix and a

need the following notion of confoundability between

number are given (06¡1), two input letters a and

letters: two input letters a and a are confoundable for

a are said to be -confoundable if and only if their

the probabilistic matrix W if and only if there exists at

proximity exceeds :

least an output letter b such that the transition proba-

bilities W (b|a) and W (b|a ) are both strictly positive. (a; a ) ¿ :

Given matrix W , one can construct a confoundability

graph G(W ) whose vertices are the letters of A by Given and , one constructs the -confoundability

joining two letters by an edge if and only if they are graph G (), whose vertices are the letters of A, by

confoundable (graph-theoretic notions are described joining two letters by an edge if and only if they are

in Appendix A). -confoundable for .

We now de*ne a similar notion for possibilis-

tic matrices; to this end we introduce a proximity Lemma 3.3. If the stochastic matrix W and the pos-

index between possibility vectors = ( 1 ; 2 ; : : :) sibilistic matrix are -equivalent the two confound-

and = ( 1 ; 2 ; : : :), which in our case will be pos- ability graphs G(W ) and G () coincide.

sibility vectors over the output set B:

Proof. We have to prove that, under the assumption

of -equivalence, any two input letters a and a are

(; ) = max [ i ∧ i ]:

16i6h confoundable for the stochastic matrix W if and only

if they are -confoundable for the possibilistic matrix

Above the wedge symbol ∧ stands for a minimum . The following chain of double implications holds:

and is used only to improve readability. The in-

dex is symmetric: (; ) = ( ; ). One has a and a are confoundable for W ⇔

06(; )61, with (; ) = 0 if and only

∃b: W (b|a) ¿ 0; W (b|a ) ¿ 0 ⇔

and have disjoint supports, and (; ) = 1

if and only if there is at least a letter a for which ∃b: (b|a) ¿ ; (b|a ) ¿ ⇔

(a) = (a) = 1; in particular, this happens when

= (we recall that the support of a possibility max [(b|a) ∧ (b|a )] ¿ ⇔

b

vector is made up by those letters whose possibility

is strictly positive). (a; a ) ¿ ⇔ a and a are

The proximity index will be extended to input

letters a and a , by taking the corresponding rows in -confoundable for :

A. Sgarro / Fuzzy Sets and Systems 132 (2002) 11 – 32 17

Remark 3.1. The index (a; a ) establishes a fuzzy with distortion, as explained in Appendix B. The rea-

relation between input letters, which may be repre- son for con*ning the general case with distortion to

sented by means of a fuzzy graph G() with vertex an appendix is just ease of readability: actually, data

set equal to the input alphabet A: each edge (a; a ) compression without distortion as covered in this sec-

belongs to the edge set of G() with a degree of tion oBers no real problem from a mathematical point

membership equal to (a; a ). Then the crisp graphs of view, but at the same type is very typical of the

G () are obtained as (strong) -cuts of the fuzzy novelties which our possibilistic approach to coding

graph G(). By the way, we observe that (a; a ) presents with respect to the standard probabilistic ap-

is a proximity relation in the technical sense of fuzzy proach.

set theory [17]. For basic notions in fuzzy set theory We give a streamlined description of what a source

cf., e.g., [7,12] or [16]. code f is; for more details we refer to [3] or [4].

A code f is made up of two elements, an encoder f+

and a decoder f− ; the encoder maps the n-sequences

4. The entropy of a possibilistic source of An , i.e., the messages output by the information

source, to binary strings of a *xed length l called code-

We start by the following general observation, words; the decoder maps back codewords to messages

which applies both to source and channel coding. in a way which should be “reliable”, as speci*ed be-

The elements which de*ne a code f, i.e., the en- low. In practice (and without loss of generality), the

coder f+ and the decoder f− (cf. below), do not basic element of a code is a subset C ⊆ An of mes-

require a probabilistic or a possibilistic description sages, called the codebook. The idea is that, out of the

of the source, or of the channel, respectively. One k n messages output by the information source, only

must simply choose the alphabets (or at least the those belonging to the codebook C are given sepa-

alphabet sizes): a primary alphabet A, which is the rate binary codewords and are properly recognized

source alphabet in the case of sources and the input by the decoder; should the source output a message

alphabet in the case of channels, and the secondary which does not belong to C, then the encoder will

alphabet B, which is the reproduction alphabet in the use any of the binary codewords meant for the mes-

case of sources and the output alphabet in the case of sages in C, and so a decoding error will be committed.

channels. 6 One must also specify a length n, which Thinking of a source which is modelled probabilis-

is the length of the messages which are encoded in tically, as is standard in information theory, a good

the case of sources, and the length of the codewords code should trade oB two conKicting demands: the bi-

which are sent through the channel in the case of nary codewords should be short, so as to ensure com-

channels, respectively. Once these elements, A, B pression of data, while the error probability should

and n, have been chosen, one can construct a code f, be small, so as to ensure reliability. In practice, one

i.e., a couple encoder=decoder. Then one can study chooses a tolerated error probability , 06¡1, and

the performance of f by varying the “behaviour” of then constructs a set C as small as possible with the

the source (of the channel, respectively): for exam- constraint that its probability be at least as great as

ple one can *rst assume that this behaviour has a 1 − . The number n−1 log |C| is called the code rate;

probabilistic nature, while later one changes to a less log |C| is interpreted as the (non-necessarily integer)

committal possibilistic description. length of the binary sequences which encode source

The *rst coding problem which we tackle is data sequences 7 and so the rate is measured as number of

compression without distortion. The results of this

section, or at least their asymptotic counterparts, in-

clusive of the notion of -entropy, might have been 7 In Shannon theory one often incurs into the slight but conve-

obtained as a very special case of data compression nient inaccuracy of allowing non-integer “lengths”. By the way,

the logarithms here and below are all to the base 2, and so the unit

we choose for information measures is the bit. Bars as in |C| de-

6 Actually, in source coding without distortion the primary al- note size, i.e., number of elements. Notice that, not to overcharge

phabet and the reproduction alphabet coincide and so the latter is our notation, the mention of the length n is not made explicit in

not be explicitly mentioned in Section 4. the symbols which denote coding functions f and codebooks C.

18 A. Sgarro / Fuzzy Sets and Systems 132 (2002) 11 – 32

bits per source letter. Consequently, the fundamental whose rate log |{a: P(a)¿0}| is the same whatever

optimization problem of probabilistic source coding the given length n. Consequently, this is also the value

boils down to *nding a suitable codebook C: of the zero-error entropy H0 (P) of the SML source:

1 H0 (P) = log |{a: P(a) ¿ 0}|:

Minimize the code rate log |C| with the

n

For strictly positive, one has, as is well known,

constraint Prob{¬C} 6 (4.1)

H (P) = H(P)6H0 (P), the inequality being strict

(the symbol ¬ denotes negation, or set complementa- unless P is uniform over its support. Note that the

tion). As is usual, we shall consider only Bernoullian -entropy is a step-function of : however, the func-

(i.e., stationary and memoryless, or SML) sources, tion’s step is obtained only if one keeps the value

which are completely described by the probability vec- = 0, which is rather uninteresting because it corre-

tor P over the alphabet letters of A; then in (4:1) the sponds to a situation where no data-compression is

generic indication of probability can be replaced by feasible, but only data transcription into binary; this

the more speci*c symbol P n . happens, say, when one uses ASCII. The zero-error

Given the SML source P, its -entropy H (P) is de- entropy H0 (P) is sometimes called Hartley’s measure

*ned as the limit of the rates Rn (P; ) of optimal codes (cf. [12]); in the present context it might be rather

which solve the optimization problem (4:1); obtained called Hartley’s entropy ( = 0), to be set against

as the length n of the encoded messages goes to in*n- Shannon’s entropy (¿0).

ity:

Example 4.1. Take P = (1=2; 1=4; 1=4; 0) over an

H (P) = lim Rn (P; ): alphabet A of four letters. For = 0 one has

n→+∞

H0 (P) = log 3 ≈ 1:585, while H (P) = H(P) = 1:5

In the probabilistic theory there is a dramatic diBerence

whenever 0¡¡1.

between the case = 0 and the case = 0. Usually one

tackles the case = 0, the only one which allows actual

We now go to a stationary and non-interactive

data compression, as it can be proved. It is well-known

source, or SNI source, over alphabet A, which is

that the rates of optimal codes tend to the Shannon

entirely described by the possibilistic vector over

entropy H(P) as n goes to in*nity:

alphabet letters. The source coding optimization prob-

H (P) = H(P) = − pi log pi ; 0 ¡ ¡ 1 lem (4.1) will be replaced by (4.3), where one bounds

16i6k

the decoding error possibility rather than the decoding

error probability:

(we use the script symbol H to distinguish Shannon

entropy from the operational entropy H ). So Shan- 1

Minimize the code rate log |C|

non entropy is the asymptotic value of optimal rates. n

Note that this asymptotic value does not depend on

the tolerated error probability ¿0; only the speed of with the constraint [n] (¬C) 6 : (4.3)

convergence is aBected; this is why the mention of

Now we shall de*ne the possibilistic entropy; as in the

is in most cases altogether omitted. Instead, we *nd

probabilistic case, the de*nition is operational, i.e., is

it convenient to explicitly mention , and say that the

given in terms of a coding problem.

(probabilistic) -entropy H (P) of the SML source

ruled by the probability vector P is equal to the Shan-

Denition 4.1. Given the stationary and non-

non entropy H(P) for whatever ¿0.

interactive source , its possibilistic -entropy H ()

Let us go to the case = 0. In this case the structure

is de*ned as the limit of the rates Rn (; ) of optimal

of optimal codebooks is extremely simple: each se-

codes which solve the optimization problem (4.3),

quence of positive probability must be given its own

obtained as the length n goes to in*nity:

codeword, and so the optimal codebook is

C = {a: P(a) ¿ 0}n (4.2) H () = lim Rn (; ):

n→+∞

A. Sgarro / Fuzzy Sets and Systems 132 (2002) 11 – 32 19

Because of Lemma 3.1, the constraint in (4.3) can Proposition 4.1. If 06¡ ¡1, then H ()¿

be re-written as P n {¬C} = 0 for whatever P which is H (). If i ¡ i+1 are two consecutive entries in ,

-equivalent with . This means that solving the min- then H () is constant for i 6¡ i+1 .

imization problem (4.3) is the same as solving the

minimization problem (4.1) at zero error for whatever Example 4.2. Take = (1; 1; 1=2; 1=2; 1=4; 0) over an

P such as to be -equivalent with . So, the following alphabet A of six letters. Then H () = log 5 ≈ 2:322

lemma holds: when 06¡1=4, H () = log 4 = 2 when 1=46¡

1=2, H () = log 2 = 1 when 1=26¡1.

Lemma 4.1. If P and are -equivalent, the very

same code f = (f+ ; f− ) which is optimal for crite- Remark 4.1. In the probabilistic case the constraint

rion (4:1) at zero error, with P n (¬C) = 0, is optimal Prob{¬C}6 can be re-written in terms of the prob-

also for criterion (4:3) at -error, with [n] (¬C)6, ability of correct decoding as Prob{C}¿1 − , be-

and conversely. cause Prob{C} + Prob{¬C} = 1. Instead, the sum

Poss{C} + Poss{¬C} can be strictly larger than 1,

A comparison with (4.2) shows that an optimal and so Poss{C}¿1 − is a diBerent constraint. This

codebook C for (4.3) is formed by all the sequences constraint, however, would be quite loose and quite

of length n which are built over the sub-alphabet of uninteresting, since the possibility Poss{C} of correct

those letters whose possibility exceeds : decoding and the error possibility Poss{¬C} can be

both equal to 1 at the same time.

C = {a: (a) ¿ }n : Remark 4.2. Unlike in the probabilistic case, in

the possibilistic case replacing the “weak” reliabil-

Consequently: ity constraint [n] {¬C}6 by a strict inequality,

[n] {¬C}¡, does make a diBerence even asymptot-

Theorem 4.1. The possibilistic -entropy H () is ically. In this case the De*nition 3.1 of -equivalence

given by: should be modi*ed by requiring P(a) = 0 if and

only if (a)¡, 0¡61. The “strict” possibilistic

H () = log |{a: (a) ¿ }|; 0 6 ¡ 1: entropy one would obtain is however the same step-

function as H () above, only that the “steps” of the

The fact that the possibilistic entropy is obtained new function would be closed on the right rather than

as the limit of a constant sequence of optimal rates being closed on the left.

is certainly disappointing; however, asymptotic opti-

mal rates are not always so trivially found, as will ap- Remark 4.3. In the probabilistic case one can pro-

pear when we discuss channel coding (Section 5) or duce a sequence of source codes whose rate tends

source coding with distortion (Appendix B); we shall to Shannon entropy and whose error probability goes

comment there that reaching an optimal asymptotic to zero; in other terms Shannon entropy allows one

value “too soon” (for n = 1) corresponds to a situation to code not only with a decoding error probability

where one is obliged to use trivial code constructions. bounded by any ¿0, but even with an “in*nitesi-

In a way, we have simply proved that in possibilistic mal” (however, positive) error probability. In our pos-

source coding without distortion trivial code construc- sibilistic case, however, requiring that the error pos-

tions are unavoidable. sibility goes to zero is the same as requiring that it is

Below we stress explicitly the obvious fact that zero for n high enough, as easily perceived by con-

the possibilistic entropy H () is a stepwise non- sidering that the possibilistic entropy is constant in a

increasing function of , 06¡1. The steps of the right neighbourhood of = 0.

function H () begin in correspondence to the dis-

tinct possibility components i ¡1 which appear in Remarks 4.1, 4.2 and 4.3, suitably reformulated,

vector , inclusive of the value 0 = 0 even if 0 is would apply also in the case of source coding with

not a component of ; below the term “consecutive” distortion as in Appendix B and channel coding as in

refers to an ordering of the numbers i . Section 5, but we shall no further insist on them.

20 A. Sgarro / Fuzzy Sets and Systems 132 (2002) 11 – 32

5. The capacity of a possibilistic channel declare a code f reliable, one requires that the proba-

bility that the output sequence does not belong to the

Let A = {a1 ; : : : ; ak } and B = {b1 ; : : : ; bh } be two correct decoding set is acceptably low, i.e., below a

alphabets, called in this context the input alphabet pre-assigned threshold , 06¡1. If one wants to play

and the output alphabet, respectively. We give a safe, one has to insist that the decoding error should

streamlined description of what a channel code is; for be low for each codeword c ∈ C which might have

more details we refer to [3,4], and also to [15], which been transmitted (a looser criterion will be examined

is speci*cally devoted to zero-error information the- in Section 6). The reliability criterion which a code f

ory. The basic elements of a code f are the encoder must meet is so:

f+ and the decoder f− . The encoder f+ is an in-

max W n (¬Dc |c) 6 : (5.1)

jective (invertible) mapping which takes uncoded c∈C

messages onto a set of codewords C ⊆ An ; the set

We recall that the symbol ¬ denotes negation, or

M of uncoded messages is left unspeci*ed, since its

set-complementation; of course the inequality sign in

“structure” is irrelevant. Codewords are sent as input

(5.1) can be replaced by an equality sign whenever

sequences through a noisy medium, or noisy channel.

= 0. Once the length n and the threshold are cho-

They are received at the other end of the channel as

sen, one can try to determine the rate Rn = Rn (W; )

output sequences which belong to Bn . The decoder

of an optimal code which solves the optimization

f− takes back output sequences to the codewords of

problem:

C, and so to the corresponding uncoded messages.

This gives rise to a partition of Bn into decoding sets, Maximize the code rate Rn so as to satisfy

one for each codeword c ∈ C. Namely, the decoding

the constraint (5:1):

set Dc for codeword c is Dc = {y: f− (y) = c} ⊆ Bn .

The most important feature of a code f = (f+ ; f− ) The job can be quite tough, however, and so

is its codebook C ⊆ An of size |C|. The decoder f− , one has often to be contented with the asymp-

and so the decoding sets Dc , are often chosen by use of totic value of the optimal rates Rn , which is ob-

some statistical principle, e.g., maximum likelihood, tained when the codeword length n goes to in-

but we shall not need any special assumption (pos- *nity. This asymptotic value is called the -

sibilistic decoding strategies are described in [1,10]). capacity of channel W . For 0¡¡1 the capacity

The encoder f+ will never be used in the sequel, and C is always the same, only the speed of conver-

so its speci*cation is irrelevant. The rate Rn of a code gence of the optimal rates to C is aBected by

f with codebook C is de*ned as the choice of . When one says “capacity” one

refers by default to this positive -capacity; cf. [3]

1

Rn = log |C|: or [4].

n Instead, when = 0 there is a dramatic change.

The number log |C| can be seen as the (not nec- In this case one uses the confoundability graph

essarily integer) binary length of the uncoded G(W ) associated with channel W ; we recall that

messages, the ones which carry information; then in G(W ) two vertices, i.e., two input letters a and

the rate Rn is interpreted as a transmission speed, a , are adjacent if and only if they are confound-

which is measured in information bits (bit frac- able; cf. Section 3. If W n is seen as a stochastic

tions, rather) per transmitted bit. The idea is to matrix with k n rows headed to An and hn columns

design codes which are fast and reliable at the headed to Bn , one can consider also the confound-

same time. Once a reliability criterion has been ability graph G(W n ) for the k n input sequences of

chosen, one tries to *nd the optimal code for length n; two input sequences are confoundable, and

each pre-assigned codeword length n, i.e., a code so adjacent in the graph, when there is an output

with highest rate among those which meet the sequence which can be reached from any of the

criterion. two with positive probability. If C is a maximal

Let us consider a stationary and memoryless chan- independent set in G(W n ) the limit of n−1 log |C|

nel W n , or SML channel, as de*ned in (2.2). To when n goes to in*nity is by de*nition the graph

A. Sgarro / Fuzzy Sets and Systems 132 (2002) 11 – 32 21

capacity C(G(W )) of the confoundability graph The following lemma is soon obtained from

G(W ). 8 Lemmas 3.2 and 3.3, and in its turn soon implies

As easily checked, the codebook C ⊆ An of an op- Theorem 5.1; it states that possibilistic coding and

timal code is precisely a maximal independent set of zero-error probabilistic coding are diBerent formula-

G(W n ). Consequently, the zero-error capacity C0 (W ) tions of the same mathematical problem.

of channel W is equal to the capacity of the corre-

sponding confoundability graph G(W ): Lemma 5.1. Let the SML channel W and the SNI

C0 (W ) = C(G(W )): channel be -equivalent. Then a code f = (f+ ; f− )

satis/es the reliability criterion (5:1) at zero error

The paper [19] which Shannon published in 1956 and for the probabilistic channel W if and only if it sat-

which contains these results inaugurated zero-error is/es the reliability criterion (5:2) at -error for the

information theory. Observe however that the last possibilistic channel .

equality gives no real solution to the problem of as-

sessing the zero-error capacity of the channel, but

Theorem 5.1. The codebook C ⊆ An of an optimal

simply re-phrases it in a neat combinatorial language;

code for criterion (5:2) is a maximal independent

actually, a single-letter expression of the zero-error

set of G ([n] ). Consequently, the -capacity of the

capacity is so far unknown, at least in general (“single-

possibilistic channel is equal to the capacity of the

letter” means that one is able to calculate the limit so

corresponding -confoundability graph G ():

as to get rid of the codeword length n). This unpleas-

ant observation applies also to Theorem 5.1 below.

We now pass to a stationary and non-interactive C () = C(G ()):

channel [n] , or SNI channel, as de*ned in (2.3). The

reliability criterion (5.1) is correspondingly replaced Observe that the speci*cation of the decoding sets

by: Dc of an optimal code (and so of the decoding strat-

egy) is obvious: one decodes y to the unique code-

max [n] (¬Dc |c) 6 : (5.2) word c for which [n] (y|c)¿; there cannot be two

c∈C

codewords with this property, because they would be

The optimization problem is now: -confoundable, and this would violate independence.

If [n] (y|c)6 for all c ∈ C, then y can be assigned

Maximize the code rate Rn so as to satisfy to any decoding set, this choice being irrelevant from

the point of view of criterion (5.2).

the constraint (5:2): Below we stress explicitly the obvious fact that the

The number is now the error possibility which we are graph capacity C () = C(G ()) is a stepwise non-

ready to accept. Again the inequality sign in (5.2) is to decreasing function of , 06¡1; the term “consecu-

be replaced by the equality sign when = 0. A looser tive” refers to an ordering of the distinct components

criterion based on average error possibility rather than i which appear in ( i can be zero even if zero does

maximal error possibility will be examined in Sec- not appear as an entry in ):

tion 6.

Proposition 5.1. If 06¡ ¡1, then C ()6

Denition 5.1. The -capacity of channel is the C (). If i ¡ i+1 are two consecutive entries in ,

limit of optimal code rates Rn (; ), obtained as the then C () is constant for i 6¡ i+1 .

codeword length n goes to in*nity.

Example 5.1. Binary possibilistic channels. The in-

8 We recall that an independent set in a graph, called also a put alphabet is binary, A = {0; 1}, the output alphabet

stable set, is a set of vertices no two of which are adjacent,

and so in our case the vertices of an independent set are never

is either the same (“doubly” binary channel), or is aug-

confoundable; all these graph-theoretic notions, inclusive of graph mented by an erasure symbol 2 (binary erasure chan-

capacity, are explained more diBusely in Appendix A. nel); the corresponding possibilistic matrices 1 and

22 A. Sgarro / Fuzzy Sets and Systems 132 (2002) 11 – 32

| a1 a2 a3 a4 a5

−− + −− −− −− −− −−

1 | 0 1 2 | 0 2 1

a1 | 1 0 0

−− + −− −− −− + −− −− −−

a2 | 0 1 0

0 | 1 0 | 1 0

a3 | 0 0 1

1 | 1 1 | 0 1

a4 | 0 0 1

a5 | 0 0 1

with 0¡ 6¡1. As soon checked, one has for the After setting by circularity a6 = a1 , a7 = a2 , one

proximities between input letters: 1 (0; 1) = in the has: (ai ; ai ) = 1¿(ai ; ai+1 ) = ¿(ai ; ai+2 ) = ,

case of the doubly binary channel and 2 (0; 1) = 6 16i65. Capacities can be computed as explained in

in the case of the erasure channel. The rele- Appendix A: C0 () = 0√(the corresponding graph is

vant confoundability graphs are G0 (1 ) = G0 (2 ), complete), C () = log 5 (the pentagon graph pops

where the input letters 0 and 1 are adjacent, and up), C () = log 5 (the corresponding graph is√edge-

G (1 ) = G (2 ), where they are not. One has free). So C () = 0 for 06¡ , C () = log 5 for

C0 (1 ) = C0 (2 ) = 0, C (1 ) = C (2 ) = 1, and so 6¡, else C () = log 5.

C (1 ) = C (2 ) = 0 for 06¡ , C (1 ) = 0¡

C (2 ) = 1 for 6¡, else C (1 ) = C (2 ) = 1.

Some of these intervals may vanish when the tran-

sition possibilities and are allowed to be equal 6. Average-error capacity versus maximal-error

and to take on also the values 0 and 1. Data trans- capacity

mission is feasible when the corresponding capacity

is positive. In this case, however the capacity is Before discussing an interpretation and an applica-

“too high” to be interesting, since a capacity equal tion of the possibilistic approach, we indulge in one

to 1 in the binary case means that the reliability more “technical” section. In the standard theory of

criterion is so loose that no data protection is re- probabilistic coding the reliability criterion (5.1) is of-

quired: for *xed codeword length n, the optimal ten replaced by the looser criterion:

codeword is simply C = An . Actually, whenever

the input alphabet is binary, one is necessarily con- 1 n

W (¬Dc |c) 6

fronted with two limit situations which are both |C|

c∈C

uninteresting: either the confoundability graph is

complete and the capacity is zero (i.e., the relia- which requires that the average probability of error,

bility criterion is so demanding that reliable trans- rather than the maximal probability, be smaller than

mission of data is hopeless), or the graph is edge- so as to be declared acceptable. Roughly speaking, one

free and the capacity is maximal (the reliability no longer requires that all codewords perform well,

criterion is so undemanding that data protection is but is contented whenever “most” codewords do so,

not needed). In Section 7 we shall hint at interac- and so resorts to an arithmetic mean rather than to

tive models for possibilistic channels which might a maximum operator. The new criterion being looser

prove to be interesting also in the binary case; cf. for ¿0, higher rates can be achieved; however one

Remark 7.1. proves that the gain evaporates asymptotically (cf.,

e.g., [4]). So, the average-error -capacity and the

Example 5.2. A “rotating” channel. Take k = 5; maximal-error -capacity (the only one we have con-

the quinary input and output alphabet is the same; sidered so far) are in fact identical. We shall pursue

the possibilistic matrix “rotates” the row-vector a similar approach also in the case of possibilistic

A. Sgarro / Fuzzy Sets and Systems 132 (2002) 11 – 32 23

channels, and adopt the reliability criterion: in the possibilistic case the maximal-error capacity

1 [n] and the average-error capacity coincide for all . We

(¬Dc |c) 6 (6.1) stress that, unlike Theorem 5.1, Theorem 6.1 is not

|C|

c∈C solved by simply re-cycling a result already available

rather than (5.2). The corresponding optimization in the probabilistic framework (even if the “expur-

problem is: gation” technique used below is a standard tool of

Shannon theory). This shows that the possibilistic

Maximize the code rate Rn so as to satisfy framework is strictly larger than the zero-error prob-

abilistic framework, as soon as one allows possibility

the constraint (6:1): values which are intermediate between zero and one.

From now on we shall assume = 0, else (5.2) and

For ¿0 one can achieve better rates than in the case

(6.1) become one and the same criterion, and there is

of maximal error, as the following example shows.

nothing new to say. Clearly, (6.1) being a looser crite-

rion, the average-error possibility of any pre-assigned

Example 6.1. We re-take the 2 × 2 matrix of Ex-

code cannot be larger than the maximal-error possibil-

ample 2.1, which is basically the matrix 1 of Exam-

ity, and so the average-error -capacity of the channel

ple 5.1 when = 0. We choose n¿1 and adopt the

cannot be smaller than the maximal-error -capacity:

reliability criterion (5.2) which involves the maximal

CQ ()¿C (). The theorem below will be proven by

decoding error. For 06¡ the graph G () is com-

showing that also the inverse inequality holds true.

plete and so is also G ([n] ). Maximal independent

sets are made up by just one sequence: the optimal rate

Theorem 6.1. The average-error -capacity CQ ()

is as low as 0; in practice this means that no informa-

and the maximal-error -capacity C () of the SNI

tion is transmittable at that level of reliability. Let us

possibilistic channel coincide for whatever admis-

pass instead to the reliability criterion (6.1) which in-

sible error possibility , 06¡1:

volves the average decoding error. Let us take a code-

book whose codewords are all the 2n sequences in An ;

CQ () = C ():

each output sequence is decoded to itself. The rate of

this code is as high as 1. It easy to check that the de-

Proof. Let us consider an optimal code which satis*es

coding error possibility for each transmitted sequence

the reliability criterion (6.1) for *xed codeword length

c is always equal to , except when c = aa : : : a is sent,

n and *xed tolerated error possibility ¿0; since the

in which case the error possibility is zero. This means

code is optimal, its codebook C has maximal size |C|.

that with an error possibility such that

Let

2n − 1

6¡

2n 1 = 0 ¡ 2 ¡ · · · ¡ r = 1 (6.2)

the optimal rate is 0 for criterion (5.2) while it is 1

for criterion (6.1). Observe that the interval where the be the distinct components which appear as entries

two optimal rates diBer evaporates as n increases. in the possibilistic matrix which speci*es the tran-

sition possibilities, and so speci*es the possibilistic

In analogy to the maximal-error -capacity C (), behaviour of the channel we are using; r¿1. Fix code-

the average-error -capacity is de*ned as follows: word c: we observe that the error possibility for c, i.e.,

the possibility that c is incorrectly decoded, is neces-

Denition 6.1. The average-error -capacity CQ () sarily one of the values which appear in (6.2), as it

of channel is the limit of code rates RQ n (; ) op- is derived from those values by using maximum and

timal with respect to criterion (6.1), obtained as the minimum operators (we add i = 0 even if 0 is not to

codeword length n goes to in*nity. be found in ). This allows us to partition the code-

book C into r classes Ci , 16i6r, by putting into the

(Our result below will make it clear that such a same class Ci those codewords c whose error possi-

limit does exist.) We shall prove below that also bility is equal precisely to i (some of the classes Ci

24 A. Sgarro / Fuzzy Sets and Systems 132 (2002) 11 – 32

may be void). The reliability criterion (6.1) satis*ed Now, the union of the classes on the left side of (6.4)

by our code can be re-written as: can be used as the codebook of a new code with max-

|Ci | imal error possibility 6. It will be enough to modify

i 6 : (6.3) the decoder by enlarging in whatever way the decod-

|C| ing sets Dc with error possibility [n] (¬Dc |c) = i 6,

16i6r

so as to cover Bn ; by doing so the error possibility

We can now think of a non-negative random variable

cannot become larger. Of course the new code need

X which takes on the values i , each with probability

not be optimal in the class of all codes which satisfy

|Ci |=|C|; to this random variable X we shall apply the

criterion (5.2) for *xed n and ; so for its rate R∗n one

well-known Markov inequality (cf., e.g., [4]), which

has

is written as:

1

1 R∗n = log |Ci | 6 Rn ; (6.5)

Prob{X ¿ #XQ } 6 ; n

# i: 6

i

where XQ is the expectation of X , i.e., the *rst side where Rn is the optimal rate with respect to criterion

of (6.3), while # is any positive number. Because of (5.2) relative to maximal error. On the other hand, in

(6.3), which can be written also as XQ 6, one has a terms of the rate RQ n = n−1 log |C| optimal with respect

fortiori: to criterion (6.1) relative to average error, (6.4) can

1 be re-written as:

Prob{X ¿ #} 6

# ∗ # − 1 nRQn

2nRn ¿ 2 : (6.6)

or, equivalently: #

|C| In (6.6) the term (#−1)=# is a positive constant which

|Ci | 6 : belongs to the open interval ]0; 1[, and so its logarithm

#

i: i ¿# is negative. Comparing (6.5) and (6.6), and recalling

Now we choose # and set it equal to: that Rn 6RQ n :

+ j 1 #−1

#= ; RQ n + log 6 R∗n 6 Rn 6 RQ n :

2 n #

where j is the smallest value in (6.2) such as to One obtains the theorem by going to the limit.

be strictly greater than . With this choice one has

#¿1; we stress that # is a constant once and are

chosen; in particular # does not depend on n. The last 7. An interpretation of the possibilistic model

summation can be now taken over those values of i based on distortion measures

for which:

+ j We have examined a possibilistic model of data

i ¿ transmission and coding which is inspired by the stan-

2

dard probabilistic model: what we did is simply replac-

i.e., since there is no i left between and j , the ing probabilities by possibilities and independence by

inequality can be re-written as: non-interactivity, a notion which is often seen as the

|C| “right” analogue of probabilistic independence in pos-

|Ci | 6 : sibility theory. In this section we shall try to give an

i: i ¿

#

interpretation of our possibilistic approach. The exam-

The r classes Ci are disjoint and give a partition of C; ple of an application to the design of telephone key-

so, if one considers those classes Ci for which the error boards will be given.

possibility i is at most , one can equivalently write: We concentrate on noisy channels and codes for

correcting transmission errors; we shall consider

#−1 sources at the end of the section. The idea is that in

|Ci | ¿ |C|: (6.4)

i: 6

# some cases statistical likelihood may be eBectively

i

A. Sgarro / Fuzzy Sets and Systems 132 (2002) 11 – 32 25

replaced by what one might call “structural resem- a distortion between sequences x ∈ A n and y ∈ B n in

blance”. Suppose that a “grapheme” is sent through several way; one resorts, e.g., to peak distortion:

a noisy channel which we are unable to describe in

all statistical details. A distorted grapheme will be re- d∗n (x; y) = max d(xi ; yi ) (7.2)

16i6n

ceived at the other end of the channel; the repertoire

of input graphemes and of output graphemes are sup- or, more commonly and less demandingly, to average

posed to be both *nite. We assume that it is plausible 9 distortion:

that the grapheme which has been received has a 1

dn (x; y) = d(xi ; yi ): (7.3)

small distortion, or even no distortion at all, from the n

16i6n

grapheme which has been sent over the channel; large

distortions are instead unplausible. Without real loss Let us be very demanding and adopt peak distortion:

of generality we shall “norm” the distortions to the in- structurally two sequences resemble each other only

terval [0; 1], so that the occurrence of distortion one is when they do so in each position. Following the philos-

seen as quite unplausible. Correspondingly, the one- ophy of the equality (7:1) above, where the term “plau-

complement of the distortion can be seen as an index sibility” has been replaced by the more speci*c term

of “structural resemblance” between the input symbol “transition possibility” and where the resemblance is

and the output symbol; with high plausibility this in- interpreted as the one-complement of the distortion,

dex will have a high value. We shall assign a numeric we set:

value to the plausibility by setting it equal precisely

to the value of the resemblance index; in other words, (b|a) = 1 − d(a; b);

we assume the “equality”:

[n] (y|x) = 1 − d∗n (x; y)

plausibility = structural resemblance: (7.1)

Long sequences of graphemes will be sent through the = min (yi |xi ): (7.4)

16i6n

channel. The distortion between the input sequence x

and the output sequence y will depend on the distor- This corresponds precisely to a stationary and non-

tions between the single graphemes xi and yi which interactive channel .

make up the sequences; to specify how this happens, To make our point we now examine a small-

we shall take inspiration from rate-distortion theory, scale example. We assume that sequences of circled

which is shortly reviewed in Appendix B; cf. also [3] graphemes out of the alphabet A={⊕; ⊗; ; ; }

or [4]. We recall here how distortion measures are are sent through a channel. Because of noise, some

de*ned. One is given two alphabets, the primary al- of the bars inside the circle can be erased during

phabet A and the secondary alphabet B, which in our transmission; instead, in our model the channel can-

case will be the alphabet of possible input graphemes not add any bars, and so the repertoire of the output

and the alphabet of possible output graphemes, re- ◦◦

graphemes is a superset of A: B = A ∪ { ; \ }. We

spectively. A distortion measure d is given which do not have any statistical information about the be-

speci*es the distortions d(a; b) between each primary haviour of the channel; we shall be contented with

letter a ∈ A and each secondary letter b ∈ B; for each the following “linguistic judgements”:

primary letter a there is at least one secondary letter It is quite plausible that a grapheme is received as it

b such that d(a; b) = 0, which perfectly reproduces a. has been sent

Distortions d(a; b) are always non-negative, but in our It is pretty plausible that a single bar has been erased

case they are also constrained not to exceed 1. The It is pretty unplausible that two bars have been erased

distortion between letters a and b can be extended to Everything else is quite unplausible

We shall “numerize” our judgements by assigning

9 The term plausibility is a technical term of evidence theory;

the numeric values 1; 2=3; 1=3; 0 to the corresponding

actually, possibilities can be seen as very special plausibilities; cf.,

possibilities. This is the same as setting the distor-

e.g., [12]. So, the adoption of a term which is akin to “possibility” tions d(a; b) proportional to the number of bars which

is more committal than it may seem at *rst sight. have been deleted during transmission. Our choice is

26 A. Sgarro / Fuzzy Sets and Systems 132 (2002) 11 – 32

enough to specify a possibilistic channel , whose input sequence and output sequence; this crite-

matrix is given below; zeroes have not been writ- rion corresponds to peak distortion, as explained

ten to help readability. Underneath we have writ- above.

ten the matrix which speci*es the proximities be- Let us consider coding. We use the proximity ma-

tween the input graphemes; since is symmetric, i.e., trix to construct the -confoundability graph G (),

(a; a ) = (a ; a), we have written only the upper tri- which was de*ned in Section 3. If 06¡1=3 the

angle; cf. the de*nition of in Section 3. -confoundability graph is complete and the -

capacity of the channel is 0: this means that the

| ⊕ ⊗ ◦ ◦\ ◦ reliability criterion (5.2) is so strict that no data

− + − − − − − − − transmission is feasible. For ¿2=3 the graph is

⊕ | 1 2=3 2=3 1=3 edge-free and so the -capacity is log 5: this means

that the reliability criterion (5.2) is so loose that the

⊗ | 1 2=3 2=3 1=3 channel behaves essentially as noise-free. Let us

| 1 2=3 proceed to the more interesting case 1=36¡2=3;

| 1 2=3 actually, one can take = 1=3 (cf. Proposition 5.1).

A maximal independent set I in G1=3 () is made

◦ | 1 up by the three “vertices” ⊕; ⊗ and , as soon◦

checked. Using the notions explained in the ap-

| ⊕ ⊗ ◦ pendix, and in particular the inequalities (A.1), one

− − − − − − − soon shows that the 3n sequences of In give a

⊕ | 1 1=3 2=3 1=3 1=3 maximal independent set in G1=3 ([n] ). Fix code-

word length n; as stated by Theorem 5:2; an opti-

⊗ | 1 1=3 2=3 1=3

mal codebook is C = In and so the optimal code

| 1 2=3 2=3 rate is log 3, which is also the value of the capac-

| 1 2=3 ity C1=3 (). When one uses such a code, a decod-

ing error occurs only when at least one of the n

◦ | 1

graphemes sent over the channel loses at least two

Both the components of and those of are in bars, an event which has been judged to be pretty

their own way “resemblance indices”. However, unplausible.

those in specify the structural resemblance be- We give the example of an application. Think of

tween an input grapheme a and an output grapheme the keys in a digital keyboard, as the one of the

b; this resemblance is 1 when b equals a, is 2=3 author’s telephone, say, in which digits from 1 to

when b can be obtained from a by deletion of a 9 are arranged on a 3 × 3 grid, left to right, top

single bar, is 1=3 when b can be obtained from a row to bottom row, while digit 0 is positioned be-

by deleting two bars, and is 0 when b cannot be low digit 8. It may happen that, when a telephone

obtained from a in any of these ways. Instead the number is digited, the wrong key is pressed (be-

components of specify how easy it is to confound cause of “channel noise”). We assume the following

input graphemes at the other end of the channel: model of the “noisy channel”, in which possibili-

(a; a ) = 1 means a = a , (a; a ) = 2=3 means that ties are seen as numeric labels for vague linguistic

a and a are diBerent, but there is at least an output judgements:

grapheme which can be obtained by deletion of a

single bar in a, or in a , or in both, (a; a ) = 1=3 (i) it is quite plausible that the correct key is pressed

means that one has two delete at least two bars (possibility 1);

from one of the input graphemes, or from both, (ii) it is pretty plausible that one inadvertently

to reach a common output grapheme. Assuming presses a “neighbour” of the correct key, i.e., a

that the channel is stationary and non-interactive key which is positioned on the same row or on

means that we are adopting a very strict criterion the same column and is contiguous to the correct

to evaluate the “structural resemblance” between key (possibility 2=3);

A. Sgarro / Fuzzy Sets and Systems 132 (2002) 11 – 32 27

(iii) it is pretty unplausible that the key one presses probabilistic model is in this case pretty unnatural. In-

is contiguous to the correct key, but is positioned stead, in a “soft” possibilistic approach one speci*es

on the same diagonal (possibility 1=3); just one possibilistic matrix , which contains pre-

(iv) everything else is quite unplausible (possibility cisely the information which is needed and nothing

0). more.

Unfortunately, the author’s telephone is not espe-

When the wrong key is pressed, we shall say that

cially promising. Let us adopt criterion (5.2). If the al-

a cross-over of type (ii), of type (iii), or of type

lowed error possibility of the code is 2=3 (or more), the

(iv) has taken place, according whether its possi-

confoundability graph is edge-free and no error pro-

bility is 2=3, 1=3, or 0. Using these values 10 one

tection is required. If we choose the error possibility

can construct a possibilistic matrix with the in-

= 1=3, we have C1=3 () = log &(G1=3 ()) = log 3; in

put and the output alphabet both equal to the set of

other words the 1=3-capacity, which is an asymp-

the 10 keys. One has, for example: (a|1) = 2=3 for

totic 11 parameter, is reached already for n = 1. To see

a ∈ {2; 4}, (a|1) = 1=3 for a = 5, (a|1) = 0 for

this use the inequalities (A.1) of Appendix A: the in-

a ∈ {3; 6; 7; 8; 9; 0}. As for the proximity (a; b),

dependence number of G1=3 () is 3, and a maximal

it is equal to 2=3 whenever either keys a and b are

independent set of keys, which are far enough from

neighbours as in (ii), or there is a third key c which

each other so as not to be confoundable, is {0; 1; 6},

is a common neighbour of both. One has, for ex-

as easily checked; however, one checks that 3 is also

ample: (1; a) = 2=3 for a ∈ {2; 3; 4; 5; 7}; instead,

the chromatic number of the complementary graph.

(1; a) = 1=3 for a ∈ {6; 8; 9} and (1; a) = 0 for

In practice, this means that an optimal codebook as

a = 0. A codebook is a bunch of admissible tele-

in Theorem 5.1 may be constructed by juxtaposition

phone numbers of length n; since a phone number

of the input “letters” 0; 1; 6; the code is disappointing,

is wrong whenever there is a collision with another

since everything boils down to allowing only phone

phone number in a single digit, it is natural to as-

numbers which use keys 0; 1; 6. As for decoding, the

sume that the “noisy channel” is non-interactive.

output sequence y is decoded to the single codeword c

This example had been suggested to us by J. KTorner;

however, at least in principle, in the standard prob- for which [n] (y|c)¿1=3; so, error correction is cer-

abilistic setting one would have to specify three tainly successful if there have been no cross-overs of

stochastic matrices such as to be 0; 1=3 and 2=3— type (iii) and (iv). If, for example, one digits num-

equivalent with . In these matrices only the op- ber 2244 rather than 1111 a successful error correc-

position zero=non-zero would count; their entries tion takes place; actually, [4] (2244|c)¿1=3 only for

would have no empirical meaning, and no signi*cant c = 1111. If instead one is so clumsy as to digit the

relation with the stochastic matrix of the probabili- “pretty unplausible” number 2225, this is incorrectly

ties with which errors are actually committed by the decoded to 1116. Take instead the the more demanding

hand of the operator. So, the adoption of a “hard” threshold = 0; the 0-capacity, as easily checked, goes

down to log 2; the 0-error code remains as disappoint-

ing as the 1=3-error code, being obtained by allowing

10 Adopting a diBerent “numerization” for the transition possibil- only phone numbers made up of “far-away” digits as

ities (or, equivalently, for the distortions) does not make any real are 0 and 1, say. The design of convenient keyboards

diBerence from the point of view of criterion (5.2), provided the

order is preserved and the values 0 and 1 are kept *xed. Instead,

arithmetic averages as in criterion (6.1) have no sort of insensitiv- 11 When the value of an asymptotic functional (channel capac-

ity to order-preserving transformations; criterion (6.1) might prove ity, say, or source entropy, or the rate-distortion function as in

to be appropriate in a situation where one interprets possibilities Appendix B) is reached already for n = 1, its computation is easy,

in some other way (recall that possibilities can be viewed as a but, unfortunately, this is so because the situation is so hopeless

special case of plausibilities, which in their turn can be viewed that one is obliged to use trivial code constructions. By the way,

as a special case of upper probabilities; cf., e.g., [22]). In (iv) we this is always the case when one tries to compress possibilistic

might have chosen a “negligible” positive value, rather than 0: sources without distortion, as in Section 4. The interesting situ-

again, this would have made no serious diBerence, save adding a ations correspond instead to cases when the computation of the

negligible initial interval where the channel capacity would have asymptotic functional is diLcult, as for the pentagon, or even

been zero. unfeasible, as for the heptagon (cf. Appendix A).

28 A. Sgarro / Fuzzy Sets and Systems 132 (2002) 11 – 32

such that their possibilistic capacity is not obtained which has been considered above. The “items” will

already for n = 1 is a graph-theoretic problem which be sequences y of n graphemes, and the ith control

may be of relevant practical interest in those situations will be made on the ith grapheme yi . The possibility

when digiting an incorrect number may cause seri- vector over the seven graphemes of B, in the order

ous inconveniences. More generally, exhibiting useful as they are listed, will be:

*nite-length code constructions would have a relation

to the material of this paper, which is similar to the re-

= (1; 1; 1; 2=3; 1; 2=3; 1) over

lation of coding theory (algebraic coding theory, say)

to the asymptotic theory of coding (Shannon theory). B = {⊕; ⊗; ; ◦ ; ; ◦\ ; ◦}:

Remark 7.1. Rather than peak distortion, in (7.4) one

A possibility smaller than 1 has been assigned to

might use average distortion. This would give rise to a

the two output graphemes which are not also input

stationary but de*nitely interactive channel for which:

graphemes; in practice, vector has been obtained by

1 taking the maximum of the entries in the columns of

n (y|x) = 1 − dn (x; y) = (yi |xi ):

n the possibilistic matrix which describes the chan-

16i6n

nel. When (b) = 1 it is possible that the grapheme

We leave open the problem of studying such a channel b has been received at the end of the channel ex-

and ascertaining its meaning for real-world data trans- actly as it has been transmitted, when (b) = 2=3 the

mission. Actually, one might even de*ne new distor- grapheme b which has been received is necessarily dis-

tions between sequences based on a diBerent way of torted with respect to the input grapheme, and that at

averaging single-letter distortions in the general sense least one bar has been erased during transmission. 13

of aggregation operators (the very broad notion of ag- Let us *x a value , 06¡1, and rule out all the items

gregation operators and averaging operators is cov- whose possibility is 6. Then the accepted items can

ered, e.g., in [12] or [16]). be encoded by means of a possibilistic source code as

in Section 4: each acceptable item is given a binary

Now we pass to source coding and data compres- number whose length is nH (), or rather nH (),

sion. We shall pursue an interpretation of possibilis- by rounding to the integer ceiling, i.e., to the small-

tic SNI sources and possibilistic source coding which est integer which is as least as large as n times the

*ts in with a meaning of the word “possible” to be -entropy of . In our case, when ¿2=3 only the

found in the Oxford Dictionary of the English Lan- sequences which do not contain the graphemes

guage: possible = tolerable to deal with, i.e., accept- and \ are given a codeword, and so H () = log 5;

able, because it possesses all the qualities which are instead when 62=3 all the sequences have their own

required. 12 Assume that certain items are accepted codeword, and so H () = log 7.

only if they pass n quality controls; each control i is

given a numeric mark i from 0 (totally unaccept-

able) to 1 (faultless); the marks which one can assign 13 As a matter of fact, we have been using a formula proposed

are chosen from a *nite subset of [0; 1] of numbers

in the literature in order to compute marginal output possibilities

which just stand for linguistic judgements. The qual- (b), when the marginal input possibilities (a) and the condi-

ity control as a whole is passed only when all the n tional possibilities (b|a) are given, namely

controls have been passed. As an example, let us take

the source alphabet B equal to the alphabet of the (b) = max [(a) ∧ (b|a)]

seven graphemes output by the possibilistic channel

the maximum being taken over all letters a ∈ A. In our case we

12 An interpretation which may be worth pursuing is: degree of have set all the input possibilities (a) equal to 1. The possibilistic

possibility = level of grammaticality. This may be interesting also formula is inspired by the corresponding probabilistic one, just

in channel coding, in those situation when decoding errors are less replacing sums and products by maxima and minima, as is usual

serious when the encoded message has a low level of grammatical when one passes from probabilities to possibilities; cf., e.g., [5]

correctness. or [11].

A. Sgarro / Fuzzy Sets and Systems 132 (2002) 11 – 32 29

easy to prove that

We gladly acknowledge helpful discussions with

1 Q

F. Fabris on the relationship between possibilistic log &(G) 6 log &(Gn ) 6 log '(G) (A.1)

channels and distortion measures as used in proba- n

bilistic source coding. and so whenever &(G) = '(G) Q the graph capacity is

very simply C(G) = log &(G). Giving a single-letter

characterization of graph capacity can be however a

Appendix A. Graph capacity very tough problem, which is still unsolved in its gen-

erality [15]. We observe that the minimum value of

We consider only simple graphs, i.e., graphs with- the graph capacity is zero, and is reached

whenever

out multiple edges and without loops; we recall that a the graph is complete, i.e., has all the K2 edges; the

graph is assigned by giving its vertices and its edges; maximum value of the capacity of a graph with k ver-

each edge connects two (distinct) vertices which are tices is log k, and is obtained when the graph is edge-

then adjacent. If G is a graph, its complementary free (has no edges at all). We also observe that “pure”

graph GQ has the same set of vertices, but two vertices combinatorialists

prefer to de*ne graph capacity as the

are adjacent in GQ if and only if they are not adjacent limit of n &(Gn ), i.e., as 2C(G) .

in G. By &(G) and '(G) we denote the independence

number and the chromatic number of G, respectively. Example A.1. Let us take the case of a polygon Pk

We recall that the independence number of a graph is with k vertices. For k = 3, we have a triangle P3 ; then

the maximum size of a set of vertices none of which &(P3 ) = '(PQ3 ) = 1 and the capacity C(P3 ) is zero. Let

are adjacent (of an independent set, called also a stable us go to the quadrangle P4 ; then &(P4 ) = '(P Q4 ) = 2 and

set); the chromatic number of a graph is the minimum so C(P4 ) = 1. In the case of the pentagon, however,

number of colours which can be assigned to its vertices &(P5 ) = 2¡'(P Q5 ) = 3. It was quite an achievement of

√

in such a way that no two adjacent vertices have the LovVasz to prove in 1979 that C(P5 ) = log 5, as long

same colour. From a graph G with k vertices one may conjectured; the conjecture had resisted a proof for

wish to construct a “power graph” Gn whose k n “ver- more than twenty years. The capacity of the heptagon

tices” are the vertex sequences of length n. Many such P7 is still unknown.

powers are described in the literature; of these we need

the following one, called sometimes the strong power:

two vertices x = x1 x2 : : : xn and u = u1 u2 : : : un are ad- Appendix B. The possibilistic rate-distortion func-

jacent in Gn if and only if for each component i either tion

xi and ui are adjacent in G or xi = ui ; 16i6n. The

reason for choosing this type of power becomes clear This appendix generalizes source coding as dealt

when one thinks of confoundability graphs G(W ) and with in Section 4 and is rather more technical than

of -confoundability graphs G () as de*ned in Sec- the body of the paper. The reader is referred to Sec-

tion 3. Actually, one has: tion 4 for a description of the problem of source

coding. In the case of source coding with distortion,

G(W n ) = [G(W )]n ; G ([n] ) = [G ()]n : beside the primary source alphabet A one has a

secondary alphabet B, called also the reproduction

The *rst equality is obvious; the second is implied by alphabet, which is used to reproduce primary se-

the *rst and by Lemma 3.3: just take any stochastic quences. A distortion matrix d is given which speci-

matrix W which is -equivalent to . *es the distortions d(a; b) between each primary letter

If G is a simple graph, its graph capacity C(G), a ∈ A and each secondary letter b ∈ B; the numbers

called also Shannon’s graph capacity, is de*ned as d(a; b) are non-negative and for each primary letter

a there is at least one secondary letter b such that

1 d(a; b) = 0, i.e., such as to perfectly reproduce a.

C(G) = lim log &(Gn ):

n n We recall that distortion measures have already been

30 A. Sgarro / Fuzzy Sets and Systems 132 (2002) 11 – 32

used in Section 7; unlike in Section 7, here we do For *xed +¿0 and ¿0, one is interested in the

not require d(a; b)61. The distortion between letters asymptotic value of the optimal rates. For ¿0 one

a and b is extended to average distortion between proves that the solution, i.e., the asymptotic value of

sequences x ∈ A n and y ∈ B n as we did in (7.3), or optimal code rates, is given by the rate-distortion

to peak distortion, called also maximal distortion, as function

in (7.2). Unlike in the case without distortion, here

the decoder f− maps the binary codeword f+ (x) R (P; +) = min I (X ∧ Y ); ¿0 (B.2)

XY : Ed(X; Y )6+

to a secondary sequence y ∈ B n which should have

an acceptably small distortion from the encoded pri- in whose expression at the right does not explicitly

mary sequence x. Let us denote by g the composition appear. Above I (X ∧ Y ) is the mutual information 14

of encoder and decoder, g(x) = f− (f+ (x)); the set of the random couple XY , X being a random primary

of secondary sequences C = g(A n ) ⊆ B n which are letter ouput by the source according to the probabil-

used to reproduce the primary sequences is called ity distribution P. The second random component Y

the codebook of the code f = (f+ ; f− ). In practice of the random couple XY belongs to the secondary al-

the secondary sequence y = g(x) is usually misin- phabet B, and so is a random secondary letter. The

terpreted as if it were the codeword for the primary minimum is taken with respect to all random couples

sequence x, and correspondingly the mapping g is XY which are constrained to have an expected distor-

called the encoder (this is slightly abusive, but the tion Ed(X; Y ) which does not exceed the threshold +.

speci*cation of f+ and f− turns out to be irrelevant The rate-distortion function does not look especially

once g is chosen). The rate of the code is the number friendly; luckily the problem of its computation has

been deeply investigated from a numeric viewpoint

log |C| [4]. Observe however that, even if the computation

Rn = :

n of the rate-distortion function involves a minimiza-

tion, there is no trace of n left and so its expression is

The numerator can be interpreted as the (non neces- single-letter, unlike in the case of graph capacity.

sarily integer) length of the binary codewords output Let us proceed to zero-error coding with distortion.

by the encoder stricto sensu f+ and fed to the de- The problem of *nding a single-letter expression for

coder f− , and so the rate is the number of bits per the asymptotic value of optimal rates is not at all triv-

primary letter. From now on we shall forget about f+ ; ial, but it has been solved; not surprisingly, this value

the term “encoder” will refer solely to the mapping g turns out to depend only on the support of P, i.e., on

which outputs secondary sequences y ∈ B n . the fact whether the probabilities P(a) of source let-

Let us begin by the average distortion dn , as is ters a are zero or non-zero. More precisely, for = 0

common in the probabilistic approach. One *xes a the asymptotic value is given by the zero-error rate-

threshold +¿0, a tolerated error probability ¿0, and distortion function:

requires that the following reliability criterion is sat-

is*ed: R0 (P; +)= max min I (X ∧Y ):

X : P(a)=0⇒PX (a)=0 XY : Ed(X; Y )6+

(B.3)

P n {x : dn (x; g(x)) ¿ +} 6 : (B.1)

Here the maximum is taken with respect to all ran-

The encoder g should be constructed in such a way dom variables X whose support is (possibly strictly)

that the codebook C ⊆ B n be as small as possible, un- included in the support of P, i.e., in the subset of let-

der the constraint that the reliability criterion which ters a whose probability P(a) is strictly positive; PX is

has been chosen is satis*ed; for *xed n one can equiv- the probability distribution of the random variable X .

alently minimize the code rate:

14 The mutual information can be expressed in terms of Shannon

log |C| entropies as I (X ∧ Y ) = H (X ) + H (Y ) − H (XY ); it is seen as

Minimize the code rate so as to satisfy

n an index of dependence between the random variables X and Y ,

and assumes its lowest value, i.e., 0, if and only if X and Y are

constraint (B:1): independent.

A. Sgarro / Fuzzy Sets and Systems 132 (2002) 11 – 32 31

The minimum in (B.3) is to be compared with R (P; +) or, in the case of peak distortion:

as in (B.2). In practice, one considers all the rate-

n {x : d∗n (x; g(x)) ¿ +} 6 (B.6)

distortion functions over the support of P, and then

selects the largest value which has been obtained; as to be compared with (B.1) and (B.4). The correspond-

for numeric techniques which are available, cf. [4]. ing minimization problems are:

If one chooses the peak distortion dn∗ rather than the

1

average distortion dn , the reliability criterion (B.1) and Minimize the code rate log |C| so as to satisfy

the de*nition of the rate-distortion function should be n

modi*ed accordingly; in particular, the new reliability constraint (B:5) or (B:6); respectively:

criterion is

P n {x : d∗n (x; g(x)) ¿ +} 6 : (B.4) Denition B.1. The possibilistic rate-distortion

The asymptotic optimal rate for peak distortion function R (; +) for average distortion and the

R∗ (P; +) has in general a higher value 15 than possibilistic rate-distortion function R∗ (; +) for

R (P; +), since (B.4) is more demanding than (B.1). peak distortion are the limit of the rates Rn of codes

The expression of R∗ (P; +) turns out to be the which are optimal for the criterion (B.5) or (B.6),

same as in (B.2) and (B.3), only replacing the con- respectively, as the length n goes to in*nity; 06

straint Ed(X; Y )6+ which de*nes the minimization ¡1.

set by the more severe constraint d(X; Y )6+ (cf.

[4]; by writing d(X; Y ) = 0 we mean that the event Lemma 3.1 gives soon the following lemma:

d(X; Y ) = 0 has probability 1, i.e., that the support of

the random couple XY is made up only of couples Lemma B.1. If P and are -equivalent, a code

(a; b) for which d(a; b) = 0). f = (f+ ; f− ) is optimal for criterion (B:1) at zero

If the source is a SNI source described by giving error if and only if it is optimal for criterion (B:5)

the possibility vector over primary letters, one can at -error; it is optimal for criterion (B:4) at zero er-

consider the same codes as before, but judge of their ror if and only if it is optimal for criterion (B:6) at

reliability by referring to the new reliability criteria: -error.

n {x : dn (x; g(x)) ¿ +} 6 (B.5)

The following theorem is obtained from Lemma B.1

15

after a comparison with the expressions of R0 (P; +)

We recall that peak distortion can be taken back to coding

and R∗0 (P; +):

with average distortion with a threshold equal to zero; this is

true no matter whether the source is probabilistic or possibilistic.

Actually, if one sets Theorem B.1. The possibilistic rate-distortion func-

(a; b) = 0 iB d(a; b) 6 +; else (a; b) = d(a; b)

tions R (; +) and R∗ (; +); 06¡1, are given by:

the inequality d∗n (x; y)6+ is clearly equivalent to the equality R (; +) = R0 (P; +); R∗ (; +) = R∗0 (P; +)

n (x; y) = 0. So, coding at distortion level + with peak distortion is

for whatever P such as to be -equivalent to ; more

the same as coding at distortion level zero with average distortion,

after replacing the old distortion measure d by the new distortion explicitly:

measure . The case of average distortion with + = 0 and the

general case of peak distortion with any +¿0 can be both couched R (; +) = max

X :(a)6⇒PX (a)=0

into the inspiring mould of graph theory; then the rate-distortion

function is rather called the hypergraph entropy, or the graph

entropy in the special case when the two alphabets A and B

min I (X ∧ Y );

XY : Ed(X; Y )6+

coincide and when the distortion is Hamming distortion, as de*ned

at the end of this appendix; cf. [4,20]. We recall that graph capacity

and hypergraph entropy are the two basic functionals of the zero- R∗ (; +) = max

X : (a)6⇒PX (a)=0

error theory; both of them originated in a coding theoretic context,

but both have found unexpected and deep applications elsewhere;

min I (X ∧ Y ):

cf. [15]. XY : d(X;Y )6+

32 A. Sgarro / Fuzzy Sets and Systems 132 (2002) 11 – 32

Observe that the possibilistic rate-distortion func- [2] B. Bouchon-Meunier, G. Coletti, C. Marsala, Possibilistic

tions R (; +) and R∗ (; +) are both non-increasing Conditional Events, IPMU 2000, Madrid, July 3–7 2000,

step-functions of . Actually, if i ¡ i+1 are two con- Proceedings, pp. 1561–1566.

[3] Th.M. Cover, J.A. Thomas, Elements of Information Theory,

secutive entries in , as in Proposition 4.1, the rela- Wiley, New York, 1991.

tion of -equivalence is always the same for whatever [4] I. CsiszVar, J. KTorner, Information Theory, Academic Press,

such as i 6¡ i+1 . Unlike R (; +), R∗ (; +) is New York, 1981.

also a step-function of +. Actually, if the distinct en- [5] G. De Cooman, Possibility Theory, Internat. J. General

Systems 25 (4) (1997) 291–371.

tries of the matrix d are arranged in the increasing

[6] D. Dubois, H.T. Nguyen, H. Prade, Possibility theory,

order, and if di ¡di+1 are two consecutive entries, the probability and fuzzy sets: misunderstandings, bridges and

constraint (B.6) is the same for whatever + such that gaps, in: D. Dubois, H. Prade (Eds.), Fundamentals of Fuzzy

di 6+¡di+1 . Sets, Kluwer Academic Publishers, Boston, 2000, pp. 343–

In some simple cases the minima and the maxima 438.

[7] D. Dubois, W. Ostasiewicz, H. Prade, Fuzzy Sets: History and

which appear in the expression of the various rate-

Basic Notions, in: D. Dubois, H. Prade (Eds.), Fundamentals

distortion functions can be made explicit; the reader is of Fuzzy Sets, Kluwer Academic Publishers, Boston, 2000,

referred once more to [4]; the results given there are pp. 21–290.

soon adapted to the possibilistic case. We shall just [8] D. Dubois, H. Prade, Properties of measures of information

mention one such special case: the two alphabets co- in evidence and possibility theories, Fuzzy Sets and Systems

24 (1987) 161–182.

incide, A = B, the distortion matrix d is Hamming [9] D. Dubois, H. Prade, Fuzzy sets in approximate reasoning:

distortion, i.e., d(a; b) is equal to 0 or to 1 according inference with possibility distribution, Fuzzy Sets and

whether a = b or a = b, respectively; + = 0. As a mat- Systems 40 (1991) 143–202.

ter of fact, one soon realizes that this is just a diBerent [10] F. Fabris, A. Sgarro, Possibilistic data transmission and

formulation of the problem of coding without distor- fuzzy integral decoding, IPMU 2000, Madrid, July 3–7 2000,

Proceedings, pp. 1153–1158.

tion as in Section 4. A simple computation gives: [11] E. Hisdal, Conditional possibilities, independence and

non-interaction, Fuzzy Sets and Systems 1 (1978) 283–297.

R (; 0) = R∗ (; 0) = log |{a: (a) ¿ }|

[12] G.J. Klir, T.A. Folger, Fuzzy Sets, Uncertainty and

in accordance with the expression of the possibilistic Information, Prentice-Hall, London, 1988.

[13] G.J. Klir, M.J. Wierman, Uncertainty-Based Information:

entropy given in Theorem 4.1. A slight generalization Elements of Generalized Information Theory, Physica

of this case is obtained for arbitrary +¿0 when the Verlag=Springer Verlag, Heidelberg and New York, 1998.

inequality d(a; b)6+ is an equivalence relation which [14] G.J. Klir, Measures of uncertainty and information, in: D.

partitions the primary alphabet A into equivalence Dubois, H. Prade (Eds.), Fundamentals of Fuzzy Sets, Kluwer

classes E. Then Academic Publishers, Boston, 2000, pp. 439–457.

[15] J. KTorner, A. Orlitsky, Zero-error information theory, Trans.

R∗ (; +) = log |{E: (E) ¿ }|: Inform. Theory 44 (6) (1998) 2207–2229.

[16] H.T. Nguyen, E.A. Walker, A First Course in Fuzzy Logic,

In practice, optimal codes are constructed by taking a 2nd Edition, Chapman & Hall, London, 2000.

letter aE for each class E whose possibility exceeds ; [17] S. Ovchinnikov, An Introduction to Fuzzy Relations, in: D.

Dubois, H. Prade (Eds.), Fundamentals of Fuzzy Sets, Kluwer

each primary letter in E is then reproduced by using Academic Publishers, Boston, 2000, pp. 233–259.

precisely aE . This way the asymptotic optimal rate [18] C.E. Shannon, A mathematical theory of communication, Bell

R∗ (; +) is achieved already for n = 1, as in the case System Technical J. 27 (3&4) (1948) 379 – 423, 623– 656.

of coding without distortion. This is bad news, since [19] C.E. Shannon, The zero-error capacity of a noisy channel,

it means that optimal code constructions are bound to IRE Trans. Inform. Theory IT-2 (1956) 8–19.

[20] G. Simonyi, Graph entropy: a survey, in: W. Cook, L. LovVasz,

be trivial; cf. footnote 11. P. Seymour (Eds.), Combinatorial Optimization, DIMACS

Series in Discrete Maths and Computer Science, vol. 20,

1995, AMS, Providence, RI, pp. 399–441.

References [21] D. Solomon, Data Compression, Springer, New York, 1998.

[22] P. Walley, Statistical Reasoning with Imprecise Probabilities,

[1] M. Borelli, A. Sgarro, A possibilistic distance for sequences Chapman & Hall, London, 1991.

of equal and unequal length, in: C. CWa lude, Gh. PWa un (Eds.), [23] L. Zadeh, Fuzzy sets as a basis for a theory of possibility,

Finite VS In*nite, Discrete Mathematics and Theoretical Fuzzy Sets and Systems 1 (1978) 3–28.

Computer Science, Springer, London, 2000, pp. 27–38.

- m10180[Shannon's Noisy Channel Coding]Uploaded bySead Kurtović
- CommunicationUploaded bybcijmb
- sha2_512_fips_180.txtUploaded bypepeparra
- An Introduction to Digital Communications - Part 2Uploaded byBEN GURION
- Calculating entropy at different scales among diverse communication systemsUploaded byJay De La Cruz
- information theoryUploaded byIsra Nazeer
- Btech II Information Tech Revised 11042015Uploaded byBhupendra Teli
- tmpAEBC.tmpUploaded byFrontiers
- GEC 324-Technical Communication NoteUploaded byPeter D.
- 0701024Uploaded byMarda Lina
- entropy-18-00007 (1)Uploaded byjafar junior
- Reception Theory (1)Uploaded byditia_ad
- The Rafiki MapUploaded byAnonymous Y2qwEuA
- enotita-B2Uploaded byGeorge Koukoutianos
- Shannons Capacity Theorem and Spectral Efficiency of Modulation MethodsUploaded byHoang Tan Luu
- EC2252-LP-BUploaded byPrakash Narendran
- DSIP Case StudiesUploaded byMahesh Abnave
- LINGI2348 Channel Coding_Ex1Uploaded byHamed Mir
- EC2252-LP-AUploaded bySanjana Balaji
- MMSec_Chap4_2004.pdfUploaded bymimirose90
- Turbo Tutorial cUploaded byUchie
- RECENT ADVANCES IN VIDEO COMPRESSION.pdfUploaded bybbaskaran
- Cypher of ZeroUploaded byTh3Mirr0r
- J4RV1I9005FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLABUploaded byJournal 4 Research
- New Paper SoftroniicsUploaded bysoftroniics
- 03-DigitalMediaFundamentalUploaded byอภิเษก หงษ์วิทยากร
- MultimediaUploaded bySenthil R
- Ieee PaperUploaded bykattaswamy
- Question Bank for 5 Units DIP-IT6005Uploaded byRamJiDR
- FQ Based Compression_Draft Ver 8 RevisedUploaded byHamood Riasat Khan

- Senior Architect Project Manager in Denver CO Resume Doug HagenUploaded byDougHagen
- Mba_sem3&4Uploaded byrakhi962
- hp_adminUploaded bysidhur6
- New Zealand (Modern World Nations)Uploaded byTO Chau
- Maximize Your FlexibilityUploaded bywetwesley
- GoddessAlchemy Golden Light Avalon UK 120722Uploaded byMySecret Gardenmdp
- Perpsectives Codes of Ethics in the ProfessionsUploaded byAndrew
- Jrc108255 Blockchain in EducationUploaded byNatxo Varona
- 9th Federal Judge accused of misconduct related to $42B Lawsuit, HARIHAR v. US BankUploaded byMohan Harihar
- 0 - Overview of the ProgramUploaded byOguz Saritas
- Photoshop Elements 10 - OrganizerUploaded byDanielD7
- No Frills MassUploaded byCLAVDIVS
- Counselor TBUploaded byZinyawbyarnostar
- Child Dev Training ManualUploaded bydanielle_toews
- 33014704-SOAP-Charting-Others-Ryan-HoymeUploaded bychamelion05
- UD2_handbook.pdfUploaded byQuintan Bt Abg Shokeran
- Vampire Dreams 06 - DeckerUploaded byTSaros Spiridon
- TA Barcelona GuideUploaded byKaterina Evdokia
- Final PttwoUploaded byd28s_exec
- Jim Hefferon - Linear Algebra - Answers to QuestionsUploaded bykimuls
- Cognitive Risk and Real Estate.pdfUploaded byadonisghl
- The PRISMA Statement de Liberati Et Al_2009Uploaded byDanielle Batista
- Zero ConditionalUploaded byema
- DoD 8500_01_2014 Cybersecurity.pdfUploaded byhtdvul
- Talent Management Final PptUploaded byManan Kapoor
- Management of RiskUploaded byMaven Training
- Cytomegalovirus (CMV) infection in newborn infants and immunocompromised patientsUploaded byAndi Todea
- 20th Anniversary ,Tissue Bank Sgpr, 2017Uploaded byhavriza vitresia
- Overview JuhoUploaded byDaniel KKK
- Application of StacksUploaded byHarshit Mishra