Professional Documents
Culture Documents
He laid down the following postulates for a measure of uncertainty of the distribution:
(i) The measure of uncertainty should be a continuous function H(P1,P2, ... ,Pn) of the
3
v. P. Singh and M. Fiorentino (eds.), Entropy and Energy Dissipation in Water Resources, 3-20.
© 1992 Kluwer Academic Publishers.
4 J. N. KAPUR AND H. K. KESA VAN
probabilities, i.e., the uncertainty should change only by a small amount if Pl,P2, ... ,Pn
change by small amounts.
(ii) H(Pl,P2, "',Pn) should be a permutationally symmetric function of P1!P2, "',Pn i.e., it
should not change when P1!P2, ... ,Pn are permuted among themselves or when the
outcomes are labelled differently.
(iii) H(PbP2, "',Pn) should be maximum when PI = P2 ... = Pn = lin and this maximum
value should be an increasing function of n.
(iv) H(Pl,P2, ... ,Pn) should follow the branching or the recursivity principle i.e.,
H(Pl,P2 .. ·Pn-bPnQl,Pnq2, · ..PnQm) = H(PbP2, "',Pn) +PnH(Qb Q2, ... , Qm) (2)
where Qj ~ 0, 2:7'==1 Qj = 1. In his epoch-making paper, Shannon [42] proved that the only
function which satisfies all these postulates is
n
where K is an arbitrary positive constant. Shannon did not want to call this function as a
measure of information or a measure of uncertainty and so, he approached his friend, the
famous mathematician-physicist John Von Neumann who allegedly advised him to call it
entropy because of two reasons: "Firstly you have got the same expression as is used for
entropy in thermodynamics and secondly and even more importantly, since even after one
hundred years, nobody understands what entropy is, and so if you use the word entropy,
you will always win in an argument!" Tribus [46]. Shannon took the suggestion and thus
a measure of uncertainty came to be known as a measure of entropy, solely because this
measure had the same mathematical expression as the thermodynamic entropy. At that
time, no relationship with thermodynamic entropy was yet established. Such a relationship
was discovered later and we shall derive it in section 3. This relationship is established by
making use of Jaynes' [10] principle of maximum entropy, which we proceed to discuss in
the next section.
Let the only information available about the probability distribution P be given by
n
and
n
We have a system with partial information and there is some missing information.
Consequently, there is some uncertainty due to this missing information.
In order to make a choice among the distributions satisfying (4) and (5), Jaynes made
use of the following principles of ancient wisdom:
• Make use of all the given information you are given and scrupulously avoid using the
information not given to you, and
• Make use of all the given information and be maximally uncommitted to the missing
information or be maximally uncertain about it.
He therefore proposed that to determine Pl,P2, ... ,Pm we should maximize Shannon's mea-
sure of uncertainty (3), subject to (4) and (5) being satisfied.
If we use Lagrange's method of undetermined multipliers, we get
where Ao,Al, ... ,Am are determined by using (4) and (5).
Now by a fortunate circumstance, the function (3) happens to be a concave function
and (4) and (5) are linear constraints and so the local maximum of expression (3) will also
be its global maximum.
We call it a fortunate circumstance because Shannon did not design his measure to be a
concave function and, accordingly, this was not one of his postulates. However, his entropy
function turned out to be a concave function.
Another fortunate circumstance was that the probabilities given by (6) are always ~ 0,
so that there is no necessity for imposing the non-negativity constraints in (4), since these
are automatically satisfied.
These two fortunate circumstances favoured the great success of the maximum entropy
principle, since in all optimization problems, great difficulties arise when we have to decide
whether
By using Shannon's measure, all the three problems are automatically taken care of, at
least for linear constraints.
6 J. N. KAPUR AND H. K. KESA VAN
Consider a system of particles in which each particle can be in one of the n energy levels
with energies 101,102, ••• , En. Let Pi be the probability of a particle being in the ith energy
level, so that the expected energy of the system is given by
n
LPiEi =i (7)
i=l
Now suppose the only information we have about the system is the value of i i.e., we believe
that the system is characterised by its expected energy i, so that the only information we
have about PlIP2"",Pn is given by (4) and (7). We have only two equations and n (> 2)
quantities to be determined. There may be an infinity of probability distributions consistent
with (4) and (7), and according to the maximum entropy principle we should choose that
distribution which maximjzes (3) subject to (4) and (7). This gives
exp( -JlEi)
Pi=~n ( )' i=I,2, ... ,n (8)
L.i=1 exp -JlEi
where Jl is determined by using
. = ~===---";-'---'-~
10
I:i=l Ei exp( -JlEi) (9)
I:i=l exp( -JlEi)
The probability distribution given by (8) and (9) is called the Maxwell-Boltzmann dis-
tribution.
So far Jl is just a mathematical concept, a Lagrange multiplier but from (9) Jl is a
function of i. This gives
di
-<0 (11)
dJl -
(12)
di dJl
dJl < 0, di < 0, (13)
This means that change in E can be either due to increase in individual energy lev-
els or due to changes in proportions of particles in different energy levels. Now changes
dtl, dtQ, ... , dtn can be brought about by doing work on the system. As such we write
n
LPidti = -LlW (16)
i=1
and call Ll W as the work effect. We also write
n
Ltidpi = LlH (17)
i=1
and call it heat effect.
From (15), (16), (17)
dE = -LlW + LlH (18)
and
!(-LlW + LlH) = 0 (19)
Now there may be an infinity of possible probability distributions with same i. All of them
will have different information-theoretic entropies and the maximum entropy will be given
by
n n n
dSmax d
{Lt
-d + 2:i=I(-t
-+ t{L n
i )exp (-W;)d
{L
2:i= 1 exp ( -{Lti)
{Ldi
LlH
(21)
kT
Comparing this with the corresponding expression for thermodynamic entropy, we con-
clude that Smax corresponds to the thermodynamic entropy.
8 J. N. KAPUR AND H. K. KESA VAN
If there is no additional information about P1,P2, ... ,Pn, beyond the natural constraint, the
maximum value of entropy is In n which will be called the primary entropy. However, if
there is some information in the form of our knowing the values of one or more moments,
the maximum value of entropy will be accordingly reduced and this entropy will be called
the secondary entropy corresponding to the given constraints. This can be given a name
according to the context of the application. Thus, the entropy corresponding to the energy
constraint (7) is called the thermodynamic entropy. Similarly, if there is a constraint on
average cost in an economic system, the corresponding maximum entropy may be called
economodynamic entropy [32,33J.
Likewise, if both the expected energy and the expected number of particles are pre-
scribed and the number of particles in any energy level can be any number from 0 to
00, then the maximum entropy is called Bose-Einstein entropy. If instead the number of
particles in any energy level can be only 0 or 1, it will be called Fermi-Dirac entropy.
Again, in the same way, if in an urban transportation problem, the number of persons
living in m residential colonies and the number of persons working in n offices are known
and if the expected travel cost is prescribed, then the maximum value of entropy may be
called transportation entropy or simply interactivity of the transportation system [4J.
Let f( x) be the probability density function for a continuous random variate varying over
a finite or infinite interval, then on analogy with (3), its entropy is defined as
-t f(x)lnf(x)dx (22)
This however cannot represent uncertainty in a strict sense, since this can be negative
and this is not invariant for coordinate transformations.
However the definition in (22) does not cause any serious problem when it is used
for implementing Jaynes' maximum entropy principle, since our object is not to find the
maximum value of (22), but to find the density function which makes (22) a maximum.
This can always be done.
Thus, let the range be (- 00, 00) and let the only information available about f( x) be
i: i: i:
given by
i.e., we know only the mean and variance of the distribution. There can be an infinity of
probability distributions with the same mean m and same variance u 2 and the distribution
ENTROPY OPTIMIZATION PRINCIPLES AND THEIR APPLICATIONS 9
which has the maximum entropy out of these is obtained by maximizing (22) subject to
(23) so that
1 (z-m)2
/(z) = !<>= exp[-t 2 ], (24)
V 27r0" 0"
Suppose on the basis of intuition, experience or theory, we have a feeling that the probability
distribution should be qll q2, ... , qn' To confirm our guess, we take some observations or
otherwise and find some characterizing moments of the distribution.
There may be many probability distributions which may be consistent with the given
constraints. Out of those, we choose that one which is 'nearest' in some sense, to the given
'a priori' distribution.
The principle we use here is similar to the principle of maximum entropy. The first part
is the same viz., use all the information you are given. In the second part instead of saying
that we should be as uncertain about the missing information as possible, we say that we
should be as near to our intuition and experience as possible.
To implement this principle, we need a measure of 'distance' or 'discrepancy' or 'di-
vergence' of a distribution P from a given distribution Q. One such measure was given
by Kullback and Leibler [37], even before the principle of maximum entropy was explicitly
stated. The measure was
n
p'
D(P: Q) = LPiln~ (25)
i=l q,
10 1. N. KAPUR AND H. K. KESAVAN
These two principles have had tremendous applications in a large variety of fields. Kapur
[16] has surveyed developments during 1957-1982. Some thoughts on the scientific and
philosophical foundations of maximum-entropy principle were given in [24]. Other early
surveys were given in [17, 19, 21].
In [30], these principles were used to solve the following fourteen problems:
Problem 1: Given a system of particles with possible energy levels E1, E2, ••• , En and with
average energy i, obtain estimates for PbP2, ... ,Pm where Pi is the probability of a
particle being in the i energy level.
Problem 2: If in Problem 1, the expected total number of particles in the system is also
known to be N, obtain estimates for the expected number of particles in each energy
level.
Problem 3: Given "n' residential colonies, with costs Cb C2, ... , Cn of travel from these to
the central business district, and being also given the expected average cost
n
C = LPiCi,
i=l
ENTROPY OPTIMIZATION PRINCIPLES AND THEIR APPLICATIONS 11
estimate the proportions PI, P2, ... , Pn of the total population living in these colonies.
Problem 4: Given (i) populations aI, a2, ••• , am of 'm' residential colonies, (ii) number of
jobs bl , b2, ••• , bn in "n' offices, (iii) cost Cij of travel from i-th colony to j-th office
(i = 1,2, ... , mj j = 1,2, ... , n), and (iv) the travel budget
n m
C= LL CijTijj
j=li=l
obtain estimate for Tij, the number of trips from the i-th colony to the j-th office.
Problem 5: Given network of 'm' queues and the average sizes of queues as all a2, ••• , am,
estimate p(nl' n2, ••• , n m ) which is the probability of there being nl, n2, ••• , nm persons
in the 'm' queues.
t
Problem 6: Given the average absorption J f( 'Z, y)dy per unit length in a slice of tissue,
of a photon beam of length 1 sent through the slice for a large number of such beams,
estimate the coefficient f( 'Z, y) at every point of the slice.
Problem 8: Given voting shares of a number of political parties, estimate (i) the number
of voters loyal to each party, each pair of parties, each triplet of parties and so on,
and (iii) the probability of switching of a voter from a given party to another.
Problem 9: Given (i) the number of beds in each ward of a hospital, (ii) the cost of each
bed in each ward, (iii) the total occupancy of the hospital, and (iv) the total revenue,
estimate the number of beds occupied in each ward.
Problem 10: Given that a continuous random variate varies from -00 to 00 and given
that its mean is 'm' and variance is (72, estimate its probability density function.
Problem 11: Given (i) the ranges of values for each component of a multidimensional
random variate, (ii) whether variate is continuous or discrete, and (iii) some moments
of the distribution of the random variate, estimate the probability density function
for the random variate.
Problem 12: Given 'L']=l aij'Zj == Ci, i == 1,2, ... , m, m ~ nj 'L']=l 'Zj == 1, 'Zb 'Z2, ... , 'Zn 2:
OJ estimate 'Zl, 'Z2, ••• , 'Zn·
Problem 13: Given a contingency table of any dimension, estimate dependence in it.
Problem 14: Given n random variables 'Zl, 'Z2, ••• , 'Zn with 'Zl + 'Z2 + ... + 'Zn == 1, each
'Zi 2: 0, estimate the density functions for 'Zl, 'Z2, ••• , :en'
12 J. N. KAPUR AND H. K. KESAVAN
The first comprehensive book on Maximum Entropy Models in Science and Engineering
is [26J. It devotes four chapters to describe discrete univariate, continuous univariate, dis-
crete multivariate and continuous multivariate maximum-entropy probability distributions.
In one chapter it obtains Maxwell-Boltzmann, Fermi-Dirac, Bose-Einstein and Intermedi-
ate Statistics distributions of statistical mechanics from MaxEnt. In another chapter, it
gives MaxEnt discussion of thermodynamics of closed and open or diffusive systems. One
chapter is devoted to maximum entropy models in regional and urban planning and this
discusses population distribution and transportation models and Fermi-Dirac and Bose-
Einstein distributions for residential location and trip distributions. Another chapter is
devoted to maximum-entropy models for brand-switching in marketing and vote-switching
in elections.
One chapter is devoted to obtaining information in contingency tables and another is
devoted to applications to statistical inference, non-parametric density estimation and other
applications in statistics.
One chapter is devoted to economics, finance, insurance and accountancy. It discusses
economic inequalities, optimum taxation policies, international trade models, stock market
models, loss of information on account of aggregation in accountancy etc ..
Another chapter deals with maximum-entropy principles in operations research. It
discusses MaxEnt in search theory, reliability theory, queueing theory, theory of games and
optimal portfolio analysis.
Three chapters deal with recent engineering applications of MaxEnt to spectral analysis,
image reconstruction and pattern recognition. These discuss comparison of MESA with
other methods of spectral analysis and multi-dimensional MESA, grey-level thresholding,
computerised tomography, Karhunen-Loeve expansion and pattern recognition as a quest
for minimum entropy [47J.
The final chapter deals with MaxEnt in pharmacokinetics, epidemic models, ecology,
design of experiments, and logistic law of population growth.
The list of topics given above and the 640 references given in [26J give an idea of the
all-pervading applications of MaxEnt and MinxEnt principles.
Another set of about three hundred papers on MaxEnt and Bayesian methods has ap-
peared in the proceedings of the ten MaxEnt conferences [3, 11, 43, 44, 45J. Besides a
discussion of theory of MaxEnt, MinxEnt and Bayesian statistics, these discuss applica-
tions to fields like magnetohydrodynamics, plasma physics, turbulence, condensed matter
physics, energy dissipation, random cellular structures, drug absorption, nuclear magnetic
resonance, image reconstruction, cyclotrone resonance, mass spectroscopy, underwater stud-
ies, magnetic resonance imaging, crystallography, chemical spectroscopy, time series, neural
networks, structural molecular biology, expert systems and information retrieval.
The generalised minimum cross-entropy principle requires that out of all probability distri-
butions satisfying given constraints, we should choose that probability distribution P which
is 'nearest' to a given a priori probability distribution Q.
ENTROPY OPTIMIZATION PRINCIPLES AND THEIR APPLICATIONS 13
1 [~'"
--1 ~ Pi qi1 - '" - 1] , (Havrda and Charvat [9]), (31)
a - i=l
1 n
- - I n "p~q~-'" (Renyi [39]) (32)
1 ~" ,
a - i=l
n p. 1 n 1 + ap.
2:PiIn-': - - 2:(1 + api)ln--', (Kapur [12, 18, 24] (33)
i=l qi a i=l 1 + aqi
Which of these or other proposed measures we choose, depends on our requirements. For
Euclidean distances, we have the requirements
(i) D(P : Q) ~ 0 (Non-negativity) (34)
In our case, we do require (i) and (ii) always. We do not require (iii), since we have
always to measure distances of various distributions from a given a priori distribution Q or
from the uniform distribution. We require only one-way or directed distances or directed
divergences.
We do not require (iv) also, since this triangle inequality arises from consideration of
geometrical distances in a Euclidean plane, whereas we are considering distances between
probability distributions.
We do not mind if (iii) and (iv) are satisfied, but we do not insist on these, since such
an insistence will restrict the choice of discrepancy measures.
We can now impose some additional requirements
(v) D(P: Q) should be a convex function of P so that when D(P : Q) is minimized subject
to linear constraints, the local minimum will be the global minimum.
14 J. N. KAPUR AND H. K. KESAVAN
(vi) When D(P : Q) is minimized subject to linear constraints, the minimizing probabil-
ities should automatically come out to be non-negative, since otherwise we have to
explicitly impose non-negativity constraints and this causes computational problems.
Requirements (v) and (vi) arise out of the need for simplifying mathematical computa-
tions, but this is an important requirement for practical implementation of the maximum
entropy and minimum cross-entropy principles.
Kullback-Leibler measure satisfies conditions (i), (ii), (v) and (vi).
There are other measures which also satisfy these and there should be no hesitation in
using these.
The advantage of using these measures over Kullback-Leibler measure arises because
each of these involves one or more parameters and these parameters can be chosen to give
better fit to given data.
Thus while use of Kullback-Leibler or corresponding Shannon's measure leads to only
one model of population growth viz. exponential law of population growth, use of Kapur's
measure leads to the logistic model with one parameter corresponding to the carrying ca-
pacity of the environment [13, 27J.
If we are given the moments, the problem of finding the MEPD will be called the direct
problem. The inverse problem is concerned with finding the characterising moments when
a given probability distribution is regarded as a MEPD.
Thus suppose we find that the observed probability distribution is the normal distribu-
tion and we want to know which moments will characterise it as a MEPD. We find that
if we are given two independent moments E(alz + b1 z 2 ), E(a2z + b2 z 2) we will get the
normal distribution.
The problem of existence and uniqueness of solutions of inverse maximum entropy and
minimum cross entropy principles has been studied by us in [35J.
Thus if we know that the income distribution in a society is the Pareto distribution, we
show by using the inverse maximum entropy principle, that the characterising moment is
E(ln z) which is the geometrical mean of income. Thus we know in this society it is not
the amount of income which matters, rather it is the logarithm of the amount of income
that matters. In other words the utility function in this society is logarithmic [29J.
Similarly if we find that if the probability distribution for intensity of earthquakes has
three parameters, we can say that the earthquakes in that region are determined by three
seismological characteristics of the region and we can try to find a physical interpretation
for these parameters.
Another very important result we get by using the inverse principles is that if in a
closed queueing network (for flexible manufacturing systems or computer systems) if the
probability distribution is of the product form, then the characterising information is about
the mean lengths of the queues [29, 32, 33J.
Thus the inverse principles can enable us to find the probabilistic causes of given prob-
abilistic systems.
ENTROPY OPTIMIZA nON PRINCIPLES AND THEIR APPLICA nONS 15
There may be an infinity of multivariate probability distributions of :1:1, :1:2, ••• ,:1: rn which may
have some given joint moments or marginal distributions. Out of these we want according to
this principle that distribution, which gives minimum interdependence among the variates.
For this we need a measure of dependence among the variates. The correlation coefficient is
not useful because there are m( m -1) /2 correlation coefficients measuring linear dependence
and we need only one measure measuring all types of dependence. Such a measure is given
by (Watanabe [48])
D = S1 + S2 + ... + Sm - S, (38)
where Si is the entropy of the probability distribution of the ith variate and S is the entropy
of the joint distribution. The principle of interdependence was first stated by Guissu et al.
[8].
This principle was further discussed [20] and was used in Kapur [24] to solve an im-
portant problem in pattern recognition. The problem is to find the matrix A so that the
components of Y = AX are as independent as possible, where X is a given normal variate.
It was shown that
(39)
where U1 , U2 , ••• , Urn are the eigenvectors corresponding to the m largest eigenvalues of the
correlation matrix of X.
The maximum entropy principle distribution gives the most unbiased distribution satisfying
given constraints. In the same way, the minimum entropy probability distribution gives the
most biased distribution satisfying the same constraints, and the true distribution will have
an entropy between the maximum and the minimum entropies.
As more and more information becomes available, the maximum entropy will decrease
and the minimum entropy will increase till the two coincide. Any further increase in infor-
mation will not change the entropy and we will have got the maximum information about
the system.
This is the problem of pattern recognition since knowing all information, we can con-
struct the pattern. In practice we can even stop when maximum and minimum entropies
are very close to each other.
Watanabe [47] described pattern recognition as a quest for minimum entropy.
The principle of minimum entropy is the dual of the principle of maximum entropy, but
it is more difficult to implement since it involves minimizing a concave function.
In the same way, principles of maximum cross-entropy and of maximum interdependence
will be duals of principles of minimum cross-entropy and minimum interdependence.
16 J. N. KAPUR AND H. K. KESAVAN
If we have a number of classes, entropy can be decompound into entropy within classes and
entropy between classes [32]. In general, we want to choose the classes in such a manner
that the entropy within classes is maximum and the entropy between classes is minimum,
so that each class is as homogenous as possible and different classes are as distinct as
possible. This is an important requirement in cluster analysis, group technology, flexible
manufacturing systems etc. and the principle of minimax entropy can enable us to achieve
the goals. This requires maximization of one type of entropy and minimization of entropy
of another type.
The dual of this principle will be the maximin entropy principle.
We will get two other principles by replacing entropy by cross-entropy.
For given constraints and for a given measure of entropy or cross-entropy, we can get a
corresponding maximum-entropy or minimum cross entropy probability distribution model.
For different measures of entropy, we should get different models. Conversely given a
mathematical model and given constraints, we can find a corresponding measure of entropy
which will lead to the given model.
This method has been used to generate measures of entropy from mathematical models
of population growth, innovation diffusion, epidemic models and chemical kinetics. Most of
the measures so obtained are the same as measures obtained from axiomatic considerations,
however we obtain a few new measures.
This shows that in every situation, a suitable measure of entropy is being maximized
subject to suitable constraints.
Our object in scientific research is to find these appropriate measures of entropy and
the corresponding appropriate constraints. In many cases, Shannon's measure is the appro-
priate measure of entropy and in this case, the problem reduces to simply that of finding
appropriate constraints.
There are many persons who are merely satisfied with MaxEnt and its rich applications.
They want to avoid the philosophical, mathematical and computational complications that
may arise out of the other principles. But these new principles have tremendous possibilities
of providing great insights and of exploring much wider classes of phenomena. It is hoped
that some of the readers of this paper will come forward to explore the new possibilities
that have been revealed by these entropy optimization principles.
REFERENCES
1 R. Christensen (1981) Entropy Minimax Source Book, 1-4, Entropy Ltd., Lincoln, Mass.
3 J.G. Erickson and C.R. Smith (editors) (1988) Maximum Entropy and Bayesian Methods
in Science and Engineering, Vol. 1 (Foundations), Vol. 2 (Applications). Kluwer
Academic Publishers, New York.
4 S. Erlander (1978) Optimal Interaction and the Gravity Models, Springer-Verlag, New
York.
6 P.F. Fourgere (ed) (1990) Maximum Entropy and Bayesian Methods, Kluwer Academic
Press, New York.
7 N. Georgescu-Rogen (1971) The Entropy Law and The Economic Process, Harvard Uni-
versity Press, Camb.
18 J. N. KAPUR AND H. K. KESAVAN
8 S. Guissu, R. Lablanc and L. Reischer (1982) "On the principle of minimum interdepen-
dence", J. Inf. Opt. Sci. Vol. 3, pp. 149-172.
10 E.T. Jaynes (1957) "Information theory and statistical mechanics", Physical Reviews,
Vol. 106, pp. 620-630.
11 J.H. Justice (editor) (1986) Maximum Entropy and Bayesian Methods in Applied Statis-
tics, Cambridge University Press, Boston.
13 J.N. Kapur (1983) "Derivation of logistic law of population growth from maximum
entropy principle", Nat. Acad. Sci. Letters, Vol. 6, No. 12, pp. 429-433.
14 J.N. Kapur (1983) "Comparative assessment of various measures of entropy", Jour. Inf.
and Opt. Sci. Vol. 4, No.1, pp. 207-232.
16 J.N. Kapur (1983) "Twenty-five years of maximum entropy", Jour. Math. Phy. Sci.
Vol. 17, No.2, pp. 103-156.
17 J.N. Kapur (1984) "The role of maximum entropy and minimum discrimination infor-
mation principles in statistics", Jour. Ind. Soc. Agri. Stat., Vol. 36, No.3, pp.
12-55.
18 J.N. Kapur (1984) "A comparative assessment of various measures of directed diver-
gences", Advances in Management studies, Vol. 3, pp. 1-16.
19 J.N. Kapur (1984) "On maximum entropy principle and its applications to science and
engineering", Proc. Nat. Symposium on Mathematical Modelling, MRI, Allahabad,
India, pp. 75-78.
20 J.N. Kapur (1984) "On minimum interdependence principle", Ind. Jour. Pure and App.
Math. 15(9), 968-977.
21 J.N. Kapur (1984) "Maximum entropy models in science and engineering", Proc. Nat.
Acad. Sciences, (Presidential Address, Physical Sciences Section), Annual number,
pp. 35-57.
22 J.N. Kapur, P.K. Sahoo and A.K.C. Wong (1985) "A new method of grey level picture
thresholding using entropy of the histogram", Computer vision, Graphics and Image
Processing, Vol. 29, pp. 273-28?
24 J.N. Kapur (1985) "Some thoughts on scientific and philosophical foundation of the
maximum entropy principle", Bull. Math. Ass. Ind., 17, pp. 15-40.
25 J.N. Kapur (1986) "Four families of measures of entropy", Ind. Jour. Pure and App.
Maths., Vol. 17, No.4, pp. 429-449.
26 J.N. Kapur (1990) Maximum-Entropy Models in Science and Engineering, John Wiley,
New York.
29 J.N. Kapur and H.K. Kesavan (1990) "Inverse MaxEnt or MinxEnt principles and their
applications", In Maximum Entropy and Bayesian Methods, edited by P. Fourgere,
pp. 433-450.
30 J.N. Kapur (1989) Maximum entropy principle, large-scale systems and cybernetics, In
Artificial Intelligence, Ed. by A. Ghoshal et al., South Asia Publishers, New Delhi.
31 H.K. Kesavan and J.N. Kapur (1989) "The generalised maximum entropy principle",
IEEE Trans. Syst. Man. Cyb., 19, pp. 1042-1052.
32 J.N. Kapur and H.K. Kesavan (1988) Generalised Maximum Entropy Principle (with
Applications), PP. 225, Sandford Educational Press, University of Waterloo.
33 J.N. Kapur and H.K. Kesavan (1991) Entropy Optimization Principles and their Appli-
cations. (book under publication).
34 J.N. Kapur and H.K. Kesavan (1989) "Generalised maximum entropy principle", Proc.
Int. Conf. Math. Mod., Vol. 2, lIT Madras, pp. 1-11.
35 H.K. Kesavan and J.N. Kapur (1990) "On the families of solutions of generalised maxi-
mum and minimum cross-entropy models", Int. Jour. Systems, Vol. 16, pp. 199-219.
36 J.N. Kapur and H.K. Kesavan (1990) "Maximum entropy and minimum cross entropy
principles: need for new perspectives", In Maximum Entropy and Bayesian Methods,
edited by P. Fougere, Kluwer Press, pp. 419-432.
37 S. Kullback and R.A. Leibler (1951) "On information and sufficiency", Ann. Math.
Stat., Vol. 22, pp. 79-86.
38 S. Kullback (1958) On Information Theory and Statistics, John Wiley, New York.
39 A. Renyi (1961) "On measures of entropy and information", Proc. 4th Berkeley Sym-
posium, Maths. Stat. Prob., Vol. 1, pp. 547-561.
41 A.K. Seth (1989) "Prof. J.N. Kapur's Views on Entropy Optimization Principles", Bull.
Math. Ass. Ind., Vol. 21, pp. 1-38, Vol. 22, 1-42.
42 C.E. Shannon (1948) "A mathematical theory of communication", Bell System, Tech.
J., Vol. 27, pp. 379-423, 623-659.
43 J. Skilling (Editor) (1989) Maximum Entropy and Bayesian Methods, Kluwer Academic
Publishers, New York.
44 C.R. Smith and W.T. Grandy, Jr. (1985) (eds) Maximum Entropy and Bayesian Meth-
ods in Inverse Problem, D. Reidel, Doedrecht, Holland.
45 C.R. Smith and J.G Erickson (eds) (1989) Maximum-Entropy and Bayesian Spectral
Analysis and Estimation Problems, D. Reidel and Kluwer Academic Publishers, New
York.