You are on page 1of 12

Information theory

Not to be confused with information science. tainty. In the case of communication of information over
a noisy channel, this abstract concept was made concrete
in 1948 by Claude Shannon in his paper "A Mathemati-
Information theory studies the quantication, storage,
and communication of information. It was originally cal Theory of Communication", in which information
is thought of as a set of possible messages, where the
proposed by Claude E. Shannon in 1948 to nd fun-
damental limits on signal processing and communica- goal is to send these messages over a noisy channel, and
then to have the receiver reconstruct the message with
tion operations such as data compression, in a land-
mark paper entitled "A Mathematical Theory of Com- low probability of error, in spite of the channel noise.
munication". Now this theory has found applications in Shannons main result, the noisy-channel coding theorem
many other areas, including statistical inference, natural showed that, in the limit of many channel uses, the rate
language processing, cryptography, neurobiology,[1] the of information that is asymptotically achievable is equal
evolution[2] and function[3] of molecular codes, model se- to the channel capacity, a quantity dependent merely on
lection in ecology,[4] thermal physics,[5] quantum com- the statistics of the channel over which the messages are
puting, linguistics, plagiarism detection,[6] pattern recog- sent.[1]
nition, and anomaly detection.[7] Information theory is closely associated with a collection
A key measure in information theory is "entropy". En- of pure and applied disciplines that have been investi-
tropy quanties the amount of uncertainty involved in the gated and reduced to engineering practice under a va-
value of a random variable or the outcome of a random riety of rubrics throughout the world over the past half
process. For example, identifying the outcome of a fair century or more: adaptive systems, anticipatory systems,
coin ip (with two equally likely outcomes) provides less articial intelligence, complex systems, complexity sci-
information (lower entropy) than specifying the outcome ence, cybernetics, informatics, machine learning, along
with systems sciences of many descriptions. Informa-
from a roll of a die (with six equally likely outcomes).
Some other important measures in information theory are tion theory is a broad and deep mathematical theory, with
equally broad and deep applications, amongst which is the
mutual information, channel capacity, error exponents,
and relative entropy. vital eld of coding theory.

Applications of fundamental topics of information theory Coding theory is concerned with nding explicit methods,
include lossless data compression (e.g. ZIP les), lossy called codes, for increasing the eciency and reducing
data compression (e.g. MP3s and JPEGs), and channel the error rate of data communication over noisy chan-
coding (e.g. for Digital Subscriber Line (DSL)). nels to near the Channel capacity. These codes can be
roughly subdivided into data compression (source coding)
The eld is at the intersection of mathematics, statistics, and error-correction (channel coding) techniques. In the
computer science, physics, neurobiology, and electrical latter case, it took many years to nd the methods Shan-
engineering. Its impact has been crucial to the success nons work proved were possible. A third class of infor-
of the Voyager missions to deep space, the invention of mation theory codes are cryptographic algorithms (both
the compact disc, the feasibility of mobile phones, the codes and ciphers). Concepts, methods and results from
development of the Internet, the study of linguistics and coding theory and information theory are widely used in
of human perception, the understanding of black holes, cryptography and cryptanalysis. See the article ban (unit)
and numerous other elds. Important sub-elds of in- for a historical application.
formation theory include source coding, channel coding,
Information theory is also used in information retrieval,
algorithmic complexity theory, algorithmic information
theory, information-theoretic security, and measures of intelligence gathering, gambling, statistics, and even in
musical composition.

1 Overview 2 Historical background

Information theory studies the transmission, processing, Main article: History of information theory
utilization, and extraction of information. Abstractly, in-
formation can be thought of as the resolution of uncer- The landmark event that established the discipline of


information theory, and brought it to immediate world- the practical result of the ShannonHartley law for
wide attention, was the publication of Claude E. Shan- the channel capacity of a Gaussian channel; as well
non's classic paper "A Mathematical Theory of Commu- as
nication" in the Bell System Technical Journal in July and
October 1948. the bita new way of seeing the most fundamental
unit of information.
Prior to this paper, limited information-theoretic ideas
had been developed at Bell Labs, all implicitly assum-
ing events of equal probability. Harry Nyquist's 1924 3 Quantities of information
paper, Certain Factors Aecting Telegraph Speed, con-
tains a theoretical section quantifying intelligence and
the line speed at which it can be transmitted by a com- Main article: Quantities of information
munication system, giving the relation W = K log m (re-
calling Boltzmanns constant), where W is the speed of Information theory is based on probability theory and
transmission of intelligence, m is the number of dierent statistics. Information theory often concerns itself with
voltage levels to choose from at each time step, and K is measures of information of the distributions associated
a constant. Ralph Hartley's 1928 paper, Transmission of with random variables. Important quantities of informa-
Information, uses the word information as a measurable tion are entropy, a measure of information in a single
quantity, reecting the receivers ability to distinguish one
random variable, and mutual information, a measure of
sequence of symbols from any other, thus quantifying in- information in common between two random variables.
formation as H = log S n = n log S, where S was the num- The former quantity is a property of the probability dis-
ber of possible symbols, and n the number of symbols in tribution of a random variable and gives a limit on the rate
a transmission. The unit of information was therefore the at which data generated by independent samples with the
decimal digit, much later renamed the hartley in his hon- given distribution can be reliably compressed. The lat-
our as a unit or scale or measure of information. Alan ter is a property of the joint distribution of two random
Turing in 1940 used similar ideas as part of the statistical
variables, and is the maximum rate of reliable commu-
analysis of the breaking of the German second world war nication across a noisy channel in the limit of long block
Enigma ciphers. lengths, when the channel statistics are determined by the
Much of the mathematics behind information theory joint distribution.
with events of dierent probabilities were developed for The choice of logarithmic base in the following formulae
the eld of thermodynamics by Ludwig Boltzmann and determines the unit of information entropy that is used.
J. Willard Gibbs. Connections between information- A common unit of information is the bit, based on the
theoretic entropy and thermodynamic entropy, includ- binary logarithm. Other units include the nat, which is
ing the important contributions by Rolf Landauer in the based on the natural logarithm, and the hartley, which is
1960s, are explored in Entropy in thermodynamics and based on the common logarithm.
information theory.
In what follows, an expression of the form p log p is con-
In Shannons revolutionary and groundbreaking paper, sidered by convention to be equal to zero whenever p =
the work for which had been substantially completed at 0. This is justied because limp0+ p log p = 0 for any
Bell Labs by the end of 1944, Shannon for the rst time logarithmic base.
introduced the qualitative and quantitative model of com-
munication as a statistical process underlying information
theory, opening with the assertion that 3.1 Entropy of an information source
Based on the probability mass function of each source
The fundamental problem of communication symbol to be communicated, the Shannon entropy H, in
is that of reproducing at one point, either ex- units of bits (per symbol), is given by
actly or approximately, a message selected at
another point.

H= pi log2 (pi )
With it came the ideas of i

where pi is the probability of occurrence of the i-th pos-

the information entropy and redundancy of a source, sible value of the source symbol. This equation gives the
and its relevance through the source coding theorem; entropy in the units of bits (per symbol) because it uses
a logarithm of base 2, and this base-2 measure of entropy
the mutual information, and the channel capacity of has sometimes been called the "shannon" in his honor.
a noisy channel, including the promise of perfect Entropy is also commonly computed using the natural
loss-free communication given by the noisy-channel logarithm (base e, where e is Eulers number), which pro-
coding theorem; duces a measurement of entropy in "nats" per symbol and
3.2 Joint entropy 3

sometimes simplies the analysis by avoiding the need to maximized when all the messages in the message space
include extra constants in the formulas. Other bases are are equiprobable p(x) = 1/n; i.e., most unpredictable, in
also possible, but less commonly used. For example, a which case H(X) = log n.
logarithm of base 28 = 256 will produce a measurement The special case of information entropy for a random
in bytes per symbol, and a logarithm of base 10 will pro- variable with two outcomes is the binary entropy func-
duce a measurement in decimal digits (or hartleys) per tion, usually taken to the logarithmic base 2, thus having
symbol. the shannon (Sh) as unit:
Intuitively, the entropy HX of a discrete random variable
X is a measure of the amount of uncertainty associated
with the value of X when only its distribution is known. Hb (p) = p log2 p (1 p) log2 (1 p).
The entropy of a source that emits a sequence of N sym-
bols that are independent and identically distributed (iid)
is NH bits (per message of N symbols). If the source
3.2 Joint entropy
data symbols are identically distributed but not indepen-
The joint entropy of two discrete random variables X
dent, the entropy of a message of length N will be less
and Y is merely the entropy of their pairing: (X, Y). This
than NH.
implies that if X and Y are independent, then their joint
entropy is the sum of their individual entropies.
For example, if (X, Y) represents the position of a chess
piece X the row and Y the column, then the joint en-
tropy of the row of the piece and the column of the piece
will be the entropy of the position of the piece.


0.5 H(X, Y ) = EX,Y [ log p(x, y)] = p(x, y) log p(x, y)


Despite similar notation, joint entropy should not be con-

fused with cross entropy.

3.3 Conditional entropy (equivocation)

0 0.5 1 The conditional entropy or conditional uncertainty of
Pr(X = 1) X given random variable Y (also called the equivocation
of X about Y) is the average conditional entropy over Y:[9]
The entropy of a Bernoulli trial as a function of success probabil-
ity, often called the binary entropy function, Hb(p). The entropy
is maximized at 1 bit per trial when the two possible outcomes are
equally probable, as in an unbiased coin toss. H(X|Y ) = EY [H(X|y)] = p(y) p(x|y) log p(x|y) = p
yY xX x,y

Suppose one transmits 1000 bits (0s and 1s). If the value
Because entropy can be conditioned on a random vari-
of each of these bits is known to the receiver (has a spe-
able or on that random variable being a certain value, care
cic value with certainty) ahead of transmission, it is clear
should be taken not to confuse these two denitions of
that no information is transmitted. If, however, each bit is
conditional entropy, the former of which is in more com-
independently equally likely to be 0 or 1, 1000 shannons
mon use. A basic property of this form of conditional
of information (more often called bits) have been trans-
entropy is that:
mitted. Between these two extremes, information can be
quantied as follows. If is the set of all messages {x1 ,
, xn} that X could be, and p(x) is the probability of
some x X , then the entropy, H, of X is dened:[8] H(X|Y ) = H(X, Y ) H(Y ).

3.4 Mutual information (transinforma-

H(X) = EX [I(x)] = p(x) log p(x).
(Here, I(x) is the self-information, which is the entropy Mutual information measures the amount of informa-
contribution of an individual message, and X is the tion that can be obtained about one random variable by
expected value.) A property of entropy is that it is observing another. It is important in communication

where it can be used to maximize the amount of infor- divergence is the number of average additional bits per
mation shared between sent and received signals. The datum necessary for compression. It is thus dened
mutual information of X relative to Y is given by:

DKL (p(X)q(X)) = p(x) log q(x) p(x) log p(x) = p(
p(x, y)
I(X; Y ) = EX,Y [SI(x, y)] = p(x, y) log xX xX xX
p(x) p(y)
x,y Although it is sometimes used as a 'distance metric', KL
divergence is not a true metric since it is not symmetric
where SI (Specic mutual Information) is the pointwise and does not satisfy the triangle inequality (making it a
mutual information. semi-quasimetric).
A basic property of the mutual information is that Another interpretation of the KL divergence is the un-
necessary surprise introduced by a prior from the truth:
suppose a number X is about to be drawn randomly from
I(X; Y ) = H(X) H(X|Y ). a discrete set with probability distribution p(x). If Al-
ice knows the true distribution p(x), while Bob believes
That is, knowing Y, we can save an average of I(X; Y) (has a prior) that the distribution is q(x), then Bob will
bits in encoding X compared to not knowing Y. be more surprised than Alice, on average, upon seeing
Mutual information is symmetric: the value of X. The KL divergence is the (objective) ex-
pected value of Bobs (subjective) surprisal minus Alices
surprisal, measured in bits if the log is in base 2. In this
way, the extent to which Bobs prior is wrong can be
I(X; Y ) = I(Y ; X) = H(X) + H(Y ) H(X, Y ).
quantied in terms of how unnecessarily surprised its
Mutual information can be expressed as the average expected to make him.
KullbackLeibler divergence (information gain) between
the posterior probability distribution of X given the value
3.6 Other quantities
of Y and the prior distribution on X:
Other important information theoretic quantities include
Rnyi entropy (a generalization of entropy), dierential
I(X; Y ) = Ep(y) [DKL (p(X|Y = y)p(X))]. entropy (a generalization of quantities of information to
continuous distributions), and the conditional mutual in-
In other words, this is a measure of how much, on the av- formation.
erage, the probability distribution on X will change if we
are given the value of Y. This is often recalculated as the
divergence from the product of the marginal distributions
to the actual joint distribution:
4 Coding theory
Main article: Coding theory
Coding theory is one of the most important and direct
I(X; Y ) = DKL (p(X, Y )p(X)p(Y )).

Mutual information is closely related to the log-likelihood

ratio test in the context of contingency tables and the
multinomial distribution and to Pearsons 2 test: mutual
information can be considered a statistic for assessing in-
dependence between a pair of variables, and has a well-
specied asymptotic distribution.

3.5 KullbackLeibler divergence (infor-

A picture showing scratches on the readable surface of a CD-R.
mation gain) Music and data CDs are coded using error correcting codes and
thus can still be read even if they have minor scratches using error
The KullbackLeibler divergence (or information di- detection and correction.
vergence, information gain, or relative entropy) is a
way of comparing two distributions: a true probability applications of information theory. It can be subdivided
distribution p(X), and an arbitrary probability distribution into source coding theory and channel coding theory. Us-
q(X). If we compress data in a manner that assumes q(X) ing a statistical description for data, information theory
is the distribution underlying some data, when, in real- quanties the number of bits needed to describe the data,
ity, p(X) is the correct distribution, the KullbackLeibler which is the information entropy of the source.
4.2 Channel capacity 5

Data compression (source coding): There are two

formulations for the compression problem: 1
r = lim H(X1 , X2 , . . . Xn );
n n
1. lossless data compression: the data must be recon-
structed exactly; that is, the limit of the joint entropy per symbol. For
stationary sources, these two expressions give the same
2. lossy data compression: allocates bits needed to re- result.[10]
construct the data, within a specied delity level
It is common in information theory to speak of the rate
measured by a distortion function. This subset of
or entropy of a language. This is appropriate, for exam-
Information theory is called ratedistortion theory.
ple, when the source of information is English prose. The
rate of a source of information is related to its redundancy
Error-correcting codes (channel coding): While and how well it can be compressed, the subject of source
data compression removes as much redundancy as coding.
possible, an error correcting code adds just the right
kind of redundancy (i.e., error correction) needed
to transmit the data eciently and faithfully across 4.2 Channel capacity
a noisy channel.
Main article: Channel capacity
This division of coding theory into compression and
transmission is justied by the information transmission Communications over a channelsuch as an ethernet
theorems, or sourcechannel separation theorems that cableis the primary motivation of information theory.
justify the use of bits as the universal currency for infor- As anyone whos ever used a telephone (mobile or land-
mation in many contexts. However, these theorems only line) knows, however, such channels often fail to produce
hold in the situation where one transmitting user wishes exact reconstruction of a signal; noise, periods of silence,
to communicate to one receiving user. In scenarios with and other forms of signal corruption often degrade qual-
more than one transmitter (the multiple-access channel), ity. How much information can one hope to communicate
more than one receiver (the broadcast channel) or inter- over a noisy (or otherwise imperfect) channel?
mediary helpers (the relay channel), or more general
networks, compression followed by transmission may no Consider the communications process over a discrete
longer be optimal. Network information theory refers to channel. A simple model of the process is shown below:
these multi-agent communication models.
x (noisy) y
Transmitter Receiver
4.1 Source theory
Any process that generates successive messages can be
Here X represents the space of messages transmitted, and
considered a source of information. A memoryless
Y the space of messages received during a unit time over
source is one in which each message is an independent
our channel. Let p(y|x) be the conditional probability dis-
identically distributed random variable, whereas the
tribution function of Y given X. We will consider p(y|x)
properties of ergodicity and stationarity impose less re-
to be an inherent xed property of our communications
strictive constraints. All such sources are stochastic.
channel (representing the nature of the noise of our chan-
These terms are well studied in their own right outside
nel). Then the joint distribution of X and Y is completely
information theory.
determined by our channel and by our choice of f(x), the
marginal distribution of messages we choose to send over
4.1.1 Rate the channel. Under these constraints, we would like to
maximize the rate of information, or the signal, we can
Information rate is the average entropy per symbol. For communicate over the channel. The appropriate mea-
memoryless sources, this is merely the entropy of each sure for this is the mutual information, and this maximum
symbol, while, in the case of a stationary stochastic pro- mutual information is called the channel capacity and is
cess, it is given by:

r = lim H(Xn |Xn1 , Xn2 , Xn3 , . . .); C = max I(X; Y ).

n f

that is, the conditional entropy of a symbol given all the This capacity has the following property related to com-
previous symbols generated. For the more general case municating at information rate R (where R is usually bits
of a process that is not necessarily stationary, the average per symbol). For any information rate R < C and cod-
rate is ing error > 0, for large enough N, there exists a code of

length N and rate R and a decoding algorithm, such that 5.1 Intelligence uses and secrecy applica-
the maximal probability of block error is ; that is, it tions
is always possible to transmit with arbitrarily small block
error. In addition, for any rate R > C, it is impossible to Information theoretic concepts apply to cryptography and
transmit with arbitrarily small block error. cryptanalysis. Turing's information unit, the ban, was
Channel coding is concerned with nding such nearly used in the Ultra project, breaking the German Enigma
optimal codes that can be used to transmit data over a machine code and hastening the end of World War II in
noisy channel with a small coding error at a rate near the Europe. Shannon himself dened an important concept
channel capacity. now called the unicity distance. Based on the redundancy
of the plaintext, it attempts to give a minimum amount of
ciphertext necessary to ensure unique decipherability.
4.2.1 Capacity of particular channel models
Information theory leads us to believe it is much more
A continuous-time analog communications channel dicult to keep secrets than it might rst appear. A
subject to Gaussian noise see ShannonHartley brute force attack can break systems based on asymmetric
theorem. key algorithms or on most commonly used methods of
symmetric key algorithms (sometimes called secret key
A binary symmetric channel (BSC) with crossover algorithms), such as block ciphers. The security of all
probability p is a binary input, binary output chan- such methods currently comes from the assumption that
nel that ips the input bit with probability p. The no known attack can break them in a practical amount of
BSC has a capacity of 1 H (p) bits per channel time.
use, where H is the binary entropy function to the Information theoretic security refers to methods such as
base 2 logarithm: the one-time pad that are not vulnerable to such brute
force attacks. In such cases, the positive conditional
1p mutual information between the plaintext and ciphertext
0 0 (conditioned on the key) can ensure proper transmis-
sion, while the unconditional mutual information between
the plaintext and ciphertext remains zero, resulting in
p absolutely secure communications. In other words, an
eavesdropper would not be able to improve his or her
p guess of the plaintext by gaining knowledge of the ci-
phertext but not of the key. However, as in any other
1 1 cryptographic system, care must be used to correctly ap-
1p ply even information-theoretically secure methods; the
Venona project was able to crack the one-time pads of
the Soviet Union due to their improper reuse of key ma-
A binary erasure channel (BEC) with erasure prob- terial.
ability p is a binary input, ternary output channel.
The possible channel outputs are 0, 1, and a third
symbol 'e' called an erasure. The erasure represents 5.2 Pseudorandom number generation
complete loss of information about an input bit. The
capacity of the BEC is 1 - p bits per channel use.
Pseudorandom number generators are widely available
in computer language libraries and application pro-
1p grams. They are, almost universally, unsuited to cryp-
0 0 tographic use as they do not evade the deterministic na-
ture of modern computer equipment and software. A
p class of improved random number generators is termed
cryptographically secure pseudorandom number gener-
e ators, but even they require random seeds external to
p the software to work as intended. These can be ob-
tained via extractors, if done carefully. The measure
1 1 of sucient randomness in extractors is min-entropy, a
1p value related to Shannon entropy through Rnyi entropy;
Rnyi entropy is also used in evaluating randomness in
cryptographic systems. Although related, the distinctions
among these measures mean that a random variable with
5 Applications to other elds high Shannon entropy is not necessarily satisfactory for
use in an extractor and so for cryptography uses.
6.2 History 7

5.3 Seismic exploration Gambling

One early commercial application of information theory Intelligence (information gathering)

was in the eld of seismic oil exploration. Work in this
eld made it possible to strip o and separate the un- Seismic exploration
wanted noise from the desired seismic signal. Informa-
tion theory and digital signal processing oer a major im-
provement of resolution and image clarity over previous
analog methods.[11]
6.2 History

Hartley, R.V.L.
5.4 Semiotics
History of information theory
Concepts from information theory such as redundancy
and code control have been used by semioticians such as Shannon, C.E.
Umberto Eco and Ferruccio Rossi-Landi to explain ide-
ology as a form of message transmission whereby a dom- Timeline of information theory
inant social class emits its message by using signs that ex-
hibit a high degree of redundancy such that only one mes-
Yockey, H.P.
sage is decoded among a selection of competing ones.[12]

5.5 Miscellaneous applications 6.3 Theory

Information theory also has applications in gambling and Coding theory

investing, black holes, and bioinformatics.
Detection theory

6 See also Estimation theory

Algorithmic probability Fisher information

Algorithmic information theory
Information algebra
Bayesian inference
Information asymmetry
Communication theory

Constructor theory - a generalization of information Information eld theory

theory that includes quantum information
Information geometry
Inductive probability
Information theory and measure theory
Minimum message length

Minimum description length Kolmogorov complexity

List of important publications

Logic of information
Philosophy of information
Network coding

6.1 Applications Philosophy of Information

Active networking Quantum information science

Semiotic information theory
Source coding

Entropy in thermodynamics and information theory Unsolved Problems


6.4 Concepts [6] Charles H. Bennett, Ming Li, and Bin Ma (2003) Chain
Letters and Evolutionary Histories, Scientic American
Ban (unit) 288:6, 76-81

Channel capacity [7] David R. Anderson (November 1, 2003). Some back-

ground on why people in the empirical sciences may want
Channel (communications) to better understand the information-theoretic methods
(pdf). Retrieved 2010-06-23.
Communication source
[8] Fazlollah M. Reza (1994) [1961]. An Introduction to In-
Conditional entropy
formation Theory. Dover Publications, Inc., New York.
Covert channel ISBN 0-486-68210-2.

Decoder [9] Robert B. Ash (1990) [1965]. Information Theory. Dover

Publications, Inc. ISBN 0-486-66521-6.
Dierential entropy
[10] Jerry D. Gibson (1998). Digital Compression for Multime-
Encoder dia: Principles and Standards. Morgan Kaufmann. ISBN
Information entropy
[11] The Corporation and Innovation, Haggerty, Patrick,
Joint entropy Strategic Management Journal, Vol. 2, 97-118 (1981)
KullbackLeibler divergence [12] Semiotics of Ideology, Noth, Winfried, Semiotica, Issue
Mutual information 148,(1981)

Pointwise mutual information (PMI)

7.1 The classic work
Receiver (information theory)
Redundancy Shannon, C.E. (1948), "A Mathematical Theory of
Communication", Bell System Technical Journal, 27,
Rnyi entropy pp. 379423 & 623656, July & October, 1948.
Self-information Notes and other formats.
Unicity distance
R.V.L. Hartley, Transmission of Information, Bell
Variety System Technical Journal, July 1928

Hamming distance Andrey Kolmogorov (1968), Three approaches to

the quantitative denition of information in Inter-
national Journal of Computer Mathematics.
7 References
[1] F. Rieke; D. Warland; R Ruyter van Steveninck; W Bialek 7.2 Other journal articles
(1997). Spikes: Exploring the Neural Code. The MIT
press. ISBN 978-0262681087. J. L. Kelly, Jr.,, A New Interpre-
tation of Information Rate Bell System Technical
[2] cf. Huelsenbeck, J. P., F. Ronquist, R. Nielsen and J. P. Journal, Vol. 35, July 1956, pp. 91726.
Bollback (2001) Bayesian inference of phylogeny and its
impact on evolutionary biology, Science 294:2310-2314 R. Landauer,, Information is Physi-
[3] Rando Allikmets, Wyeth W. Wasserman, Amy Hutchin- cal Proc. Workshop on Physics and Computation
son, Philip Smallwood, Jeremy Nathans, Peter K. Rogan, PhysComp'92 (IEEE Comp. Sci.Press, Los Alami-
Thomas D. Schneider, Michael Dean (1998) Organization tos, 1993) pp. 14.
of the ABCR gene: analysis of promoter and splice junc-
tion sequences, Gene 215:1, 111-122 R. Landauer,, Irreversibility and Heat
Generation in the Computing Process IBM J. Res.
[4] Burnham, K. P. and Anderson D. R. (2002) Model Selec- Develop. Vol. 5, No. 3, 1961
tion and Multimodel Inference: A Practical Information-
Theoretic Approach, Second Edition (Springer Science, Timme, Nicholas; Alford, Wesley; Flecker, Ben-
New York) ISBN 978-0-387-95364-9. jamin; Beggs, John M. (2012). Multivariate infor-
[5] Jaynes, E. T. (1957) Information Theory and Statistical mation measures: an experimentalists perspective.
Mechanics, Phys. Rev. 106:620 arXiv:1111.6857 [cs.IT].
7.4 Other books 9

7.3 Textbooks on information theory 7.4 Other books

Arndt, C. Information Measures, Information and Leon Brillouin, Science and Information Theory,
its Description in Science and Engineering (Springer Mineola, N.Y.: Dover, [1956, 1962] 2004. ISBN
Series: Signals and Communication Technology), 0-486-43918-6
2004, ISBN 978-3-540-40855-0
James Gleick, The Information: A History, a The-
Ash, RB. Information Theory. New York: Inter- ory, a Flood, New York: Pantheon, 2011. ISBN
science, 1965. ISBN 0-470-03445-9. New York: 978-0-375-42372-7
Dover 1990. ISBN 0-486-66521-6 A. I. Khinchin, Mathematical Foundations of Infor-
Gallager, R. Information Theory and Reliable Com- mation Theory, New York: Dover, 1957. ISBN 0-
munication. New York: John Wiley and Sons, 1968. 486-60434-9
ISBN 0-471-29048-3 H. S. Le and A. F. Rex, Editors, Maxwells Demon:
Goldman, S. Information Theory. New York: Pren- Entropy, Information, Computing, Princeton Uni-
tice Hall, 1953. New York: Dover 1968 ISBN 0- versity Press, Princeton, New Jersey (1990). ISBN
486-62209-6, 2005 ISBN 0-486-44271-3 0-691-08727-X
Robert K. Logan. What is Information? - Propa-
Cover, Thomas; Thomas, Joy A. (2006). Elements
gating Organization in the Biosphere, the Symbolo-
of information theory (2nd ed.). New York: Wiley-
sphere, the Technosphere and the Econosphere,
Interscience. ISBN 0-471-24195-4.

Csiszar, I, Korner, J. Information Theory: Cod- Toronto: DEMO Publishing.

ing Theorems for Discrete Memoryless Systems
Akademiai Kiado: 2nd edition, 1997. ISBN 963- Tom Siegfried, The Bit and the Pendulum, Wiley,
05-7440-3 2000. ISBN 0-471-32174-5

MacKay, David J. C.. Information Theory, Infer- Charles Seife, Decoding the Universe, Viking, 2006.
ence, and Learning Algorithms Cambridge: Cam- ISBN 0-670-03441-X
bridge University Press, 2003. ISBN 0-521-64298- Jeremy Campbell, Grammatical Man, Touch-
1 stone/Simon & Schuster, 1982, ISBN 0-671-44062-
Mansuripur, M. Introduction to Information The-
ory. New York: Prentice Hall, 1987. ISBN 0-13- Henri Theil, Economics and Information Theory,
484668-0 Rand McNally & Company - Chicago, 1967.
McEliece, R. The Theory of Information and Cod- Escolano, Suau, Bonev, Information Theory in Com-
ing. Cambridge, 2002. ISBN 978-0521831857 puter Vision and Pattern Recognition, Springer,
2009. ISBN 978-1-84882-296-2
Pierce, JR. An introduction to information theory:
symbols, signals and noise. Dover (2nd Edition).
1961 (reprinted by Dover 1980). 7.5 MOOC on information theory
Reza, F. An Introduction to Information Theory. Raymond W. Yeung, "Information Theory" (The
New York: McGraw-Hill 1961. New York: Dover Chinese University of Hong Kong)
1994. ISBN 0-486-68210-2

Shannon, Claude; Weaver, Warren (1949). The

Mathematical Theory of Communication (PDF). 8 External links
Urbana, Illinois: University of Illinois Press. ISBN
0-252-72548-4. LCCN 49-11922. Erill I. (2012), "A gentle introduction to informa-
tion content in transcription factor binding sites"
Stone, JV. Chapter 1 of book Information Theory: (University of Maryland, Baltimore County)
A Tutorial Introduction, University of Sheeld,
England, 2014. ISBN 978-0956372857. Hazewinkel, Michiel, ed. (2001), Information,
Encyclopedia of Mathematics, Springer, ISBN 978-
Yeung, RW. A First Course in Information Theory 1-55608-010-4
Kluwer Academic/Plenum Publishers, 2002. ISBN
0-306-46791-7. Lambert F. L. (1999), "Shued Cards, Messy
Desks, and Disorderly Dorm Rooms - Examples of
Yeung, RW. Information Theory and Network Cod- Entropy Increase? Nonsense!", Journal of Chemical
ing Springer 2008, 2002. ISBN 978-0-387-79233-0 Education

Schneider T. D. (2014), "Information Theory

Srinivasa, S., "A Review on Multivariate Mutual In-
IEEE Information Theory Society and ITSoc review

9 Text and image sources, contributors, and licenses

9.1 Text
Information theory Source: Contributors: AxelBoldt, Brion VIB-
BER, Timo Honkasalo, Ap, Graham Chapman, XJaM, Toby Bartels, Hannes Hirzel, Edward, D, PhilipMW, Michael Hardy, Isomorphic,
Kku, Bobby D. Bryant, Varunrebel, Vinodmp, AlexanderMalmberg, Karada, (, Iluvcapra, Minesweeper, StephanWehner, Ahoerstemeier,
Angela, LittleDan, Kevin Baas, Poor Yorick, Andres, Novum, Charles Matthews, Guaka, Bemoeial, Ww, Dysprosia, The Anomebot, Mun-
ford, Hyacinth, Ann O'nyme, Shizhao, AnonMoos, MH~enwiki, Robbot, Fredrik, Rvollmert, Seglea, Chancemill, Securiger, SC, Lupo,
Wile E. Heresiarch, ManuelGR, Ancheta Wis, Giftlite, Lee J Haywood, COMPATT, KuniShiro~enwiki, SoWhy, Andycjp, Ynh~enwiki,
Pcarbonn, OverlordQ, L353a1, Rdsmith4, APH, Cihan, Elektron, Creidieki, Neuromancien, Rakesh kumar, Picapica, D6, Jwdietrich2,
Imroy, CALR, Noisy, Rich Farmbrough, Guanabot, NeuronExMachina, Bishonen, Ericamick, Xezbeth, Bender235, MisterSheik, Crunchy
Frog, El C, Spoon!, Simon South, Bobo192, Smalljim, Rbj, Maurreen, Nothingmuch, Photonique, Andrewbadr, Haham hanuka, Pearle,
Mpeisenbr, Mdd, Msh210, Uncle Bill, Pouya, BryanD, PAR, Cburnett, Jheald, Geraldshields11, Kusma, DV8 2XL, Oleg Alexandrov, Fran-
cisTyers, Velho, Woohookitty, Linas, Mindmatrix, Ruud Koot, Eatsaq, Eyreland, SeventyThree, Kanenas, Graham87, Josh Parris, Mayu-
mashu, SudoMonas, Arunkumar, HappyCamper, Bubba73, Alejo2083, Chris Pressey, Mathbot, Annacoder, Nabarry, Srleer, Chobot,
DVdm, Commander Nemet, FrankTobia, Siddhant, YurikBot, Wavelength, RobotE, RussBot, Michael Slone, Loom91, Grubber, ML,
Yahya Abdal-Aziz, Raven4x4x, Moe Epsilon, DanBri, BMAH07, Allchopin, Light current, Mceliece, Arthur Rubin, Lyrl, GrinBot~enwiki,
Sardanaphalus, Lordspaz, SmackBot, Imz, Henri de Solages, Incnis Mrsi, Reedy, InverseHypercube, Cazort, Gilliam, Metacomet, Octa-
hedron80, Spellchecker, Colonies Chris, Jahiegel,, LouScheer, Calbaer, EPM, Djcmackay, Michael Ross, Tyrson, Jon
Awbrey, Het, Bidabadi~enwiki, Chungc, SashatoBot, Nick Green, Harryboyles, Sina2, Lachico, Almkglor, Bushsf, Sir Nicholas de Mimsy-
Porpington, FreezBee, Dicklyon, E-Kartoel, Wizard191, Matthew Verey, Isvish, ScottHolden, CapitalR, Gnome (Bot), Tawkerbot2,
Marty39, Daggerstab, CRGreathouse, Thermochap, Ale jrb, Thomas Keyes, Mct mht, Pulkitgrover, Grgarza, Maria Vargas, Roman Chep-
lyaka, Hpalaiya, Vanished User jdksfajlasd, Nearfar, Heidijane, Thijs!bot, N5iln, WikiIT, Headbomb, James086, PoulyM, Edchi, D.H,
Jvstone, HSRT, JAnDbot, BenjaminGittins, RainbowCrane, Jthomp4338, Dricherby, Buettcher, MetsBot, David Eppstein, Pax:Vobiscum,
Logan1939, MartinBot, Tamer ih~enwiki, Sigmundg, Jargon777, Policron, Useight, VolkovBot, Joeoettinger, JohnBlackburne, Jimmaths,
Constant314, Starrymessenger, Kjells, Magmi, AllGloryToTheHypnotoad, Bemba, Lamro, Radagast3, Newbyguesses, SieBot, Ivan tam-
buk, Robert Loring, Masgatotkaca, Junling, Pcontrop, Algorithms, Anchor Link Bot, Melcombe, ClueBot, Fleem, Ammarsakaji, Estirabot,
7&6=thirteen, Oldrubbie, Vegetator, Singularity42, Dziewa, Lambtron, Johnuniq, SoxBot III, HumphreyW, Vanished user uih38riiw4hjlsd,
Mitch Ames, Addbot, Deepmath, Peerc, Eweinber, Sun Ladder, C9900, Blaylockjam10, L.exsteens, Xlasne, LuK3, Egoistorms, Luckas-
bot, Quadrescence, Yobot, TaBOT-zerem, Taxisfolder, Carleas, Twohoos, Cassandra Cathcart, Dbln, Materialscientist, Informationtheory,
Citation bot, Jingluolaodao, Expooz, Raysonik, Xqbot, Ywaz, Isheden, Informationtricks, Dani.gomezdp, PHansen, Masrudin, FrescoBot,
Nageh, Tiramisoo, Sanpitch, Gnomehacker, Pinethicket, Momergil, Jonesey95, SkyMachine, SchreyP, Lotje, Miracle Pen, Vanadiumho,
Kastchei, Djjr, EmausBot, WikitanvirBot, Primefac, Jmencisom, Wikipelli, Dcirovic, Bethnim, Quondum, Henriqueroscoe, Terra Novus,
ClueBot NG, Wcherowi, MelbourneStar, BarrelProof, TimeOfDei, Frietjes, Thepigdog, Pzrq, MrJosiahT, Lawsonstu, Helpful Pixie Bot,
Leopd, BG19bot, Vaulttech, Wiki13, Trevayne08, CitationCleanerBot, Brad7777, Schafer510, BattyBot, David.moreno72, Bankmichael1,
FoCuSandLeArN, SFK2, Jochen Burghardt, Limit-theorem, Dschslava, Phamnhatkhanh, Szzoli, 314Username, Roastliras, Eigentensor,
Comp.arch, SakeUPenn, Logan.dunbar, Leegrc, Prof. Michael Bank, DanBalance, JellydPuppy, Isambard Kingdom, KasparBot, Kk,
Lr0^^k, Mingujizaixin, DerGuteSamariter, Sisu55, Capriciousknees, Bhannel, Fmadd, Dello1234 and Anonymous: 283

9.2 Images
File:Binary_entropy_plot.svg Source: License:
CC-BY-SA-3.0 Contributors: original work by Brona, published on Commons at Image:Binary entropy plot.png. Converted to SVG by
Alessio Damato Original artist: Brona and Alessio Damato

File:Binary_erasure_channel.svg Source: License:

Public domain Contributors: Own work Original artist: David Eppstein
File:Binary_symmetric_channel.svg Source:
License: Public domain Contributors: Own work Original artist: David Eppstein
File:CDSCRATCHES.jpg Source: License: Public do-
main Contributors: English Wikipedia <a href='' class='extiw' title='w:en:
Image:CDSCRATCHES.JPG'>here</a> Original artist: en:user:Jam01
File:Comm_Channel.svg Source: License: Public domain
Contributors: en wikipedia Original artist: Dicklyon
File:Folder_Hexagonal_Icon.svg Source: License: Cc-by-
sa-3.0 Contributors: ? Original artist: ?
File:Internet_map_1024.jpg Source: License: CC BY
2.5 Contributors: Originally from the English Wikipedia; description page is/was here. Original artist: The Opte Project
File:Lock-green.svg Source: License: CC0 Contributors: en:File:
Free-to-read_lock_75.svg Original artist: User:Trappist the monk
File:Nuvola_apps_edu_mathematics_blue-p.svg Source:
mathematics_blue-p.svg License: GPL Contributors: Derivative work from Image:Nuvola apps edu mathematics.png and Image:Nuvola
apps edu mathematics-p.svg Original artist: David Vignoni (original icon); Flamurai (SVG convertion); bayo (color)
File:Portal-puzzle.svg Source: License: Public domain Contributors: ?
Original artist: ?
File:Symbol_template_class.svg Source: License: Public
domain Contributors: ? Original artist: ?

File:Wikiquote-logo.svg Source: License: Public domain

Contributors: Own work Original artist: Rei-artur

9.3 Content license

Creative Commons Attribution-Share Alike 3.0