You are on page 1of 10

The transmission sense of information

Carl T. Bergstrom∗ and Martin Rosvall†


Department of Biology, University of Washington, Seattle, WA 98195-1800

(Dated: June 15, 2009)

Biologists rely heavily on the language of information, coding, and transmission that is common-
place in the field of information theory developed by Claude Shannon, but there is open debate
about whether such language is anything more than facile metaphor. Philosophers of biology have
argued that when biologists talk about information in genes and in evolution, they are not talking
about the sort of information that Shannon’s theory addresses. First, philosophers have suggested
that Shannon’s theory is only useful for developing a shallow notion of correlation, the so-called
“causal sense” of information. Second, they typically argue that in genetics and evolutionary
biology, information language is used in a “semantic sense,” whereas semantics are deliberately
omitted from Shannon’s theory. Neither critique is well-founded. Here we propose an alternative
to the causal and semantic senses of information: a transmission sense of information, in which an
object X conveys information if the function of X is to reduce, by virtue of its sequence properties,
uncertainty on the part of an agent who observes X. The transmission sense not only captures
much of what biologists intend when they talk about information in genes, but also brings Shan-
non’s theory back to the fore. By taking the viewpoint of a communications engineer and focusing
on the decision problem of how information is to be packaged for transport, this approach resolves
several problems that have plagued the information concept in biology, and highlights a number
of important features of the way that information is encoded, stored, and transmitted as genetic
sequence.

I. INTRODUCTION ory is less obvious. Efforts to make this analogy ex-


plicit seem forced at best, and the most successful uses of
Biologists think in terms of information at every level information-theoretic reasoning within the field of genet-
of investigation. Signaling pathways transduce informa- ics rarely make explicit their information-theoretic foun-
tion, cells process information, animal signals convey in- dations or make use of information-theoretic language
formation. Information flows in ecosystems, information (Crick et al., 1957; Felsenstein, 1971; Freeland and Hurst,
is encoded in the DNA, information is carried by nerve 1998; Kimura, 1961).
impulses. In some domains the utility of the information As a consequence, philosophers have concluded that
concept goes unchallenged: when a brain scientist says the mathematical theory of communication pioneered by
that nerves transmit information, nobody balks. But Claude Shannon in 1948 (hereafter “Shannon theory”) is
when geneticists or evolutionary biologists use informa- inadequate to ground the notion of information in ge-
tion language in their day-to-day work, a few biologists netics and evolutionary biology. First, philosophers have
and many philosophers become anxious about whether unfairly suggested that Shannon theory is only useful for
this language can be justified as anything more than developing a shallow notion of correlation, the so-called
facile metaphor (Godfrey-Smith, 2000b, 2008; Griesemer, “causal sense” of information. Second, they typically ar-
2005; Griffiths, 2001; Sterelny, 2000; Sterelny and Grif- gue that in genetics and evolutionary biology, informa-
fiths, 1999). Why do the neurobiologists get a free pass tion language is used in a “semantic sense” — and of
while evolutionary geneticists get called on the carpet? course semantics are deliberately omitted from Shannon
When neurobiologists talk about information, they have theory.
two things going for them. First, there is a straight- Neither critique is well-founded. In this paper we begin
forward analogy between electrical impulses in neural by summarizing the causal and semantic views of infor-
systems and the classic communications theory picture mation. We then propose an alternative — a transmis-
of source, channel, and receiver (Shannon, 1948). Sec- sion sense of information — that not only captures much
ond, information theory has obvious “legs” in neurobi- of what biologists intend when they talk about informa-
ology: for decades, neurobiologists have profitably used tion in genes, but also brings Shannon theory back to the
the theoretical apparatus of information theory to un- fore.
derstand their study systems. Geneticists are not so for-
tunate. For them, the analogy to communication the-
Causal view of information

∗ Electronic Inspired by Dretske (Dretske, 1983), several authors


address: cbergst@u.washington.edu; Also at Santa Fe
Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501; URL: http: (Godfrey-Smith, 2008; Griffiths, 2001; Griffiths and
//octavia.zoology.washington.edu/ Gray, 1994; Sterelny and Griffiths, 1999) have explored
† Electronic address: rosvall@u.washington.edu Shannon theory as a grounding for information language
2

in biology. They derive roughly the following picture: information measures the fact that either way we have
The key currency in information theory is entropy H(X) reduced our uncertainty n-fold.
and the key statistic is mutual information I(X; Y ),
where X and Y are random variables. We have to
properly define H and I in words and equations I(G; P ) I(P ; G)
Causal information
in order to make this thing a self-contained paper
for philosophers. Information is conveyed in Grice’s Semantic information
sense of natural meaning (Grice, 1957): whenever Y is
correlated with X, we can say that Y carries information
about X. There is no deep notion of meaning or cod- Genome Phenotype
ing here. “[W]hen a biologist introduces information in
Environment
this sense to a description of gene action or other pro-
cesses, she is not introducing some new and special kind Figure 1 Information theory restricted to a descriptive statis-
of relation or property”, Godfrey Smith writes, “She is tics for correlations. Information flows in both directions be-
just adopting a particular quantitative framework for de- tween genotype and phenotype, I(P ; G) = I(G; P ), and, ac-
scribing ordinary correlations.” (Godfrey-Smith, 2008). cording to the parity thesis, there is nothing that privileges
In this causal sense of information, genes carry informa- genes over environment. In a semantic notion of genetic in-
tion about phenotypes just as smoke carries information formation, genes represent phenotypes but phenotypes do not
about fire, nothing more. If biologists are using informa- represent genes.
tion only as a shorthand for talking about correlations,
this is a shallow use of the information concept compared
to what we see in engineering and communications the- Worse still, this notion of mutual information fails to
ory! capture the sense of privilege that biologists tend to as-
Not only is this sense shallow, it fails to capture the di- cribe to the informational molecule DNA over other con-
rectional flow of information from genotype to phenotype tributors to phenotype. So far as causal covariance is
that is the central dogma of molecular biology (Crick, concerned, both genes G and environment E influence
1970). If, by “G has information about P,” we mean phenotype P — and in principle we can equally well
only that I(G; P ) > 0, then we are not acknowleding compute either I(G; P ) and I(E; P ). So it seems that
the direction of information from genotype to phenotype Shannon theory has no way of singling out DNA as an
(Godfrey-Smith, 2000b, 2008; Griffiths, 2001). The rea- information-bearing entity.
son is that the mutual information I is by definition a
symmetric quantity; I(G; P ) = I(P ; G). While mutual This criticism is formalized as the parity thesis, and is
information is a key component of the deeper applica- crafted around an important result in information the-
tions of Shannon theory that we will discuss later, for ory that the roles of source and channel conditions are
now let us consider it simply as a statistical quantity. exchangeable (Griffiths and Gray, 1994). Typically when
Mathematically, the amount of information that know- one sits in front of the television, the football broadcast
ing the genotype G provides about the phenotype P is is the signal. The weather, a crow landing on the tele-
always exactly equal to the amount of information that vision antenna, interference from a neighbor’s microwave
knowing the phenotype P provides about the genotype — these are sources of noise, the channel conditions for
G (see Figure 1). our transmission. But a television repairman has an op-
posite view. He doesn’t want to watch the game, he wants
Here it is helpful to consider an example. Suppose that to learn about what is altering the transmission from the
there are nk genotypes in total and k distinct genotypes station. So he tunes your set to a station broadcasting
produce produce each one of the n possible phenotypes. a test pattern. For the repairman, this test pattern pro-
Now it seems surprising that phenotype would tell us as vides channel conditions to read off the signal of how the
much about genotype as genotype can tell us about phe- transmission is being altered. As Sterelny and Griffiths
notype. After all, knowing genotype predicts phenotype point out (1999), “The sender/channel distinction is a
exactly but knowing phenotype leaves us uncertain about fact about our interests, not a fact about the physical
which of k possible genotypes is responsibility. The reso- world.” The parity thesis applies this logic to genes and
lution to this puzzle is that mutual information measures environment. In the parity view, whether it is genome
how much an observation reduces our uncertainty, NOT or environment that carries information must be a fact
how much residual uncertainty we face after making the about our interests, not a fact about the world.
observation. We can see this clearly form our example.
If we observe the phenotype, this reduces the number of The problem with these arguments is that they adopt
possible genotypes from a huge number (nk) to a much a few tools from Shannon theory, but neglect its raison
smaller number (k). If we observe genotype, this reduces d’être: the underlying decision problem of how to pack-
the number of possible phenotypes from n to 1. Mu- age information for transport. Before delving deeper into
tual information doesn’t measure the fact that there are Shannon theory, we will take a brief detour to summarize
k possibilities in one case and only 1 in other; mutual the semantic sense of information in biology.
3

Semantic view of information II. A TRANSMISSION VIEW OF INFORMATION

As we described above, philosophers of biology largely


In addition to the limitations enumerated above, the restrict Shannon theory to a descriptive statistics for cor-
causal view fails to highlight the intentional, represen- relations. This misses the point. At the core of Shannon
tational nature of genes and other biological objects theory is the study of how far mathematical objects such
(Godfrey-Smith, 2008; Griffiths and Gray, 1994; Shea, as sequences and functions can be compressed without
2007; Sterelny and Griffiths, 1999). When biologists talk losing their identity, and if compressed further, how much
about genes as informational molecules, this argument their structure will be distorted. From this foundation in
goes, it is not because they are correlated with other the limits of compression emerges a richly practical the-
things (e.g. amino acid sequence or phenotype), but ory of coding: information theory is a decision theory of
rather because they represent other things. This semantic how to package information for transport, efficiently. It
sense of information in which “genes semantically specify is a theory about the structure of those sequences that
their normal products” (Godfrey-Smith, 2007) cannot be efficiently transmit information. And it is a theory about
captured using Shannon theory, which is by design silent the fundamental limits with which that information can
on semantic matters. be transmitted (Cover and Thomas, 2006; Pierce, 1980;
Shannon and Weaver, 1949; Yeung, 2002).
But what is it that genes are supposed to represent? This decision-theoretic view of Shannon theory is miss-
Much of the conventional language of molecular biology ing from the discussions of information in biology. While
suggests phenotypes as an obvious candidate, and this is Shannon theory is no panacea for geneticists and evolu-
the approach that Maynard Smith takes in a target ar- tionary biologists (Shannon, 1956), we will argue that it
ticle that triggered much of the recent debate over the provides a justification for information language as ap-
information concept in biology (Maynard Smith, 2000). plied to genes, and that it also resolves the apparent and
At first glance, this view has several things to recom- unappealing symmetries of (1) mutual information be-
mend it. A semantic notion of genetic information cap- tween genes and phenotype, and the parity thesis.
tures the directionality discussed above: genes represent In the original formulation of Shannon theory, informa-
phenotypes but phenotypes do not represent genes (see tion is what an agent packages for transmission through
Figure 1). One could also try to argue that the semantic a channel to reduce uncertainty on the part of a re-
view privileges genes in that we can say that genes have a ceiver. This information is physically instantiated and
representational message about phenotype, but environ- spatiotemporally bounded. Thus, as Lloyd and Penfield
ment does not. Finally, it allows for misrepresentation (2003) note, information can be sent either from one place
or false representation, whereas causal information does to another, or from one time to another2 . Usually when
not (Griffiths, 2001)1 . we talk about sending information from one place to an-
other, we posit two separate actors, one of whom sends
Does this mean that the problem is solved? No — a message that the other receives; when we talk about
Griffiths (2001) and Godfrey-Smith (2008) argue that se- sending information from one time to another, we posit
mantic information remains vulnerable to the parity the- a single agent who stores information that she herself can
sis. Moreover, Godfrey-Smith (1999,2008) and Griffiths later retrieve. But whether the message goes across space
(2001) note that the reach of the semantic information or time, whether there are one or two agents involved,
concept within genetics is very shallow: legitimate talk whether we use the language of signal transmission or
of semantic representation can go no further than genes the language of data storage, mathematically these are
coding for amino acids. Beyond this point, the mapping exactly the same process. Thus in practice, when you
from genotype forward is context-dependent and hope- package information and then send it either across the
lessly entangled in a mass of causal interactions. Thus, space dimension as a signal or across the time dimension
these authors conclude, the relation from genes to phe- via storage and retrieval, you are transmitting informa-
notype cannot be a representational one. Accordingly, tion.
it seems as if the semantic view of information has been Think about what happens when you send a message
pushed as far as it will go, and yet we are left without a to your friend by burning a compact disc. Your computer
fully satisfactory account of the information concept in encodes a message onto the digital medium. You send the
biology. Let us therefore return to Shannon’s information medium through a channel (e.g. the postal service). Your
theory, but move beyond the causal sense. friend, the receiver, puts the CD into her computer, the
computer decodes the message, and she hears the sweet

1 We think that Stegmann handles the misrepresentation issue


even more cleanly with a shallow semantic notion of genes as 2 In practice, it takes time to send information from one place
conveying instructional, as opposed to representational, content to another, but the conventional Shannon framework suppresses
(Stegmann, 2004). this time dimension.
4

strains of Rick Astley. But it doesn’t matter that you information as it flows through the process of intergener-
sent the disc through the mail – all of the mathematical ational genetic transmission 4 . And once we do that, we
operations that underly the information encoding are the can start to think about natural selection, the evolution-
same whether you send the CD to a friend or save it ary process, and how information gets into the genome
for your own later use. To cross space or time, we can in the first place (Szathmary and Maynard Smith, 1995).
encode the same way. Indeed, we use the same error-
correcting codes for storage and retrieval on CDs as we Genome Phenotype

Transmission of information
do for sending digital images from deep space back to in ego
Earth (Cipra, 1993). Environment
Taking this view of information and transmission, let
us return to the proposed schematic of biological infor- Genome Phenotype
mation in Figure 1. This picture has neither a space in daughter

dimension or a time dimension; information is not being Environment


sent anywhere. Here we simply have a correlation (if one
takes a causal view) or a translation (if one takes a seman- Genome Phenotype
in granddaughter
tic view). Thus within the actual Shannon framework,
Environment
Figure 1 simply illustrates a decoder. Likewise, notice
that the biological processes underlying the schematic in Figure 2 In biology, genetic transmission occurs vertically
Figure 1 are not the processes that biologists refer to (from parent to offspring to grandoffspring). It is upon this
when they talk about transmission. In biology, trans- axis that the transmission sense of information focuses.
mission genetics is the study of inheritance systems, not
the study of transcription and translation, and genetic
transmission is the passing of genes from one generation
to another, not the passing of information from genotype
to phenotype. Symmetry of mutual information
In life and in evolution, the transmission of informa-
tion goes from generation to generation to generation as By viewing Shannon information as a result of a de-
in Figure 2. Here is the transmission; we know that genes cision process instead of as a correlation measure, we
are transmitted from parent to offspring in order to pro- can resolve the concern that Godfrey-Smith raises about
vide the offspring with information about how to make a the bidirectional flow of information in Shannon the-
living (e.g. metabolize sugars, create cell walls, etc.) in ory. Godfrey-Smith dismisses the causal sense of infor-
the world. This suggests that we can make sense of a mation because “information in the Shannon sense ‘flows’
large fraction of the use of information language in biol- in both directions, as it involves no more than learning
ogy if we adopt a transmission view of information3 . about the state of one variable by attending to another”
(Godfrey-Smith, 2008). Here Godfrey-Smith is referring
to the symmetry of the mutual information measure:
Transmission view of information: I(X; Y ) = I(Y ; X) (see Figure 1).
An object X conveys information if the func- What is the mutual information actually measuring
tion of X is to reduce, by virtue of its se- when we apply this equality in a communication con-
quence properties, uncertainty on the part of text? An example helps. Peter and Paul are con-
an agent who observes X. cerned about the state of the world W . Suppose that
Peter observes a correlate X of the random variable
W . We want him to communicate his observation to
As with many aspects of science, the tools and lan- Paul, and he does so using a signal Y . The mutual
guage that we use have a strong influence on the questions information I(X; Y ) tells us how effectively he conveys
that we think to ask — and once we shift to the transmis- what he sees, on average, given the statistical distribu-
sion sense of information, our focus changes. When we tion of possible X values, the properties of the chan-
view biological information as a semantic relationship, we nel across which the signal is sent, etc. Specifically,
are drawn to think like developmental biologists, about I(X; Y ) = H(X) − H(X|Y ) measures how much Paul
how information goes from an encoded form in the geno- learns by knowing Y about what Peter saw, X, again
type to its expression in the phenotype. But when we talk on average. Because I(X; Y ) = H(X) − H(X|Y ) =
about the transmission sense of information, we step out H(X) + H(Y ) − H(X, Y ) = H(Y ) − H(Y |X) = I(Y ; X),
in an orthogonal direction (Figure 2) and we can now see Peter knows exactly as much about what Paul learns as

3 A message need not be composed of multiple characters to meet 4 Though see Shea (2007) for how a semantic view of informa-
this definition. Even a string of length one is a sequence; thus tion need not be incompatible with a focus on intergenerational
even a single character conveys information. processes such as evolution by natural selection.
5

Paul learns about what Peter saw. But I(Y ; X) is usu- (1) encode lots of sequence information in a small space,
ally irrelevant when we think about the decision problem (2) be incredibly easy to replicate, (3) be arbitrarily and
of communicating. In this context we want Peter to get infinitely extensible in what it can say, and (4) be struc-
a message about the world to Paul, and we rarely care turally very stable and inert (Lewontin, 1992). In fact,
how much Peter knows afterwards about what Paul has DNA is perhaps the most impressive known substance
learned. with respect to (1) and (2). No machine can look at a
This directionality is manifested within Shannon the- protein and run off a copy; DNA is exquisitely adapted so
ory by the data processing inequality (Figure 3) (Cover that a relatively simple machine, the DNA polymerase,
and Thomas, 2006; Yeung, 2002). This theorem states does this at great speed and with high fidelity. Think
that the act of processing data, deterministically or about how amazing it is that PCR works. It is as if
stochastically, can never generate additional information. you could throw a hard drive in a water bath with a few
A corollary pertains to communication: along a commu- enzymes and a few raw materials, run the temperature
nication chain, information about the original source can through a few cycles, and pull out millions of identical
be lost but never gained. In the scenario described above, hard drives. DNA practically screams, “I am for storage
both the observation step W → X and the communi- and transmission!”
cation step X → Y are steps in a Markov chain. For One might object that Figure 2 conveys an over-
any Markov chain W → X → Y , the data processing simplified view of the world. This is true. A more so-
inequality states that I(W ; X) ≥ I(W ; Y ). In our exam- phisticated view of the evolutionary process allows for
ple, the data processing inequality reveals that commu- additional channels of intergenerational transmission and
nication between Peter and Paul is not symmetric. Paul information flow: environments can be constructed and
may know as much about what Peter sent as Peter knows inherited (Odling-Smee et al., 2003). Non-genetic bio-
about what Paul received, but Paul does not in general logical structures such as membranes and centrosomes
know as much about what matters — the state of the are inherited (Griffiths and Knight, 1998) and can even
world — as Peter does. Shannon theory is not symmet- be argued to carry some information (Griesemer, 2005).
ric with respect to the direction of communication. Methylation provides an extensive layer of markup on top
of nucleic acid sequence. Developmental switches actively
I(X; Y ) = I(Y ; X) transduce environmental information into epigenetically
heritable forms (Griffiths, 2001).
W −→ X −→ Y But such an objection misses our point. Our aim with
the transmission sense of information is not to single out
uniquely the genes as having some special property that
we deny to all other biological structures, but rather to
I(W ; X) ≥ I(W ; Y ) identify those components of biological systems that have
information storage, transmission, and retrieval as their
Figure 3 Despite the symmetry of the mutual information primary function. Methylation tags are obvious mem-
I(X; Y ) = I(Y ; X), the data processing inequality reveals the bers of this information-bearing class: they carry infor-
directional flow of information in Shannon’s scheme.
mation across generations in the transmission sense, and
this appears to be their primary function. Extrinsic fea-
tures of the environment such as ambient temperature
are obviously not members of this class: they are not
The parity thesis transmitted across generations, they carry information
only in the causal sense and information transmission is
In the first part of this paper, we described the parity not their role under any reasonable teleofunctional ex-
thesis. While there is a good case to be made for par- planation. Biological structures such as membranes and
ity between genes and environment when we restrict our centrosomes may appear as some sort of middle ground,
view to the horizontal development of phenotype from but we note that (1) their primary function is not an
genotype, that parity is shattered when we look along informational one, and (2) their bandwidth is extremely
the vertical axis of intergenerational transmission. restricted compared to that of DNA sequence. Birds’
Look at Figure 2, which corresponds to a neo- nests (Sterelny et al., 1996) could be seen as an environ-
Darwinian view of evolution. In this model of the bio- mental analogue to these intracellular structures, whereas
logical world, the transmission concept cleanly separates libraries start to push toward genes and methylation tags
genes from environments. The former are transmitted in their informational capacity. Developmental switches
across generations, the latter are not. Moreover, tak- (Griffiths, 2001) are another interesting case; these trans-
ing a teleofunctional view as Sterelny et al. (1996) do duce environmental information but, in addition to their
for replicators in general, the hypothesis that genes are bandwidth limitations, they appear to have a more lim-
for transmission across generations is richly supported by ited intergenerational transmission role. Genes may not
the physical structure of the DNA. Genes are made out be unique in their ability to convey information across
of DNA, a molecule that is exquisitely fashioned so as to generations — but at the same time a transmission view
6

makes it clear that not all components of the develop- Causal information versus transmitted information
mental matrix (Griffiths, 2001; Griffiths and Gray, 1994)
enjoy parity in their informational capacities. One motivation for replacing causal sense-views of in-
The parity thesis typically is linked to Shannon’s in- formation with semantic-sense views is that the causal
formation theory via the claim that “The source/channel sense of information appears to cast too broad of a net.
distinction is imposed on a natural causal system by the Any physical system with correlations among its com-
observer.” (Griffiths 2001, p.398) What is signal, and ponents carries causal information — but in their use of
what is noise — Sterelny and Griffiths (1999) take this the information concept, biologists appear to mean some-
to be merely a reflection of our interests. So must we im- thing stronger than the notion of natural meaning that
pose our own notions of what makes an appropriate ref- has smoke in the sky carrying information about a fire
erence frame in order to single out certain components of below (Godfrey-Smith, 2008). If we substitute a trans-
the developmental matrix as signal and others as noise? mission view of information for a semantic view, will we
If we view life on earth as the outcome of evolution by be driven back to this overly-broad notion of informa-
natural selection, the answer is no. We are not the ones tion? Not at all. Like naturalized views of semantics,
who pick the reference frame, natural selection is. Be- the transmission notion of information rests upon func-
cause natural selection operates on heritable variation, tion: to say that X carries information, we require that
it acts upon some components of the developmental ma- the function of X be to reduce uncertainty on the part of
trix differently than others5 . For biologists, therefore, the a receiver.
source-channel distinction is imposed not by the observer The failure to consider function when talking about in-
but rather by the process of natural selection from which formation sometimes generates confusion among practic-
life arose and diversified. ing biologists. After all, there are correlations everywhere
in biological systems; measuring them is what we do as
G P biologists, and we often talk about these correlations as
information. This language is understandable; indeed,
Transmission and acquisition of
information by natural selection

these correlations provide us with information about bi-


E
ological systems. But this is merely causal-sense informa-
G P tion (Godfrey-Smith, 1999). As Godfrey-Smith explains,
G P when a systematist uses gene sequences to make infer-
ences about population history, “there is no more to this
E
E
kind of information than correlation or ‘natural mean-
ing’; the genes are not trying to tell us about their past.”
G P (Godfrey-Smith, 1999). In other words, these correla-
G P tions do not convey teleosemantic information. Nor can
E these correlations be considered information in the trans-
E mission sense.
To expand upon this distinction, an example from pop-
Figure 4 Transmission and natural selection. With the parent ulation genetics is helpful. Voight and colleagues (Voight
sending variant messages to each offspring and natural selec- et al., 2006) developed a method for inferring positive se-
tion acting on the phenotype, information can accumulate in
lection at polymorphic loci in the human genome. Their
the genome.
key insight is that, with enough sequence data from suf-
ficiently many members of the population, we can pick
To better understand the role of natural selection, it out regions of the genome that have unusually long hap-
helps to expand Figure 2 somewhat. (For simplicity we lotypes of low diversity. Such extended haplotype blocks
retain our focus on the genes as transmitted elements, tend to surround an allele that has recently risen in fre-
but one could extend this to include other heritable struc- quency due to strong selection, because there has not
tures). In Figure 2, we highlight the fact that it is the been enough time for recombination to break down the
genes, and not the environment, that are transmitted association between the favored allele and the genetic
from generation to generation. In Figure 4, we highlight background in which it arose. Using this method, Voight
the fact that not all genes are transmitted to the next and colleagues find strong evidence for recent selection
generation. It is by the means of variation in the genes among Europeans in the lactase gene LCT, which is im-
and selection on the phenotypes with which they are cor- portant for metabolism of lactose beyond early child-
related that information can built up in the genome over hood. The favored allele results from a single nucleotide
time (Felsenstein, 1971). change 14 kb upstream of the lactase gene on the nearly
1 Mb haplotype. Positive selection has presumably oc-
curred because the ability to process lactase throughout
life became advantageous with the invention of animal
5 Similarly, Shea (2007) uses this fact to derive teleosemantic agriculture approximately 10,000 years ago.
meaning in his account of biological information What does this have to do with information? There
7

is information about the history of selection in the cell recordings of the spike train from the H1 visual neu-
population-level correlations. Voight et al. found an ex- ron (de Ruyter van Steveninck et al., 1997). This neuron
tended haplotype length surrounding the LCT+ allele rel- is sensitive to movement, but it is not known how move-
ative to that around the LCT- allele. But notice that we ment information is encoded into the spike train, nor
have to observe the genotypes of multiple individuals in even what aspects of movement are being represented.
order to determine that one allele at the LCT locus is Nonetheless, de Ruyter van Steveninck and colleagues
surrounded by longer haplotype blocks than is the other. were able to determine how much information this neu-
Once we have made observations of multiple genomes, we ron is able to encode. The investigators exposed a fly to
as external observers can conclude something about the the stimulus, and measured the (average) entropy of the
history of selection on the population. But this informa- spike train. This is the so-called total entropy for the
tion is not available at the level of a single individual. neuron’s output. They then looked at what happens if
A single individual cannot look at its own genome and you play the same stimulus back repeatedly: how much
notice a longer (or shorter) haplotype block around any does the resultant spike train vary from previous trials?
given focal locus — these haplotype blocks are defined This is the so-called noise entropy. The information that
with respect to the genotypes of others in the population. the spike train carries about the stimulus, i.e., the mutual
A single individual can only look at its own genome and information between spike train and stimulus, is simply
see a sequence of base pairs. This sequence of base pairs the difference in these two quantities. Using this ap-
is what is transmitted; it is what has the function of re- proach, the authors were able to show that this single
ducing uncertainty on the part of the agent who observes insect neuron conveys approximately 80 bits of informa-
it6 . These individual gene sequences are the entities that tion per second about a dynamic stimulus. Thus an indi-
have an informational function in biology (though bioin- vidual visual neuron achieves a bit-rate that is roughly 7
formaticians have not always recognized this distinction times the bit rate of a skilled touch typist8 ! More impor-
(Adami, 2002)). The population-level statistics that ge- tantly, the researchers were able to compare the response
neticists use to infer history are informative, but they of this neuron to static stimuli with the response of the
are not information in the transmission sense. They are neuron to natural patterns of motion. They found that,
merely the smoke that is cast off by the fire of natural for natural patterns, the neuron is able to attain the high
selection. bandwidth that it does by “establishing precise temporal
relations between individual action potentials and events
in the sensory stimulus.” By doing so, the neuron’s re-
Coding without appeal to semantics sponse to the dynamic stimulus greatly surpasses the bit
rate that could be obtained if the stimulus were encoded
The transmission sense of information allows us to sep- by a simple matching of spike rate to stimulus intensity.
arate claims about how information is transmitted from Subsequent investigators have used related methods to
claims about what information means7 . Indeed, we can show that evolved sensory systems are tuned to natural
study how information is transmitted without having any stimuli, to study the properties of neural adaptation and
knowledge of the“codebook”for how to interpret the mes- history dependence, and to examine temporal sensitivity
sage, or even what the information represents. — all without knowing the way in which the signals that
In many biological studies, we are in exactly this posi- they study are actually encoded.
tion. Again an example — this time drawn from neurobi- de Ruyter van Steveninck et al. sum up the power of
ology — is helpful. In a study of the fly visual system, de being able to study information without appeal to seman-
Ruyter van Steveninck and colleagues presented flies with tics: “This characterization of ... information transmis-
a moving grating as a visual stimulus, and made single- sion is independent of any assumptions about [or knowl-
edge of!] which features of the stimulus are being encoded
or about which features of the spike train are most im-
portant in the code” (de Ruyter van Steveninck et al.,
6 Although causal-sense information is transmitted from the popu- 1997).
lation at time t to the population at time t+1 in the population This brings us back to Shannon theory. When informa-
frequencies of haplotypes, this is not transmission-sense infor- tion theorists think about coding, they are not thinking
mation because the function of these population-level haplotype
assemblages is not to reduce uncertainty on the part of future
about semantic properties. All of the semantic proper-
populations. ties are stuffed into the codebook, the interface between
7 The source-channel separation theorem (Cover and Thomas source structure and channel structure, which to infor-
2006, Chapter 7, p.218) proves that in any physical communica- mation theorists is as interesting as a phonebook is to
tion system for error-free transmission over a noisy channel, one
can entirely decouple the process of tuning the code to the nature
of the specific channel from not only the semantic reference of the
signal but, indeed, from all statistics of the message source. This
follows because the theorem states that one can achieve channel 8 Using Shannon’s 1950 upper bound on bits per letter and his
capacity with separate source and channel coders — and in this estimate of letters per word in the English language (Shan-
setup, the source coder can always be configured so as to return non, 1950), we can estimate the bit rate of a touch typist as
120 words 1 minute 4.5 letters 1.3 bits
output that maximizes the entropy given the symbol set. minute 60 seconds word letter
= 11.7 bits/second.
8

sociologists. When an information theorist says “Tell me sifications that fascinated Charles Pierce, but rather to
how data stream A codes for message set B,” she is not solve the sort of decision problem that motivated Claude
asking you to read her the codebook. She is asking you to Shannon. From the symbolic relation between code and
tell her about compression, channel capacity, distortion product, there arise the degrees of freedom that a com-
structure, redundancy, and so forth. That is, she wants munication engineer requires to tune the code to the sta-
to know how the structure of the code reflects the statis- tistical properties of source and channel. To see how this
tical properties of the data source and the channel with works we will visit an example from the early history of
respect to the decision problem of effectively packaging telecommunications.
information for transport. For over one hundred years, Morse code was the stan-
With these things in focus, we can now look at the dard protocol for telegraph and radio communication.
concept of arbitrariness, what it means, and why this The code transcribes the English alphabet into code-
concept is critically important in biological coding. words composed of short pulses called dots and long
pulses called dashes. For example, the letter E is rep-
resented by a single dot “.”, the letter T is represented
III. INFORMATION THEORY AND ARBITRARINESS by a single dash “-”, the letter Q by the quartet “--.-”,
and the letter J by “.---”. At first glance, the mapping
In arguing that DNA is an informational molecule, between letters and Morse codewords appears to be arbi-
Maynard Smith (Maynard Smith, 2000) appeals to trary. They are certainly symbolic rather than iconic or
Jacques Monod’s concept of gratuité (Monod, 1971), and indexical. But there is an important pattern to the way
a number of additional authors have further explored this that letters are assigned codes in Morse code.
thread (Godfrey-Smith, 2000a,b; Stegmann, 2005). For But instead of assigning codewords sequentially (“.”
Monod, gratuité was an important component of the log- to A, “-” to B, “..” to C) Samuel Morse exploited the
ical structure of his theory of gene regulation. Gratuité degrees of freedom available for codeword assignments
is the notion that, in principle, regulatory proteins can to make an efficent code for fast transmission of English
cause any inducer or repressor to influence the expression sentences. He could not assign short code words to every
of any region of DNA. There need be no direct chemical single letter — there simply are not enough short code
relation between the structure of an inducer and the nu- words to go around. Instead, by assigning the shortest
cleic acid sequence on which it operates. Maynard Smith code words to the most commonly used letters, Morse
observes that we can see something like gratuité in the created a code in which transmissions would on average
structure of the genetic code as well: there is an arbi- be shorter than if he had used sequential codeword as-
trary association between codons and the amino acids signments.
that they specify. Morse could not have done this with pictograms. The
Yet as they grapple with this idea of an arbitrary code, leeway to associate any message with any code word pro-
these authors confront the fact that the genetic code vides the communications engineer with the degrees of
is not a random assignment of codons to amino acids, freedom that he needs to tune the semantic and statisti-
but rather a one-in-a-million evolved schema for asso- cal properties of the source messages to the transmission
ciating these molecules: the genetic code is structured cost and error properties of the channel. In the absence
so as to smooth the mutational landscape (Sonneborn, of a full picture of the engineer’s decision problem, the
1965) and ensure that common translational errors gen- code might look arbitrary. But a well-chosen code is not
erate amino acid replacements between chemically sim- arbitrary at all; it solves a decision problem for packag-
ilar amino acids (Freeland and Hurst, 1998; Haig and ing.
Hurst, 1991; Woese, 1965). So what can these authors Morse exploited the available degrees of freedom in his
mean when they say that the code is arbitrary? May- choice of codes; apparently, natural selection has done the
nard Smith, Godfrey-Smith, and others are not suggest- same in evolving a one-in-a-million genetic code. Think-
ing that the structure of the code is random or contingent ing about these degrees of freedom — along with an im-
on random historical processes, as in Crick’s frozen ac- portant thought experiment — helped us to understand
cident hypothesis. Rather, they are making a semiotic the role of coding in the transmission sense of informa-
claim. Arbitrariness refers to the fact that “[m]olecular tion. Godfrey-Smith (2000b) imagined a hypothetical
symbols in biology are symbolic,” as opposed to indexi- world composed of “protein genes” as a way to explore
cal or iconic (Maynard Smith, 2000). In the case of the the importance of the coding concept in biology. Since
genetic code, this means that the association between a the idea of coding presumably refers to the fact that
codon and its corresponding amino acid is not driven by DNA provides an arbitrary combinatorial representation
the immediate steric interactions between the codon and of amino acid sequence, Godfrey-Smith considers the na-
the amino acid, but instead is mediated by an extensive ture of a world in which the hereditary material was nei-
tRNA structure that in principle could have coupled this ther arbitrary nor representational. He images a world
codon to any other amino acid instead. of “protein-genes” in which there is no translation, but
This fact is enormously important to the function of instead a copying system in which amino acid sequences,
the biological code — not as a matter of the semiotic clas- assisted by coupling molecules, replicate using previ-
9

ous amino acid sequences as templates. In that world, positive correlation both between synonymous codon bias
Godfrey-Smith argues, there is nothing that corresponds and gene expression level and between tRNA abundance
to coding, and yet stepping away from the microscope, and codon usage in S. cerevisiae and E. coli (Bennet-
biology functions much as before. In Godfrey-Smith’s zen and Hall, 1982; Ikemura, 1981; Sharp and Li). While
protein-sample world, there is no code, no compression, the underlying mechanisms for the correlations are not
no redundancy, and information theory can merely be fully understood, simple models of mutation and selec-
applied as a descriptive statistics for correlations. One tion can explain much of the observed variance (Bulmer,
could even go so far as to argue that in a protein-genes 1987; Knight et al., 2001).
world, physical structure and not information is inher- As another intriguing example, Mitarai et al. (2008)
ited across generations. Thus, Godfrey-Smith concludes have found that the choice of synonymous codons along
that “Removing genetic coding from the world need not the length of a gene operates to prevent “traffic conges-
change much else, and this gives support to my claim tion” by ribosomes moving down the mRNA, potentially
that we should only think of coding as part of an expla- increasing expression rates and minimizing incomplete
nation of how cells achieve the specific task of putting translation of proteins. This works as follows. Some
amino acids in the right order,” rather than something codons are translated faster than others. For instance,
fundamental to the logical structure of biology. Sørensen and Pedersen (1991) have shown that the dif-
But there is a critical difference between a world with ference in translation rates between the two synonymous
a proper genetic code and a world based upon protein- glutamate codons GAA and GAG is threefold. Just as
genes: only the former allows an arbitrary combinatori- the relative position of fast and slow regions of high-
cal mapping between templates and products. From an way influence the rate at which traffic can travel and
information-theoretic perspective, this is absolutely crit- the amount of traffic congestion that occurs, the relative
ical. The degrees of freedom to construct an arbitrary positions of rapidly and slowly translated codons will in-
mapping of this sort turn the problem of code evolution fluence the rate at which ribosomes can move along an
from one of passing physical samples to the next genera- mRNA and the amount of congestion that they experi-
tion into a decision-theoretic problem of how to package ence. If rapidly translated codons were to occur early in
information for transport. In the protein-genes world, a gene and slowly translated ones were to occur late, nu-
the fidelity of transmission is at the mercy of the bio- merous ribosomes could load on the mRNA, race through
chemical technology for copying. There are no degrees of the early part of the gene, but then back up against one
freedom for structuring redundancy, minimizing distor- another when the slower codons were reached toward the
tion, or conducting the other optimization activities of gene’s end. At best this would cause congestion and slow-
a communications engineer. In a DNA-based code, the down; at worst, incomplete translation as the blocked ri-
chemically arbitrary assignments of nucleotide triplets to bosomes disengage of the mRNA. If instead slowly trans-
amino acids via tRNAs offer the degrees of freedom to do lated codons occur early and rapidly translated ones oc-
all of the above. While the precise dynamics of code evo- cur late, more ribosomes can pass along the mRNA in
lution remain unknown, it appears that natural selection a given time window, and less congestion arises. Look-
has put these degrees of freedom to good use. ing at highly-expressed genes in E. coli, Mitarai et al.
For example, during the early evolution of the genetic (2008) have found evidence of exactly this pattern; there
code when the translation mechanism was highly inaccu- appears to have been selection for using slower synony-
rate, there was not only selection to minimize the effect mous codons in the beginning of these highly expressed
of misreads (Woese, 1965), but also selection pressure to genes.
minimize the effect of frame-shift errors. Frame-shift er- The difference between a protein-genes world and a
rors waste resources by generating potentially toxic non- DNA-based world became clear by taking the perspec-
sense polypeptides. There is therefore a fitness advan- tive of a communications engineer. Throughout the pa-
tage to codes that quickly terminate after a frame shift per, this has been our approach. Correlations, symmetry
by reaching a stop codon (Seligmann and Pollock, 2004). of mutual information, the parity thesis, arbitrariness,
Itzkovitz and Alon (2007) have shown that of the 1152 coding — all of these come into focus from a communi-
alternative codes that are equivalent to the real code with cations viewpoint. We see what makes the genetic code
respect to translational misreads (the real code with inde- a code, and we get a new perspective on the information
pendent permutations of the nucleotides and wobble-base language that is part of the everyday working vocabulary
constraint), only 8 codes encounter a stop codon earlier of researchers in genetics and evolutionary biology. The
after a frame-shift than does the real code. Because in transmission sense of information justifies such language
the real code, stop codons overlap with common codons, as more than shallow metaphor.
the length of nonsense peptides are on average 8 codons
shorter than in alternative codes.
References
Even with the genetic code fixed in its present form,
natural selection can still make use of the degrees of free- Adami, C., 2002, BioEssays 24, 1085.
dom that are available when selecting which of several Bennetzen, J., and B. Hall, 1982, Journal of Biological Chem-
synonymous codons to use. For example, there is strong istry 257(6), 3026.
10

Bulmer, M., 1987. Lewontin, R. C., 1992, New York Review of Books 39(10).
Cipra, B. A., 1993, SIAM News 26(1). Lloyd, S., and P. Penfield, 2003, Information and entropy:
Cover, T. M., and J. A. Thomas, 2006, Elements of Infor- Mit open courseware.
mation Theory (John Wiley and Sons, New York), second Maynard Smith, J., 2000, Philosophy of Science 67(2), 177,
edition. ISSN 00318248.
Crick, F. H. C., 1970, Nature 227, 561. Mitarai, N., K. Sneppen, and S. Pedersen, 2008, Journal of
Crick, F. H. C., J. S. Griffith, and L. E. Orgel, 1957, Proceed- Molecular Biology .
ings of the National Academy of Sciences USA 43, 416. Monod, J., 1971, Chance and Necessity (Collins, London).
de Ruyter van Steveninck, R. R., G. D. Lewen, S. P. Strong, Odling-Smee, F. J., K. N. Leland, and M. W. Feldman, 2003,
R. Koberle, and W. Bialek, 1997, Science 275, 1805 . Niche Construction (Princeton University Press).
Dretske, F. I., 1983, Knowledge and the Flow of Information Pierce, J. R., 1980, An Introduction to Information Theory
(MIT Press). (Dover).
Felsenstein, J., 1971, American Naturalist 105, 1. Seligmann, H., and D. Pollock, 2004, DNA and cell biology
Freeland, S. J., and L. D. Hurst, 1998, Journal of Molecular 23(10), 701.
Evolution 47, 238. Shannon, C. E., 1948, A Mathematical Theory of Communi-
Godfrey-Smith, P., 1999, in Where Biology Meets Psychology: cation, volume 27 (Bell System Technical Journal).
Philosophical Essays, edited by V. Hardcastle (MIT Press, Shannon, C. E., 1950, Bell Systems Technical Journal 30, 50.
Cambridge), pp. 305–331. Shannon, C. E., 1956, IEEE Transactions Information Theory
Godfrey-Smith, P., 2000a, Philosophy of Science 67(2), 202, 2.
ISSN 00318248. Shannon, C. E., and W. Weaver, 1949, The Mathematical
Godfrey-Smith, P., 2000b, Philosophy of Science 67, 26. Theory of Communication (University of Illinois Press, Ur-
Godfrey-Smith, P., 2007, in Stanford Encyclopedia of Philos- bana).
ophy, edited by E. N. Zalta (Center for the Study of Lan- Sharp, P., and W. Li, ????, Nucleic acids research 15(3), 1281.
guage and Information, Stanford University). Shea, N., 2007, Biology and Philosophy 22, 313.
Godfrey-Smith, P., 2008, Information in Biology in D. L. Hull Sonneborn, T. M., 1965, Degeneracy of the genetic code: ex-
and M. Ruse (eds.) The Philosophy of Biology (Cambridge tent, nature and genetic implications in V. Bryson, and
University Press, Cambridge), chapter 6, pp. 103–119. J. H. Vogel (eds.) Evolving genes and proteins. (Academic
Grice, H. P., 1957, Philosophical Review 66, 377. Press, Inc., New York, N.Y.), pp. 377–397.
Griesemer, J. R., 2005, The Informational Gene and the Sub- Sørensen, M., and S. Pedersen, 1991, Journal of molecular
stantial Body: On the Generalization of Evolutionary The- biology 222(2), 265.
ory by Abstraction in M. R. Jones and N. Cartwright (eds.), Stegmann, U. E., 2004, Biology and Philosophy 19, 205.
Idealization XII: Correcting the Model, Idealization and Ab- Stegmann, U. E., 2005, Philosophy of Science 72, 425.
straction in the Sciences (Rodopi, Amsterdam), pp. 59– Sterelny, K., 2000, Philosophy of Science 67, 195.
115. Sterelny, K., and P. E. Griffiths, 1999, Sex and Death: An In-
Griffiths, P. E., 2001, Philosophy of Science 68, 394. troduction to Philosophy of Biology (University of Chicago
Griffiths, P. E., and R. D. Gray, 1994, Journal of Philosophy Press).
XCI, 277. Sterelny, K., K. C. Smoth, and M. Dickison, 1996, Biology
Griffiths, P. E., and R. D. Knight, 1998, Philosophy of Science and Philosophy 11, 377.
65, 276. Szathmary, E., and J. Maynard Smith, 1995, Nature 374,
Haig, D., and L. D. Hurst, 1991, Journal of Molecular Evolu- 227.
tion , 412. Voight, B. F., S. Kudaravalli, X. Wen, and J. K. Pritchard,
Ikemura, T., 1981, Journal of molecular biology 151(3), 389. 2006, PLoS Biol 4, 446.
Itzkovitz, S., and U. Alon, 2007, Genome research 17(4), 405. Woese, C. R., 1965, Proc Natl Acad Sci 54, 1546.
Kimura, M., 1961, Genetical Research 2, 127. Yeung, R. W., 2002, A First Course in Information Theory
Knight, R., S. Freeland, and L. Landweber, 2001, Genome (Springer).
Biology 2(4).

You might also like