You are on page 1of 100

!

87

When we look at the genetic formulation, however, we see a different story. The covariance
term, Cov (w , g ), is still insensitive to the transmission aspect of heredity, since it concerns
only the covariance between fitness and breeding value within the ancestor-set.
Importantly, however, it does incorporate the effects of the variance aspect of heritability,
since it is sensitive to the narrow-sense heritability of z: for a given value of Cov (w , z ) ,
lower narrow-sense heritability implies a lower value of Cov (w , g ) .9 The phenotypic and
genetic versions of the Price equation thus agree regarding the term to which they assign
the transmission aspect of heredity, but they differ regarding to the term to which they
assign the variance aspect.

The two formulations therefore differ substantively in the way they partition evolutionary
change. Is there any reason to prefer one to the other? In some contexts, the phenotypic
formulation will be more useful. This is notably true in cases in which a character is
transmitted by wholly non-genetic means (e.g., a cultural variant that is uncorrelated with
any allele). In such cases, the breeding value for that character will not even be well
defined: the breeding value is the phenotypic value as predicted by relevant (i.e., correlated)
alleles, and this notion has no meaning if there are no such alleles. We can use the Price
equation to describe the evolution of such characters (for the cultural case, see Henrich
and Boyd 2001; Henrich 2004), but in doing so we must employ the phenotypic
formulation, not the genetic formulation.10

For the study of social evolution in non-human societies, however, the genetic formulation
can be particularly useful. This is because it allows us to take account of the evolutionarily
salient correlations that lurk below the surface when genes are differentially expressed. It
is a truism that the phenotype of any given individual depends not merely on its genes,

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
9
( ) ( )
In a nutshell, this is because Cov w , g = β gz Cov w , z in many cases (see Queller 1992a).
10 We may still be able to define a useful analogue of the breeding value, where allelic predictors are
replaced by whatever transmissible predictors are relevant to the phenotype (cf. footnote 8). I do not explore
this possibility here.

!
!
88

but also on how those genes are expressed during its life cycle. The result is that two
individuals can have the same genetic value but very different phenotypic values for a
given character; moreover, the phenotypic value of any given individual can vary hugely
over its lifetime. It is sometimes suggested that this truism about development and
physiology vitiates a gene-centred approach to evolution, but the opposite is closer to the
truth: differential gene expression leads to two connected problems for any purely
phenotypic approach to the analysis of social evolution. One is that, in populations where
differential gene expression is rife, the overall fidelity of phenotypic transmission will
often be very poor. As a result, the expectation term in the phenotypic Price equation will
often be large, and may well be more significant than the covariance term (i.e., the term we
usually want to analyse) with regard to the overall direction of evolution. The other
problem is that phenotypic differences between social partners can belie their underlying
genetic similarity, and this genetic similarity can have important evolutionary
consequences regardless of whether it is manifested phenotypically.

Abstract as they may seem, both these problems are vividly illustrated by considering a
eusocial insect colony. In the most complex eusocial societies, workers are highly
differentiated in both morphology and behaviour, and there is a strict division of
reproductive labour: all reproduction is undertaken by the queen (see Chapter 5 for
further details). Usually, theoretical inquiry into the evolution of eusociality pays close
attention to the genetic relatedness between the workers and the queen. This genetic
relatedness can help explain how conditionally expressed genes for altruistic behaviour
can positively co-vary with fitness, since these genes are present in the queen (i.e., the
beneficiary of the workers’ altruism), as well in the workers who incur the cost. Moreover,
the reliable transmission of these genes from the queen to her offspring helps explain how
altruistic behaviour reappears in each newly founded colony.

What would happen if we ignored genotypes, and looked only at phenotypic correlations
between workers, and between the queen and her offspring? The answer is that the

!
! 89

evolutionary stability of eusociality would be much more difficult to explain. The altruistic
phenotypes of the workers do not co-vary positively with fitness: they are genuinely
altruistic in that they detract from the lifetime fitness of their bearers, and as such (in
contrast to the genes underlying them) co-vary negatively with fitness. Without
considering genetics, we would expect these behaviours to disappear rapidly from the
population. What we would see instead, however, is that they are retained in the
population by what would look like a bizarre bias in the transmission of phenotypes: the
queen, rather than producing offspring that resemble her phenotypically, continually
produces offspring with morphological and behavioural phenotypes that differ
dramatically from her own, and that systematically tend to be a great deal more altruistic.
This ‘bias in phenotypic transmission’ would appear to fortuitously counterbalance the
selection against altruistic phenotypes in each generation. Until we see the underlying
genetic similarity behind the phenotypic heterogeneity within and across generations, the
stability of altruism defies any deeper explanation.

3.3 Selection, transmission, and ‘spill-over’

3.3.1 Interpreting the Price formalism

The standard Price equation is commonly thought to separate the overall evolutionary
change into a component attributable to natural selection and a component attributable to
biased transmission (see, e.g., Frank 1995, 1997a, 1998; Gardner et al. 2007; Gardner 2008;
Gardner and Foster 2008; Wenseleers et al. 2010; Gardner et al. 2011). The covariance term
is taken to quantify the former, while the expectation term is taken to quantify the latter.
As Samir Okasha (2006) notes, however, it is doubtful whether this standard interpretation
is correct in general. The problem is that, in the standard Price equation, both terms
functionally depend on differential fitness. For recall that the second term—the term
supposedly attributable to ‘transmission bias’—is an expectation of w Δg . Since each
individual’s value for Δ g is weighted by its fitness, the personal transmission biases of

!
!
90

fitter individuals will make a bigger difference to the value of this term than those of less
fit individuals.

A rearrangement of the Price equation, first derived by Frank (1997a, 1998), yields a
‘modified Price equation’ with an expectation term that is independent of fitness
differences:

1
Δg = ⎡Cov ( w , g′ ) + wE ( Δg )⎤⎦ (3.3.1)
w⎣

Frank’s equation differs from the standard equation in two important respects. First, the
covariance term replaces g, an individual’s personal breeding value, with g′ , the average
breeding value of its descendants. Second, the expectation term is no longer weighted by
fitness: we look at the difference between an individual’s breeding value and the average
breeding value of its descendants without taking into account its relative contribution to
the descendant-set.

Okasha (2006) suggests that the modified Price equation succeeds where the standard
Price equation fails: that is, it does provide a clean separation of the effects of selection and
biased transmission. In response to Okasha, Peter Godfrey-Smith (2007a) and Ken Waters
(2011) have separately argued that matters are not quite so straightforward. For, although
replacing E (wΔg ) has the effect of making the second term independent of fitness
differences, replacing g with g′ has the effect of making Cov (w , g′ ) sensitive to variation in
transmission biases. Suppose, for example, that Cov (w , g ) is positive, but that fitter
individuals tend to transmit their breeding value less reliably than less fit individuals. In
this scenario, Cov (w , g′ ) would be less than Cov (w , g ).

Here, then, is the overall picture: the standard Price equation collates all the effects of
biased transmission in the expectation term, yielding a covariance term that is
independent of transmission bias. But it does not cleanly separate the effects of selection

!
! 91

and transmission, because the expectation term is sensitive to fitness differences as well as
transmission biases. The modified Price equation, by contrast, collates all the effects of
differential fitness in the covariance term, yielding an expectation term that is independent
of fitness differences. But it too cannot be said to cleanly separate the effects of selection
and transmission, because the covariance term is sensitive to transmission biases as well as
fitness differences. Hence, neither version separates the effects of selection and
transmission without some degree of ‘spill-over’.

Why does this apparently inescapable spill-over arise? Why is it so difficult to partition the
overall change cleanly into two terms, one attributable to selection alone, and the other to
transmission alone? As Godfrey-Smith (2007a) and Okasha (2011) note, there is in fact a
simple explanation. An individual’s personal transmission bias is a character, and, like any
other character, it is possible for it to co-vary with fitness. When such covariance occurs,
there will be a component of the change in g that depends on both differential fitness and
biased transmission, and so cannot be attributed to either process acting alone. This
component is equal to Cov (w , Δg ) , and it accounts for the ‘spill-over’ in both the standard
and modified Price equations. This is easier to see if we note the following notational
identities:

Cov ( w , %Δg ) = Cov ( w , %g′ ) − Cov (w , %g )

Cov ( w , %Δg ) = E ( wΔg ) − w E ( Δg )

The standard Price equation accounts for Cov (w , Δg ) as part of the expectation term, and,
as a result, this term is sensitive to variation in fitness as well as to individual transmission
biases. The modified Price equation, by contrast, accounts for Cov (w , Δg ) as part of the
covariance term, and, as a result, this term is sensitive to variation in transmission biases
as well as to variation in fitness.

!
!
92

The only way to avoid spill-over of this kind is to represent Cov (w , Δg ) explicitly as a
separate term in the Price equation, rather than incorporating it into one of the other
terms. The result is a third version of the equation which partitions the overall change into
three components rather than two (Godfrey-Smith 2007a; Okasha 2011):

1
Δg = ⎡Cov ( w , g ) + Cov ( w , Δg ) + w E ( Δg )⎤⎦ (3.3.2)
w⎣

The first term, Cov (w , g ) is identical to the covariance term in the standard Price equation
and is independent of transmission bias. The third term, w E( Δg ) , is identical to the
expectation term in the modified Price equation and is independent of differential fitness.
The second term, Cov (w , Δg ) , is sensitive to variation in fitness and to variation in
transmission bias. The terms are interpretable as quantifying, respectively, the effects
selection has independently of biased transmission, the effects biased transmission has
independently of selection, and the effect of directional selection on transmission bias
(Figure 2.1).

When Cov (w , Δg ) = 0 , the differences between the standard, modified and three-term Price
equations collapse: all three provide quantitatively identical partitions of the overall
change. But when Cov (w , Δg ) ≠ 0 , only the three-term version provides a complete causal
decomposition of the effects of selection and transmission, since both the standard and
modified Price equations have too few terms to separate the distinct effects of selection,
transmission and the interaction of the two processes. The standard Price equation treats
the change due to the interaction of selection and transmission as if it were attributable to
transmission alone; while the modified Price equation treats this effect as if it were
attributable to selection alone. By adding a third term explicitly representing the
interaction of selection and transmission, we avoid both types of causal misattribution.

!
! 93

Change"due"to"" Change"due"to"selection) Change"due"to"


selection)on)g" on)transmission)biases) transmission)bias)
with)respect)to)g" alone"

Figure 2.2: The three-term Price equation, with its associated causal interpretation.

Under what ecological conditions should we expect to find a non-zero Cov (w , Δg ) term?
Such circumstances may be rare (cf. Okasha 2011), but they are certainly not inconceivable.
Imagine, for example, a population in which a mutation arises that disposes its bearer to
help its parents raise additional offspring. Let g represent the breeding value for this
cooperative trait; and suppose that, owing to the appearance of the mutation in their
offspring, some parents have positive values of Δ g . Now suppose that these parents
receive a fitness benefit by virtue of their positive value for Δ g , since their offspring help
them produce additional offspring. This effect would show up in the Price equation in the
form of a positive Cov (w , Δg ) term.11

3.3.2 A further complication

The three-term Price equation distinguishes two different ways in which natural selection
may cause the population mean of a character, z, to increase. One is fairly intuitive: if

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
11
( )
Another possible case in which Cov w , Δg ≠ 0 , involving horizontal gene transfer, is considered in
Chapter 5.

!
!
94

individuals with the genes for z tend to have more offspring than individuals without
those genes, then (in the absence of a countervailing transmission bias) those genes will
increase in frequency, and this may issue in a positive change in z . This effect is captured
in Cov (w , g ) , the covariance between an individual’s fitness and its breeding value for z.
The other effect is much less intuitive: if individuals whose offspring have a greater
breeding value than their own tend to enjoy increased fitness as a result, this too can issue
in a positive change in z . This effect is captured in Cov (w , Δg ) , the covariance between an
individual’s fitness and its personal transmission bias with respect to the character of
interest. The latter effect seems likely to be much rarer than the former, and smaller when
it obtains, but it remains a mathematical and conceptual possibility. I suggest we call the
former effect the primary effect of natural selection, and call the latter effect the secondary
effect of natural selection.

That is not quite the end of the story. For there is a third way in which natural selection
may affect z . To see what it is, we need to return to the definition of breeding value. A
breeding value, recall, is a sum of allelic values weighted by their average effects (sensu
Fisher) on the character of interest. When introducing the notion of a breeding value, we
noted briefly that the average effects of an allele can be altered by changes in allele
frequency in the presence of non-additive interactions (i.e., dominance or epistasis)
between alleles. In essence, this is because non-additive interactions imply that the
difference an allele makes to the phenotype of its bearer is context-dependent (cf. Sterelny
and Kitcher 1988; Okasha 2006). For instance, if an allele A dominates an allele a at a
particular locus, the effect of adding a second copy of A to a diploid individual who
already has one copy will not be the same as adding a copy of A to a diploid individual
with no copies. Whenever allelic effects are context-dependent, the average effect of a
substituting one allele for another will depend on the relative frequency with which a copy
of that allele finds itself in one genetic context rather than another—and this in turn will
depend on the overall allele frequencies in the population. Since changes in allele
frequency can be brought about by natural selection, the implication is that natural

!
! 95

selection can alter the weightings that determine the breeding values of individuals in the
descendant-set (cf. Okasha 2008). This change in average effects may in turn produce a
change in g (and, by implication, z ). I will call this the tertiary effect of natural selection,
though this is not intended to imply that it will be any smaller or less important than the
secondary effect.

If such an effect occurs, it will not be accounted for in Cov (w , g ) . Some of it may be
accounted for in Cov (w , Δg ) , since it is possible that the alleles subject to changes in their
average effects will be differentially possessed by fitter (or less fit) individuals. But some
of this change is likely to be independent of fitness differences; and the Price equation will
account for this portion in the expectation term, w E( Δg ) , which we had originally hoped
to interpret as a term attributable to ‘transmission bias alone’. Of course, there is a sense in
which a change in the average effects of alleles due to natural selection is a source of a
transmission bias, for it impairs an individual’s ability to transmit its breeding value
faithfully to its descendants. But there is also a sense in which this label is misleading,
since nothing about the process of genetic transmission is responsible for this effect. It
comes about not because of any bias in the transmission of alleles, but rather because the
breeding value, by definition, requires us to weight alleles by their average effects, and
these average effects can change between generations. And it seems particularly
misleading to attribute this change to transmission bias alone, given that a change in
average effects may be due to changes in gene frequency caused by natural selection.

One might see this further complication as a reason to regard breeding values with
suspicion. For it implies that, when we formulate the Price equation in terms of breeding
values, even the three-term version fails to provide the clean separation of effects we
hoped for: it is quite possible for natural selection to influence the expectation term as well
as the two covariance terms (Box 2.1 summarizes the overall picture). We could remove
this possibility by switching to p-scores, which are not phenotype-relational, and do not
require us to weight alleles by their changeable average effects. Yet we need not see this

!
!
96

feature of breeding values as a drawback. Indeed, seen in a different light, it actually


provides further justification for using breeding values rather than ‘raw’ allelic values or
p-scores in the formulation of fundamental theory. For it also suggests that, if we were to
look only at the evolutionary change in allele frequencies or average p-scores, and assume
a simple relationship between alleles and phenotypes, we would be liable to overlook the
tertiary effect of natural selection on phenotypic change. This effect is real, and any model
of phenotypic change that aims for causal completeness should accommodate it.

Box 2.1: The three effects of natural selection on the evolution of z

• Natural selection may bring about covariance between the genes for z and
fitness. This is the primary effect of natural selection on the evolution of z. It is
quantified by ,"the covariance between an individual’s fitness and its
breeding value for z."

• When transmission fidelity is imperfect, the difference an individual’s personal


transmission bias makes to the overall change in z depends on the number of
descendants it leaves. Because of this, selection can have a secondary effect on the
evolution of z. This effect is captured in , the covariance between an
individual’s fitness and its personal transmission bias.

• Natural selection may also have a tertiary effect on the evolution of z when
changes in allele frequency alter the average effects of alleles. This effect may
contribute towards , but it may also contribute towards .

• When we talk informally about the ‘effect of natural selection’, we should be


clear regarding which of these effects we have in mind.

!
! 97

3.3.3 Analysing partial change

At the start of his Genetical Theory of Natural Selection (1930), Ronald A. Fisher famously
remarks that ‘natural selection is not evolution’. Selection contributes to evolutionary
change, but it is not the whole story: other processes—notably, mutation, migration and
genetic drift—contribute too. In this sense, theories of ‘social evolution’ are inaptly named,
because they are usually theories about the conditions under which natural selection will
favour a social behaviour, rather than theories about the conditions under which social
behaviours will in fact evolve. As we see in the next chapter, it is often useful when
formulating such a theory to focus on the component of the overall evolutionary change
attributable to natural selection (or rather, natural selection at a particular level or levels,
since we also usually want to ignore change attributable to within-organism intragenomic
conflict or gametic selection). To do this is not to assume that other factors are
insignificant. It is simply to abstract away from them, so as to focus on the partial change
caused by the process we take to be largely responsible for the evolution of cooperation.

The reason for the Price formalism’s rise to prominence in recent decades is that, at least
on the face of it, it allows theorists to identify this partial change in a form that makes it a
convenient target for further analysis. The moral of Section 2.3.2, however, was that the
true picture is somewhat more complicated. The term in the Price equation that is usually
taken to represent the partial change attributable to natural selection, Cov (w , g ) , in fact
represents only the primary effect of natural selection. There are two further ways in which
natural selection may influence evolution which are not accounted for in this term.
Nevertheless, in some contexts we may be chiefly interested in determining the conditions
under which the primary effect of natural selection will favour a social behaviour; and, in
these contexts, Cov (w , g ) will be the correct target for analysis. I will introduce the symbol
Δ1o g to denote this partial change:

1
Δ1o g = ⎡Cov ( w , g )⎤⎦ (3.3.3)
w⎣

!
!
98

In other contexts, we will be able to gain additional insight into the effects of selection by
taking the secondary effect into account, and by taking Cov (w , g′ ) as our target of analysis
(see Chapter 4; see also Frank 1997a, 1998). I will introduce the symbol Δ w g to denote this
partial change, since it represents the component of the overall change that directly
depends on fitness differences:

1
Δw g = ⎡Cov ( w , g′ )⎤⎦ (3.3.4)
w⎣

These partial changes will be a frequent target of analysis in subsequent chapters.


Sometimes one sees these partial changes denoted by the subscript ‘S’ or ‘NS’, to indicate
that they reflect the partial change attributable to natural selection. I avoid this here
because it is misleading: it encourages us to neglect the tertiary effect of natural selection,
an effect that is not accounted for by the covariance term in the Price equation but that
may still make a significant contribution to the overall evolutionary change (cf. Chapter 6).

3.4 Grouping organisms

When we derived the Price equation in Section 3.1, we framed the entire discussion in
terms of the properties of individuals. No attempt was made to sort organisms into groups,
types or classes of any kind. In some ways, the fact that we can formulate the Price
equation in purely individualist terms is important (see Grafen 1985a; Godfrey-Smith
2009a). In practice, however, whenever biologists actually use the Price equation as the
starting point for the analysis of a (real or modelled) population, they more commonly
group organisms together in one way or another, so as to focus their attention on the
average properties of groups. In this section, I unify the various ways in which one might
go about grouping organisms under a common formal framework; I use this framework to

!
! 99

show why certain methods of grouping are particularly useful; and I relate this discussion
to philosophical issues.

In Section 3.4.1, I show (following Price 1972a) how, provided we can partition a
population into non-overlapping subsets, it is always possible to partition the overall w/g
covariance into between- and within-subset components. In Section 3.4.2, I consider three
applications of this general principle: trait-groups of interacting organisms, genotypic
classes, and developmental classes. The next two subsections bring the preceding
discussion to bear on philosophical questions. Section 3.4.3 considers the subtle
relationship between the Price formalism and ‘evolutionary nominalism’, the view that
sorting organisms into classes is never obligatory in evolutionary theory (Godfrey-Smith
2009a). Section 3.4.4 turns to the troubled relationship between kin- and group-selectionist
approaches to social evolution. I suggest that the main methodological difference between
these approaches lies not in whether organisms are sorted into groups for the purpose of
analysis, but how.

3.4.1 The general case

The overall covariance between w and g is defined as the expected product of individual
deviations from the population mean with respect to each variable:

! Cov (w , g ) = E
!! ⎡⎣(w − w )( g − g )⎤⎦

As with any expectation value, this quantity may be computed in a variety of ways. Here
is one possible procedure: first, add up the values of (w − w )( g − g ) for each individual in
the ancestor-set; then, divide this number by the total number of individuals. This
procedure is reflected in the following expression:!

!
!
100

1
! Cov ( w , g ) = !!∑ ( wi − w )( gi − g ) (3.4.1)!
n i

It would be equally reasonable, however, to compute the covariance by means of the


following, three-step procedure:

1. Sort the members of the ancestor-set into N non-overlapping subsets (the


subsets need not correspond to natural groupings of any kind; in principle,
individuals can be assigned to subsets arbitrarily).
2. For each subset, compute the average value of (w − w )( g − g ) !within that
subset.
3. Take an average of the subset averages, weighting each subset by its relative
size.

To rewrite equation (3.4.1) in a way that reflects this alternative averaging procedure, we
can re-label the members of the ancestor-set. Instead of labelling them with a single index,
i, we can label them with two indices, i and j, such that gij represents the breeding value
of the ith member of the jth subset, and w ij represents the fitness of the ith member of the jth
subset. We can then replace Σ i
with Σ , indicating that we are to sum over all j for each
ij

value of i (i.e., over all entities in each subset) and then sum over all i (i.e., over all subsets).
Finally, we can define a quantity qi = mi n , where mi represents the size of the ith subset;
qi thus represents the relative size of the ith subset. Combining these ingredients, we can
rewrite equation (3.4.1) as follows:

q
!
( )(
Cov (w , g ) = ∑!! i wij − w gij − g ) (3.4.2)!
ij mi

Equations (3.4.1) and (3.4.2) give us two equivalent expressions for the overall w/g
covariance. The former is the simpler of the two, but the latter still has its uses. In

!
! 101

particular, it is possible to partition the summation in equation (3.4.2) as follows, where Gi


represents the average breeding value of the ith subset, and Wi represents the average fitness
of the ith subset (see Appendix A for details):

qi
Cov (w , g ) = ∑ (w − Wi
mi ij
)( gij )
− Gi + ∑qi (Wi − w )(Gi − g )
ij i

Both terms on the right-hand side of this equation can be given a statistical interpretation.
The first term is the size-weighted expectation of the w/g covariance within each subset,
while the second term is the size-weighted covariance12 between the subset averages, W and
G. We can therefore rewrite the equation in statistical notation (where the m subscripts
indicate weighting by size, and Cov i " denotes the within-subset covariance of the ith
subset):

Cov ( w , g ) = Em ⎡⎣Cov i !!( w , g )⎤⎦ + Cov m (W , G ) (3.4.3)!

This general result, first presented by Price (1972a), bifurcates the overall covariance into
two components, the first of which depends only on genetic variation within subsets, and
the second of which depends only on genetic variation between subsets. Nothing is
assumed about the nature of these subsets, except that they do not overlap.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
12 The notions of ‘weighted expectation’ and ‘weighted covariance’ are used frequently in Price’s (1970, 1971,
1972a) original papers, but are rarely seen in other contexts. In standard probability theory, all computations
of expectation and covariance involve weighting outcomes by their probabilities, so this is not explicitly
mentioned; and any other weightings are included in the arguments of E and Cov, rather than being
incorporated into the functions themselves. Price’s stipulative definitions are as follows, where Ek denotes ‘k-
weighted expectation’ and Covk denotes ‘k-weighted covariance’:

Ek ( X ) = E[( k k ) X ]

Cov k ( XY ) = Ek [( X − Ek ( X )) (Y − Ek (Y ))]!

!
!
102

3.4.2 Three special cases

Section 3.4.1 was deliberately abstract. We saw that, when we sort the members of our
ancestor-set into subsets, we can partition the overall w/g" covariance into a component
that depends on genetic variation within subsets, and a component that depends on
genetic variation between subsets; but we said nothing at all about the biological
interpretation of these ‘subsets’. The reason is that biologists sort organisms into groups in
a variety of ways for a variety of theoretical purposes, and considering the general case
allows us to bring all these cases within a unifying framework. In this section, I want to
consider three such cases: groups of interacting organisms, genotypic classes, and
developmental classes.

First, however, it will be helpful to introduce a formal framework in which different ways
of grouping organisms can be conceptualized and compared. As we have emphasized,
equation (3.4.3) holds for any partition of a population into non-overlapping subsets; in
principle, the subsets can be completely arbitrary. But in practice, little insight is gained by
grouping organisms in an arbitrary fashion: we want to group organisms non-arbitrarily.
In broad terms, the way to do this is by identifying a biologically meaningful equivalence
relation among the members of the population under study.13 In set theory, an equivalence
relation, x$ ~$ y, can be any binary relation among the elements of a set that is reflexive,
symmetric and transitive. If we can find a relation with these properties for a given set, we
can use it to partition the set into non-overlapping subsets such that each comprises
elements related to each other by x$~$y. These subsets are known as equivalence classes. For
example, suppose we have a set of balls of varying colours. We can identify an
equivalence relation for this set:

(x$~$y)"="(x$is$the$same$colour$as$y)"

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
13 To my knowledge, the first author to apply this notion specifically to the problem of grouping organisms
was Godfrey-Smith (2006), and here I am indebted to his illuminating discussion.

!
! 103

"
Note that the relation is reflexive, symmetric and transitive. We can then use this relation
to partition the set of balls into equivalence classes, where each equivalence class
comprises all and only those balls of a particular colour.

Case 1: Groups of interacting organisms


In biology, we are rarely interested in grouping organisms by colour. We often are
interested, however, in grouping them by patterns of interaction. Sometimes it is possible to
partition a population into discrete ‘trait-groups’, where the members of each trait-group
engage in fitness-affecting interactions (with respect to the character(s) of interest) only
with their fellow group members. We can think of ‘trait-groups’ as equivalence classes
defined by the following equivalence relation (see Godfrey-Smith 2006):14

(x$~$y)$=$(x$has$its$fitness$affected$by$the$character$of$y)

Grouping by fitness-affecting interactions is a standard approach in the literature on


multi-level selection (see, e.g., Price 1972a; Hamilton 1975; Wilson 1975; Wade 1985;
Queller 1992a; Sober and Wilson 1998; Pepper 2000; Okasha 2006; Gardner and Grafen
2009). Indeed, this was the application for which Price (1972a) originally derived his
partition of the w/g covariance into between- and within-subset components. For ease of
analysis, it is often assumed that the trait-groups are equal in size. In this special case, we
can replace Price’s somewhat confusing ‘weighted expectation’ and ‘weighted covariance’
functions with their standard, unweighted equivalents:

Cov ( w , g ) = E ⎡⎣Cov i ( w , g )⎤⎦ + Cov (W , G )


(3.4.4)

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
14 Such a partition will not be possible for all populations; see Section 2.5.4.

!
!
104

Why group organisms in this way? One reason is that the equivalence classes one obtains
will undoubtedly seem more ‘natural’ than purely arbitrary groupings. But another reason
is practical. If we grouped organisms arbitrarily, any subsequent analysis of the covariance
would have to take account of interactions between the members of different subsets.
After all, we could not rule out the possibility of such interactions, nor could we dismiss
them as irrelevant to the overall response to selection. By contrast, grouping organisms by
patterns of relevant interaction allows us to discount any interactions that cut across group
boundaries. For it ensures that, at least with respect to the character of interest, the fitness
of a given organism is affected only by its fellow group members; and that the average
fitness of the group depends only on how its members behave. In other words, it
guarantees that all interactions relevant to the response to selection will take place within
groups—not across them.

Case 2: Genotypic classes


While it is often useful to sort organisms by patterns of interaction, this is not the only way
in which we may wish to sort organisms for the purposes of analysis. Here is another
possibility: we could sort organisms into subsets by their breeding value for a character/s of
interest, such that there is exactly one subset for each value of g instantiated in the
population, and each subset contains all and only those members of the population which
instantiate that value. We can call these subsets genotypic classes.15 Genotypic classes,
which are again relative to the character under study, are defined, for a given character, by
the following equivalence relation:

(x"~"y)"="(x"has$the$same$breeding$value$as"y)"

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
15 It may be that all members of a genotypic class have the same alleles at all loci relevant to the trait of
interest, but this need not be the case: a given breeding value might be ‘multiply realizable’ by various allele
combinations.

!
! 105

Since every individual within a subset has the same breeding value, an individual’s
breeding value always equals the average breeding value of its subset, i.e., gij = Gi for all i,
j. This equality eliminates the first term in equation (2.1.5), yielding:

! Cov (w , g!!) = Covm (W , G ) (3.4.5)!

In effect, sorting organisms into genotypic classes entitles us to assume that the within-
subset covariance will be zero in all subsets, so that all that remains is the size-weighted
covariance between the class averages for fitness and breeding value.

Sorting organisms into genotypic classes—in order to track variation in the average
properties of genotypes rather than variation in the properties of individuals—is
extremely common in the kin selection literature. This is in part because kin selection
theory gives paramount importance to considerations of genetic relatedness at relevant loci
(see Chapters 4 and 5). Usefully, we know that the relatedness among any two members of
a genotypic class will always equal 1; moreover, we know that the relatedness between a

member of one genotypic class and a member of another will be the same for all possible
pairings of one member from each class. Hence, by sorting organisms into genotypic
classes, we save ourselves the trouble of having to track degrees of relatedness between
individual actor and recipient pairs: if we know the average relatedness between the actor’s
genotypic class and the recipient’s genotypic class, this tells us everything we need to
know (see Frank 1997b, 1998; see also Chapter 5).

Sorting organisms by genotype is a standard practice in population genetics more


generally. This too is understandable, since the functional relationship between an
individual’s genes and its fitness will often be enormously complex. Individuals with the
same breeding value for the trait of interest will often end up with very different fitness

!
!
106

values, depending on the other traits they inherit, the environment in which they develop,
and the chance events that befall them during their life. It can be very helpful to abstract
away from all this micro-level variation in fitness among individuals with the same
genotype. Equation (3.4.5) shows that we are formally entitled to do this, for it shows that
fitness differences within genotypic classes, no matter how dramatic, have no effect at all
on the overall w/g covariance. All that matters is the variation in average fitness between
genotypic classes.

Case 3: Developmental classes


The third main type of equivalence class arises when organisms are grouped by some
important, non-genotypic property that is functionally distinct from the character we are
studying, but that nonetheless has profound consequences for an organism’s fitness
and/or behaviour. For example, when generations overlap (as they do in insect societies),
it is helpful to sort organisms into age classes; when sexes differ in their ploidy (as they do
in insect societies), it is helpful to sort organisms by sex; and when organisms exhibit
significant morphological differentiation that impacts on their reproductive and
behavioural capacities (as they do in insect societies), it is helpful to sort them by
morphological caste. I will refer all these forms of equivalence class as developmental classes;
since, in one way or another, they all group organisms by the developmental features they
instantiate, be it the temporal stage of development they are passing through or the
morphology they have come to exhibit. Like genotypic classes, developmental classes are
defined by a similarity-based equivalence relation, though in this case it is some relevant
developmental similarity that matters:

(x$~$y)$=$(x$is$in$the$same$age/group/sex/caste$as$y)!

Developmental classes, like genotypic classes, are commonly encountered in kin selection
models. This is because the kin selection approach requires that we track the fitness effects

!
! 107

of particular social behaviours, and there are many cases in which a particular social
behaviour will yield different payoffs depending on the developmental class of the actor
and recipient. For instance, in an age-structured population, an altruistic act that benefits a
very old recipient will tend to confer a smaller fecundity benefit than it would confer if it
were to fall on a younger recipient; conversely, an altruistic act performed by a very old
recipient will tend to impose a smaller cost than it would have done if it were performed
by a younger actor with more to lose.

For ease of analysis, kin selection theorists typically assume that there is no genetic
variance between developmental classes at the relevant loci (see Frank 1997b, 1998). This is
often a reasonable assumption. When classes are differentiated by age, there is usually no
particular reason why the overall genotypic composition of one age class would be
different to that of any other; the same applies when classes are differentiated
morphologically, provided the relevant differences arise from differential gene expression
(i.e., phenotypic plasticity) rather than from genetic differences; and the same also applies
when classes are differentiated by sex, provided the genes related to social phenotypes are
not correlated with the genes for sex determination. Of course, one can construct
hypothetical scenarios in which this assumption would fail, but it will hold in many cases.
If we succeed in individuating classes such that there is no genetic variance between
classes, we can eliminate the second term in equation (3.4.3), yielding, in Price’s statistical
notation:

! Cov ( w , g ) =!! Em ⎡⎣Cov i ( w , g )⎤⎦ (3.4.6)

Verbally, the overall covariance is equal to a size-weighted (i.e., frequency-weighted)


average of the within-class covariance. A version of this expression, which was (to my
knowledge) first derived by Peter Taylor (1990), is frequently deployed in the kin selection
literature when a problem requires explicit accommodation of class-structure (e.g, Taylor

!
!
108

1990; Taylor and Frank 1996; Frank 1997b, 1998; Wild and Taylor 2006; Taylor et al. 2007;
Wenseleers et al. 2004; Wenseleers et al. 2010).

The three types of equivalence class we have considered provide cross-cutting ways of
grouping organisms: they are very unlikely to ever align with one another.16 Nevertheless,
for any partition of a population into equivalence classes, we are free to treat each
equivalence class as a new population in its own right, and partition it into yet smaller
equivalence classes. As a result, there are various ways in which the three types of class
could be nested within each other. For instance, we could partition a population into trait-
groups, then separately partition each trait-group into developmental classes, then
separately partition each developmental class of each trait-group into genotypic classes. In
practice, the usual procedure in the kin selection literature is to index organisms purely by
genotype and developmental class; while the usual procedure in the multi-level selection
literature is to index organisms purely by patterns of interaction. To my knowledge, the
possibility of combining all three types of equivalence class in a single model has not yet
been explored.

3.4.3 The Price formalism and evolutionary nominalism

I now want to bring the foregoing discussion of equivalence classes to bear on two
(related) philosophical matters. Godfrey-Smith (2009a) suggests that the Price formalism is
a natural ally of a view he terms evolutionary nominalism:17

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
16 This would require that each developmental class is genotypically unique, that the members of each class
all have the same genotype, and that organisms only interact with other members of their class.
17 A similar view is defended by Nanay (2010). Nanay, however, frames his view in terms of property-types
and property-tokens; specifically, he argues that ‘biological property-types do no play any explanatory role
in evolutionary explanations’ (2010, 93), a view he terms ‘trope nominalism’. This may or may not be more
radical than Godfrey-Smith’s formulation, depending on whether or not the trope theorist can sanction the
various equivalence relations discussed in Section 2.5.2 (cf. Footnote 10).

!
! 109

[T]he grouping of individuals into types is in no way essential


to Darwinian explanation. Such groupings are convenient tools.
But one always has the choice of using finer or coarser
groupings, ignoring fewer or more differences between
individuals. As categories become finer, they may be occupied
by only one individual each. (Godfrey-Smith 2009a, 35)

One moral from the foregoing discussion is that, while Godfrey-Smith is broadly correct to
highlight an affinity between the Price formalism and evolutionary nominalism, the true
relationship between the two ideas is not straightforward.

We saw earlier in the chapter that the Price equation may be formulated purely in terms of
individuals and their properties, without acknowledging equivalence classes of any kind
(Section 3.1). The result is an extremely general and strictly individualist description of the
evolutionary change in some character. This naturally leads to the suggestion that the
Price equation vindicates evolutionary nominalism. The equation, however, is not always
formulated in individualist terms. Notably, Frank (1995, 1997a, 1998, 2012) derives the
Price equation purely in terms of the properties of types (typically genotypes), where each
type is weighted by its frequency in the computations of covariance and expectation; we
then regard w as a measure of the total number of descendants each type contributes to the
descendant-set. This formulation is no less correct than the more familiar version
expressed in terms of properties of individuals, provided we can individuate types such
that there is no variation within each type (and hence no covariance with w) with respect
to the property of interest. In effect, rather than starting (as I have done) from a purely
individualist formulation and extending it to accommodate various forms of equivalence
class, Frank takes for granted the identity given in equation (3.4.5) and works with types
from the outset.

!
!
110

In a sense, therefore, the Price formalism is promiscuous: it permits both individual-


centred and type-centred formulations. Moreover, we saw in the preceding section that, in
addition to explaining why it is permissible and useful to sort organisms by genotype, the
Price formalism also explains why it is permissible and useful to sort them by causal or
developmental equivalence relations. In all three cases, sorting organisms into equivalence
classes allows us to highlight the variation that matters to the response to selection, while
abstracting away from variation that does not: sorting by patterns of fitness-affecting
interaction allows us to discount interactions that cross subset boundaries; sorting by
breeding value allows us to discount variance in fitness within each class; and sorting by
developmental properties, if they are chosen well, allows us to discount the variance in
fitness between classes.

The upshot is that, if we construe evolutionary nominalism in the strongest possible


sense—as the thesis that evolutionary theory can and should proceed without sorting
organisms into equivalence classes—then the Price formalism is not the ally it may at first
appear. For while it shows that an individualist description of evolutionary change is
always available, it also shows why sorting individuals into equivalence classes often
facilitates a more convenient description. It is doubtful, however, whether anyone would
seriously defend so a strong a version of the thesis18; and Godfrey-Smith certainly does
not. We can more cautiously formulate Godfrey-Smith’s claim as the two-part thesis that
(i) evolutionary explanations never require that we sort organisms into equivalence classes;
but that (ii) there is often a pragmatic justification for doing so. If we construe evolutionary
nominalism like this, the Price formalism provides support for both parts.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
18 Nanay 2010 (see footnote 17) may be an exception, though this is not clear. On the face of it, nothing
prevents trope-nominalists from admitting equivalence relations (including causal relations and similarity
relations) into their ontology. They would, however, deny that a similarity relation between two objects is
reducible to their co-exemplification of a common property-type.

!
! 111

3.4.4 The Price formalism and the ‘kin selection’ versus ‘group selection’ debate

For the past five decades, the question of the relationship between kin and group selection
has been one of the most divisive issues in evolutionary theory (see Borrello 2010 for a
historical overview). John Maynard Smith (1964) originally coined the term ‘kin selection’
to describe a process he took to be distinct from group selection; and this view of kin- and
group-selectionist approaches as rivals—a view promoted by, among others, George
Williams (1966) and Richard Dawkins (1976, 1979, 1982)—persists in many quarters to this
day. But theorists have recognized for some time that the true relationship between the
two approaches must be rather more subtle, since kin and group selection models appear
to yield identical predictions under a wide range of conditions, and also seem to have
similar limitations (Hamilton 1975; Grafen 1984; Queller 1992a; Wilson and Dugatkin
1997).

In recent years, a consensus-of-sorts has emerged that the two frameworks are ‘formally
equivalent’, in the sense that they will never disagree regarding the direction of the
response to selection (see Lehmann et al. 2007; Wenseleers et al. 2010; Gardner et al. 2011;
Marshall 2011a,b; for dissent, see van Veelen 2009, 2011; Nowak et al. 2010; van Veelen et
al. 2012). This is certainly correct if one is comparing the most general partition of the Price
equation deployed in multi-level selection theory (i.e., equation (3.4.3)) with the most
general version of Hamilton’s rule deployed in kin selection theory (see ‘HRG’ in Chapter
4). Yet, for those of us who believe there ought to be room for both approaches in
mainstream social evolution theory, this ‘equivalence’ result is a double-edged sword. On
the plus side, if kin and group selection are, at a very general level, formally equivalent,
then it is futile to argue over which approach is objectively superior in general: we can let
both flowers bloom, at least in principle. But the downside is that theorists who have
hitherto completely ignored one or other of these approaches will consider themselves
entitled to carry on doing so. This is often the case in the current kin selection literature,
where multi-level selection is typically introduced as a ‘formally equivalent’ alternative

!
!
112

only so that it can be set aside for serious explanatory purposes (see West et al. 2007a,
2008; Bourke 2011; Birch 2012b). The theoretical pluralist faces a new challenge: given that,
at a very general level, the partitions of the Price equation employed by kin and group
selection theory are formally equivalent, how exactly do the theories differ in a way that
justifies us in retaining and developing both? Is there still any interesting methodological
difference between the two approaches, or should we regard them as nothing more than
redundant notational variants of the same theory and jettison one for the sake of the other?

One often encounters the suggestion that the theories of kin and group selection, though
formally equivalent, offer usefully different ‘perspectives’ on social evolution. The
apparent implication is that, while they may not constitute substantively different theories
as to how social evolution proceeds, the differences between them are not simply
notational. But how exactly should we cash out this idea? One might assume that the
difference lies in the fact that kin selection, in contrast to group selection, provides a
fundamentally individualist perspective on social evolution—a perspective that puts the
individual at the centre of the analysis, and avoids any sorting of organisms into groups.
But I think this is a mistake, and the preceding sections illustrate why. Though it is
possible to formulate the basic idea of kin selection in strictly individualist terms (see
Gardner et al. 2011; see also Chapter 4), detailed modelling of kin selection almost never
holds to strict individualist scruples. When applying kin selection theory to particular
problems, theorists almost always sort organisms into genotypic classes, developmental
classes, or both.

With this in mind, I suggest that one fundamental difference between kin- and group-
selectionist approaches to social evolution consists not in whether one sorts organisms into
subsets for the purposes of analysis, but how. A kin selection analysis prioritizes
considerations of genetic and developmental similarity in assigning organisms to subsets.
Organisms are indexed to genotypic and developmental classes, so that variation within
genotypic classes and between developmental classes can be ignored. The kin selectionist

!
! 113

then studies the ways in which organisms of different classes interact with one another,
with a view to ascertaining which genotypes are likely to be favoured by selection within
each developmental class. By contrast, a multi-level analysis prioritizes considerations of
causal interaction from the beginning. Organisms are indexed to a particular subset on the
grounds that they interact only with the other members of that subset. Considerations of
genetic and developmental similarity are thus trumped by considerations of causation: if
two organisms have the same genotype or belong to the same developmental class but
never interact with one another, the group selectionist will not assign them to the same
subset. The group selectionist then analyses the effects of selection in two parts. She
studies how within-group differences in character cause differences in individual fitness,
and studies separately how differences in the average character of groups cause
differences in their average fitness.

This contrast shows how the frameworks of kin and multi-level selection can be something
less than rivals, but something more than redundant systems of notation. The differences
between the approaches are not particularly substantive, because we will often be able to
switch from a kin selectionist analysis to a group selectionist analysis simply by re-
labelling the individuals in the population, in the same way that we can switch
perspective as we view a duck-rabbit, or a Necker cube (cf. Dawkins 1982; Godfrey-Smith
and Kerr 2002, forthcoming). But the differences between the two frameworks are not
merely notational, because this form of re-indexing will not always be possible. We can
identify, at least in the abstract, cases in which only one of the two methods of indexing
will work.19

The multi-level method of indexing will not always be available, because x$ has" its$ fitness$
affected$ by$ the$ character$ of$ y is not always reflexive, transitive and symmetric (Godfrey-
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
19 In Chapter 5, I make a similar point regarding the ‘neighbour-modulated fitness’ and ‘inclusive fitness’
frameworks within kin selection theory. Here, I am considering ‘kin selection theory’ in the broadest
possible sense, glossing over the interesting differences between alternative kin selectionist approaches.

!
!
114

Smith 2006, 2008). In other words, it is not always possible to sort organisms into non-
overlapping subsets such that each subset contains all and only those organisms which
engage in fitness-affecting interactions with other members of the subset. So-called
‘neighbour-structured’ populations, in which each organism interacts with its nearest
neighbours, often have this property; since it is not true in general that my neighbours’
neighbours are my neighbours (Maynard Smith 1964, 1976, 1987, 2002; Godfrey-Smith
2006, 2008). For a simple illustration (adapted from Godfrey-Smith 2006), imagine a
population with a spatial structure that can be represented by a square lattice. Each
organism occupies one node on the lattice, and no node is unoccupied; and each organism
interacts with all and only those organisms on the four adjacent nodes. In this scenario, x
has$its$fitness$affected$by$the$character$of$y is not transitive: it is not true in general that, if
some organism A has its fitness affected by an organism B, and B has its fitness affected by
a third organism C, then A$has its fitness affected by C (in fact, this is true only if A$=$C).

On the face of it, the equivalence relations we need in order to individuate genotypic and
developmental classes (e.g., x$has$the$same$breeding$value$as$y, x$is$the$same$age$as$y) seem
less likely to fail the conditions of reflexivity, symmetry and transitivity. But the kin
selection method of indexing might still lead to difficulties, particularly when
developmental classes are employed. For recall that, for equation (3.4.6) to apply, we need
to be able to identify developmental classes such that there is no genetic variation between
classes with respect to the character of interest, and this may not always be possible. For
instance, we can imagine cases in which a population is age-structured, and in which
bearers of a particular allele at a relevant locus tend to die off more rapidly than non-
bearers—so that the genotypic composition of the older classes differs from the genotypic
composition of the younger classes. In such cases, we could still assign organisms to
classes, but we would not be able to eliminate the between-class portion of the overall w/g
covariance. As a result, we would not be able to employ any methods of analysis that start
from equation (3.4.6), including the usual methods of neighbour-modulated and inclusive

!
! 115

fitness analysis for class-structured populations (see Taylor 1990; Taylor and Frank 1996;
Frank 1998; Taylor et al. 2007; see also Chapter 5).

Can we say anything about which method of grouping is in general preferable, when both
options are available? Partisans on both sides are likely to insist that their preferred
equivalence relations are more ‘natural’ or ‘meaningful’ than the alternative: the ardent
kin selectionist will claim that classes defined by genetic and developmental similarity are
more natural than groups defined by patterns of causal interaction; the ardent group
selectionist will claim the reverse. But it is hard to see how either party could substantiate
its claim to superior ‘naturalness’. Both approaches involve abstracting, from the great
milieu of objective relations in which organisms stand to one another, some subset of
relations that is particularly salient to the problem at hand. Which approach is superior in
any particular instance is likely to depend on the precise nature of the problem. This
conclusion, of course, is in keeping with the modest brand of evolutionary nominalism
advanced in Section 6.4.3.

!
!
!

FOUR

The Scope and Limits of Hamilton’s Rule

Chapter 2 examined the ecological aspects of social behaviour, and the concepts we use to
sort behaviours into types. Chapter 3 examined an abstract formalism for the
representation of natural selection—the Price formalism—that provides the foundation for
much of contemporary social evolution theory, but which by itself says nothing in
particular about the evolution of social behaviour. The most important bridge from the
abstract world of population genetics to the real world of behavioural ecology is
Hamilton’s rule, a deceptively simple statement of the conditions under which we can
expect a social behaviour to be favoured by natural selection. The rule states, broadly
speaking, that a social behaviour will be favoured by natural selection if and only if

rb − c > 0 , where b represents the benefit the behaviour confers on the recipient, c
represents the cost it imposes on the actor, and r$represents the relatedness between actors
and recipients. To describe Hamilton’s (1964) presentation of the rule as ‘enormously
influential’ would be an understatement: 48 years and 9056 citations later,2 Hamilton’s rule
remains a result of paramount importance both to theorists, for whom it is the
foundational principle of kin selection theory, and to field biologists, for whom it is a
versatile rule of thumb with which to rationalize social behaviours observed in the wild.

Yet despite (or perhaps, in part, because of) its great influence, Hamilton’s rule has proved
a powerful magnet for controversy and debate.3 The reason, in a nutshell, is that Hamilton
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
2 According to Google Scholar, as of 03/09/12.
3 For a recent example, see Nowak, Tarnita and Wilson’s (2010) incendiary claim that Hamilton’s rule
‘almost never holds’; a claim fiercely rebutted by 157 social evolution theorists (Abbot et al. 2011). I discuss
118

first derived the rule in a one-locus population-genetic model that made a number of
substantive modelling assumptions, including weak selection, fair meiosis, random
mating, the absence of mutation and the additivity of genic effects on fitness. In the
following decades, many theorists (including Hamilton himself) explored the extent to
which these assumptions could be relaxed. The upshot was a variety of routes to ‘rb$–$c$>"
0’-type results, often with apparently incompatible implications about the conditions
under which the result obtains.4

The Price formalism provides a route to a particularly general formulation of Hamilton’s


rule (Hamilton 1970; Queller 1985, 1992a, b, 2011; Grafen 1985a, b; Frank 1998; Gardner et

al. 2007; McElreath and Boyd 2007; Wenseleers et al. 2010; Gardner et al. 2011; Birch
forthcoming). Yet even among proponents of the Price formalism, disagreement persists
regarding the rule’s scope and limits. In particular, a dispute about the consequences of
synergy for Hamilton’s rule has divided two of the most influential living theorists of
social evolution—David C. Queller of Washington University in St Louis and Alan Grafen
of the University of Oxford—for almost thirty years. In the 1980s, Queller argued that the
familiar ‘ rb − c > 0 ’" form of Hamilton’s rule must be extended in cases of synergistic
interaction between social partners (that is, cases in which the effect of two individuals

performing a social behaviour differs from the sum of the two effects each behaviour
would have had in the absence of the other) (Queller 1984, 1985). Grafen replied that this
was simply incorrect: the standard version of the rule still holds, he argued, regardless of
whether synergy is present or absent (Grafen 1985a, b). One might expect this to be a
mathematical issue that could be resolved quickly and definitively, one way or the other.
For reasons discussed below, however, matters are not so straightforward and, three

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
this particular controversy in detail in a separate article (Birch forthcoming). That article can be seen as a
companion piece to this chapter, since they address closely-related issues.
4 See, e.g, Hamilton 1970, 1972, 1975; Levitt 1975; Orlove 1975; Charnov 1977; Charlesworth 1980;
Uyenoyama and Feldman 1980, 1981, 1982; Uyenoyama et al. 1981; Michod 1982; Toro et al. 1982; Cheverud
1984; Grafen 1984, 1985a.

!
! 119

decades on, the two theorists remain unchanged in their views (Queller, personal
communication; Grafen, personal communication).

The dispute has taken on broader significance in the intervening years, since the question
of whether Hamilton’s rule can or cannot accommodate synergistic interactions now lies at
the heart of a heated and ongoing debate concerning its usefulness and generality (see
Fletcher and Zwick 2006; Gardner et al. 2007; van Veelen 2009; Nowak et al. 2010; Gardner
et al. 2011; Marshall 2011b; van Veelen et al. 2012; Birch forthcoming). In recent literature,
Grafen’s insistence that the standard version of Hamilton’s rule can accommodate synergy
has been picked up and defended at length by a number of his Oxford colleagues (Gardner

et al. 2007; Gardner et al. 2011), while Queller’s note of caution on this score has (to
Queller’s evident frustration5) been picked up by theorists who would like to see the rule
expunged altogether from serious theorizing (van Veelen 2009, 2011; van Veelen et al.
2012).

In this chapter, I dissect the long-running debate regarding the validity of Hamilton’s rule
in cases of synergy, and I propose an irenic resolution that identifies what is right about
both positions in the original Queller/Grafen dispute. In Section 4.1, I provide the

necessary theoretical background, presenting a derivation of Hamilton’s rule from the


Price equation that closely follows that of Queller (1985). In Section 4.2, I explain the
problem synergy poses for the formulation of the rule derived in the preceding section. In
Section 4.3, I critically examine Queller’s proposed extension to the rule and more recent
developments in a similar vein; in Section 4.4, I critically examine ways of preserving the
original ‘ rb − c > 0 ’ form of the rule in the tradition of Grafen 1985b. Finally, in Section 4.5, I
propose a resolution to the debate. I argue that, in choosing whether to employ generalized
(two-term) or extended (three-or-more-term) versions of Hamilton’s rule in social
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
5 See his recent article on Joan Strassmann’s ‘Sociobiology’ blog (URL:
http://sociobiology.wordpress.com/2012/04/06/agreement-and-disagreement-in-social-evolution-insight-
from-david-queller/ [Accessed 03/09/12]).

!
!
120

evolution theory, we inevitably face a trade-off between conceptual unification and causal
explanation: generalized formulations buy unification at the expense of causal content,
while extended formulations buy causal content at the expense of unifying power. The
upshot is that the generalized and extended formulations of Hamilton’s rule are apt to
perform different theoretical functions, and may peacefully coexist. Problems arise only
when we choose the wrong formulation for the task at hand.

4.1 The regression route to Hamilton’s rule

4.1.1 Regression analysis of partial change

Partitioning the Price equation


The Price equation is a useful starting point for the analysis of social evolution (and,
indeed, evolution more generally) chiefly because the covariance operator is bilinear, or
linear in both arguments. This implies that, if we can write either one of the arguments as a
function of other variables, we can partition the overall covariance into a sum of
components, one for each of the terms in that function (for instance,

Cov (α X + βY , Z ) = α Cov ( X , Z ) + β Cov (Y , Z ), where α and β are constants). To put the


point in less abstract terms: if we can express an individual’s fitness as a function of
relevant phenotypic predictors weighted by appropriate coefficients, then, by substituting
this function into the Price equation, we can partition the overall w(g covariance into a sum
of components, one for each of those predictors (see Lande and Arnold 1983; Queller
1992a, b).

Here is a simple illustration. We start with equation (3.3.3), which expresses the partial
change due to the primary effect of natural selection as the covariance between an
individual’s fitness, w, and its breeding value, g:

1
Δ1o g = ⎡Cov ( w , g )⎤⎦
w⎣

!
! 121

Next, we express w as a function of an individual’s value for some phenotypic character, or


set of characters. Let us suppose that w depends on two phenotypic characters, z1 and z2 ,
and that it depends on them in a perfectly linear way:

w = α z1 + β z2

Substituting this expression into the Price equation, and exploiting the bilinearity of
covariance, we obtain:

1
Δ1o g = ⎡α Cov ( z1 , g ) + β Cov ( z2 , g )⎤⎦
w⎣

This resolves the overall primary effect of natural selection into a term that depends on
differences in z1 and a term that depends on differences in z2 . Of course, the true fitness
function will rarely (if ever) be this simple, and we have not yet introduced social effects;
the purpose of the above example is merely to illustrate the bilinearity of covariance and
the partitions of the Price equation it facilitates.

Residuals
It would be naïve to suppose that, in any real biological context, we could read off the
exact fitness of each individual in a population simply by plugging its phenotypic
characters into a linear fitness function. Usefully, however, the models of fitness we use in
the analysis of social evolution do not have to be idealized in this way. We can relax this
idealization by adding to our fitness function a residual term ( ε w ) that, for any individual,
represents the portion of its fitness that is not predicted by the other terms in the equation.
For example:

w = α z1 + β z2 + ε w

!
!
122

Or, more generally, for n phenotypic characters, each weighted by a coefficient, βi :

n
w = ∑ β i zi + ε w
i

Of course, by adding a residual term, we make any fitness function true by definition:
there is no individual in any population, whether real or modelled, for which the equation
is false. The fitness function will fit some populations better than others, however, and this
will be reflected in the size of the residual terms. If the residuals are typically negligible,
we have probably captured all the significant influences on fitness; if the residuals are
typically enormous, we probably have not.

Partial regression coefficients


In the above fitness function, each phenotype, zi , is weighted by a coefficient, βi . While
these coefficients could in principle be assigned arbitrarily or by guesswork, they are
usually assigned in a more principled way. For any given set of phenotypic characters we
might use as predictors of fitness, {z1 , z2 ,..., zn } , there will be some corresponding set of
weighting coefficients, {β1 , β2 ,..., βn} , that minimizes the sum-of-squares of the residuals for
the population. These special coefficients are known as the partial regression coefficients

for this predictor set. Intuitively, weighting the phenotypic predictors by partial regression
coefficients provides the overall ‘best fit’ between the variable we want to predict (i.e.,
fitness) and the variables doing the predicting (i.e., the phenotypic characters).

The numerical values of the partial regression coefficients are not, in the general case, easy
to compute, since to do so we need to solve a set of n simultaneous equations, one for each
predictor, and this task rapidly becomes very demanding as n increases (Ewens 2010). This
does not, however, prevent partial regression coefficients from playing a fundamental role
in the context of general, abstract theorizing about the nature of the evolutionary process, a
context in which theoretical rigour and conceptual clarity often take precedence over
tractable number-crunching (Fisher 1930; Ewens 2010; Gardner et al. 2011). We have

!
! 123

already encountered one important theoretical role for partial regression coefficients: they
feature in the definition of a breeding value. By weighting allelic values by partial
regression coefficients taken with respect to the phenotypic character of interest, we (by
definition) obtain the best possible prediction of an individual’s phenotype from a linear
combination of its alleles, and this quantity, though difficult to compute exactly, is
theoretically useful (see Chapter 2). For similar reasons, partial regression coefficients are
also commonly used to weight phenotypic predictors in theoretical analyses of social
evolution, where we want to define the best possible prediction of an individual’s fitness
from its behavioural phenotype.

If we want to predict fitness from two predictors or fewer, the relevant partial regression
coefficients are much more easily computed. For a single predictor, ‘partial regression’
gives way to simple regression, and the simple regression of Y$on X can be expressed as a
ratio of the covariance between Y and X to the variance in X:6

Cov (Y , X )
βY , X =
Var ( X )

For two predictor variables ( X 1 and X 2 ), the partial regression coefficients can be computed

from the corresponding simple regressions by means of the following formulae (Lande and
Arnold 1983; Queller 1992a; Gardner et al. 2011):7

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
6 Here I introduce a common notation for simple regression coefficients, on which βY , X represents the simple

regression of Y on X.
7 Here I introduce a common notation for partial regression coefficients, on which βY , X X2
represents the
1

partial regression of Y on X 1 , correcting for correlations with X 2 . ρ X , X2


is the Pearson correlation coefficient
1

between X 1 and X 2 ( ρ X X = Cov ( X 1 , X 2 ) Var ( X 1 ) Var ( X 2 ) ).


1 2

!
!
124

!
βY , X − βY , X2 β X2 , X1
βY , X = !! 1
1 X2
1 − ρ X21 , X2

βY , X − βY , X β X 1 , X2
βY , X = 2 1

2 X1
1− ρ 2
X1 , X 2

These formulae, though specific to the two-predictor case, shed broader light on what
partial regression coefficients are, and how they work. Importantly, although the partial
regression of a dependent variable, Y, on a predictor, X 1 , is sometimes glossed as a
measure of the extent to which differences in X 1 " predict differences in Y when we ‘control
for’ correlated variables, talk of ‘control’ in this context is usually misleading. There is
often no literal sense in which correlated variables are being ‘controlled’ or ‘held fixed’ by
the analyst when partial regression coefficients are computed. Rather, the ‘controlling’
occurs purely in the statistics: we take the simple regression of Y on X 1 and adjust it,
subtracting the portion of the overall association between Y and X 1 that is accounted for by
correlation between X 1 and the other predictors. The partial regression of Y on X 1 is
therefore more accurately described as a measure of the extent to which differences in X 1
predict differences in Y when we ‘correct for’ or ‘adjust for’ correlations among predictors.

One final preliminary is needed: it will be convenient to introduce a symbol, z ,


representing the complete set of phenotypic predictors we want to use in a particular
regression analysis of fitness:

z = {z1 , z2 , z3 ,..., zn}

Naturally, the contents of z will vary greatly depending on the population and process in
question. We may often want to regress fitness on only one or two phenotypic predictors,
but in principle there is no limit to the number we could use.

!
! 125

4.1.2 The phenotypic formulation of Hamilton’s rule (HRP)

When individuals do not interact socially, a causal analysis of the effects of natural
selection can proceed solely by analysing how an individual’s fitness depends on its own
phenotypic characters. Correlations between characters will often matter to the overall
direction of selection—and these correlations need to be taken into account in the
analysis—but the characters in question will typically be intrinsic properties of the
individual whose fitness they predict (Lande and Arnold 1983). By contrast, when
individuals do interact socially, extrinsic properties matter too: to capture the causal
influences on an individual’s fitness with any degree of accuracy, we need to take into
account not merely its own intrinsic character, but also its social milieu. We therefore need
to include at least two phenotypic predictors in our predictor set. In addition to z, the focal
individual’s phenotypic value for the social trait under investigation,9 we at the very least
need to consider ẑ , the average phenotypic value of its social partners:

z = {z , zˆ}

Partitioning the Price equation using this predictor set takes us directly to a useful version
of Hamilton’s rule (Queller 1992a). First, we express fitness as a sum of these predictors,
weighted by the relevant partial regression coefficients (and including a residual term, ε w ):

w = βw ,z zˆ z + βwzˆ z zˆ + ε w (3.1.1)

Recall that, owing to the residual term, this equation cannot be false of any individual in
any population, though it may fit some populations much better than others. We then
substitute this equation into equation (2.3.3), yielding the following partition:

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
9 By ‘focal individual’, I mean the arbitrary individual whose fitness we are attempting to predict, as
opposed to its social partners (Rousset 2004).

!
!
126

1⎡
Δ1o g = β Cov ( z , g ) + βw , zˆ z Cov ( zˆ , g ) + Cov (ε w , g )⎤
w ⎣ w , z zˆ ⎦

To get from this three-term partition of the Price equation to Hamilton’s rule, we need to
eliminate the third term—that is, we need to assume that Cov (ε w , g ) = 0 . Queller (1992a)
terms this assumption the ‘separation condition’ on the grounds that, if it obtains, the two-
term partition of the Price equation that is left behind fully separates quantities that relate
genotype to phenotype from quantities that relate phenotype to fitness.

There is some confusion in the literature as to whether or not the separation condition
amounts to a substantive assumption. Queller (1992a) claims that it does, and I agree. It is

true enough that least-squares theory guarantees that the residuals in a regression equation
cannot co-vary with any of the predictors (a point I revisit in Section 6.4). At first glance,

(
therefore, one might take it to be trivial that Cov ε w , g = 0 . Crucially, however, g is not
!
)
among the predictors of fitness in our analysis: our predictors are phenotypes, whereas g is
a breeding value. There can be no formal guarantee that a variable outside our predictor
set will not co-vary with the residuals, so there can be no formal guarantee that g will not
co-vary with ε w when the predictors of fitness are phenotypes. The separation condition
!
thus amounts to a substantive assumption. The circumstances under which it obtains are

discussed at length by Queller (1992a), and will also be considered in detail in Section 4.2.

If the separation condition is indeed satisfied, we obtain the following simplified partition:

1⎡
Δ1o g = β Cov ( z , g ) + βw ,zˆ z Cov ( zˆ , g )⎤
w ⎣ w , z zˆ ⎦

This result entails the following condition for a social trait to be favoured by the primary
effect of natural selection10:

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
10 Here I use ‘iff’ in the standard philosophical sense, as an abbreviation for ‘if and only if’.

!
! 127

( )
Δ  g > 0!!iff!!β w ,z ẑ Cov z, g + β w ,ẑ z Cov ẑ, g > 0
! 1
( )

Owing to the definition of breeding value, Cov ( z , g ) = Var ( g ) .11 Since variance cannot be
negative, we can divide the right-hand side of this inequality through by Var ( g ) without
any risk of reversing its sign. The result is the following rule:

Δ1 g > 0!!iff!!β w ,z ẑ + β w ,ẑ z


( ) >0
Cov ẑ, g (3.1.2)
Var ( g )
!
This result has the form of Hamilton’s rule. The first term ( βw , z zˆ ) measures the association
between a social behaviour and the fitness of the actor who performs it, and can be

regarded as a generalized measure of the cost of that behaviour. The first part of the
second term ( βw , zˆ z ) measures the association between an individual’s fitness and the
behaviour of its social partners, and can be regarded as a generalized measure of the
benefit an individual receives from its neighbours. This quantity is weighted by a third
coefficient ( Cov ( zˆ , g ) Var ( g ) or, equivalently, β zˆ , g ) that measures the overall association
between the breeding value of an individual and the character of its social partners. This
can be regarded as a generalized measure of the ‘relatedness’ between actors and
recipients (Orlove and Wood 1978; Michod and Hamilton 1980; Seger 1981; Michod 1982;

Queller 1985; Grafen 1985a). Hence, to get from inequality (3.1.2) to the more familiar

rb − c > 0 "form of Hamilton’s rule, we need only relabel the coefficients as follows:

Cov ( zˆ , g )
βw ,z zˆ = −c &&&&&&&&&&&&&&&&&βw , zˆ z = b&&&&&&&&&&&&&&&&&& =r
Var ( g )

Yielding:

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
11By definition, the simple regression of a phenotypic character on its breeding value is 1 (if it were anything
other than 1, the breeding value would not be the best possible prediction of the phenotype from a linear
combination of allelic values, since we could improve the prediction by multiplying through by β z , g ); and

β z , g = Cov ( z , g ) Var ( g ) = 1 implies that Cov ( z , g ) = Var ( g ) .

!
!
128

Δ g > 0!!iff!!rb − c > 0 (HRP)


! 1

For reasons that will become clearer later on, I will refer to this as the phenotypic
formulation of Hamilton’s rule (HRP). Note that, although talk of costs and benefits
intuitively connotes that costs will detract from an agent’s fitness while benefits increase it,
this need not be the case: the rule is intended to apply regardless of the sign of b or c.
Hence, while the rule is most often associated with the evolution of cooperation (for which
b is positive) and the evolution of altruism (for which b and c are both positive), selfish,
spiteful and mutualistic behaviours are also intended to fall within the scope of Hamilton’s
rule (see Hamilton 1964; Trivers 1985; Bourke and Franks 1995; West et al. 2007; Bourke

2011).

4.1.3 The regression definition of relatedness

In the above formulation of Hamilton’s rule, relatedness is formally defined as the


regression of an individual’s social partners’ average phenotype on its own breeding value
for the character under study. This regression definition of relatedness is highly abstract
and highly general. It also has several important but rather counterintuitive implications,
and it is worth briefly noting these for future reference. Relatedness coefficients and their
various forms will be discussed in greater detail in Chapter 5.

Relatedness is not genealogical kinship


As we noted in Chapter 1, a high value of r does not require kinship in the traditional,
genealogical sense of the word. What matters is correlation between the breeding value of
the recipient and the phenotype of the actor. Kinship is one way of generating such
correlations, but it is by no means the only way. As Hamilton (1975) himself notes, the
necessary correlation could be ensured by, for instance, genetically-correlated habitat
preferences, or ‘greenbeard’-style recognition mechanisms that allow the bearers of a
particular social gene to seek out and detect other bearers, whether or not they are

!
! 129

genetically similar in other respects (cf. Okasha 2002; Godfrey-Smith 2009; West and
Gardner 2010). If genealogical kinship is the only source of relatedness then, on the
assumption of weak selection,13 relatedness coefficients between classes can be usefully
approximated by traditional pedigree measures (e.g., for diploid individuals: ½ for
offspring and full siblings; ¼ for grandoffspring and half siblings, etc.). But nothing in the
definition of relatedness requires that we measure it this way (Frank 1998; Gardner et al.
2011).14

Relatedness is character-specific
Since breeding values and phenotypic values are strictly character-specific, relatedness too

can be evaluated only relative to a particular character. This raises the conceptual
possibility that two individuals might be closely related with respect to one character, yet
only weakly related with respect to others; greenbeard effects provide one possible
mechanism by which this kind of scenario could arise. It has often been suggested that
such a pattern of relatedness would be unstable over evolutionary time, owing to the
intragenomic conflict to which it could potentially give rise (Ridley and Grafen 1981;
Grafen 1985a; Okasha 2002; Biernaskie et al. 2011). Even so, we should not discount the
possibility of such variation, particularly in microbial populations, where horizontal gene

transfer is known to issue in ephemeral and highly character-specific correlations between


individuals (see Rankin et al. 2011a, b; Birch 2013b; see also Chapter 5).

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
13 Strong selection distorts family trees, leading to correlations between relatives that differ from traditional
pedigrees (cf. Frank 1998; Gardner et al. 2011). For example, if my siblings would probably have died years
ago had they not inherited a particular gene, then the probability that my living siblings share that gene will
be greater than ½.
14 The possibility of relatedness without genealogical kinship has led to some uncertainty regarding how the
term ‘kin selection’ should be applied. Some authors use the term broadly, to describe any selection process
in which relatedness matters; others reserve the term for only those cases in which relatedness arises
through genealogical kinship. As explained in Chapter 1, I will employ the term in the broader sense.

!
!
130

Relatedness is not ‘shared genes’


Strictly speaking, a high value of r does not even require that the actor and recipient must
share genes for the character under investigation. Recall that, since the breeding value is an
estimate of a phenotypic character on the basis of any relevant alleles, it is possible (at least
in theory) for two organisms to possess the same breeding value while possessing very
different underlying allele combinations—combinations which just happen to have similar
average effects on the phenotypic character in question. Because relatedness is formally
defined in terms of breeding values and phenotypes, and not allelic values, individuals
may, in principle, be closely related according to the regression definition and yet differ
considerably at the more fine-grained level of particular alleles (cf. Fletcher and Doebeli

2009).

Relatedness can be negative


Any regression coefficient can be negative as well as positive, and relatedness, on the
regression definition, is no exception. But in what sort of biological scenario could a
negative value of relatedness arise? It would be one in which social partner phenotypes are
negatively correlated, so that an individual with the genes for the social behaviour under
investigation is less likely than average to interact with a social partner that performs it,

while an individual without the relevant genes is more likely than average to interact with
a social partner that performs it. Hamilton (1970) suggested that such a scenario would be
conducive to the evolution of spite: by inflicting harm on their social partners, even at a
cost to themselves, individuals with the genes for spite could increase the relative
representation of these genes in the next generation. The regression definition of
relatedness allows for such a phenomenon, since, when r is sufficiently negative,
Hamilton’s rule may predict that a social behaviour can be favoured by selection even if its
effects on actor and recipient are also negative (see Gardner and West 2004a, b; West and
Gardner 2010).

!
! 131

Relatedness is population-relative
On the regression definition, relatedness is a population-level statistic: it is a measure of
the overall extent to which, in a given population, differences in breeding value predict
differences in social milieu. Any evaluation of relatedness is thus relative to a reference
population. One implication of this is that, strictly speaking, it makes no sense to talk of the
relatedness between one particular individual and its social partner, because regression
coefficients cannot be defined for a single data point. A second implication is that, if we
want to use relatedness as a rough indicator of whether or not altruism is likely to evolve,
the choice of an appropriate reference population is crucial. This point is especially salient
in the case of ‘viscous populations’, in which organisms are confined to a particular locality

and must compete for resources with other nearby organisms. When relatedness is
evaluated relative to the population as a whole, viscosity tends to increase relatedness,
because genealogical kin are confined to the same region. One might intuitively expect,
therefore, that viscosity would be conducive to the evolution of altruism. Yet the fact that r
is high when evaluated relative to the global population does not imply that r will also be
high when evaluated separately for each local subpopulation. Organisms may be
surrounded by kin, but that does not mean they will interact differentially with their
closest kin—and it is differential interaction with genetically similar individuals, relative to

the reference population mean, that matters to the value of r. The upshot of this is that, if
competition within subpopulations is much stronger than competition between them,
altruism might not be favoured after all (Taylor 1992; Queller 1992c). The general moral is
that, if we want to use r-values as a guide to whether altruism will be favoured, we need to
make sure that the reference population with respect to which they are computed is
commensurate with the scale of competition (Queller 1994; West et al. 2002).

!
!
132

4.2 The problem of synergy

4.2.1 Why synergy matters

Synergy, as I use the term here, refers to any fitness effect that arises from a combination of
social behaviours (performed by two or more individuals) and which is quantitatively
different from (i.e., greater or less than) the sum of the fitness effects that those behaviours
would have had if performed in causal isolation from each other. It is, more informally, a
fitness effect that is either more or less than the sum of its parts, where the parts are the
fitness effects the behaviours in question would have conferred by themselves.

Detecting synergy is not an easy business, partly because detecting quantitative fitness
effects is never an easy business, and partly because we often have no observed instances
of synergy-producing behaviours occurring ‘in causal isolation from each other’ to use as a
contrast class. Nevertheless, there is a widespread consensus that synergy matters in social
evolution—that many important social behaviours generate synergistic benefits, and that
this often helps explain why they evolved in the first place (see especially Queller and
Strassmann 1998; Strassmann et al. 2000; Fletcher and Doebeli 2006; Fletcher and Zwick
2006; Strassmann and Queller 2007, 2011; smith [sic] et al. 2010; Cornforth et al. 2012;
Damore and Gore 2012). Chapter 1 introduced a number of examples of social interaction
that may plausibly be regarded as synergistic, including slime mould aggregation;
collective predation in myxobacteria; bubblenet feeding in humpback whales; and
collective defence, construction and foraging tasks in eusocial insect colonies. Indeed, any
instance of task-based cooperation will count as synergistic if the probability of task
completion is a non-linear function of the number of contributions. In short, therefore,
synergy is rife in nature—not only in the contexts of microbial evolution and of transitions
in individuality, but also in sociobiology’s more traditional entomological heartland. Any
theory of social evolution with aspirations to serious explanatory power should be able to

!
! 133

accommodate synergy and its consequences for social evolution. If HRP cannot, that is a
problem.

4.2.2 Why a one-predictor rule is unreliable when relatives interact

To see why synergy is often thought to present a problem for HRP, it is best to start with a
different question: why do we need HRP at all? Why can we not predict the direction of
selection with a simpler rule that uses a single phenotypic predictor, namely z, the focal
individual’s own value for the character under study? Consider the following regression
equation:

w = βw ,z z + ε w

By substituting this equation into the Price equation and (for now) assuming that

Cov (ε w , g ) = 0 (as in Section 3.2.2), we can derive the following principle:

Δ g > 0!!iff!!β w ,z > 0 (3.2.1)


! 1

The principle states that a phenotypic character will be favoured by selection if and only if
there is a positive statistical association between the phenotype and fitness. One might
suppose that the main problem with (3.2.1) is that it is just too simple to tell us anything
interesting: in compressing all the causal influences on fitness into a single regression
coefficient, it conflates direct and indirect causal pathways that HRP usefully separates.
That is certainly true, but there is also a more serious problem: when relatives interact
socially, (3.2.1) is liable to be downright false. This is because the separation condition we
took for granted in its derivation (i.e., the assumption that Cov (ε w , g ) = 0 ) is unlikely to be
satisfied in real cases of social interaction.

!
!
134

In general terms, the condition will be satisfied only if the overall association between g
and w is fully accounted for by, on the one hand, the association between g and z; and, on
the other hand, the association between z and w. Accordingly, the separation condition
may be usefully rewritten as β w , g = βw , z β z , g . To judge informally whether this condition is
likely to be satisfied in any given case, we must consider the following question: if we
already knew an individual’s value for z, would additionally learning its value for g enable
us to predict its fitness with any greater accuracy? If it would, this can only be because the
true value of β w , g is not fully accounted for by β w , z β z , g .

Queller (1992a) argues informally but cogently that, if genetic relatives interact socially,

then knowing an individual’s breeding$ value does provide predictively relevant


information about its fitness over and above that provided by its phenotype. The key
consideration here is that, in general, an individual’s breeding value is a better predictor
than its phenotype of the phenotype of its genetic relatives, because deviations from the
breeding value are not genetically transmissible (and, hence, to the extent that an
individual’s phenotypic value deviates from its breeding value, the deviation is not likely
to reappear in its relatives). When genetic relatives interact socially, this information
becomes predictively relevant to w, because the phenotype of one’s genetic relatives has

consequences for one’s own fitness. The upshot is that, in such cases, knowing the
conjunction of an individual’s breeding value and its phenotypic value tells us more about
its fitness than knowing its phenotypic$ value alone. This implies that the separation
condition is violated.

An example may help to bring out the logic of this argument. Consider a particular pair of
organisms, A and B. They do not interact with each other, but both interact socially with
their own genetic relatives. Both have exactly the same phenotypic value for some
cooperative behaviour, z, but they differ in their fitness, w,"and in their breeding value for
that behaviour, g. Now consider the following question: if we know that A has the greater
breeding value, does this tell us anything about which has the greater fitness? It does." The

!
! 135

information that A has the greater breeding value" tells" us that the breeding value of A’s
relatives is likely to be greater than the breeding value of B’s relatives. This tells us that the
phenotypic value of A’s relatives is likely to be greater than the phenotypic value of B’s
relatives. And this tells us that A is more likely than B to receive a benefit from its social
partners. Hence, A is likely to have the greater fitness.

4.2.3 Why HRP is unreliable when relatives interact synergistically

The failings of the one-predictor rule can be remedied simply by adding an extra predictor,
ẑ , which explicitly represents the phenotype of the focal individual’s social partners. This
new predictor accounts for the component of the w(g covariance that z alone does not
account for; and the result, of course, is HRP, which succeeds where the one-predictor rule
fails. HRP faces a problem of its own, however, when genetic relatives interact
synergistically (Queller 1985, 1992a, 2011).

The problem is broadly similar to the problem we encounter when we try to apply a one-
predictor rule to additive social interactions. That problem arose because one’s breeding
value is often a more accurate predictor than one’s phenotype of the (non-synergistic)

social effects one can expect to receive, leading to a situation in which breeding value
predicts residual fitness. A two-predictor analysis solves this problem, but runs straight
into another: one’s breeding value is often a more accurate predictor than one’s phenotype
of the synergistic social effects one can expect to receive, again leading to a situation in
which breeding value predicts residual fitness. The main difference is that, this time round,
the problem is not nearly as easy to see, because it arises from the precise way in which
partial regression coefficients compensate for interactions among predictors.

!
!
136

To understand the nature of the problem, it will be helpful to consider simple one-shot,
two-player, game-theoretic models of synergistic interaction (henceforth referred to as
‘synergy games’), characterized by the following payoff matrix (Queller 1984, 1985):

COOPERATE ( ẑ = 1 ) DEFECT ( ẑ = 0 )

COOPERATE ( z = 1 ) !B − C + D " !−C "

DEFECT ( z = 0 ) B$ 0"

We can translate the two strategies into the language of the Price formalism by defining a
dummy variable, z, such that z = 1 if the row-player cooperates and z = 0 if it defects, and
by defining a dummy variable ẑ such that zˆ = 1 if the column-player cooperates and zˆ = 0
if it defects. The B, C and D parameters represent fecundity payoffs; nothing is assumed
about their sign or magnitude. The D-payoff is what makes the model synergistic, since it
implies that the payoff each player receives when both players cooperate differs from the
sum of the payoffs that would be conferred by each player cooperating in isolation.

The payoff matrix does not fully specify a model, since models with the same payoff
matrix can differ with respect to other parameters. For example, the strategies of
interacting individuals may be correlated or uncorrelated, and their strategies may or may
not be determined by their genotype. In Appendix C, I analyse two specific synergy
games. In both games, some individuals possess a particular allele (!x = 1 ) while others do
not (!x = 0 ). Social partner genotypes are correlated: a fraction, a, of individuals are
assigned a social partner with an allelic value that is guaranteed to be identical to their

( )
own, while a fraction, 1− a , are assigned a social partner that is drawn at random from
!
the (infinite) population. The difference is that, in the first game, allelic value determines
strategy: !x = 0→ z = 0 and !x = 1→ z = 1 . In the second game, allelic value does not fully
determine strategy: only a fraction, k, of !x = 1 individuals go on to express the cooperative

!
! 137

phenotype. This difference turns out to be critical, for it turns out that HRP is a reliable
guide to the direction of selection in the first game but is not reliable in the second.

What is the source of the trouble? At first glance, one might think it obvious that HRP is
going to break down in synergy games, on the grounds that it takes no account of the D-
payoff (van Veelen 2009; van Veelen et al. 2012). This, however, misunderstands the
meaning of the terms in HRP. The cost and benefit terms in HRP are partial regression
coefficients, not fecundity payoffs, and as such they measure the overall statistical
association between a predictor and fitness (correcting for other predictors), not merely the
payoffs caused directly by that predictor. The implication is that, when computed

correctly, they do take the D-payoff into account (Grafen 1985a, b; Gardner et al. 2007;
Birch forthcoming). What they do, in effect, is account for the expected effects of synergy
through a correction factor that we add to the B and C fecundity payoffs. In other words,
they treat the synergistic payoff not as a third phenotypic pathway distinct from the costs
and benefits of the behaviour in question, but rather as an effect that modulates these costs
and benefits. If D is positive, the cost of the behaviour in question is lessened and its benefit
is boosted; if D is negative, the converse is true (see Appendix C, Part I for details).

In fact, although synergistic interaction does pose a genuine problem for HRP, the source
of the problem is far from obvious. The real issue is that, in scenarios in which genotype is
an imperfect predictor of behaviour, synergistic interaction between genetic relatives leads

( )
to a violation of the separation condition (i.e., Cov ε w , g ≠ 0 ), and this in turn implies that
!
HRP is not a reliable guide to the sign and magnitude of the overall w(g covariance.
Queller (1992a) argues for this conclusion informally, but provides no formal argument. In
Appendix C (Part II), I show that Queller’s intuition is indeed correct: when genotypes
determine phenotypes, the separation condition is satisfied; but when genotypes are
imperfect predictors of phenotypes, the separation condition is violated. More specifically,
I show that HRP systematically overcompensates for the effects of synergy on the direction

!
!
138

of selection when phenotypes are not genetically determined. If synergistic effects are
large, this systematic error threatens to make HRP seriously unreliable.15

While the formal argument in Appendix C is made in the context of a particular synergy
game, it is important to add that the problem for HRP does not arise from any
idiosyncratic features of this model. On the contrary, the problem is likely to recur in any
scenario in which (i) genetic relatives interact synergistically and (ii) genotype is an
imperfect predictor of phenotype. This is because the problem ultimately arises from a
very general feature of partial regression coefficients. In correcting for the expected
synergistic effect arising from interactions among predictors, a regression analysis only

ever takes into account correlations among the predictors; any other biologically relevant
correlations are ignored. This means that, if our predictors of fitness are phenotypic (and in
real ecological contexts they usually need to be if we want to measure them; cf. Sections 4.4
and 4.5), only phenotypic correlations are taken into account when compensating for
synergy, and any underlying genetic correlations are neglected. The upshot is that,
whenever the underlying genotypic correlations between social partners are stronger than
the manifest phenotypic correlations, knowing an individual’s genotype will give us
predictively relevant information about the synergistic effects it is likely to receive that a

purely phenotypic regression analysis of fitness will fail to take into account. Hence, we
have reason to believe that synergistic interaction among relatives will, in general, lead to
violations of the separation condition. The synergy game analysed in Appendix C is
merely an illustrative case.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
15 van Veelen et al. (2012) argue for a similar conclusion using similar models, but their understanding of
Queller’s separation condition strikes me as deeply confused. My aim in Appendix C is to show when
synergy leads to a violation of the separation condition as Queller understands it.

!
! 139

4.3 Solution 1: Expand the predictor set

The problem of accommodating synergy within a kin selection framework is serious, but it
is by no means insoluble. Indeed, one finds two different solutions in the kin selection
literature, both of which have proved influential. The first—Queller’s original (1984, 1985)
solution—involves expanding our phenotypic predictor set to explicitly represent
synergistic effects. The second—also developed by Queller (1992b), but more recently
championed by Andy Gardner and colleagues (2011)—involves recasting Hamilton’s rule
as a principle concerned only with the average effects of genotypes, and ignoring
phenotypic pathways altogether. Each solution has its merits and demerits. In the next two

sections, I present and scrutinize each solution in turn. In Section 4.5, I address the
question of which is on balance preferable.

4.3.1 Queller’s extension of Hamilton’s rule (HRQ)

Above, we saw how the deficiencies of a one-predictor regression analysis in cases of social
interaction among genetic relatives can be remedied simply by adding an extra predictor
to our predictor set, so that the phenotypes of an individual’s social partners are explicitly
represented. An analogous response is available in the present context: the deficiencies of
HRP in cases of synergistic interaction among genetic relatives can be remedied by adding
yet another predictor to our predictor set, explicitly representing the synergistic effect.
Since the D-payoff in the synergy game obtains if and only if both players cooperate
(i.e., zzˆ = 1 ), the predictor we need to add is zzˆ , the product of the players’ phenotypic
values:

z = {z , zˆ , zzˆ}

From this predictor set we obtain the following regression equation (Queller 1985, 1992a,
2011):

!
!
140

w = βw ,z zˆ ,zzˆ z + βw ,zˆ z ,zzˆ zˆ + βw ,zzˆ z ,zˆ zzˆ + ε w

As always, the partial regression coefficients are defined as the weightings that minimize
the sum-of-squares of the residuals. In the synergy game, the residuals will be
minimized—indeed, eliminated altogether—when the three coefficients are equal to the –C,
B and D fecundity payoffs respectively. Substituting the new regression equation into
equation (2.3.3), we obtain the following partition:

( ) ( ) ( )
Δ  g = β w ,z ẑ ,zẑ Cov z, g + β w ,ẑ z ,zẑ Cov ẑ, g + β w ,zẑ z ,ẑ Cov zẑ, g + Cov g, ε w
! 1
( )

Because the three-predictor regression fully accounts for the fitness of every individual in
the synergy game—leaving no residuals at all—it follows that Cov (ε w , g ) = 0 , so the
separation condition is satisfied. This allows us to derive the following condition under
which selection will favour cooperation in the synergy game (Queller 1985, 1992a, 2011):

Δ1 g > 0!!!iff!!!β w ,z ẑ ,zẑ + β w ,ẑ z ,zẑ


( )+β
Cov ẑ, g ( ) >0
Cov zẑ, g
Cov ( z, g ) w ,zẑ z ,ẑ
Cov ( z, g )
!

Though broadly similar to Hamilton’s rule, this condition includes an extra term
corresponding to the effect of synergy. By using d to represent βw , zzˆ z , zˆ and s to represent

Cov ( zzˆ , g ) Cov ( z , g ) , we can produce the following, more memorable formulation:

(HRQ)
Δ g > 0!!!iff!!!rb − c + sd > 0
! 1

This principle is sometimes known as Queller’s rule (Marshall 2011), though Queller
himself describes it as a natural extension of Hamilton’s rule to the case of synergy. I will
refer to it as Queller’s extension of Hamilton’s rule (HRQ).

!
! 141

4.3.2 The general method

Because it explicitly represents the effects of synergy, HRQ is more reliable than HRP
when synergy is present: it accurately predicts the direction and magnitude of selection in
cases in which HRP falls foul of the separation condition (and thus fails to account fully for
the heritable variation in fitness). Yet it would be naive to suppose that HRQ constitutes a
fully general extension of Hamilton’s rule. We introduced a third predictor, zzˆ , in order to
accommodate the additional complexity of the synergy game in contrast to a game with
perfectly additive payoffs. But the synergy game is still very simple, in the great scheme of
things, and many instances of social behaviour in the real world are likely to have far more

complicated payoff structures. If we want to account for all the w(g covariance in these
more complex contexts, then we are probably going to need more than three predictors.

The general moral to draw from the two cases discussed in Section 4.2, I suggest, is that
our predictor set will fail to account fully for the overall w(g covariance whenever (i) we
have too few predictor variables in our regression analysis to account for the full causal
structure of the social interactions in the population under study, and (ii) an individual’s
genotype is a stronger predictor than its phenotype of the variable(s) we have omitted.
Such a situation arises when we try to apply a one-predictor regression to social interaction
among genetic relatives, and it arises again when we try to apply a two-predictor
regression to synergistic interaction among genetic relatives. Queller’s three-predictor
regression copes with the synergy game only because its predictor set fully captures the
causal influences on fitness in that game. If we were to apply HRQ to social interactions
that have a greater degree of complexity than a three-predictor regression is able to
represent, then there is every chance that the same problem would recur again: some of the
w(g covariance would be unaccounted for, and the separation condition would be violated.

!
!
142

Queller (2011) is well aware of this problem, and does not claim that HRQ provides a fully
general characterization of the conditions under which selection will favour a social
behaviour. Instead, he suggests that we should see his derivation of HRQ merely as one
instance of a general method for deriving Hamilton’s rule-type principles to apply to
particular cases. The general method can be characterized in four steps:16

STEP 1: Construct a regression analysis of fitness including all


phenotypic predictors causally relevant to the direction of
selection on the character of interest. In cases of social
behaviour, this will include extrinsic (‘neighbourhood’)

predictors as well as intrinsic predictors.

STEP 2: Substitute this regression equation into the (standard


genetic) Price equation to yield a partition of the overall w(g
covariance.

STEP 3: Assume, if it is reasonable to do so, that the residuals in


the regression analysis are uncorrelated with breeding value.

This leaves behind a partition that cleanly separates quantities


that relate genotype to phenotype from quantities that relate
phenotype to fitness.

STEP 4: Rearrange to derive a rule describing the conditions


under which (the primary effect of) natural selection will
increase the average value of the character of interest.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
16 Queller (2011) himself suggests that there are eight steps, because he individuates the steps more finely
than I do. This is not a substantive difference.

!
! 143

Queller (2011) shows how to apply the general method to various particular cases.
Variants of this method have also been fruitfully employed by Steven A. Frank (1997a, b;
1998; 2006) and by jeff smith [sic] and colleagues (2010).

4.3.3 Neighbour-modulated and inclusive fitness

The general method is non-specific enough to encompass a variety of quite different


analytical approaches. One important ambiguity arises in Step 1: we are told to construct a
‘regression analysis of fitness’, but the meaning of fitness in the context of social evolution
is deliberately left unspecified. The reason is that there are two influential bodies of theory

falling within the purview of Queller’s general method, but which conceive of social fitness
in very different ways. These are the neighbour-modulated fitness (or direct fitness) and
inclusive fitness approaches to the analysis of kin selection.

The intuitive difference between these approaches is easy enough to grasp. The neighbour-
modulated fitness approach conceives of an individual’s fitness in terms of its own
reproductive output, and analyses the ways in which that output depends on the
behaviour of its social partners (the above derivations of HRP and HRQ are simple

examples of this approach). The inclusive fitness approach, by contrast, conceives of an


individual’s fitness as (roughly speaking) a sum of the fitness components for which its
behaviour is causally responsible, where each component is weighted by the individual’s
relatedness to the organism that is doing the reproducing. It proceeds to analyse how an
organism’s fitness (thus construed) depends on its own behaviour.

Recent years have seen considerable debate as to whether or not the two frameworks
constitute formally equivalent perspectives on social evolution (see, e.g., Frank 1998, 2006;
Taylor et al. 2007; Fletcher and Doebeli 2009; Rosas 2010; Martens 2011). I will weigh into
this debate in Chapter 5. For now, I merely want to note that both approaches involve

!
!
144

Steps 1-4 of the general method; the difference between them lies in the conception of
fitness they take as the target of analysis. Hence, the general method is inclusive enough to
accommodate not only a great plurality of predictor sets, but also a plurality of conceptions
of social fitness.

4.3.4 Contextual analysis as a special case

It is also worth briefly noting the extremely close relationship between Queller’s general
method and the contextual analysis approach to multi-level selection (Heisler and Damuth
1987; Damuth and Heisler 1988; Goodnight et al. 1992; Okasha 2006). In essence, contextual
analysis involves applying Queller’s method with a predictor set that includes group-level

properties. In the simplest case, we might only include Z, the average z-value of the focal
individual’s group:

z = {z , Z}

From this predictor set, we can (applying Steps 1-3) derive the following partition of the w(
g covariance:

() ( )
Δ1 g = β w ,z Z Var g + β w ,Z z Cov Z , g

!
And, from this partition, we can (applying Step 4) derive a variant of Hamilton’s rule in
which the ‘relatedness’ is equal to Cov ( g, Z ) Var ( g ) , a measure of the association between
an individual’s breeding value and the average z-value of the group in which it finds
itself17:

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
17 This quantity is sometimes known as the ‘whole-group relatedness’ and has received considerable
attention in recent kin selection literature, particularly in the contexts of microbial and human cooperation
(see, e.g., Pepper 2000; Frank 2006; El Mouden and Gardner 2008; Ross-Gillespie et al. 2009; El Mouden et al.
2010; Rankin et al. 2011; Cornforth et al. 2012).

!
! 145

Δ1 g > 0!!iff!!β w ,z Z + β w ,Z z


( ) >0
Cov Z , g
Var ( g )
!

Of course, the same general method could be applied to more complicated predictor sets,
including sets containing ‘emergent’ group characters (such as specialization or division of
labour). Hence, although contextual analysis is usually considered to fall under the
umbrella of multi-level selection theory, it might equally be regarded as a natural
extension of Queller’s general method for the analysis of kin selection to predictor sets that
include group characters. Its ambiguous status shows just how close to one another kin-
and group-selectionist approaches to social evolution have become.18

4.4 Solution 2: Bypass phenotypes

Queller first presented his extension to Hamilton’s rule in two papers in the mid-1980s
(Queller 1984, 1985). Shortly afterwards, in a paper entitled ‘Hamilton’s rule OK’, Grafen
replied that the proposed extension was unnecessary, since the standard version of the rule
already accommodates synergistic effects (see also Grafen 1985a, 81-2):

The third, synergistic term in Queller’s form can be made to


disappear by agreeing to define benefit and cost as the average

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
18 In Section 3.4, I argued that the main difference between kin- and group-selectionist methods of analysis
lies not in whether organisms are assigned to groups, but how. A multi-level analysis typically presupposes
that organisms can be sorted into equivalence classes on the basis of their social interactions, while kin
selection analyses typically presuppose that organisms can be sorted into genotypic classes, developmental
classes, or both. Although I think this is broadly correct, contextual analysis muddies the waters somewhat,
because it does not require that the groups to which it assigns Z-values are non-overlapping equivalence
classes. For instance, suppose we have a population of N organisms subdivided into N groups. Each group is
‘centred’ on a particular individual, comprising all and only those individuals with whom it interacts
(including itself). We can apply contextual analysis to this population by defining an individual’s Z-value as
average character of the group centred on it (cf. Godfrey-Smith 2006).
!

!
!
146

effects on individuals’ fitnesses, rather than as arbitrary terms


in a model of fitness. (1985b, 311)

For Grafen, the illusion that Hamilton’s rule breaks down in cases of synergy highlights
the fact that ‘care must be taken in applying Hamilton’s rule’ (1985a, 82). He adds, rather
stingingly, that ‘[a] result has no interest as an exception to Hamilton’s rule if it is based on
the wrong interpretation of r, b and c’ (1985a, 82).

There is a sense in which Grafen is right, and a sense in which he is wrong. As we noted in
Section 3.2, one might naïvely assume that Hamilton’s rule could not possibly

accommodate synergy, because simple synergy games involve a D-payoff and Hamilton’s
rule makes no mention of a D-payoff (cf. van Veelen 2009, 2011; van Veelen et al. 2012).
This, however, mistakes the partial regression coefficients in Hamilton’s rule for fecundity
payoffs in a payoff matrix. When computed correctly, the b and c coefficients do indeed
take the D-payoff into account, as Grafen correctly points out. Nevertheless, owing to the
way in which the partial regression coefficients are calculated, there is a systematic
tendency for HRP to overcompensate for the effects of synergy on the response to
selection. If the error is large, rb − c may even be qualitatively inaccurate regarding the

direction of partial change. As Queller (1992a) subsequently made clear (and as I argue
more formally in Appendix C), this is the real problem for HRP in cases of synergy.

Even so, there is still a way in which the synergistic term in HRQ can legitimately be made
to disappear in the spirit of Grafen’s original proposal. We can do this by replacing
phenotypic predictors with purely genetic predictors. By ignoring altogether the
phenotypic pathways linking genotype and fitness, we can avoid failures of the separation
condition and produce a version of Hamilton’s rule that almost never fails. Queller himself
was the first to point out this alternative response to the problem of synergy (Queller
1992b). In recent years, however, it has been defended chiefly by Grafen’s Oxford
colleagues (Gardner et al. 2007; Gardner et al. 2011).

!
! 147

4.4.1 Hamilton’s rule with genetic predictors (HRG)

The derivation of the genetic version of Hamilton’s rule is exactly parallel to the derivation
of HRP, and can be regarded as yet another special case of Queller’s general method. The
only difference is that our phenotypic predictor set is replaced with a genetic predictor set
(which we can denote with the letter g ) comprising the breeding value of the focal
individual, g, and the average breeding value of its social partners, ĝ :

g = {g, gˆ }

Applying Steps 2-4 of Queller’s method, we can derive the following rule:

Δ1 g > 0!!iff!!β w ,g ĝ + β w , ĝ g


( ) >0
Cov ĝ, g
Var ( g )
!

By using bg to represent βw , g gˆ , −c g to represent βw , gˆ g and rg to represent

Cov ( gˆ , g ) Var ( g ) , we can recast this into more familiar notation:

Δ g > 0!!iff!!rgbg − c g > 0 (HRG)


! 1

I will refer to this as Hamilton’s rule with genetic predictors (HRG). It differs from HRP in
two respects. First, the bg and c g terms represent the average effects of genotypes,

whereas the b and c terms in HRP represent the average effects of behavioural phenotypes.
Second, the rg term represents the correlation between social partner genotypes, whereas

the r term in HRP represents the correlation between actor phenotypes and recipient
genotypes.

!
!
148

4.4.2 On the generality of HRG

As with any other application of Queller’s method, the derivation of HRG requires an
assumption of uncorrelated residuals, that is, Cov (ε w , g ) = 0 . As we have seen, this
assumption is often substantive and contentious. In the case of HRG, however, it is
guaranteed to hold. This is because g is now among the predictors in our regression
analysis, and the method of least-squares (i.e., minimizing the sum-of-squares of the
residuals) ensures that the residuals in a regression equation do not co-vary with any of
the predictors (Queller 1992b). In light of this, we could say that HRG cannot possibly fail
the separation condition. Yet it would be rather more accurate to say that the separation
condition does not even apply to HRG, because HRG does not even attempt to separate
quantities which relate genotype to phenotype from quantities which relate phenotype to
fitness. In effect, it bypasses phenotypes altogether: it considers only the overarching
relationships between fitness and breeding value, without any care for how they are
mediated phenotypically.

The reward for bypassing phenotypes is a principle of extraordinary generality. After all,

Cov (ε w , g ) = 0 is the only substantive assumption in the derivation of HRP, and in the
derivation of HRG this assumption is trivially satisfied. The upshot is that HRG, as a
statement of the statistical conditions under which selection will favour a social behaviour,
is true of any population to which the Price equation applies, and in which the relevant
partial regression coefficients are well defined.

Accordingly, there are only two kinds of case in which HRG can fail. First, there are cases
in which HRG is false because the standard Price equation is also false. Such cases are
likely to be extremely rare, but they are not inconceivable, because the derivation of the
Price equation involves a substantive assumption to the effect that all descendants have the
same number of ancestors (see Kerr and Godfrey-Smith 2009; see also Section 2.1). Second,
there are cases in which HRG is false because, although social behaviour can still evolve,

!
! 149

the coefficients in HRG are undefined. Such cases may be rather less rare, since the terms
in HRG are undefined whenever g and ĝ are perfectly collinear (to see why, note that the
formula for partial regression coefficients given in Section 3.1.1 yields an undefined value
if 1 − ρ 2 = 0 , and perfect collinearity implies that ρ 2 = 1 ). It is not hard to envisage
scenarios in which such collinearity might arise, particularly in populations of asexually
reproducing microbes. Yet, on the face of it, these scenarios seem conducive, not hostile, to
the evolution of social behaviour. The fact that HRG does not apply to such cases shows
not that social behaviour can never evolve in them, but rather that HRG is not a fully
general condition for the evolution of social behaviour.19

4.5 The solutions compared

We now have two solutions to the problem of synergy on the table. Solution 1 is to expand
the predictor set to more accurately represent the causal structure of the phenotypic
pathways linking genotype to fitness. Solution 2 is to bypass phenotypic pathways
altogether so as to recast Hamilton’s rule as a purely genetic principle. Which should we
prefer? In this section, I want to argue that we do not have to choose—or rather, we do not
have to choose one solution to apply across the board. The general moral of the problem of

synergy, I submit, is that there are two explanatory functions traditionally assigned to
Hamilton’s rule. Each solution to the problem prioritizes one of these functions but
neglects the other, and there is no way for a single principle to satisfy both at once. This
gives us grounds for an irenic resolution to the debate: neither solution to the problem
gives us everything we might want, but both solutions give us something worth having.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
19 HRP faces a similar problem, but less severely. The coefficients in HRP are still well defined when social
partners have identical genotypes, provided there is still some phenotypic variation between social partners
with respect to the characters under study.

!
!
150

4.5.1 The dual role of Hamilton’s rule

One might intuitively expect that the primary role of Hamilton’s rule would be
predictive—that biologists would estimate its coefficients in order to predict the social
behaviours that natural selection will have built. In fact, the power of Hamilton’s rule (in
any form derived from the Price equation) to predict the long-run trajectory of social
evolution is extremely limited. There are two main reasons for this. The first is that the rule
concerns only the direction of (the primary effect of) social selection—it avoids analysing
any other relevant influences on evolution, including mutation, genetic drift and
intragenomic conflict. We can therefore use Hamilton’s rule to predict the overall direction
of social evolution only on the assumption that these effects are insignificant. The second
reason is that the r, b and c coefficients represent aggregate statistical properties of a
population, and these properties are liable to change as gene frequencies change. The
implication is that, even if Hamilton’s rule is satisfied for a particular social behaviour at a
particular time, it will not necessarily be favoured by social selection at a later time (cf.
Birch forthcoming).

If the pathways linking breeding value to fitness are frequency-independent (i.e., genes
have frequency-independent effects on phenotype, and phenotypes have frequency-
independent effects on fitness), then there is no particular reason why the direction of
social selection would change as gene frequencies change. In this special case, Hamilton’s
rule can serve as a rough guide to whether a trait will go to fixation in the long run. But
frequency-independence is likely to be the exception rather than the rule in real-world
social evolution. Notably, frequency-dependent effects can be generated by synergy (as in
the synergy game) or by dominance or epistasis among genes, and both kinds of
phenomenon are biologically commonplace. When effects are frequency-dependent,
Hamilton’s rule will only ever provide a static ‘snapshot’ of a dynamic process. The
direction of kin selection may well change over evolutionary time and, if we want to

!
! 151

understand how it changes, we have no choice but to build a more concrete model of the
evolutionary dynamics (cf. Grafen 1985a,b; Frank 1995, 1998, 2012; Traulsen 2010).

These limitations do not, however, render Hamilton’s rule explanatorily useless.


Explanation in evolutionary biology is not all about long-run predictive success, and there
are important explanatory roles to which Hamilton’s rule, in spite of its predictive
limitations, remains well suited. Two such roles stand out as especially important: that of
providing causal explanations of observed evolutionary outcomes, and that of unifying
diverse dynamical models under a common conceptual framework.

Causal explanation in the field


When we encounter social behaviours in nature, we can usually be fairly confident that
they evolved at least in part because they were favoured by natural selection, but this by
no means tells us everything we want to know. We also want to know why they were so
favoured. In particular, we want to know whether they were favoured by virtue of their
effects on the actor performing them, or whether they were favoured by virtue of their
effects on other individuals. The standard way to answer this question is to estimate the
coefficients in Hamilton’s rule (Grafen 1985a). By estimating the value of r, b and c for a

given population, a behavioural ecologist can assess firstly whether the rule is satisfied,
and secondly how the terms compare. They can thereby make an inference as to why the
trait was originally favoured by selection.

Since Hamilton’s original (1964) derivation of the rule, numerous empirical studies have
put this method into practice (e.g., Grafen 1984; Gadagkar 2001; Oli 2003; smith et al. 2010;
Waibel et al. 2011; for reviews, see Foster 2009; Westneat and Fox 2010; Bourke 2011).
Importantly, however, the method is usually only practicable if the cost and benefit
coefficients in Hamilton’s rule are understood as average effects of phenotypes rather than
of breeding values. This is because, while we can usually gather data on the behaviours of
particular organisms and their fitness consequences, it is rarely possible to gather data on

!
!
152

the genotypes of particular organisms outside of the laboratory. Indeed, as Grafen (1985a)
notes, this is the main reason why Hamilton’s rule—in contrast to the concepts and
methods of classical population genetics—has come to be so influential among behavioural
ecologists:

In applications to data, Hamilton’s rule comes into its own. The


great differences from models are that usually with data on
social traits, the genotypes of individuals are unknown and the
genetic system controlling the trait is unknown. This makes
worries about dominance, number of loci and mode of gene

action purely academic. In modelling, the fundamental


population genetics method of finding the number of offspring
of each genotype is the main rival to Hamilton’s rule. This
alternative simply cannot be applied to data if the genotypes of
individuals are unknown. Hamilton’s rule can be applied,
provided enough information is available to measure the effects
of social action. (Grafen 1985a, 76)

Conceptual unification of models


Hamilton’s rule has a very different explanatory function in theoretical population
genetics, where it is seen not as a source of causal explanations of particular phenomena,
but rather as a means of unifying diverse dynamical models under a single conceptual
umbrella. This conception of the explanatory role of Hamilton’s rule is forcefully
advocated by Gardner et al. 2007:

The most powerful and simple approach to evolutionary


problems is to start with a method such as population genetics
(including the multilocus approach), game theory or direct-
fitness maximization techniques. The results of these analyses

!
! 153

can then be interpreted within the frameworks that Price’s


theorem and Hamilton’s rule provide. The correct use of these
powerful theorems is to translate the results of such disparate
analyses, conducted with a variety of methodologies and
looking at very different problems, into the common language
of social evolution theory. (Gardner et al. 2007, 224)

Gardner and colleagues’ emphasis on translating results into ‘the common language of
social evolution theory’ shows that the explanatory role being performed by Hamilton’s
rule is unificatory rather than causal-explanatory: the aim is not add any additional causal

detail to that already included in the underlying dynamical model, but rather to show how
the results of many particular models, from various different modelling traditions, can all
be seen as particular instances of an overarching general principle.

4.5.2 The right rule for the right job

Recognizing the dual explanatory role that Hamilton’s rule is often expected to play in
contemporary sociobiology allows us to see the value in both solutions to the problem of
synergy. For Solution 1 shows us how to extend Hamilton’s rule so as to better enable it to
perform its causal-explanatory function, and Solution 2 shows us how to reformulate it so
as to better enable it to perform its unificatory function.

Let us consider the latter function first. If Hamilton’s rule is to provide a common language
in which to express the results of diverse modelling approaches, generality is paramount: it
is important, in particular, that the rule still holds in models of synergistic interaction. The
genetic formulation of Hamilton’s rule, HRG, provides maximal generality. What it tells
us, in effect, is that all cases of the genetical evolution of social behaviour by natural
selection have something in common: in all such cases, an individual’s fitness depends not
only on its own internal genes, but also on the external genetic milieu in which it finds

!
!
154

itself. It also tells us that, as a consequence, the direction of selection depends on three
factors: the association between an individual’s fitness and its own genes; the association
between an individual’s fitness and any relevant external genes; and the correlation
between its own genes and its external genetic milieu. This may not tell us very much, but
it does tell us something—and it may well be all there is to say in general about the nature
of broad-sense kin selection.

For all its unifying power, however, HRG is dreadfully ill suited to providing causal

explanations of particular social phenomena. There are two main reasons for this. The first

is that, since the cost and benefit terms in HRG represent the average effects of genotypes,

not phenotypes, its terms are extremely difficult to measure accurately in real ecological

contexts. The second is that, even if we could estimate the cost and benefit terms in real

contexts, it is not clear that we would gain much in the way of causal explanation by doing

so, since HRG takes no account of how the overall association between fitness and

breeding value is causally mediated by phenotypic pathways. One consequence of this is

that estimating the terms in HRG would not settle the question of whether a particular

behaviour is selfish, spiteful, altruistic or mutually beneficial, in the standard technical

sense of these terms (see Chapter 2). The information that bg is positive and −c g is
!
negative for a particular social behaviour, for example, would not imply that the behaviour

is altruistic. It would imply only that the genes for that behaviour correlate negatively with

actor fitness and positively with recipient fitness—and this pattern of correlation could in

principle by explained by pleiotropic effects of the relevant genes on a quite different

phenotype. The point is not that such pleiotropy is especially likely (though see Foster et

al. 2004), but merely that HRG does not rule it out: because it says nothing at all about the

causal pathways linking g to w, it radically underdetermines the true causal explanation of

a trait’s evolutionary success.

!
! 155

If we want to derive principles of social evolution that can do serious causal-explanatory


work in real ecological contexts, then we must include phenotypes; and this is where
Solution 1 earns it keep. Queller’s method provides a general recipe for causal analyses of
phenotypic pathways, of which HRQ is merely a simple example. The downside, of course,
is that in applying this method we lose the generality HRG afforded: the traditional two-
term form of Hamilton’s rule (HRP) does not fully account for the w(g covariance even in
simple synergy games, and predictor sets that capture all the causally relevant phenotypes
(and thereby do account for all the w(g covariance) will often need to be large and
complicated, and to vary significantly from one case to the next, if they are to do justice to
the complexity of real, evolving populations (cf. Frank 1998; smith et al. 2010).

Let us review the argument of this section. Hamilton’s rule has two roles in contemporary
biology: it is a causal-explanatory principle that field biologists use to make sense of real
evolutionary outcomes; but it is also employed by theorists as a unifying principle that
captures a general feature of processes of social evolution. It would be convenient—but
also somewhat miraculous—if a single principle could fulfil both explanatory functions.
The moral of the problem of synergy is that this is not the case. To fulfil its causal-
explanatory function, Hamilton’s rule must be formulated using a causally adequate set of

phenotypic predictors, and this leads to extended versions of the rule with as many
predictors as it takes to reflect the causal structure of the problem of interest. To fulfil its
unificatory function, Hamilton’s rule must be formulated in terms of genetic predictors, at
considerable cost to its causal-explanatory power. No single principle can do both the
causal-explanatory work and the unificatory work: HRG is apt to perform the unificatory
work, while the myriad Hamilton’s rule-type principles derived via Queller’s general
method are apt to do the causal-explanatory work. Accordingly, there is no one
formulation uniquely entitled to the name ‘Hamilton’s rule’. The bottom line is that there is
no pressing need to choose between our two solutions to the problem of synergy. It is not
that one solution is correct and the other incorrect; rather, each provides a means of

!
!
156

salvaging a version of Hamilton’s rule that is able to perform one of the rule’s traditional
explanatory functions at the expense of the other.

!
FIVE
!

Two Conceptions of Social Fitness!

[T]here exist two forms of Hamilton’s rule, each with its own
distinct coefficient of relatedness. ... The similarity in the form
of these coefficients often leads to the mistaken conclusion that

direct and inclusive fitness models are the same process


described in two different ways.

(Frank 1997a, 1719)

In Chapter 4, we saw that Hamilton’s rule in its traditional rb − c > 0 form (where ‘b’ and
‘c’ represent average effects of phenotypes) struggles to accommodate the complexities of
social phenomena in real ecological contexts, where synergistic effects are rife. We also
saw that, although the rule still holds if reformulated in terms of the average effects of

genotypes, it loses much of its causal-explanatory power in the process. If we want to use
kin selection theory to give causal explanations of particular social phenomena, we are
better off applying Queller’s general method: construct a regression model with enough
predictors to fully capture the causal pathways linking genotype to fitness, and substitute
this model into the Price equation to derive a condition for positive social selection on the
character under study. At the time (Section 4.3.3), we noted in passing that this general
method has been developed in two quite different ways by contemporary theorists:
modern kin selection theory incorporates both the neighbour-modulated (or direct) fitness
approach and the inclusive fitness approach. Both are instances of Queller’s general
method, and both are potentially more general than (the phenotypic version of)
158

Hamilton’s rule. The overarching aim of this chapter is to examine the relationship
between these approaches.

The terminology of ‘neighbour-modulated’ and ‘inclusive’ fitness is originally due to


Hamilton (1964, 5-6), who noted in passing the possibility of two alternative accounting
schemes for fitness in the context of social behaviour. Hamilton himself chose to focus on
developing the second, ‘inclusive fitness’ method, and the alternative has never received
quite the same degree of popular recognition. Nevertheless, a long tradition of theory has
seen the notion of ‘neighbour-modulated’ fitness—often under the name of ‘direct’ or
‘personal’ fitness—gradually and inconspicuously grow into a full-fledged framework for
the analysis of kin selection (see especially Orlove 1975, 1979; Cavalli-Sforza and Feldman
1978; Grafen 1979; Queller 1985; Taylor and Frank 1996; Frank 1998).

Unsurprisingly, this has led to considerable discussion of when—and in what sense—the


two frameworks constitute ‘equivalent perspectives’ on social evolution, rather than
genuine rivals. Many theorists have suggested that the two frameworks are no more than
alternative methods of bookkeeping which, if applied correctly, cannot disagree on any
substantive questions (see, e.g., Dawkins 1982; Rousset 2004; West et al. 2007; Gardner et
al. 2007; Gardner and Foster 2008; Wenseleers et al. 2010; Gardner et al. 2011; Queller
2011). Yet there have always been dissenters from the consensus view, notably including
John Maynard Smith (1980, 1983, 1987), who contrasts ‘the exact “neighbour-modulated
fitness” approach’ with ‘the more intuitive “inclusive fitness” method’ (1983, 315).
Although Maynard Smith ultimately advocates inclusive fitness over the alternative (on
the grounds that it is easier to apply), he suggests that it is unlikely to apply as widely. A
similar sentiment is shared by Jeffrey A. Fletcher and Michael Doebeli (2006, 2009, 2010;
see also Fletcher and Zwick 2006; Fletcher et al. 2006), who controversially argue that only
a neighbour-modulated/direct fitness approach has the resources to cover all cases of the
evolution of altruism. Other authors defend positions that are hard to place squarely in
either camp. Peter D. Taylor and colleagues (2007), for example, argue that if certain

!
! 159!

conditions obtain (namely, fair meiosis, weak selection and interactions among
conspecifics only), then the two frameworks will yield equivalent predictions about the
direction of evolutionary change. They allow, however, that when their assumptions are
relaxed, the frameworks might well come apart. Steven A. Frank (1997a,b, 1998) is a
second example: though regularly cited as a defender of equivalence, he writes (in the
passage quoted at the start of this section) of the ‘mistaken conclusion that direct and
inclusive fitness models are the same process described in different ways’ (1997a, 1719).

What is needed is a precise and general statement of the conditions under which the
frameworks are equivalent, and of the conditions under which they are not. My goal in
this chapter is to advance the debate by providing such a statement. The overview of the
chapter is as follows. In Section 5.1, I further motivate the project by introducing two quite
different ways of thinking informally about the role of relatedness in social evolution. I
suggest that, in asking whether neighbour-modulated and inclusive fitness are ‘formally
equivalent’, we are essentially asking whether these ‘ways of thinking’ describe different
mechanisms for the evolution of altruism, or whether they are merely alternative
perspectives on the same mechanism. In Section 5.2, I set out more precisely the
conceptual contrast between neighbour-modulated and inclusive fitness, and highlight a
number of subtle features of inclusive fitness that are often overlooked. I then introduce
Steven A. Frank’s (1997a,b, 1998) influential formalism for neighbour-modulated and
inclusive fitness, which provides the most appropriate framework within which to
address questions of their formal equivalence. In Sections 5.3 and 5.4, I discuss in detail
when the frameworks are equivalent, and when they are not. I argue, in short, that, while
the frameworks can be shown to be equivalent across a wide range of cases, there are
some important classes of case in which they are non-equivalent.

!
!
160

5.1 Why does relatedness matter? Two kinds of answer

It is a platitude of social evolution theory that relatedness between social partners can lead
to the evolution of social behaviour that would not be stable in its absence. The paradigm
cases are cases of altruism, in which an individual negatively impacts its own fitness while
conferring a fitness benefit on another individual. We often say that relatedness between
social partners makes altruism possible. But why does relatedness matter so much? In
contemporary social evolution theory, one finds not one but two kinds of answer to this
question; and it is certainly not obvious that they are equivalent.

5.1.1 The ‘indirect reproduction’ answer

There is a longstanding affinity between kin selection theory and a ‘gene-centred’ or


‘gene’s eye’ view of evolution (see, e.g., Dawkins 1976, 1979, 1982; Bourke 2011a,b). At first
glance, one might wonder whether this is simply an historical accident: a result of the fact
that W. D. Hamilton, in addition to being the first theorist to formulate the core idea of kin
selection, was also a proponent of the gene’s eye view (and therefore formulated his
theory in genetic terms). After all, kin selection theory is fundamentally concerned with
how interactions and patterns of resemblance between organisms influence the outcome of
selection, and it is possible to formulate versions of the theory that do not assume
particulate inheritance (Gardner 2011). I suspect, however, that the widespread association
of these ideas, theoretically separable though they may be, is no accident, for taking a
‘gene’s eye’ perspective on kin selection makes the basic evolutionary logic behind the
theory seem incredibly intuitive.

The familiar ‘gene-centred’ rationale for kin selection goes like this: organisms are
designed by natural selection to do whatever they can to maximize their genetic
representation in the next generation. Broadly speaking, there are two ways for an
organism to do this. The direct way for an organism to increase its genetic representation is

!
! 161!

for it to have more offspring of its own, since it will be able to transmit copies of its genes
to the next generation via these offspring. But an organism# can also increase its genetic
representation indirectly, by helping other organisms have more offspring. This strategy
will only work, however, if the organism!differentially confers benefits on recipients who
are more likely than average to transmit copies of its genes to their offspring. We can think of
these recipients as the organism’s ‘relatives’, although there is no requirement that they
are its genealogical kin: what matters is that, for whatever reason, they are disposed to
transmit copies of its genes. We can intuitively see how altruistic behaviours might evolve
by this route: an organism may sacrifice some of its direct reproduction to help relatives,
but it will be a sacrifice worth making if the increased genetic representation it gains
through the indirect pathway outweighs the representation it loses through the direct
pathway. Here, then, is our first answer to the question of why relatedness matters:

Answer 1: Positive relatedness promotes altruism because it


provides altruists with an indirect means of transmitting their
genes to the next generation.

When an organism gains genetic representation in the next generation through helping
relatives, the process is often glossed as indirect reproduction. Consider, for example, the
following quotations:1

Social insects are characterized by indirect reproduction, in


which most individuals achieve genetic success by helping to
rear the offspring of colony mates. (Strassmann et al. 1989, 268)

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
1 See also, e.g., Queller 1989, 1996; Gadagkar and Bonner 1994; Cronk 1991, 2007; Choe and Crespi 1997;
Voland 1998; Queller et al. 2000; Oli 2003; Frank 1998, 2006; Ratnieks et al. 2006; Ratnieks and Wenseleers
2008.

!
!
162

[N]onreproductive workers ... can compensate for their loss of


direct reproduction by the indirect reproduction achieved
through helping relatives, provided relatedness is high enough.
(Hastings et al. 1998, 573)

For this reason, I will call this line of thought the ‘indirect reproduction’ explanation for
the importance of relatedness. The idea, in a nutshell, is that high relatedness matters to
the evolution of altruism because it allows social actors to achieve a form of ‘indirect
reproduction’ through altruistic acts.

5.1.2 The ‘positive assortment’ answer

Talk of ‘indirect reproduction’ remains widespread in elementary expositions of kin


selection, particularly those directed at students and field biologists. It is, however,
somewhat out of fashion in theoretical circles. The concern is that, in emphasizing genetic
resemblance between social actors and the offspring of their recipients, the ‘indirect
reproduction’ story unnecessarily limits the domain of phenomena to which kin selection
theory can apply. For many of the current generation of social evolution theorists, kin
selection theory can and should be extended to accommodate and explain any process in
which salient resemblance between individuals leads to the evolution of social behaviour,
be it via ‘narrow-sense’ kin selection based on genealogical kinship, direct or indirect
reciprocity, group selection, greenbeard effects, or any other selection process. Since many
of these processes do not involve social actors securing genetic representation in the next
generation by an indirect pathway (so the line of thought goes), we need to replace the
traditional explanation for the importance of relatedness in social evolution with one that
applies more widely.

This theoretically in-vogue story takes the defining feature of a process of kin selection to
be assortment between recipient genotypes and actor phenotypes (see, e.g., Kerr and

!
! 163!

Godfrey-Smith 2002; Fletcher and Zwick 2006; Fletcher and Doebeli 2006, 2009, 2010;
Godfrey-Smith 2009a; Rosas 2010). In the context of the evolution of altruism, what
matters is that bearers of altruistic genotypes differentially receive the benefits of the
altruism. As Fletcher and Doebeli (2009) put it:

[W]hat is necessary for the evolution of altruism is assortment


between focal genotype and phenotypic help, rather than the
assortment among genetic types often emphasized in kin
selection theory. (Fletcher and Doebeli 2009, 17)

Informally, the ‘new’ story goes like this: when some altruistic behaviour evolves ‘by kin
selection’, it evolves because individuals with altruistic genotypes are fitter on average
than non-altruists; and they are fitter on average because there is a statistical tendency for
the benefits of altruism to fall differentially on bearers of altruistic genotypes. We can still
talk of ‘relatedness’ in this framework, but the relatedness that matters is correlation
between one’s own genotype and the phenotypes of the social actors with whom one
interacts; genetic correlation between actors and recipients is strictly optional.

Here, then, is our second answer to the question of why relatedness matters:

Answer 2: Positive relatedness promotes altruism because it


implies that the benefits of altruistic acts fall differentially on
bearers of the genes for altruism.

I will call this the ‘positive assortment’ answer to the question of why relatedness matters.
The attraction over the ‘indirect reproduction’ answer, at least in the eyes of its
proponents, lies in its ability to extend to cases in which there is no strong genetic
similarity between actors and recipients.

!
!
164

5.1.3 The equivalence question

These two answers provide us with pictures of how altruism evolves that are, on the face
of it, quite different from each other (Box 4.1). This naturally leads to the question of when,
if at all, these two pictures constitute equivalent representations of the same evolutionary
process, rather than representations of qualitatively different processes.

Our intuitions on this question could, I think, go either way. On the one hand, we might
intuitively take the two pictures to represent qualitatively different mechanisms for the
evolution of altruism, since there is a qualitative difference in the sort of ‘relatedness’ they
take to be important. For note that, on the ‘indirect reproduction’ answer, what really

matters is correlation between the genotype of the actor and the genotype of the recipient’s
offspring; whereas, on the ‘positive assortment’ picture, what matters is correlation
between the phenotype of the actor and the genotype of the recipient. Though the difference is
subtle, neither kind of correlation strictly implies the other (a point I revisit in Section 6.4).
On the other hand, we might intuitively suspect that, although the two pictures embody
subtly different conceptions of the sort of ‘relatedness’ that matters for the evolution of
altruism, in reality both kinds of relatedness tend to arise from the same family of causal
mechanisms: limited dispersal, kin recognition, greenbeard effects, and so on. If the causal

mechanisms responsible for both kinds of correlation are the same in many cases, then it
would seem more reasonable to regard our two informal pictures (at least in these cases)
not as characterizations of different processes, but as two alternative ways of visualizing
the same process.

I therefore doubt that we can settle the equivalence question by intuition alone: to arrive at
a more satisfactory answer, we must approach the question formally. This is the strategy I
intend to pursue in the coming sections. In the next section, I introduce the neighbour-
modulated and inclusive fitness approaches to the analysis of kin selection, and I argue
that the distinction between these approaches reflects the distinction between our two

!
! 165!

informal pictures. As a consequence, the question of when, if at all, ‘indirect reproduction’


and ‘positive assortment’ constitute equivalent perspectives on social evolution may be
recast as the question of when, if at all, the neighbour-modulated and inclusive fitness
approaches are formally equivalent.

!
!
166
Box 5.1: Two ways for altruism to pay

Generation!1! Generation!2!

Picture 1: Altruism pays due to indirect reproduction. Altruists (red) differentially confer fitness benefits on

recipients who are disposed to transmit the genes for altruism. Recipients thereby provide actors with an indirect

route to genetic representation in the next generation. Actor phenotypes may also correlate with recipient

genotypes, but this is not assumed.

Generation!1! Generation!2!

Picture 2: Altruism pays due to positive assortment. Altruists (red) differentially receive the benefits of

altruism. They are therefore fitter, on average, than individuals who do not possess the altruistic genotype (blue).

The recipient’s offspring may also bear a genetic resemblance to the actor, but this is not assumed.

!
! 167!

5.2 Neighbour-modulated and inclusive fitness

5.2.1 The conceptual contrast

The Price formalism describes the evolutionary change between two sets of entities
connected by a mapping relation R (cf. Kerr and Godfrey-Smith 2009; Frank 2012). In
Chapter 3 we noted that, while in biological contexts the salient mapping relation between
the two sets is usually direct lineal descent (i.e., parenthood, if descendants and ancestors
are separated by a single generation), we may in principle assign descendants to ancestors
in alternative ways. The fundamental difference between neighbour-modulated and
inclusive fitness is that, while the former prioritizes considerations of parenthood in the
assignment of descendants to ancestors, the latter prioritizes considerations of social
causation and control (cf. Frank 1998). This can lead to radically divergent measures of an
individual’s social fitness.

An individual’s neighbour-modulated fitness (Figure 5.1) is a measure of its personal


reproductive success: typically, it is the expected or realized number of offspring of which
it is a parent. The qualifier ‘neighbour-modulated’ is merely an acknowledgement that in
cases of social behaviour, an individual’s personal reproductive success is influenced—
often heavily influenced—by the properties of the individuals with which it interacts.
Note that, although neighbour-modulated fitness can be used to analyse kin selection, the
concept of relatedness does not feature in the definition of neighbour-modulated fitness. To
evaluate the neighbour-modulated fitness of a particular individual, we need only look at
the offspring it personally produces: we need not have any prior information about its
relatedness to its social partners.

An individual’s inclusive fitness (Figure 5.2) is a weighted sum of the causal contributions
its behaviour makes to the reproductive success of individuals, including (but not limited

!
!
168

to) its causal contribution to its own reproductive success. Each contribution is weighted
by a measure of its value to the actor as a route to genetic representation in the
descendant-population, and the correct weights are measures of genetic correlation
between the actor and the recipient’s descendants (Frank 1997a,b; 1998). Note, therefore,
that relatedness does feature explicitly in the definition of inclusive fitness, and that the
relatedness that matters is strictly genetic. In W. D. Hamilton’s original words:

Inclusive fitness may be imagined as the personal fitness which


an individual actually expresses in its production of adult
offspring as it becomes after it has been stripped and
augmented in a certain way. It is stripped of all components
which can be considered as due to the individual’s social
environment, leaving the fitness he would express if not
exposed to any of the harms or benefits of that environment.
This quantity is then augmented by certain fractions of the
quantities of harm and benefit which the individual himself
causes to the fitness of his neighbours. The fractions in question
are simply the coefficients of relationship. (Hamilton 1964, 8)

!
! 169!

B# C#

b
##
b
##

A#

)c## b
##
D#
wdir ( A ) = Ω A + 3b − c

Figure 5.1: Neighbour-modulated fitness. In a neighbour-modulated fitness analysis, we


ascribe to A all and only those fitness components that correspond to its personal
reproductive success. Some of these components are influenced by the behaviour of B, C
and D (as indicated by the arrows). A’s total neighbour modulated fitness is a
straightforward, unweighted sum of these components (3b), plus a component
corresponding to A’s own influence on its reproductive success via the character under
study (-c), plus a baseline component (ΩA) independent of that character.

!
!
170

B# C#

b
##
b
##

A#

b
)c## ##

D#
winc ( A ) = Ω A + τ AB b + τ AC b + τ AD b − c

Figure 5.2: Inclusive fitness. In an inclusive fitness analysis, fitness effects are assigned to
the actors whose behaviour was responsible for them. A therefore loses the slices of
personal fitness it obtained through its interactions with B, C, and D. In compensation, it
gains three new slices, taken from the personal fitness of B, C and D, which causally depend
on its own behaviour. To calculate A’s inclusive fitness, these new slices must be weighted
by a suitable measure of their value to A as routes to genetic representation in the next
generation (τ), and this will be a measure of genetic relatedness. In short, therefore, A’s
inclusive fitness is a relatedness-weighted sum of the fitness effects for which it is causally
responsible.

!
! 171!

5.2.2 Five subtleties of inclusive fitness

The notion of inclusive fitness is conceptually more challenging than that of neighbour-
modulated fitness, for at least five reasons. Several of these arise from the counterintuitive
consequences of the regression definition of relatedness (see Section 3.1.4), while others
arise from the sensitivity of inclusive fitness to considerations of causal responsibility.

Inclusive fitness is an inherently causal notion


An individual’s neighbour-modulated fitness functionally depends on its interactions with
social partners. But the relationship between inclusive fitness and causation is more
intimate than this, for the very notion of inclusive fitness is defined in explicitly causal

terms. To calculate an individual’s inclusive fitness, we need information about the


reproductive success of all and only those individuals with which it has causally interacted. A
failure to appreciate the causal nature of inclusive fitness leads to the widely held but
inaccurate view that an individual’s inclusive fitness depends on the fitness of all
individuals to whom it is related, whether or not it has ever interacted with them. In fact,
having a very successful relative will not increase one’s inclusive fitness unless one is
causally responsible for a portion of that relative’s success (Grafen 1979, 1982, 1984;
Dawkins 1982).

Inclusive fitness is character-relative


In an inclusive fitness analysis, fitness components are assigned to an actor on the basis of
how it has influenced its own reproductive success and that of others through expressing the
character (or set of characters) under study. This introduces a form of character-relativity,
since an individual may have very high inclusive fitness with respect to one character (or
set of characters), and yet have very low inclusive fitness with respect to another. One
might suppose that we could define an organism’s overall inclusive fitness as the sum of its
inclusive fitness with respect to each of its characters, but this would quickly lead to
problems of double counting. For instance, the founding of a new insect colony has many

!
!
172

downstream effects, including the production of new workers and all the work they do. If
we are attempting to estimate the inclusive fitness effect on a foundress of founding a new
nest (rather than staying in her mother’s nest), we will need to take these effects into
account and attribute them to the (distally responsible) foundress. But if we are attempting
to estimate the inclusive effect on a worker of participating in an item of work, we will
need to take some of these very same effects into account—but this time we will need to
attribute them to the (proximally responsible) worker. Because causal responsibility is
inescapably character-relative, so is inclusive fitness.

Inclusive fitness is independent of considerations of parenthood


Inclusive fitness replaces considerations of parenthood with considerations of causal
responsibility, and this has occasionally counterintuitive consequences. On the one hand,
an individual with no personal offspring at all can still have substantial inclusive fitness
with respect to a particular character, if it confers fitness benefits on many related
individuals. This is one reason to suspect that inclusive fitness is extremely important to
the evolution of sterile workers in insect societies. These individuals typically have zero
neighbour-modulated fitness, but, since they devote their lives to assisting the queen, their
inclusive fitness is likely to be much higher. On the other hand, and less intuitively, an
individual with high personal reproductive success could still have low or even zero
inclusive fitness with respect to some social behaviour, should it be the case that (i) its
personal fitness is heavily influenced by instances of the behaviour in other individuals,
and yet (ii) it does not confer any fitness benefits on related individuals by expressing the
behaviour itself. An inclusive fitness analysis will strip this individual of the fitness
components it owes to the behaviour of others; and, since it does not confer any fitness
benefits on others through its own behaviour, it will not gain any new components in
return. The result is that its inclusive fitness will be much lower than its neighbour-
modulated fitness.

!
! 173!

Spiteful behaviours can increase inclusive fitness


In Section 4.1.4, we noted that, on the regression definition, relatedness can be negative,
and that this allows for the evolution of spiteful behaviours which detract from the fitness
of both actor and recipient. A further implication is that, if relatedness is sufficiently
negative, and if the harm caused to the recipient is sufficiently greater than the harm
incurred by the actor, then performing a spiteful behaviour can increase an organism’s
inclusive fitness. Even more counterintuitively, performing a mutually beneficial
behaviour can detract from an actor’s inclusive fitness, if relatedness is sufficiently negative
and the benefit conferred on the recipient is sufficiently greater than the benefit obtained
by the actor. This reflects the fact that inclusive fitness is ultimately a measure of how
successful an actor has been in securing genetic representation in the next generation.
Sometimes helping the wrong recipients is counterproductive in this regard, and the result
is that the sign of the inclusive fitness effect differs from the sign of the actual fecundity
payoff.

Inclusive fitness, like relatedness, is population-relative


In Section 4.1.4, we noted that relatedness is, strictly speaking, a property of a population,
and that the choice of an appropriate reference population has significant consequences
for the link between relatedness and altruism. Inclusive fitness, by contrast, is a property
of a particular organism, and depends on the fitness effects for which the organism is
personally responsible. Yet because these effects are weighted by coefficients of
relatedness, and relatedness is population-relative, inclusive fitness is also population-
relative: it is a property of a particular organism relative to a reference population. The upshot
is that the choice of reference population may affect whether or not a given behaviour
contributes positively or negatively to inclusive fitness.

Suppose (as in Section 4.1.4) that we are studying a viscous population in which
relatedness between social partners is high relative to the global population mean, but low
relative to the local subpopulation mean: kin cluster together on the whole, but organisms

!
!
174

do not differentially interact with the closest kin in their vicinity. In such a scenario, it may
well be that altruistic behaviours contribute to an organism’s inclusive fitness relative to
the global population, yet detract from an organism’s inclusive fitness relative to the local
subpopulation. The implication is that, if we want to use inclusive fitness considerations to
draw inferences about the kinds of social behaviour that are likely to evolve, we need to
know where the competition is. If competition is mostly local (i.e., within subpopulations),
then inclusive fitness should be evaluated relative to the local subpopulation; if
competition is mostly global (i.e., between subpopulations), then inclusive fitness should be
evaluated relative to the global population.

Creel’s paradox
The importance of taking these various subtleties into consideration is aptly illustrated by
‘Creel’s paradox’ (Creel 1990; Queller 1996). Scott R. Creel (1990) argues that, given how
inclusive fitness is usually defined, the queen of a social insect colony seems to have
virtually zero inclusive fitness. After all, her reproductive success owes little to her own
behaviour, and she does nothing at all to aid the reproductive success of other individuals.
She spends her life laying millions of eggs, all the while receiving a steady stream of
fitness benefits from the minions who supply her with food and shelter, and who defend
the colony with their lives. As a result, her personal (neighbour-modulated) fitness is
undoubtedly extremely high, but her inclusive fitness, strictly speaking, must be
negligible. The strange implication appears to be that, in a social insect colony, the workers
have greater inclusive fitness than the queen! Creel considers this result sufficiently
implausible to warrant a significant revision to the theory of inclusive fitness, for there is
plenty of empirical evidence of workers fighting amongst themselves to replace or
displace a queen (see, e.g, Queller and Strassmann 1998), but no evidence of queens and
their daughters fighting amongst themselves for the right to be workers.

Queller (1996) shows, however, that the appearance of paradox (or at least, of a seriously
counterintuitive result) is dispelled when we consider the causal and character-relative

!
! 175!

nature of inclusive fitness. It is true enough that the queen has low inclusive fitness with
respect to most of the behaviours routinely expressed by workers: foraging, nest
construction, nest defence, and so on. She contributes little to these tasks, and gains a great
deal from their completion. But the queen has very high inclusive fitness with respect to a
different behaviour: the behaviour of founding a new colony and adopting the queen role.
For in installing herself as queen, rather than choosing to work for another, the queen
makes a vast causal contribution to her own reproductive success.

5.2.3 Frank’s formalism for neighbour-modulated fitness

While the conceptual distinction between neighbour-modulated and inclusive fitness is


easy enough to grasp, we can only understand their subtle relationship to each other by
introducing a formal framework within which both conceptions of social fitness may be
captured and compared. The formalism I introduce in this section is based on that of
Frank (1997a,b, 1998), with a few simplifications made for ease of exposition.2 Frank’s
formalism falls under the broader umbrella of the Price formalism (Chapter 3), and can
also be regarded as an application of Queller’s general method for the regression analysis
of social evolution (Chapter 4).3

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
2 In particular, Frank sorts organisms into both genotypic classes and developmental classes, whereas I only
employ developmental classes. Sorting by genotype requires the introduction of class reproductive value
weightings, since the fitness of a genotype in one developmental class will not in general equal the fitness of
the same genotype in a different class. Because I only assign fitness values to individuals, not genotypes, I
can avoid introducing class reproductive value weightings.
3 Frank’s formalism for kin selection theory has been influential, but it is not the only option. For a somewhat
different way of formulating neighbour-modulated and inclusive fitness as partitions of the Price equation,
see Grafen 2006a. It would be interesting to see whether results derived within Frank’s formalism could be
recovered within Grafen’s framework; I suspect that any differences would turn out to be superficial.

!
!
176

The starting point


We start with the Price equation for the primary and secondary effects of natural selection,
partitioned into components corresponding to distinct developmental classes (cf. Section
3.4.2, equation 3.4.6):4

1
Δw g = Em ⎡⎣Cov i ( w , g′ )⎤⎦ = ∑ qi Cov i ( w , g′ ) (5.2.1)
w i

Following the notation introduced in Section 2.4, qi represents the relative size of the ith
class. As a starting point for analysis, two features of equation (5.2.1) are worthy of
comment. First, the equation explicitly accommodates developmental classes. This may
involve grouping organisms by age, sex, morphological caste, or any combination of these.
One might wonder whether this degree of complexity is needed, if the aim is simply to
compare the neighbour-modulated and inclusive fitness frameworks rather than to apply
them to particular biological problems. It is needed, however, because the interesting
differences between the two frameworks mostly disappear if one considers an
homogeneous population with no class structure. To see where the differences lie, it is
essential to consider a population partitioned into classes (cf. Gardner et al. 2007;
Wenseleers et al. 2010; Gardner et al. 2011; Queller 2011). Second, the equation considers
the secondary effect of natural selection as well at the primary: we take Cov ( w , g′ ) , not

Cov ( w , g ) , as the target of analysis. This is an idiosyncratic feature of Frank’s formalism


that is rarely replicated elsewhere (though see Okasha 2006). It is optional when
formulating the neighbour-modulated fitness approach, for this approach is mostly
concerned with analysing assortment among actors and recipients in the ancestor-
population, and this could equally be achieved by starting with Cov ( w , g ) . It is necessary,
however, if we want to formalize the inclusive fitness approach in a way that does justice
to the intuitive idea of recipients providing actors with an indirect route to genetic

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
4 As previously noted in Section 2.4, this equation assumes that there is no genetic variance between
classes—otherwise a further term is needed to capture the between-class covariance (Frank 1997b, 1998).

!
! 177!

representation in the next generation (cf. Box 4.1). The natural way to capture this idea is
to weight fitness components by the correlation between the genotypes of actors and the
genotypes of their recipients’ descendants; and this requires that we take account of the
recipients’ values for g′ as well as their values for g (cf. Frank 1997a).

Regression equations
Next, we introduce three separate regression analyses:

Regression 1: For the ith class, a regression model of all causally relevant
phenotypic influences on fitness, including both intrinsic and extrinsic (i.e.,
‘neighbourhood’) characters:

wik = ∑ βij zijk + ε


j

Here, wik denotes the fitness of the kth individual in the class, β ij denotes the
average effect of the jth relevant phenotype, and zijk denotes the value of the
jth phenotype for the kth individual. Note that this equation only analyses the

relationship between phenotype and fitness within a particular class; each


class thus requires a separate analysis. We thereby allow that different
classes may be influenced by different phenotypes, and that the average
fitness effect of a given phenotype may vary depending on the affected
individual’s class.

Regression 2: For each relevant phenotype, a regression analysis of its


statistical association with the g-value of the affected individual:

zijk = ρij gik + ε

!
!
178

Here, zijk denotes (as above) the value of the jth phenotype for the kth member
of the ith class, gik denotes the g-value of that individual with respect to the
character under investigation, and ρik represents the simple regression of zijk
on gik .

Regression 3: A regression analysis of the fidelity of direct transmission from


ancestors to their direct lineal descendants, averaging over all classes:

g′ = τ 0 g + ε

Here, τ 0 denotes the simple regression of descendant genotype on ancestor


genotype, and this may be regarded as an overall measure of the fidelity of
direct, vertical transmission.

Substitution
We now substitute all three ingredients into equation (5.2.1), making the (substantive)
assumption that the residuals in the three different regression models are uncorrelated
with each other. This yields:

1⎡ ⎤
Δw g = ⎢τ 0 ∑ qi ∑ ρij bij Var ( gik )⎥
i

w⎣ i j ⎦

Conditional on the (again, substantive) assumption that the within-class genetic variance,

Vari ( g) , is the same for all classes, we can exploit the fact that variance and w cannot be
negative to obtain the following rule concerning the direction of partial change (Frank
1998):

⎛ ⎞
sign ( Δw g ) = sign ⎜τ 0 ∑ qi ∑ ρij bij ⎟ (5.2.2)
⎝ i j ⎠

!
! 179!

We can think of the term on the right hand side of equation (5.2.2) as a measure of the
overall extent to which transmitted differences in the character under study predict
differences in neighbour-modulated fitness. We can call this the neighbour-modulated
fitness increment for the character under study. The rule tells us that the (primary and
secondary) effect of natural selection will be to drive the evolution of the character in the
same direction as the neighbour-modulated fitness increment.

In the special case in which the evolution of a particular social behaviour is influenced only
by the direct cost it imposes on the actor ( βij = −c ,"ρij = 1 ) and by the benefit actors tend to
receive from social partners with the same trait ( βij = +b,#ρij = ρ0 ), and in which class
structure is wholly absent, the general rule reduces to the following, much simpler rule:

sign ( Δw g ) = sign ⎡⎣τ 0 ( ρ0b − c )⎤⎦

Conditional on the further assumption that the actor transmits to its own direct lineal
descendants with perfect fidelity ( τ 0 = 1 ), we recover a two-term rule that bears a strong
resemblance to Hamilton’s rule in its traditional form (Frank 1997a):

sign ( Δw g ) = sign ( ρ0b − c )

5.2.4 Frank’s formalism for inclusive fitness

To formalize inclusive fitness, we start, as before, with equation (5.2.1):

1
Δw g = ∑ qi Cov i (w , g′ )
w i

!
!
180

Regression equations
An inclusive fitness approach, much like a neighbour-modulated fitness approach,
involves partitioning the within-class covariance through regression analysis. Indeed, the
first regression is exactly the same: we write an individual’s personal fitness as a weighted
sum of correlated phenotypes (whether intrinsic or extrinsic), weighted by partial
regression coefficients. But the second and third regression equations are different. Instead
of relating the correlated phenotypes to the genotype of the recipient, we relate these
phenotypes to the genotype of the actor who controls the phenotype. And instead of relating
the genotypes of ancestors to those of their direct lineal descendants, we relate the
genotypes of actors to those of their recipients’ descendants, as if the recipient had
provided the actor with an indirect channel of transmission.

Regression 1: For the ith class, a regression model of all causally relevant
phenotypic influences on fitness, including both intrinsic and extrinsic (i.e.,
‘neighbourhood’) characters:

wik = ∑ βij zijk + ε


j

Here, zijk again denotes the value of the jth phenotype for the kth member of
the kth class, and β ij again denotes the extent to which the jth phenotype
predicts recipient fitness, correcting for other relevant phenotypes.

Regression 2: For each relevant phenotype, a regression analysis of its


statistical association with the breeding value of the controlling actor:

zijk = dij gijk + ε

Here, zijk denotes (as above) the value of the jth phenotype for the kth member
of the ith class; gijk denotes the breeding value of the individual who controls

!
! 181!

the character; and dij represents the simple regression of zijk on gijk . We can
think of a d-coefficient as a measure of the extent to which the jth social
phenotype is predicted by the genotype of the actor who controls it. I will
refer to these d-coefficients as ‘coefficients of control’.

Regression 3: For each relevant phenotype, a regression of the breeding value


of the controlling actor on the average breeding value of the descendants of
the affected individual:

gijk = τij gik′ + ε


!

Substitution
We now substitute all three ingredients into equation (5.2.1), again making the
(substantive) assumption that residuals in the three regression models do not correlate
with each other:

1⎡ ⎤
( )
Δ w g = ⎢ ∑ qi ∑ dij βijτij Var i gik′ ⎥
w⎣ i
! j ⎦

As Frank (1997b, 1998) notes, this result does not yet admit of an intelligible interpretation
in terms of inclusive fitness. We can obtain a result that does admit of such an
interpretation by ‘flipping’ the direction of Regression 3, so that we regress descendant
breeding values on controlling actor breeding values rather than the other way round. We

( ) ( )
can do this by noting that, for all ij, τij Var gik′ = τ ij Var gijk , where τ ij is the simple
!
regression of the breeding values of the descendants of the ith recipient class on the
breeding values of the actors who control the jth phenotype. This yields:

1⎡ ⎤
Δw g = ⎢∑ qi ∑ dij βijτ ij Var gijk ⎥
w⎣ i
i
( )
j ⎦

!
!
182

Conditional on the further substantive assumption that the genetic variance among

( )
controlling actors (i.e, Var gijk ) is the same for every actor-class, we can exploit the fact
that variance and w cannot be negative to obtain the following rule concerning the
direction of partial change (Frank 1998):

⎛ ⎞
sign ( Δw g ) = sign ⎜ ∑ qi ∑ dij βijτ ij ⎟ (5.2.3)
⎝ i j ⎠

We can think of the term on the right hand side of equation (5.2.3) as a measure of the
overall extent to which ‘transmitted’ differences in the genotypes of controlling actors
predict differences in the fitness effects for which those actors are responsible, where
‘transmitted’ means that the genes reappear not in the direct lineal descendants of the
actors, but rather in the descendants of the recipients it affects. We can interpret this as a
measure of the overall extent to which an actor’s genotype is associated with its inclusive
fitness, where the inclusive fitness of a controlling actor is understood as the sum of fitness
components for which its behaviour is responsible, weighted by τ -coefficients
representing the ‘transmission fidelity’ of the actor’s genes through each component. We
can call this quantity the inclusive fitness increment for the character under study. The
rule tells us that the (primary and secondary) effect of natural selection will be to drive
social evolution in the same direction as the inclusive fitness increment.

In the special case in which the evolution of a particular social behaviour is influenced only
by its direct effect on the actor ( dij = 1,#βij = −c , #τ ij = τ 1 ) and by its direct effect on a single
recipient ( dij = 1, $βij = +b,$τ ij = τ 2 ), and in which class structure is wholly absent, the general
rule reduces to the following, much simpler rule (Frank 1997a):

sign ( Δw g ) = sign (τ 2b − τ 1c )

!
! 183!

Conditional on the further assumption that the actor transmits to its own direct offspring
with perfect fidelity ( τ 1 = 1), we once again obtain a rule that bears a strong resemblance to
Hamilton’s rule in its most familiar form:

sign ( Δw g ) = sign (τ 2b − c )

5.2.5 The two pictures revisited

There is a close relationship between the two formal representations of kin selection
outlined above and the two informal explanations for the evolution of altruism discussed
in Section 5.1: neighbour-modulated fitness is the natural framework for analysing
whether altruism pays due to positive assortment (i.e., Picture 1), while inclusive fitness is
the natural framework for analysing whether altruism pays due to indirect reproduction
(i.e., Picture 2). Because of this, the formal equivalence (or otherwise) of the two theoretical
representations would reveal something of wider significance about the equivalence (or
otherwise) of our informal explanations. I will briefly elaborate on these points, because
they will be important later on.

Neighbour-modulated fitness analyses positive assortment


Each of the ρ-coefficients in the neighbour-modulated fitness framework can be
interpreted as a measure of the ‘relatedness’ between recipients of the ith class and the jth
influence on their fitness, in the sense of relatedness introduced in Chapter 4. Note,
however, that ‘relatedness’ in this sense is not purely genetic. What these coefficients
measure is the degree of association between an individual’s breeding value and its
phenotypic characters, where these are considered to include extrinsic characters that
represent aspects of its social milieu. If possessing the genes for altruism makes a member
of a particular class more likely to be surrounded by agents with a particular altruistic
phenotype, ρ ij will be positive for the relevant i#and j. If the correlation is strong enough,
the genes for altruism will be favoured by selection. Because the neighbour-modulated

!
!
184

fitness framework analyses patterns of genotype-phenotype assortment within the


ancestor-population, it is naturally regarded as a formal treatment of the informal ‘positive
assortment’ explanation for the evolution of altruism. The stronger the assortment
between possessing the genes for altruism and receiving the benefits of altruism, the more
likely it is that the neighbour-modulated fitness increment will be positive, potentially
leading to a situation in which altruists have higher neighbour-modulated fitness, on
average, than non-altruists.

Inclusive fitness analyses indirect reproduction


The τ-coefficients in Frank’s inclusive fitness formalism can also be regarded as measures
of ‘relatedness’ in some sense. But they differ from the ρ -coefficients of the neighbour-
modulated fitness analysis in two important respects: they are purely genetic, and they
concern cross-generational correlations between the ancestor- and descendant-populations.
Specifically, each of the τ-coefficients measures the association between the genotypes of
the descendants of the ith class and those of the actors who controlled the jth influence on
the fitness of their direct lineal ancestors. As Frank (1997a,b; 1998) notes, these coefficients
are naturally interpreted as measures of the ‘transmission fidelity’ of the actor’s genes
through each of the fitness components for which its behaviour is causally responsible. Of
course, there is usually no literal process by means of which the actor’s (token) genes are
replicated and transmitted to the recipient’s descendants. But, as our informal ‘indirect
reproduction’ story notes, something broadly analogous to this does happen when genetic
relatives interact: a related recipient affords the actor something broadly analogous to an
indirect channel of transmission. If the ‘fidelity’ of this ‘transmission’ is sufficiently high,
then altruism may be favoured by selection, for the genetic representation an altruistic
gene earns via this ‘indirect pathway’ may outweigh what it sacrifices through the direct
pathway.

This brings out the close relationship between inclusive fitness and the ‘indirect
reproduction’ explanation for the evolution of altruism. For, in analysing patterns of τ-

!
! 185!

correlation between the genotypes of actors and the genotypes of their recipients’
descendants, the inclusive fitness approach proceeds just as if the recipient of a social
effect provided the actor with an indirect channel of genetic transmission. The framework
thus formally captures the sense in which the ‘indirect reproduction’ metaphor is justified,
by showing in precise terms how the spread of a social gene depends not only on the sign
and magnitude of its fitness effects, but also on its ‘transmission fidelity’ through the
fitness components for which it is causally responsible.

5.3 When the frameworks are formally equivalent

5.3.1 Conditions for formal equivalence

With Frank’s formalism in hand, we can now address the question of when the neighbour-
modulated and inclusive fitness approaches are equivalent. For current purposes, I will
assume that the neighbour-modulated and inclusive fitness frameworks are ‘equivalent’ if
and only if they cannot disagree with regard to the direction of (the primary and
secondary effect of) selection on the character under study. It follows that the frameworks
are ‘equivalent’ under some conditions if and only if the neighbour-modulated fitness
increment is guaranteed to have the same sign as the inclusive fitness increment under
those conditions. A more stringent conception of ‘equivalence’ would require that they
also agree on the magnitude of the change; but in practice the direction is often what we
want to know.

One might think it obvious that the two approaches are equivalent in this sense. After all,
both start with the same version of the Price equation, and both proceed to decompose
that equation through regression analysis. Moreover, both derivations rely on a broadly
similar assumption, namely an assumption of uncorrelated residuals: we assume that the

!
!
186

residuals in the relevant regression equations are uncorrelated with any other variable in
the analysis. In effect, this amounts to the assumption that there is no unexplained
residual covariance between g′ and w once we take account of the statistical associations
described by the relevant regression equations (this is equivalent to the assumption that
Queller’s ‘separation condition’ is satisfied, though with g replaced by g′ ; cf. Chapter 4).
Note, however, that the relevant regression equations differ significantly between the two
frameworks. The neighbour-modulated fitness approach regresses all social phenotypes
on the genotype of the recipient, and considers only the fidelity of direct transmission
between recipients and their descendants. The inclusive fitness approach, by contrast,
regresses all social phenotypes on the genotype of the controlling actor, and separately
considers the transmission fidelity of the actor’s genotype through each fitness
component. The implication is that, while both derivations require an assumption of
uncorrelated residuals, the content of that assumption differs significantly between the two
cases. As a result, we cannot simply assume that the neighbour-modulated and inclusive
fitness increments will always have the same sign.

Given, then, that we cannot expect the frameworks to be equivalent in all possible cases,
what are the conditions under which we can expect them to be equivalent? Here is the
thought I want to develop. In Section 5.1, we noted that the ‘indirect reproduction’ and
‘positive assortment’ pictures invoke different kinds of correlation to explain the success
of altruism. The ‘indirect reproduction’ picture invokes correlations between actor
genotypes and those of their recipients’ descendants (represented by τ-correlations in
Frank’s formalism), while the ‘positive assortment’ picture invokes correlations between
recipient genotypes and actor phenotypes (represented by ρ-correlations in Frank’s
formalism). Both kinds of correlation can be glossed as measures of ‘relatedness’, but it is
interesting to see the subtle difference between the two pictures with respect to the kind of
relatedness they take to matter for the evolution of altruism. We also noted, however, that
in practice the two kinds of correlation might often turn out to be generated by the same

You might also like