Iterated Prisoner’s Dilemma contains strategies that

dominate any evolutionary opponent
William H. Press
a,1
and Freeman J. Dyson
b
a
Department of Computer Science and School of Biological Sciences, University of Texas at Austin, Austin, TX 78712; and
b
School of Natural Sciences, Institute
for Advanced Study, Princeton, NJ 08540
Contributed by William H. Press, April 19, 2012 (sent for review March 14, 2012)
The two-player Iterated Prisoner’s Dilemma game is a model for
both sentient and evolutionary behaviors, especially including the
emergence of cooperation. It is generally assumed that there
exists no simple ultimatum strategy whereby one player can en-
force a unilateral claim to an unfair share of rewards. Here, we
show that such strategies unexpectedly do exist. In particular,
a player X who is witting of these strategies can (i) deterministi-
cally set her opponent Y’s score, independently of his strategy or
response, or (ii) enforce an extortionate linear relation between
her and his scores. Against such a player, an evolutionary player’s
best response is to accede to the extortion. Only a player with
a theory of mind about his opponent can do better, in which case
Iterated Prisoner’s Dilemma is an Ultimatum Game.
evolution of cooperation
|
game theory
|
tit for tat
I
terated 2 × 2 games, with Iterated Prisoner’s Dilemma (IPD) as
the notable example, have long been touchstone models for
elucidating both sentient human behaviors, such as cartel pricing,
and Darwinian phenomena, such as the evolution of cooperation
(1–6). Well-known popular treatments (7–9) have further estab-
lished IPD as foundational lore in fields as diverse as political
science and evolutionary biology. It would be surprising if any
significant mathematical feature of IPD has remained unde-
scribed, but that appears to be the case, as we show in this paper.
Fig. 1A shows the setup for a single play of Prisoner’s Dilemma
(PD). If X and Y cooperate (c), then each earns a reward R. If one
defects (d), the defector gets an even larger payment T, and the
naive cooperator gets S, usually zero. However, if both defect, then
both get a meager payment P. To be interesting, the game must
satisfy two inequalities: T > R > P > S guarantees that the Nash
equilibrium of the game is mutual defection, whereas 2R>T þ S
makes mutual cooperation the globally best outcome. The “con-
ventional values” ðT; R; P; SÞ ¼ ð5; 3; 1; 0Þ occur most often in the
literature. We derive most results in the general case, and indicate
when there is a specialization to the conventional values.
Fig. 1B shows an iterated IPD game consisting of multiple,
successive plays by the same opponents. Opponents may now
condition their play on their opponent’s strategy insofar as each
can deduce it fromthe previous play. However, we give each player
only a finite memory of previous play (10). One might have thought
that a player with longer memory always has the advantage over
a more forgetful player. In the game of bridge, for example,
a player who remembers all of the cards played has the advantage
over a player who remembers only the last trick; however, that is
not the case when the same game (same allowed moves and same
payoff matrices) is indefinitely repeated. In fact, it is easy to prove
(Appendix A) that, for any strategy of the longer-memory player Y,
shorter-memory X’s score is exactly the same as if Y had played
a certain shorter-memory strategy (roughly, the marginalization of
Y’s long-memory strategy: its average over states remembered by
Y but not by X), disregarding any history in excess of that shared
with X. This fact is important. We derive strategies for X assuming
that both players have memory of only a single previous move, and
the above theorem shows that this involves no loss of generality.
Longer memory will not give Y any advantage.
Fig. 1C, then, shows the most general memory-one game. The
four outcomes of the previous move are labeled 1; . . . ; 4 for the
respective outcomes xy ∈ ðcc; cd; dc; ddÞ, where c and d denote
cooperation and defection. X’s strategy is p ¼ ðp
1
; p
2
; p
3
; p
4
Þ, her
probabilities for cooperating under each of the previous out-
comes. Y’s strategy is analogously q ¼ ðq
1
; q
2
; q
3
; q
4
Þ for out-
comes seen from his perspective, that is, in the order of
yx ∈ ðcc; cd; dc; ddÞ. The outcome of this play is determined by
a product of probabilities, as shown in Fig. 1.
Methods and Results
Zero-Determinant Strategies. As is well understood (10), it is not
necessary to simulate the play of strategies p against q move by
move. Rather, p and q imply a Markov matrix whose stationary
vector v, combined with the respective payoff matrices, yields an
expected outcome for each player. (We discuss the possibility of
nonstationary play later in the paper.) With rows and columns of
the matrix in X’s order, the Markov transition matrix Mðp; qÞ
from one move to the next is shown in Fig. 2A.
Because M has a unit eigenvalue, the matrix M′ ≡ M−I is
singular, with thus zero determinant. The stationary vector v of
the Markov matrix, or any vector proportional to it, satisfies
v
T
M ¼ v
T
; or v
T
M′ ¼ 0: [1]
Cramer’s rule, applied to the matrix M′, is
AdjðM′ÞM′ ¼ detðM′ÞI ¼ 0; [2]
where AdjðM′Þ is the adjugate matrix (also known as the classical
adjoint or, as in high-school algebra, the “matrix of minors”). Eq.
2 implies that every row of AdjðM′Þ is proportional to v.
Choosing the fourth row, we see that the components of v are
(up to a sign) the determinants of the 3 × 3 matrices formed from
the first three columns of M′, leaving out each one of the four
rows in turn. These determinants are unchanged if we add the
first column of M′ into the second and third columns.
The result of these manipulations is a formula for the dot
product of an arbitrary four-vector f with the stationary vector v
of the Markov matrix, v · f ≡ Dðp; q; fÞ, where D is the 4 × 4
determinant shown explicitly in Fig. 2B. This result follows from
expanding the determinant by minors on its fourth column and
noting that the 3 × 3 determinants multiplying each f
i
are just the
ones described above. What is noteworthy about this formula for
v · f is that it is a determinant whose second column,
~ p ≡ð−1 þ p
1
; −1 þ p
2
; p
3
; p
4
Þ; [3]
is solely under the control of X; whose third column,
Author contributions: W.H.P. and F.J.D. designed research, performed research, contrib-
uted new reagents/analytic tools, analyzed data, and wrote the paper.
The authors declare no conflict of interest.
Freely available online through the PNAS open access option.
1
To whom correspondence should be addressed. E-mail: wpress@cs.utexas.edu.
www.pnas.org/cgi/doi/10.1073/pnas.1206569109 PNAS Early Edition | 1 of 5
E
V
O
L
U
T
I
O
N
P
S
Y
C
H
O
L
O
G
I
C
A
L
A
N
D
C
O
G
N
I
T
I
V
E
S
C
I
E
N
C
E
S
~ q ≡ð−1 þ q
1
; q
3
; −1 þ q
2
; q
4
Þ; [4]
is solely under the control of Y; and whose fourth column is
simply f.
X’s payoff matrix is S
X
¼ ðR; S; T; PÞ, whereas Y’s is
S
Y
¼ ðR; T; S; PÞ. In the stationary state, their respective scores
are then
s
X
¼
v · S
X
v · 1
¼
Dðp; q; S
X
Þ
Dðp; q; 1Þ
s
Y
¼
v · S
Y
v · 1
¼
Dðp; q; S
Y
Þ
Dðp; q; 1Þ
;
[5]
where 1 is the vector with all components 1. The denominators
are needed because v has not previously been normalized to
have its components sum to 1 (as required for a stationary
probability vector).
Because the scores s in Eq. 5 depend linearly on their corre-
sponding payoff matrices S, the same is true for any linear
combination of scores, giving
αs
X
þ βs
Y
þ γ ¼
Dðp; q; αS
X
þ βS
Y
þ γ1Þ
Dðp; q; 1Þ
: [6]
It is Eq. 6 that now allows much mischief, because both X and Y
have the possibility of choosing unilateral strategies that will
make the determinant in the numerator vanish. That is, if X
chooses a strategy that satisfies ~ p ¼ αS
X
þ βS
Y
þ γ1, or if Y
chooses a strategy with ~ q ¼ αS
X
þ βS
Y
þ γ1, then the deter-
minant vanishes and a linear relation between the two scores,
αs
X
þ βs
Y
þ γ ¼ 0 [7]
will be enforced. We call these zero-determinant (ZD) strategies.
We are not aware of any previous recognition of these strategies in
the literature; they exist algebraically not only in IPD but in all
iterated 2 × 2 games. However, not all ZD strategies are feasible,
with probabilities p all in the range ½0; 1Š. Whether they are feasible
in any particular instance depends on the particulars of the ap-
plication, as we now see.
X Unilaterally Sets Y’s Score. One specialization of ZD strategies
allows X to unilaterally set Y’s score. From the above, X need
only play a fixed strategy satisfying ~ p ¼ βS
Y
þ γ1 (i.e., set α ¼ 0
in Eq. 7), four equations that we can solve for p
2
and p
3
in terms
of p
1
and p
4
, that is, eliminating the nuisance parameters β and γ.
The result, for general R; S; T; P (not necessarily a PD game), is
p
2
¼
p
1
ðT −PÞ −ð1 þ p
4
ÞðT −RÞ
R−P
p
3
¼
ð1 −p
1
ÞðP −SÞ þ p
4
ðR−SÞ
R−P
:
[8]
Fig. 1. (A) Single play of PD. Players X (blue) and Y (red) each choose to cooperate (c) or defect (d) with respective payoffs R, T, S, or P as shown (along with
the most common numerical values). (B) IPD, where the same two players play arbitrarily many times; each has a strategy based on a finite memory of the
previous plays. (C) Case of two memory-one players. Each player’s strategy is a vector of four probabilities (of cooperation), conditioned on the four outcomes
of the previous move.
Fig. 2. (A) Markov matrix for the memory-one game shown in Fig. 1C. (B)
The dot product of any vector f with the Markov matrix stationary vector v
can be calculated as a determinant in which, notably, a column depends only
on one player’s strategy.
2 of 5 | www.pnas.org/cgi/doi/10.1073/pnas.1206569109 Press and Dyson
With this substitution, Y’s score (Eq. 5) becomes
s
Y
¼
ð1 −p
1
ÞP þ p
4
R
ð1 −p
1
Þ þ p
4
: [9]
All PD games satisfy T >R>P >S. By inspection, Eq. 8 then has
feasible solutions whenever p
1
is close to (but ≤) 1 and p
4
is close
to (but ≥) 0. In that case, p
2
is close to (but ≤) 1 and p
3
is close to
(but ≥) zero. Now also by inspection of Eq. 9, a weighted average
of P and R with weights ð1 −p
1
Þ and p
4
, we see that all scores
P ≤ s
Y
≤ R (and no others) can be forced by X. That is, X can set
Y’s score to any value in the range from the mutual non-
cooperation score to the mutual cooperation score.
What is surprising is not that Y can, with X’s connivance,
achieve scores in this range, but that X can force any particular
score by a fixed strategy p, independent of Y’s strategy q. In
other words, there is no need for X to react to Y, except on
a timescale of her own choosing. A consequence is that X can
simulate or “spoof” any desired fitness landscape for Y that she
wants, thereby guiding his evolutionary path. For example, X
might condition Y’s score on some arbitrary property of his last
1,000 moves, and thus present him with a simulated fitness
landscape that rewards that arbitrary property. (We discuss the
issue of timescales further, below.)
X Tries to Set Her Own Score. What if X tries to set her own score?
The analogous calculation with ~ p ¼ αS
X
þ γ1 yields
p
2
¼
ð1 þ p
4
ÞðR−SÞ −p
1
ðP −SÞ
R−P
≥ 1
p
3
¼
−ð1 −p
1
ÞðT −PÞ −p
4
ðT −RÞ
R−P
≤ 0:
[10]
This strategy has only one feasible point, the singular strategy
p ¼ ð1; 1; 0; 0Þ, “always cooperate or never cooperate.” Thus, X
cannot unilaterally set her own score in IPD.
X Demands and Gets an Extortionate Share. Next, what if X
attempts to enforce an extortionate share of payoffs larger
than the mutual noncooperation value P? She can do this by
choosing
~ p ¼ ϕ½ðS
X
−P1Þ −χðS
Y
−P1ފ; [11]
where χ ≥ 1 is the extortion factor. Solving these four equations
for the p’s gives
p
1
¼ 1 −ϕðχ −1Þ
R−P
P −S
p
2
¼ 1 −ϕ
_
1 þ χ
T −P
P −S
_
p
3
¼ ϕ
_
χ þ
T −P
P −S
_
p
4
¼ 0
[12]
Evidently, feasible strategies exist for any χ and sufficiently small
ϕ. It is easy to check that the allowed range of ϕ is
0 <ϕ ≤
ðP −SÞ
ðP −SÞ þ χðT −PÞ
: [13]
Under the extortionate strategy, X’s score depends on Y’s
strategy q, and both are maximized when Y fully cooperates, with
q ¼ ð1; 1; 1; 1Þ. If Y decides (or evolves) to maximize his score by
cooperating fully, then X’s score under this strategy is
s
X
¼
PðT −RÞ þ χ½RðT −SÞ −PðT −Rފ
ðT −RÞ þ χðR−SÞ
: [14]
The coefficients in the numerator and denominator are all pos-
itive as a consequence of T >R>P >S. The case ϕ ¼ 0 is for-
mally allowed, but produces only the singular strategy ð1; 1; 0; 0Þ
mentioned above.
The above discussion can be made more concrete by special-
izing to the conventional IPD values ð5; 3; 1; 0Þ; then, Eq. 12
becomes
p ¼ ½1 −2ϕðχ −1Þ; 1 −ϕð4χ þ 1Þ; ϕðχ þ 4Þ; 0Š; [15]
a solution that is both feasible and extortionate for
0 <ϕ ≤ ð4χ þ 1Þ
−1
. X’s and Y’s best respective scores are
s
X
¼
2 þ 13χ
2 þ 3χ
; s
Y
¼
12 þ 3χ
2 þ 3χ
: [16]
With χ >1, X’s score is always greater than the mutual co-
operation value of 3, and Y’s is always less. X’s limiting score
as χ → ∞ is 13/3. However, in that limit, Y’s score is always 1,
so there is no incentive for him to cooperate. X’s greed is thus
limited by the necessity of providing some incentive to Y. The
value of ϕ is irrelevant, except that singular cases (where
strategies result in infinitely long “duels”) are more likely at
its extreme values. By way of concreteness, the strategy for X
that enforces an extortion factor 3 and sets ϕ at its midpoint
value is p ¼
_
11
13
;
1
2
;
7
26
; 0
_
, with best scores about s
X
¼ 3:73 and
s
Y
¼ 1:91.
In the special case χ ¼ 1, implying fairness, and ϕ ¼ 1=5 (one
of its limit values), Eq. 15 reduces to the strategy ð1; 0; 1; 0Þ,
which is the well-known tit-for-tat (TFT) strategy (7). Knowing
only TFT among ZD strategies, one might have thought that
strategies where X links her score deterministically to Y must
always be symmetric, hence fair, with X and Y rewarded equally.
The existence of the general ZD strategy shows this not to be
the case.
Extortionate Strategy Against an Evolutionary Player. We can say,
loosely, that Y is an evolutionary player if he adjusts his strategy
q according to some optimization scheme designed to maximize
his score s
Y
, but does not otherwise explicitly consider X’s score
or her own strategy. In the alternative case that Y imputes to
X an independent strategy, and the ability to alter it in response
to his actions, we can say that Y has a theory of mind about X
(11–13).
Against X’s fixed extortionate ZD strategy, a particularly
simple evolutionary strategy for Y, close to if not exactly Dar-
winian, is for him to make successive small adjustments in q and
thus climb the gradient in s
Y
. [We note that true Darwinian
evolution of a trait with multiple loci is, in a population, not
strictly “evolutionary” in our loose sense (14)].
Because Y may start out with a fully noncooperative strategy
q
0
¼ ð0; 0; 0; 0Þ, it is in X’s interest that her extortionate strategy
yield a positive gradient for Y’s cooperation at this value of q.
That gradient is readily calculated as
∂s
Y
∂q
¸
¸
¸
¸
q¼q
0
¼
_
0; 0; 0;
ðT −SÞðS þ T −2PÞ
ðP −SÞ þ χðT −PÞ
_
: [17]
The fourth component is positive for the conventional values
ðT; R; P; SÞ ¼ ð5; 3; 1; 0Þ, but we see that it can become negative
as P approaches R, because we have 2R>S þ T. With the con-
ventional values, however, evolution away from the origin yields
positive gradients for the other three components.
Press and Dyson PNAS Early Edition | 3 of 5
E
V
O
L
U
T
I
O
N
P
S
Y
C
H
O
L
O
G
I
C
A
L
A
N
D
C
O
G
N
I
T
I
V
E
S
C
I
E
N
C
E
S
We have not proved analytically that there exist in all cases
evolutionary paths for Y that lead to the maximum possible
scores (Eq. 16) and that have positive directional derivatives
everywhere along them. However, this assertion seems likely
from numerical evidence, at least for the conventional values.
Fig. 3 shows a typical numerical experiment in which X plays an
extortionate strategy (here, χ ¼ 5, with maximum scores
s
X
¼ 3:94 and s
Y
¼ 1:59), and Y takes small steps that locally
increase his score. Y has no unique gradient direction because
the mapping from the score gradient (a covariant vector) to the
step direction (a contravariant vector) involves an arbitrary
metric, signifying how easily Y can evolve in each direction. Fig.
3 shows 10 arbitrary choices for this metric. In no case does Y’s
evolution get hung up at a local maximum. That is, all of the
evolutions shown (and all of a much larger number tried) reach
the value of Eq. 14.
Discussion
We have several times alluded to issues of timescale. The ZD
strategies are derived mathematically under the assumption that
the players’ expected scores are generated by a Markov sta-
tionary state defined by their respective strategies p and q.
However, we have also suggested situations in which X may vary
her ZD strategy so as to spoof Y with a fictitious fitness land-
scape. The question also arises whether Y can somehow vary his
strategy on timescales faster than that for Markov equilibrium to
be established. Perhaps by playing “inside the equilibration
timescale” he can evade the linear constraint on scores (Eq. 7)
imposed by X.
Interestingly, it is easy to prove that this latter situation cannot
occur (Appendix B). If X plays a constant ZD strategy, then any
strategy of Y’s, rapidly varying or not, turns out to be equivalent
(from X’s perspective) to a fixed strategy against which X’s im-
position of a constraint is effective.
In the former situation, where it is X whose strategies are
changing (e.g., among ZD strategies that set Y’s score), things
are not as crisp. Because X must be basing her decisions on Y’s
behavior, which only becomes evident with averaging over time,
the possibility of a race condition between X’s and Y’s responses
is present with or without Markov equilibration. This reason is
sufficient for X to vary her strategy only slowly. If X chooses
components of p in Eqs. 9 and 8 that are bounded away from the
extreme values 0 and 1, then the Markov equilibration time will
not be long and thus not a consideration. In short, a deliberate X
still has the upper hand.
The extortionate ZD strategies have the peculiar property of
sharply distinguishing between “sentient” players, who have
a theory of mind about their opponents, and “evolutionary”
players, who may be arbitrarily good at exploring a fitness
landscape (either locally or globally), but who have no theory of
mind. The distinction does not depend on the details of any
particular theory of mind, but only on Y’s ability to impute to X
an ability to alter her strategy.
If X alone is witting of ZD strategies, then IPD reduces to one
of two cases, depending on whether Y has a theory of mind. If Y
has a theory of mind, then IPD is simply an ultimatum game (15,
16), where X proposes an unfair division and Y can either accept
or reject the proposal. If he does not (or if, equivalently, X has
fixed her strategy and then gone to lunch), then the game is di-
lemma-free for Y. He can maximize his own score only by giving
X even more; there is no benefit to him in defecting.
If X and Y are both witting of ZD, then they may choose to
negotiate to each set the other’s score to the maximum co-
operative value. Unlike naive PD, there is no advantage in de-
fection, because neither can affect his or her own score and each
can punish any irrational defection by the other. Nor is this
equivalent to the classical TFT strategy (7), which produces in-
determinate scores if played by both players.
To summarize, player X, witting of ZD strategies, sees IPD as
a very different game from how it is conventionally viewed. She
chooses an extortion factor χ, say 3, and commences play. Now,
if she thinks that Y has no theory of mind about her (13) (e.g.,
he is an evolutionary player), then she should go to lunch
leaving her fixed strategy mindlessly in place. Y’s evolution will
bestow a disproportionate reward on her. However, if she
imputes to Y a theory of mind about herself, then she should
remain engaged and watch for evidence of Y’s refusing the ul-
timatum (e.g., lack of evolution favorable to both). If she finds
such evidence, then her options are those of the ultimatum
game (16). For example, she may reduce the value of χ, perhaps
to its “fair” value of 1.
Now consider Y’s perspective, if he has a theory of mind about
X. His only alternative to accepting positive, but meager, rewards
is to refuse them, hurting both himself and X. He does this in the
hope that X will eventually reduce her extortion factor. How-
ever, if she has gone to lunch, then his resistance is futile.
It is worth contemplating that, though an evolutionary player
Y is so easily beaten within the confines of the IPD game, it is
exactly evolution, on the hugely larger canvas of DNA-based life,
that ultimately has produced X, the player with the mind.
Appendix A: Shortest-Memory Player Sets the Rules of the Game. In
iterated play of a fixed game, one might have thought that
a player Y with longer memory of past outcomes has the ad-
vantage over a more forgetful player X. For example, one might
have thought that player Y could devise an intricate strategy that
uses X’s last 1,000 plays as input data in a decision algorithm,
and that can then beat X’s strategy, conditioned on only the last
one iteration. However, that is not the case when the same game
(same allowed moves and same payoff matrices) is indefinitely
repeated. In fact, for any strategy of the longer-memory player
Y, X’s score is exactly the same as if Y had played a certain
shorter-memory strategy (roughly, the marginalization of Y’s
long-memory strategy), disregarding any history in excess of that
shared with X.
Let X and Y be random variables with values x and y that are
the players’ respective moves on a given iteration. Because their
scores depend only on ðx; yÞ separately at each time, a sufficient
statistic is the expectation of the joint probability of ðX; YÞ over
past histories H (of course in their proportion seen). Let
H ¼ ½H
0
; H
1
Š, where H
0
is the recent history shared by both X
and Y, and H
1
is the older history seen only by Y. Then
a straightforward calculation is,
0 1000 2000
0
1
2
3
4
5
step number
s
c
o
r
e
s
Fig. 3. Evolution of X’s score (blue) and Y’s score (red) in 10 instances.
X plays a fixed extortionate strategy with extortion factor χ ¼ 5. Y evolves
by making small steps in a gradient direction that increases his score.
The 10 instances show different choices for the weights that Y as-
signs to different components of the gradient, i.e., how easily he can
evolve along each. In all cases, X achieves her maximum possible (extor-
tionate) score.
4 of 5 | www.pnas.org/cgi/doi/10.1073/pnas.1206569109 Press and Dyson
hPðx; yjH
0
; H
1
Þi
H0; H
1
¼

H
0
; H
1
Pðx; yjH
0
; H
1
ÞPðH
0
; H
1
Þ
¼

H
0
; H
1
PðxjH
0
ÞPðyjH
0
; H
1
ÞPðH
0
; H
1
Þ
¼

H
0
PðxjH
0
Þ
_

H
1
PðyjH
0
; H
1
ÞPðH
1
jH
0
ÞPðH
0
Þ
_
¼

H
0
PðxjH
0
Þ
_

H
1
Pðy; H
1
jH
0
Þ
_
PðH
0
Þ
¼

H
0
PðxjH
0
ÞPðyjH
0
ÞPðH
0
Þ
¼ hPðx; yjH
0
Þi
H
0
[18]
Here, the first line makes explicit the expectation, and the second
line expresses conditional independence.
Thus, the result is a game conditioned only on H
0
, where Y
plays the marginalized strategy
PðyjH
0
Þ ≡

H
1
Pðy; H
1
jH
0
Þ: [19]
Because this strategy depends on H
0
only, it is a short-memory
strategy that produces exactly the same game results as Y’s
original long-memory strategy.
Note that if Y actually wants to compute the short-memory
strategy equivalent to his long-memory strategy, he has to play or
simulate the game long enough to compute the above expect-
ations over the histories that would have occurred for his long-
memory strategy. Then, knowing these expectations, he can, if he
wants, switch to the equivalent short-memory strategy.
To understand this result intuitively, we can view the game
from the forgetful player X’s perspective: If X thinks that Y’s
memory is the same as her own, she imputes to Y a vector of
probabilities the same length as his own. Because the score for
the play at time t depends only on expectations over the players’
conditionally independent moves at that time, Y’s use of a longer
history, from X’s perspective, is merely a peculiar kind of ran-
dom number generator whose use does not affect either player.
So Y’s switching between a long- and short-memory strategy is
completely undetectable (and irrelevant) to X.
The importance of this result is that the player with the
shortest memory in effect sets the rules of the game. A player
with a good memory-one strategy can force the game to be
played, effectively, as memory-one. She cannot be undone by
another player’s longer-memory strategy.
Appendix B: ZD Strategies Succeed Without Markov Equilibrium. We
here prove that Y cannot evade X’s ZD strategy by changing his
own strategy on a short timescale—even arbitrarily on every
move of the game. The point is that Y cannot usefully “keep the
game out of Markov equilibrium” or play “inside the Markov
equilibration time scale.”
For arbitrary p and q, the Markov matrix is as shown in Fig.
2A. We suppose that X is playing a ZD strategy with some fixed
p
z
and write MðqÞ ≡Mðp
z
; qÞ. The key point is that each row of
MðqÞ is linear in exactly one component of q. Thus, the average
of any number of different Mðq
i
Þ’s satisfies
hMðq
i
Þi
i
¼ M
_
hq
i
i
i
_
: [20]
Now consider the result of N consecutive plays, i ¼ 1; 2; . . . ; N;
where N is a large number. The game goes through N states α
i
,
with α ∈fcc; cd; dc; ddg. Comparing times i and i þ 1, the game
goes from state α
i
to state α
iþ1
by a draw from the four proba-
bilities M
αi α
iþ1
ðq
iαi
Þ, α
iþ1
∈fcc; cd; dc; ddg, where q
iαi
is the α
i
th
component of q
i
(at time i). So the expected number of times
that the game is in state β is
¸
N
β
_
¼

N
i¼1
M
α
i
β
_
q

i
_
¼

α

ijα
M
αβ
ðq

Þ
¼

α
N
α
¸
M
αβ
ðq

Þ
_
ijα
¼

α
N
α
M
αβ
_
hq

i
ijα
_
[21]
Here the notation ijα is to be read as “for values of i such that
α
i
¼ α.”
Now taking the (ensemble) expectation of the right-hand side
and defining probabilities
P
α
¼
1
N
hN
α
i; [22]
Eq. 21 becomes
P
β
¼

α
P
α
M
αβ
_
hq

i
ijα
_
: [23]
This result shows that a distribution of states identical to those
actually observed would be the stationary distribution of Y’s
playing the fixed strategy q
α
¼ hq

i
ijα
. Because X’s ZD strategy is
independent of any fixed strategy of Y’s, we have shown that, for
large N, X’s strategy is not spoiled by Y’s move-to-move
strategy changes.
That the proofs in Appendix A and Appendix B have a similar
flavor is not coincidental; both exemplify situations where Y
devises a supposedly intricate strategy that an oblivious X au-
tomatically marginalizes over.
ACKNOWLEDGMENTS. We thank Michael Brenner, Joshua Plotkin, Drew
Fudenberg, Jeff Hussmann, and Richard Rapp for helpful comments
and discussion.
1. Axelrod R, Hamilton WD (1981) The evolution of cooperation. Science 211:1390–1396.
2. Roberts K (1985) Cartel behavior and adverse selection. J Industr Econ 33:401–413.
3. Axelrod R, Dion D (1988) The further evolution of cooperation. Science 242:1385–1390.
4. Nowak M, Sigmund K (1993) A strategy of win-stay, lose-shift that outperforms tit-
for-tat in the Prisoner’s Dilemma game. Nature 364:56–58.
5. Nowak MA (2006) Five rules for the evolution of cooperation. Science 314:1560–1563.
6. Kendall G, Yao X, Chong SY (2007) The Iterated Prisoners’ Dilemma 20 Years On
(World Scientific, Singapore).
7. Axelrod R (1984) The Evolution of Cooperation (Basic Books, New York).
8. Dawkins R (1988) The Selfish Gene (Oxford Univ Press, New York).
9. Poundstone W (1992) Prisoner’s Dilemma (Doubleday, New York).
10. Hauert Ch, Schuster HG (1997) Effects of increasing the number of players and memory
steps in the Iterated Prisoner’s Dilemma, a numerical approach. Proc Biol Sci 264:513–519.
11. Premack DG, Woodruff G (1978) Does the chimpanzee have a theory of mind? Behav
Brain Sci 1:515–526.
12. Saxe R, Baron-Cohen S, eds (2007) Theory of Mind: A Special Issue of Social Neuro-
science (Psychology Press, London).
13. Lurz RW (2011) Mindreading Animals: The Debate over What Animals Know about
Other Minds (MIT Press, Cambridge, MA).
14. Ewens WJ (1989) An interpretation and proof of the fundamental theorem of natural
selection. Theor Popul Biol 36:167–180.
15. Güth W, Schmittberger R, Schwarze B (1982) An experimental analysis of ultimatum
bargaining. J Econ Behav Organ 3:367–388.
16. Nowak MA, Page KM, Sigmund K (2000) Fairness versus reason in the
ultimatum game. Science 289:1773–1775.
Press and Dyson PNAS Early Edition | 5 of 5
E
V
O
L
U
T
I
O
N
P
S
Y
C
H
O
L
O
G
I
C
A
L
A
N
D
C
O
G
N
I
T
I
V
E
S
C
I
E
N
C
E
S

We call these zero-determinant (ZD) strategies. if X ~ chooses a strategy that satisfies p ¼ αSX þ βSY þ γ1.1073/pnas. 1Š. S. conditioned on the four outcomes of the previous move. SX Þ ¼ v·1 Dðp. 1. q. q. However. T. q. for general R. is Fig.. p1 ðT − PÞ − ð1 þ p4 ÞðT − RÞ R−P ð1 − p1 ÞðP − SÞ þ p4 ðR − SÞ : p3 ¼ R−P p2 ¼ [8] 2 of 5 | www. 2. 5 depend linearly on their corresponding payoff matrices S. X need ~ only play a fixed strategy satisfying p ¼ βSY þ γ1 (i. that is. In the stationary state. One specialization of ZD strategies allows X to unilaterally set Y’s score. [4] is solely under the control of Y. a column depends only on one player’s strategy. −1 þ q2 . their respective scores are then sX ¼ sY ¼ v · SX Dðp. q3 . T. S. q. whereas Y’s is SY ¼ ðR. not all ZD strategies are feasible. (A) Single play of PD. Players X (blue) and Y (red) each choose to cooperate (c) or defect (d) with respective payoffs R. PÞ.Fig. αSX þ βSY þ γ1Þ : Dðp. q. (C) Case of two memory-one players. From the above. the same is true for any linear combination of scores.e. SY Þ . eliminating the nuisance parameters β and γ. 7). Whether they are feasible in any particular instance depends on the particulars of the application. ¼ Dðp. 1Þ v·1 Because the scores s in Eq. ~ q ≡ ð−1 þ q1 . S. We are not aware of any previous recognition of these strategies in the literature. where the same two players play arbitrarily many times.pnas. then the determinant vanishes and a linear relation between the two scores. giving αsX þ βsY þ γ ¼ Dðp. X’s payoff matrix is SX ¼ ðR. with probabilities p all in the range ½0. 1Þ [6] [5] It is Eq. 1Þ v · SY Dðp. T. they exist algebraically not only in IPD but in all iterated 2 × 2 games. q. or P as shown (along with the most common numerical values). 6 that now allows much mischief. 1C. will be enforced. set α ¼ 0 in Eq. Each player’s strategy is a vector of four probabilities (of cooperation). X Unilaterally Sets Y’s Score. or if Y ~ chooses a strategy with q ¼ αSX þ βSY þ γ1. as we now see. notably. S. four equations that we can solve for p2 and p3 in terms of p1 and p4 . P (not necessarily a PD game). The result. q4 Þ. T. (B) IPD. The denominators are needed because v has not previously been normalized to have its components sum to 1 (as required for a stationary probability vector). and whose fourth column is simply f. because both X and Y have the possibility of choosing unilateral strategies that will make the determinant in the numerator vanish. each has a strategy based on a finite memory of the previous plays.org/cgi/doi/10. PÞ.1206569109 Press and Dyson . αsX þ βsY þ γ ¼ 0 [7] where 1 is the vector with all components 1. That is. (A) Markov matrix for the memory-one game shown in Fig. (B) The dot product of any vector f with the Markov matrix stationary vector v can be calculated as a determinant in which.

1. [We note that true Darwinian evolution of a trait with multiple loci is. The case ϕ ¼ 0 is formally allowed. Now also by inspection of Eq. X’s greed is thus limited by the necessity of providing some incentive to Y. close to if not exactly Darwinian. feasible strategies exist for any χ and sufficiently small ϕ. X can set Y’s score to any value in the range from the mutual noncooperation score to the mutual cooperation score. X might condition Y’s score on some arbitrary property of his last 1. because we have 2R > S þ T. (We discuss the issue of timescales further. For example. in a population. 1 − ϕð4χ þ 1Þ. which is the well-known tit-for-tat (TFT) strategy (7). a weighted average of P and R with weights ð1 − p1 Þ and p4 . and thus present him with a simulated fitness landscape that rewards that arbitrary property. SÞ ¼ ð5. we can say that Y has a theory of mind about X (11–13). If Y decides (or evolves) to maximize his score by cooperating fully. X’s and Y’s best respective scores are sX ¼ 2 þ 13χ . 2. the singular strategy p ¼ ð1. Eq. The existence of the general ZD strategy shows this not to be the case. we see that all scores P ≤ sY ≤ R (and no others) can be forced by X. With the conventional values. Eq. In that case. 0. 1. with X’s connivance. 5) becomes ð1 − p1 ÞP þ p4 R : sY ¼ ð1 − p1 Þ þ p4 [9] sX ¼ PðT − RÞ þ χ½RðT − SÞ − PðT − Rފ : ðT − RÞ þ χðR − SÞ [14] X Tries to Set Her Own Score. Eq. [11] where χ ≥ 1 is the extortion factor. 3. loosely. and Y’s is always less. 0. 0Þ. ðP − SÞ þ χðT − PÞ ∂q q¼q0 ~ p ¼ ϕ½ðSX − P1Þ − χðSY − P1ފ. it is in X’s interest that her extortionate strategy yield a positive gradient for Y’s cooperation at this value of q. except on a timescale of her own choosing. but produces only the singular strategy ð1.” Thus. X’s score is always greater than the mutual cooperation value of 3. but that X can force any particular score by a fixed strategy p. and both are maximized when Y fully cooperates. Y’s score (Eq. so there is no incentive for him to cooperate. The value of ϕ is irrelevant. p2 is close to (but ≤) 1 and p3 is close to (but ≥) zero. 15 reduces to the strategy ð1. then. X cannot unilaterally set her own score in IPD. The above discussion can be made more concrete by specializing to the conventional IPD values ð5. 0. evolution away from the origin yields positive gradients for the other three components. 9. It is easy to check that the allowed range of ϕ is 0<ϕ ≤ ðP − SÞ : ðP − SÞ þ χðT − PÞ [13] [12] Under the extortionate strategy. a solution that is both feasible and extortionate 0 < ϕ ≤ ð4χ þ 1Þ−1 . 0. 0Þ. Against X’s fixed extortionate ZD strategy. independent of Y’s strategy q. X’s limiting score as χ → ∞ is 13/3. with X and Y rewarded equally. then X’s score under this strategy is Press and Dyson The fourth component is positive for the conventional values ðT. is for him to make successive small adjustments in q and thus climb the gradient in sY . ϕðχ þ 4Þ. A consequence is that X can simulate or “spoof” any desired fitness landscape for Y that she wants. what if X attempts to enforce an extortionate share of payoffs larger than the mutual noncooperation value P? She can do this by choosing sY ¼ 1:91. 26. By inspection. except that singular cases (where strategies result in infinitely long “duels”) are more likely at its extreme values. not strictly “evolutionary” in our loose sense (14)]. PNAS Early Edition | 3 of 5 PSYCHOLOGICAL AND COGNITIVE SCIENCES ð1 þ p4 ÞðR − SÞ − p1 ðP − SÞ ≥1 R−P −ð1 − p1 ÞðT − PÞ − p4 ðT − RÞ ≤ 0: p3 ¼ R−P p2 ¼ value is p ¼ 11 1 7 13. Extortionate Strategy Against an Evolutionary Player. 0Þ. 0 . thereby guiding his evolutionary path. 0Þ mentioned above. in that limit. P. 2 þ 3χ sY ¼ 12 þ 3χ : 2 þ 3χ [15] for [16] With χ > 1. We can say. That gradient is readily calculated as    ∂sY  ðT − SÞðS þ T − 2PÞ  : [17] ¼ 0. In other words. Solving these four equations for the p’s gives R−P p1 ¼ 1 − ϕðχ − 1Þ P−S   T −P p2 ¼ 1 − ϕ 1 þ χ P−S   T −P p3 ¼ ϕ χ þ P−S p4 ¼ 0 Evidently. 1.000 moves. 1. 0Þ. 0Þ. That is. By way of concreteness. that Y is an evolutionary player if he adjusts his strategy q according to some optimization scheme designed to maximize his score sY .) The coefficients in the numerator and denominator are all positive as a consequence of T > R > P > S. but does not otherwise explicitly consider X’s score or her own strategy.With this substitution. R. however. 1. What if X tries to set her own score? ~ The analogous calculation with p ¼ αSX þ γ1 yields [10] This strategy has only one feasible point. and the ability to alter it in response to his actions. Because Y may start out with a fully noncooperative strategy q0 ¼ ð0. “always cooperate or never cooperate. Y’s score is always 1. In the alternative case that Y imputes to X an independent strategy. below. 1Þ. 1. 8 then has feasible solutions whenever p1 is close to (but ≤) 1 and p4 is close to (but ≥) 0. one might have thought that strategies where X links her score deterministically to Y must always be symmetric. achieve scores in this range. implying fairness. there is no need for X to react to Y. a particularly simple evolutionary strategy for Y. 0Š. 3. 12 becomes p ¼ ½1 − 2ϕðχ − 1Þ. 1. the strategy for X that enforcesan extortion factor 3 and sets ϕ at its midpoint  . 0. X’s score depends on Y’s strategy q. with q ¼ ð1. X Demands and Gets an Extortionate Share. Knowing only TFT among ZD strategies. but we see that it can become negative as P approaches R. What is surprising is not that Y can. In the special case χ ¼ 1. hence fair. 0. However. and ϕ ¼ 1=5 (one of its limit values). 0. with best scores about sX ¼ 3:73 and EVOLUTION All PD games satisfy T > R > P > S. Next.

for any strategy of the longer-memory player Y.pnas. Y’s evolution will bestow a disproportionate reward on her. 9 and 8 that are bounded away from the extreme values 0 and 1. He can maximize his own score only by giving X even more.e. conditioned on only the last one iteration. Y Þ over past histories H (of course in their proportion seen). sees IPD as a very different game from how it is conventionally viewed. there is no advantage in defection. The 10 instances show different choices for the weights that Y assigns to different components of the gradient. where it is X whose strategies are changing (e. then the Markov equilibration time will not be long and thus not a consideration. Nor is this equivalent to the classical TFT strategy (7). However. If Y has a theory of mind. there is no benefit to him in defecting. signifying how easily Y can evolve in each direction. He does this in the hope that X will eventually reduce her extortion factor.000 plays as input data in a decision algorithm. i. then she should remain engaged and watch for evidence of Y’s refusing the ultimatum (e. depending on whether Y has a theory of mind. and H1 is the older history seen only by Y. which produces indeterminate scores if played by both players. 4 of 5 | www. H1 Š. To summarize. Because their scores depend only on ðx. then she should go to lunch leaving her fixed strategy mindlessly in place. and that can then beat X’s strategy. 3 shows 10 arbitrary choices for this metric. If X chooses components of p in Eqs. Now. equivalently. then IPD is simply an ultimatum game (15. Then a straightforward calculation is. that is not the case when the same game (same allowed moves and same payoff matrices) is indefinitely repeated. who have a theory of mind about their opponents. Discussion We have several times alluded to issues of timescale.g.1206569109 .g. and Y takes small steps that locally increase his score. In iterated play of a fixed game. χ ¼ 5. X has fixed her strategy and then gone to lunch). one might have thought that player Y could devise an intricate strategy that uses X’s last 1. and “evolutionary” players. The ZD strategies are derived mathematically under the assumption that the players’ expected scores are generated by a Markov stationary state defined by their respective strategies p and q. then IPD reduces to one of two cases. he is an evolutionary player). Appendix A: Shortest-Memory Player Sets the Rules of the Game. In no case does Y’s evolution get hung up at a local maximum. Interestingly. a sufficient statistic is the expectation of the joint probability of ðX. the player with the mind. 3. that ultimately has produced X. then they may choose to negotiate to each set the other’s score to the maximum cooperative value. disregarding any history in excess of that shared with X. Evolution of X’s score (blue) and Y’s score (red) in 10 instances. This reason is sufficient for X to vary her strategy only slowly. Fig. say 3. though an evolutionary player Y is so easily beaten within the confines of the IPD game. 16). hurting both himself and X. a deliberate X still has the upper hand. who may be arbitrarily good at exploring a fitness landscape (either locally or globally). where X proposes an unfair division and Y can either accept or reject the proposal. If X plays a constant ZD strategy.org/cgi/doi/10. She chooses an extortion factor χ. rapidly varying or not. yÞ separately at each time. witting of ZD strategies. one might have thought that a player Y with longer memory of past outcomes has the advantage over a more forgetful player X. X’s score is exactly the same as if Y had played a certain shorter-memory strategy (roughly. because neither can affect his or her own score and each can punish any irrational defection by the other. In fact. the possibility of a race condition between X’s and Y’s responses is present with or without Markov equilibration. it is easy to prove that this latter situation cannot occur (Appendix B). 16) and that have positive directional derivatives everywhere along them.. if he has a theory of mind about X.g. if she has gone to lunch. Let X and Y be random variables with values x and y that are the players’ respective moves on a given iteration. The extortionate ZD strategies have the peculiar property of sharply distinguishing between “sentient” players. if she imputes to Y a theory of mind about herself. Unlike naive PD. X achieves her maximum possible (extortionate) score. For example. Y has no unique gradient direction because the mapping from the score gradient (a covariant vector) to the step direction (a contravariant vector) involves an arbitrary metric. all of the evolutions shown (and all of a much larger number tried) reach the value of Eq. this assertion seems likely from numerical evidence. where H0 is the recent history shared by both X and Y. the marginalization of Y’s long-memory strategy). X plays a fixed extortionate strategy with extortion factor χ ¼ 5. Perhaps by playing “inside the equilibration timescale” he can evade the linear constraint on scores (Eq. Because X must be basing her decisions on Y’s behavior. turns out to be equivalent (from X’s perspective) to a fixed strategy against which X’s imposition of a constraint is effective. In short. It is worth contemplating that. then the game is dilemma-free for Y. If X and Y are both witting of ZD. If he does not (or if. among ZD strategies that set Y’s score). at least for the conventional values. The question also arises whether Y can somehow vary his strategy on timescales faster than that for Markov equilibrium to be established.. However. how easily he can evolve along each. but only on Y’s ability to impute to X an ability to alter her strategy. If X alone is witting of ZD strategies. Y evolves by making small steps in a gradient direction that increases his score. but meager. but who have no theory of mind. 7) imposed by X. then his resistance is futile. In all cases. with maximum scores sX ¼ 3:94 and sY ¼ 1:59). That is. if she thinks that Y has no theory of mind about her (13) (e. rewards is to refuse them. The distinction does not depend on the details of any particular theory of mind. on the hugely larger canvas of DNA-based life. 14. Press and Dyson 5 4 3 2 1 0 scores 0 1000 step number 2000 Fig. However. which only becomes evident with averaging over time. His only alternative to accepting positive... and commences play. it is exactly evolution. then her options are those of the ultimatum game (16). things are not as crisp.We have not proved analytically that there exist in all cases evolutionary paths for Y that lead to the maximum possible scores (Eq. she may reduce the value of χ. we have also suggested situations in which X may vary her ZD strategy so as to spoof Y with a fictitious fitness landscape. perhaps to its “fair” value of 1. For example. In the former situation. 3 shows a typical numerical experiment in which X plays an extortionate strategy (here. However. then any strategy of Y’s. player X.1073/pnas. However. lack of evolution favorable to both). Fig. If she finds such evidence. Let H ¼ ½H0 . Now consider Y’s perspective.

and short-memory strategy is completely undetectable (and irrelevant) to X. The game goes through N states αi . 2. H1 jH0 Þ: [19] H1 Because this strategy depends on H0 only. he can. Nowak M. knowing these expectations. To understand this result intuitively. where qiαi is the αi th component of qi (at time i). we have shown that. We thank Michael Brenner. cd. Axelrod R. Yao X. dc. Appendix B: ZD Strategies Succeed Without Markov Equilibrium. H1 Þ H0 . Theor Popul Biol 36:167–180. yjH0 . i ¼ 1. H1 jH0 Þ PðH0 Þ ¼ H0 H1 X ¼ PðxjH0 ÞPðyjH0 ÞPðH0 Þ ¼ hPðx. 8. where N is a large number. Kendall G. Nature 364:56–58. Science 242:1385–1390. Comparing times i and i þ 1. Drew Fudenberg. the first line makes explicit the expectation. N [22] À Á Pα Mαβ hqiα iijα : [23] This result shows that a distribution of states identical to those actually observed would be the stationary distribution of Y’s playing the fixed strategy qα ¼ hqiα iijα . 9. Press and Dyson PNAS Early Edition | 5 of 5 PSYCHOLOGICAL AND COGNITIVE SCIENCES Here the notation ijα is to be read as “for values of i such that αi ¼ α. Premack DG. Cambridge. and the second line expresses conditional independence. 4. Nowak MA (2006) Five rules for the evolution of cooperation. as memory-one. 11. qÞ. 5. 2.” 1. Schmittberger R. H1 ÞiH0 . 10. 21 becomes Pβ ¼ X α 1 hNα i. Baron-Cohen S. H1 ¼ Pðx. Y’s use of a longer history. So Y’s switching between a long. 15. where Y plays the marginalized strategy X PðyjH0 Þ ≡ Pðy. MA). Axelrod R. yjH0 . 13. New York). Lurz RW (2011) Mindreading Animals: The Debate over What Animals Know about Other Minds (MIT Press. 7. with α ∈ fcc. cd. . Axelrod R (1984) The Evolution of Cooperation (Basic Books. the Markov matrix is as shown in Fig. 12. H1 ÞPðH0 . 14. yjH0 ÞiH0 H0 [18] For arbitrary p and q. Schuster HG (1997) Effects of increasing the number of players and memory steps in the Iterated Prisoner’s Dilemma. Hamilton WD (1981) The evolution of cooperation. she imputes to Y a vector of probabilities the same length as his own. Woodruff G (1978) Does the chimpanzee have a theory of mind? Behav Brain Sci 1:515–526. 6. We suppose that X is playing a ZD strategy with some fixed pz and write MðqÞ ≡ Mðpz . eds (2007) Theory of Mind: A Special Issue of Social Neuroscience (Psychology Press. London). Schwarze B (1982) An experimental analysis of ultimatum bargaining. if he wants. α Pα ¼ Eq. Dion D (1988) The further evolution of cooperation. effectively. H1 " # X X PðxjH0 Þ PðyjH0 . dc. Ewens WJ (1989) An interpretation and proof of the fundamental theorem of natural selection. switch to the equivalent short-memory strategy. She cannot be undone by another player’s longer-memory strategy. 2A. ACKNOWLEDGMENTS. That the proofs in Appendix A and Appendix B have a similar flavor is not coincidental. Güth W. We here prove that Y cannot evade X’s ZD strategy by changing his own strategy on a short timescale—even arbitrarily on every move of the game. Hauert Ch. Saxe R. So the expected number of times that the game is in state β is N   X À Á Nβ ¼ Mαi β qiαi i¼1 XX Mαβ ðqiα Þ ¼ Here. from X’s perspective. Poundstone W (1992) Prisoner’s Dilemma (Doubleday. Because X’s ZD strategy is independent of any fixed strategy of Y’s. Science 211:1390–1396. we can view the game from the forgetful player X’s perspective: If X thinks that Y’s memory is the same as her own. The key point is that each row of MðqÞ is linear in exactly one component of q. X’s strategy is not spoiled by Y’s move-to-move strategy changes. New York). . Chong SY (2007) The Iterated Prisoners’ Dilemma 20 Years On (World Scientific. The point is that Y cannot usefully “keep the game out of Markov equilibrium” or play “inside the Markov equilibration time scale. Roberts K (1985) Cartel behavior and adverse selection. Because the score for the play at time t depends only on expectations over the players’ conditionally independent moves at that time. Joshua Plotkin. Singapore). lose-shift that outperforms titfor-tat in the Prisoner’s Dilemma game. a numerical approach. ddg.X hPðx. Then. Thus. Science 314:1560–1563. Thus. the average of any number of different Mðqi Þ’s satisfies Á À [20] hMðqi Þii ¼ M hqi ii : Now consider the result of N consecutive plays. H1 ÞPðH1 jH0 ÞPðH0 Þ ¼ H0 " H1 # X X PðxjH0 Þ Pðy. H1 X ¼ PðxjH0 ÞPðyjH0 .” Now taking the (ensemble) expectation of the right-hand side and defining probabilities EVOLUTION   Nα Mαβ ðqiα Þ ijα α X À Á ¼ Nα Mαβ hqiα iijα ¼ X α ijα [21] . he has to play or simulate the game long enough to compute the above expectations over the histories that would have occurred for his longmemory strategy. αiþ1 ∈ fcc. and Richard Rapp for helpful comments and discussion. N. Dawkins R (1988) The Selfish Gene (Oxford Univ Press. H1 Þ H0 . the game goes from state αi to state αiþ1 by a draw from the four probabilities Mαi αiþ1 ðqiαi Þ. it is a short-memory strategy that produces exactly the same game results as Y’s original long-memory strategy. both exemplify situations where Y devises a supposedly intricate strategy that an oblivious X automatically marginalizes over. for large N. . is merely a peculiar kind of random number generator whose use does not affect either player. Nowak MA. New York). Sigmund K (2000) Fairness versus reason in the ultimatum game. A player with a good memory-one strategy can force the game to be played. The importance of this result is that the player with the shortest memory in effect sets the rules of the game. H1 ÞPðH0 . 16. Note that if Y actually wants to compute the short-memory strategy equivalent to his long-memory strategy. Page KM. Science 289:1773–1775. J Industr Econ 33:401–413. 3. J Econ Behav Organ 3:367–388. ddg. the result is a game conditioned only on H0 . . Proc Biol Sci 264:513–519. Sigmund K (1993) A strategy of win-stay. Jeff Hussmann.

Sign up to vote on this title
UsefulNot useful