Professional Documents
Culture Documents
Effective Choice in Prisoner's Dilemma PDF
Effective Choice in Prisoner's Dilemma PDF
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms
Sage Publications, Inc. is collaborating with JSTOR to digitize, preserve and extend access to
The Journal of Conflict Resolution
This content downloaded from 14.139.157.21 on Wed, 10 Jul 2019 17:05:23 UTC
All use subject to https://about.jstor.org/terms
Effective Choice in
the Prisoner's Dilemma
ROBERT AXELROD
Institute of Public Policy Studies
University of Michigan
This is a "primer" on how to play the iterated Prisoner's Dilemma game effectively.
Existing research approaches offer the participant limited help in understanding how to
cope effectively with such interactions. To gain a deeper understanding of how to be
effective in such a partially competitive and partially cooperative environment, a computer
tournament was conducted for the iterated Prisoner's Dilemma. Decision rules were
submitted by entrants who were recruited primarily from experts in game theory from
a variety of disciplines: psychology, political science, economics, sociology, and mathema-
tics. The results of the tournament demonstrate that there are subtle reasons for an in-
dividualistic pragmatist to cooperate as long as the other side does, to be somewhat for-
giving, and to be optimistic about the other side's responsiveness.
INTRODUCTION
AUTHOR'S NOTE: I would like to thank those who made this project possible, the
participants whose names and affiliations are given in the Appendix. I would also like to
thank Jeff Pynnonen, my research assistant, for helping to prepare the executive program
for the tournament. And for their helpful suggestions concerning the design and analysis
of the tournament, I would like to thank John Chamberlin, Michael Cohen, and Serge
Taylor. For its support of this research, I owe thanks to the Institute of Public Policy
Studies of the University of Michigan.
This content downloaded from 14.139.157.21 on Wed, 10 Jul 2019 17:05:23 UTC
All use subject to https://about.jstor.org/terms
4 JOURNAL OF CONFLICT RESOLUTION
This content downloaded from 14.139.157.21 on Wed, 10 Jul 2019 17:05:23 UTC
All use subject to https://about.jstor.org/terms
Axelrod / THE PRISONER'S DILEMMA 5
TABLE 1
Column Player
Cooperate Defect
Row Player
(The payoff to the row player is given first in each pair of numbers.)
This content downloaded from 14.139.157.21 on Wed, 10 Jul 2019 17:05:23 UTC
All use subject to https://about.jstor.org/terms
6 JOURNAL OF CONFLICT RESOLUTION
This content downloaded from 14.139.157.21 on Wed, 10 Jul 2019 17:05:23 UTC
All use subject to https://about.jstor.org/terms
Axelrod / THE PRISONER'S DILEMMA 7
bilities. It would also have to take into account two important facts
about strategic interaction in a non-zero sum setting. First, what is
effective is likely to depend not only upon the characteristics of a par-
ticular strategy, but also upon the nature of the other strategies with
which it must interact. The second point follows directly from the
first. An effective strategy must be able at any point to take into account
the history of the interaction as it has developed so far.
A computer tournament for the study of effective choice in the
iterated Prisoner's Dilemma meets these needs. In a computer tourna-
ment, each entrant writes a program which embodies a decision rule
to select the cooperative or noncooperative choice on each move. The
program has available to it the history of the game so far, and may use
this history in making a choice. Because the participants are recruited
primarily from those who have written on game theory and especially
the Prisoner's Dilemma, the entrants are assured that their decision
rule will be facing rules of other experts. Such recruitment also guaran-
tees that the state of the art is represented in the tournament.
Just such a tournament was conducted. The results contain some real
surprises. These surprises in turn offer new insights into the questions
of how to understand and how to cope with an environment which
contains aspects of the Prisoner's Dilemma.
THE TOURNAMENT
The Winner
This content downloaded from 14.139.157.21 on Wed, 10 Jul 2019 17:05:23 UTC
All use subject to https://about.jstor.org/terms
8 JOURNAL OF CONFLICT RESOLUTION
The tournament was run as a round robin so that each entry was
paired with each other entry. As announced in the rules of the tourna-
ment, each entry was also paired with its own twin and with RANDOM,
a program that randomly cooperates and defects with equal probability.
Each game consisted of exactly 200 moves. The payoff matrix for
each move awards both players 3 points for mutual cooperation, and
1 point for mutual defection. If one player defects while the other player
cooperates, the defecting player receives 5 points and the cooperating
player receives 0 points (see Table 1).
No entry was disqualified for exceeding the allotted time. In fact,
the entire round robin tournament was run five times to get a more
stable estimate of the scores for each pair of players. In all, there were
120,000 moves, making for 240,000 separate choices.
This content downloaded from 14.139.157.21 on Wed, 10 Jul 2019 17:05:23 UTC
All use subject to https://about.jstor.org/terms
Axe/rod / THE PRISONER'S DILEMMA 9
The Contestants
The fourteen submitted entries came from three countries and five
disciplines. The Appendix lists the names and affiliations of the sixteen
people who submitted these entries, and it gives the rank and score
of their entries.
One remarkable aspect of the tournament is that it allowed people
from different disciplines to interact with each other in a common
format and language. Most of the entrants were recruited from those
who had published articles on game theory in general or the Prisoner's
Dilemma in particular.
Analysis of the results shows that neither the discipline of the author,
the brevity of the program, nor its length accounts for a rule's relative
success. What does?
Before answering this question, a remark on the interpretation of
numerical scores is in order. A useful benchmark for very good perfor-
mance is 600 points, which is equivalent to the score attained by a
player when both sides always cooperate with each other. A useful
benchmark for very poor performance is 200 points, which is equivalent
to the score attained by a player when both sides never cooperate with
each other. As we shall see, most scores range between 200 and 600
points, although scores from 0 to 1000 points are possible. The winner,
TIT FOR TAT, averaged 504 points per game.
Niceness
This content downloaded from 14.139.157.21 on Wed, 10 Jul 2019 17:05:23 UTC
All use subject to https://about.jstor.org/terms
10 JOURNAL OF CONFLICT RESOLUTION
separated the more successful rules from the less successful rules in this
Computer Prisoner's Dilemma Tournament.
Each of the nice rules got about 600 points with each of the 7 other
nice rules and with its own twin. This is because when 2 nice rules play,
they are sure to cooperate with each other until virtually the end of
the game. Actually the minor variations in end-game tactics did not
account for much variation in the scores.
The scores of each player with each of the others are displayed in
Table 2. Notice that the numbers in the upper left are all near 600, due
to the fact that the top ranking entries are all nice and therefore cooper-
ate with each other.
The bottom left section of Table 2 shows that the rules which are not
nice rarely got more than 400 points with any of the nice rules. There-
fore, the nice rules had a tremendous advantage in that they cooperated
with each other, but were not very cooperative with the rules which
were the first to defect.
So far we have seen that niceness accounts for which rules are in the
top eight ranks and which rules are below. But what accounts for the
relative ranking among the top eight?
Since the nice rules all got within a few points of 600 with each other,
the thing which distinguishes the relative rankings among the nice
rules is their scores with the rules which are not nice. This much is
obvious.
What is not obvious is that the relative ranking of the eight top rules
was largely determined by just two of the other seven rules. These two
rules are kingmakers because they do not do so well for themselves,
but they largely determine the rankings among the top contenders.
The two kingmakers are GRAASKAMP and DOWNING. (Entries
are indicated by capital letters. All entries except the familiar TIT FOR
TAT are named after their authors.)
DOWNING is the most important kingmaker because the range of
scores achieved with it by the nice rules was the largest of any rule.
The scores of the nice rules playing DOWNING ranged from a high
of 601 by TIDEMAN AND CHIERUZZI to a low of 158 by NYDEG-
GER.
DOWNING is a particularly interesting rule in its own right, bec
unlike most of the others, its logic is not based on a variant of TIT
This content downloaded from 14.139.157.21 on Wed, 10 Jul 2019 17:05:23 UTC
All use subject to https://about.jstor.org/terms
6 'Z 't C) 1? CI N Xo M CI 1- 14 X t - C' o
C)) C X X wo 000000 C~~~~-
C) a", C' C) C) w o C~~- C
m m CT w w w w 1t C ^^
11 00 C- t 00 r C M It m > m m
IO 0 t 0C a m C) 00 0 It O 0 0
ssor C- -4 r- m r- t - t M 't m cl I' "o
C) - 't CO 0t ON tn 0- 00 0 00 r- O
dNNV 00 V N 0 C- j 00 > C M m 1 t 0 00
CT T m CT T C1 CT T CI "t C' C' C' m m
ONINMOU C) o ) r- Co C )t No
G~~~~~ ^^^^^mNV
H O ^ ? NITLS
00 o > > 00 C m m00 m
NVWUSY1D C) C) -- 't o om C) C O C
o lgflHIS gt g~ a" o0 g m t ^
~~EIDD3RJA o o o ooo o~ o
? ? ? ? ? ? ? ? ~~~~~~~~~~~t m It It m It
lo~~~~~~
IV N A II all all 00 00 11 oI ^t
7i
Ho)ZN _
cl s
;J.~~~~~~~~~~~1
This content downloaded from 14.139.157.21 on Wed, 10 Jul 2019 17:05:23 UTC
All use subject to https://about.jstor.org/terms
12 JOURNAL OF CONFLICT RESOLUTION
TAT. DOWNING assumes that the other side bases its choice on
DOWNING's own previous choice. DOWNING estimates the pro-
bability that the other player cooperates after it (DOWNING) cooper-
ates, and also the probability that the other player cooperates after
DOWNING defects. Each move, it updates its estimate of these two
conditional probabilities and then selects the choice which will maxi-
mize its own long-term payoff under the assumption that it has correctly
modeled the other player. The decision rule is based on "an outcome
maximization" principle originally developed as a possible interpreta-
tion of what human subjects do in Prisoner's Dilemma laboratory
experiments (Downing, 1975). If the two conditional probabilities
have similar values, DOWNING determines that it pays to defect, since
the other player seems to be doing the same thing whether DOWNING
cooperates or not. Conversely, if the other player tends to cooperate
after a cooperation but not after a defection by DOWNING, then the
other player seems responsive, and DOWNING will calculate that the
best thing to do with a responsive player is to cooperate. Under certain
circumstances, DOWNING will even determine that the best strategy
is to alternate cooperation and defection.
At the start of a game, DOWNING does not know the values of these
conditional probabilities for the other players. It assumes that they are
both .5, but gives no weight to this estimate when information actually
does come in during the play of the game.
This is a fairly sophisticated decision rule, but its implementation
does have one flaw. By initially assuming that the other player is un-
responsive, DOWNING is doomed to defect on the first two moves.
These first two defections led many other rules to punish DOWNING,
so things usually got off to a bad start. But this is precisely why DOW-
NING served so well as a kingmaker. First ranking TIT FOR TAT
and second ranking TIDEMAN AND CHIERUZZI both reacted in
such a way that DOWNING learned to expect that defection does
not pay but that cooperation does. All of the other nice rules went
downhill with DOWNING.
The other kingmaker is GRAASKAMP. This is a very clever pro-
gram. It plays tit for tat for 50 moves, defects once, plays tit for tat for
another 5 moves, and then examines the history of the game so far.
Its defection on move 51 allows it to recognize its own twin and be
cooperative with it. Similarly, it can check to see if the other player
This content downloaded from 14.139.157.21 on Wed, 10 Jul 2019 17:05:23 UTC
All use subject to https://about.jstor.org/terms
Axelrod / THE PRISONER'S D)ILEMMA 13
This content downloaded from 14.139.157.21 on Wed, 10 Jul 2019 17:05:23 UTC
All use subject to https://about.jstor.org/terms
14 JOURNAL OF CONFLICT RESOLUTION
Forgiveness
We have seen that the nice rules did well in the tournament largely
because they did so well with each other, and because there were enough
of them to substantially raise each other's average score. As long as the
other player did not defect, each of the nice rules was certain to continue
cooperating until virtually the end of the game.
But what happened if there was a defection? As we shall see, different
rules responded quite differently, and their response was important
in determining their overall success. A key concept in this regard is
the forgiveness of a decision rule. Forgiveness of a rule is its propensity
to cooperate in the moves after the other player has defected.
The least forgiving rule is also a nice one. This is FRIEDMAN, which
as we have just seen is cooperative until the other player defects; then
it never cooperates again. TIT FOR TAT is unforgiving for one move,
but then is totally forgiving of an isolated defection. SHUBIK, on the
other hand, gets progressively less forgiving: It punishes the first defec-
tion with one defection, then punishes the next with two defections,
and so on.
One of the main reasons why the rules which are not nice did not do
well in the tournament is that most of the rules in the tournament were
not very forgiving. A concrete illustration will help. Consider the case
of JOSS. This decision rule is a variation of TIT FOR TAT. Like
TIT FOR TAT, it always defects immediately after the other player
defects. But instead of always cooperating after the other player cooper-
ates, 10% of the time it defects after the other player cooperates. Thus
it tries to get away with an occasional exploitation of the other player.
This decision rule seems like a fairly small variation of TIT FOR
TAT, but in fact its overall performance was much worse, and it is
interesting to see exactly why. Table 3 shows the move-by-move history
of a game between JOSS and TIT FOR TAT. At first both players
cooperated, but on the sixth move, JOSS selected one of its probabilis-
tic defections. On the next move JOSS cooperated again, but TIT FOR
TAT defected in response to JOSS's previous move. Then JOSS de-
fected in response to TIT FOR TAT's defection. In effect, the single
defection of JOSS on the sixth move created an echo back and forth
between JOSS and TIT FOR TAT. This echo resulted in JOSS de-
fecting on all the subsequent even numbered moves and TIT FOR TAT
defecting on all the subsequent odd numbered moves.
During this alternation of cooperation and defection by the two
players, each was getting an average of 2.5 points per turn (5 when the
This content downloaded from 14.139.157.21 on Wed, 10 Jul 2019 17:05:23 UTC
All use subject to https://about.jstor.org/terms
Axelrod / THE PRISONER'S L)ILEMMA 15
TABLE 3
other cooperated and 0 when the other defected). This is not too much
below the average of 3 points per turn when both were cooperating.
However, worse was to come. JOSS was still cooperating on moves
7, 9, 11, 13, and so on. But on the twenty-fifth move, JOSS selected
another of its probabilistic defections. Of course TIT FOR TAT de-
fected on move 26 and another reverberating echo began. This echo
had JOSS defecting on the odd numbered moves and TIT FOR TAT
defecting on the even numbered moves. Together these two echoes
resulted in both players defecting on every move after move 25. This
meant that for the next 175 moves they both got only one point per
turn. The final score of this game was 236 for TIT FOR TAT and 241
for JOSS. Notice that while JOSS did a little better than TIT FOR TAT,
both did poorly.2
2. In the five games between them, the average scores were 225 for TIT FOR TAT
and 230 for JOSS.
This content downloaded from 14.139.157.21 on Wed, 10 Jul 2019 17:05:23 UTC
All use subject to https://about.jstor.org/terms
16 JOURNAL OF CONFLICT RESOLUTION
This content downloaded from 14.139.157.21 on Wed, 10 Jul 2019 17:05:23 UTC
All use subject to https://about.jstor.org/terms
Axelrox/ / THE PRISONER'S DILEMMA 17
3. As might be expected, the results could depend on the payoff matrix. For example,
if P is changed from I to 2, then TIDEMAN AND CHIERUZZI would have come in
first, followed by STEIN AND RAPOPORT, FRIEDMAN, and TIT FOR TAT. Of
course, the entrants knew the actual payoff matrix. Had the announced payoff matri
been different, some of them might have submitted different programs, and this wou
have affected everyone's score.
This content downloaded from 14.139.157.21 on Wed, 10 Jul 2019 17:05:23 UTC
All use subject to https://about.jstor.org/terms
18 JOURNAL OF CONFLICT RESOL UTION
4. TULLOCK was also involved in a cycle of scores. TULLOCK did about fifty
points better than SHUBIK in their games together. And SHUBIK did about ten points
better than DOWNING in their games together. But DOWNING did fifty points better
than TULLOCK in their games together.
This content downloaded from 14.139.157.21 on Wed, 10 Jul 2019 17:05:23 UTC
All use subject to https://about.jstor.org/terms
Axelrod / THE PRISONER'S DILEMMA 19
This content downloaded from 14.139.157.21 on Wed, 10 Jul 2019 17:05:23 UTC
All use subject to https://about.jstor.org/terms
20 JOURNAL OF CONFLICT RESOLUTION
(a) The sample program sent to prospective contestants to show them how to make
a submission would in fact have won the tournament if anyone had simply clipped
it and mailed it in! But no one did. This rule is to defect only if the other player de-
fected on the previous two moves. It is a more forgiving version of TIT FOR TAT
in that it does not punish isolated defections. The excellent performance of the TIT
FOR TWO TATS rule highlights the fact that a common error of the contestants
was to expect that gains could be made from being relatively less forgiving than
TIT FOR TAT, whereas in fact there were big gains to be made from being even
more forgiving. The implication of this finding is striking, since it suggests that
expert strategists do not give sufficient weight to the importance of forgiveness.
(b) Another rule which would have won the tournament was also available to most
of the contestants. This was the rule which won the preliminary tournament, a
report of which was used in recruiting the contestants (Axelrod, 1978). Called
LOOK AHEAD, it was inspired by the tree searching techniques used in the
artificial intelligence programs to play chess. It is interesting that artificial in-
telligence techniques could have inspired a rule which was in fact better than any
of the rules designed by game theorists specifically for the Prisoner's Dilemma.
(c) A third rule which would have won the tournament was a slight modification of
DOWNING. If DOWNING had started with initial beliefs that the other players
would be responsive rather than unresponsive, it too would have won and won
by a large margin.5 A kingmaker could have been king. DOWNING's initial
beliefs about the other players were pessimistic. It turned out that optimism
about their responsiveness would not only have been more accurate but would also
have led to more successful performance. It would have resulted in first place
rather than tenth place.
This content downloaded from 14.139.157.21 on Wed, 10 Jul 2019 17:05:23 UTC
All use subject to https://about.jstor.org/terms
Axelrod / THE PRISONER'S DILEMMA 21
Appendix
First Place with 504.5 points is TIT FOR TAT submitted by Anatol Rapo-
port of the Department of Psychology, University of Toronto. This rule is
only four lines long in FORTRAN. It cooperates on the first move, and then
does whatever the other player did on the previous move. It has a long history,
since it can be identified with the ancient lex talionis or an eye for an eye. A
recent mathematical treatment of its strategic properties is given in Taylor
(1976). Discussions of its successful performance with human subjects include
Oskamp (1971) and Wilson (1971).
This content downloaded from 14.139.157.21 on Wed, 10 Jul 2019 17:05:23 UTC
All use subject to https://about.jstor.org/terms
22 JOURNAL OF CONFLICT RESOLUTION
This content downloaded from 14.139.157.21 on Wed, 10 Jul 2019 17:05:23 UTC
All use subject to https://about.jstor.org/terms
Axe/rod / THE PRISONER'S DILEMMA 23
This content downloaded from 14.139.157.21 on Wed, 10 Jul 2019 17:05:23 UTC
All use subject to https://about.jstor.org/terms
24 JOURNAL OF CONFLICT RESOLUTION
REFERENCES
This content downloaded from 14.139.157.21 on Wed, 10 Jul 2019 17:05:23 UTC
All use subject to https://about.jstor.org/terms
Axe/rod / THE PRISONER'S DILEMMA 25
This content downloaded from 14.139.157.21 on Wed, 10 Jul 2019 17:05:23 UTC
All use subject to https://about.jstor.org/terms