You are on page 1of 9

Complex Strategies in the Iterated Prisoner's Dilemma

Jean-Paul Delahaye and Philippe Mathieu

Laboratoire d'Informatique Fondamentale de Lille
U.A. 369 du C.N.R.S., University of Lille I,
59655 Villeneuve d'Ascq Cedex. FRANCE.
e-mail: fdelahaye,mathieug@li .fr

A modi ed version of the Iterated Prisoner's Dilemma is proposed and examined. At
each move of the game a de nitive renunciation to further interactions with the other player
is allowed. The results of a simulation are presented. This simulation uses the 95 proposed
strategies in a tournament organized by the French edition of Scienti c American. The conclu-
sions are analogous to those of R. Axelrod excepted on the importance of simplicity. Several
arguments are given in favor of the view that complexity is necessary to obtain good strategies
in the Iterated Prisoner's Dilemma (classical or modi ed version), and that there is no limit
in the expected strength of strategies: new strategies, more and more complex and ecient,
will appear if suciently rich environments are constructed and simulated. Finally we argue
that in such a game, the whole perspective of evolution of intelligence is probable. Thus we
have a new argument in favor of a law of \complexi cation" in the universe.

1 The Classical Iterated Prisoner's Dilemma (CIPD)

In the Classical Prisoner's Dilemma [31, 18, 17, 8, 30] each player has two choices: cooperate (c)
or defect (d). The reward for mutual cooperation [c,c] is, say, R=3. The sucker's payo and the
temptation to defect [c,d] are S=0 and T=5. The punishment for mutual defection [d,d] is P=1.
If the game is only played once, then each player gets a higher payo from defecting than from
cooperating, regardless of what the other player does.
When the game is played repeatedly we obtain the Classical Iterated Prisoner's Dilemma (CIPD)
[4, 1, 2, 8, 18, 17, 9, 30]. The exact rules of the game are:
1. The interactions are between pairs of players (i.e., strategies).
2. The number of moves is xed but unknown to the two players (in our experiment, we chose
1000 moves).
3. Each player has two possible choices on each move: cooperate or defect. Choices are made
4. The payo s R, S, T and P have been determined before the game and announced to the
To see which strategies would be e ective in exploiting the opportunities for cooperation, two
round-robin computer tournaments were organized by R. Axelrod [4, 1]. The winner of the two
tournaments was the TIT-FOR-TAT, a strategy that uses cooperation on the rst move of the
game and then plays whatever the other player chose on the previous move.
This performance of the TIT-FOR-TAT strategy did not prove that TIT-FOR-TAT would perform
well as an evolutionary strategy. Hence an ecological simulation was conducted. The population
dynamics of the ecological simulation was determined by setting the change in frequency of each

strategy in any given round to be proportional to its relative success in the previous round.
The result obtained by R. Axelrod with the ecological simulation is that TIT-FOR-TAT quickly
becomes the most common strategy. Other experiments and studies about this game, or some of
its variants have been done (see [2, 5, 3, 6, 13, 14, 15, 20, 24, 25, 26, 28, 29]). This game has been
put to use in various ways ([22, 23, 7]).

2 The Iterated Prisoner's Dilemma with Renunciation

Now we are going to explore a new version of the Iterated Prisoner's Dilemma in which each
strategy can give up the game. This choice of strategy is irreversible and each player then obtains
a payo of = 2 for every remaining moves until the game is over (the version of the iterated

prisoner's dilemma with reversible renunciation has been considered by [32]). This value for N

is chosen to be greater than = 0 or = 1, on the basis on the fact that when you go your

own way, you get a better result than when you are in a con ictual situation, or when somebody
exploits you. The value for N is chosen to be less than = 5 or = 3 since, when you cooperate

or when you exploit someone, you obtain a better result than when you are isolated.
This version of the iterated prisoner's dilemma is a more realistic version of the real world than
the original one (in our every day life, we are able to stop interacting with a \player" that seems
too weird or too aggressive). Yet, This game is almost as simple as the original one. So, it is
interesting to examine if it con rms Axelrod's results.
The exact values = 0 = 1 = 2 = 3 = 5 are not important, provided it remains to
S ;P ;N ;R ;T

( + )2
T S = < R

S < P < N < R < T

The rst relation is assumed in order to avoid that a game [c,d] [d,c] [c,d] [d,c] [c,d] [d,c] etc.,
provides each player with a better reward than [c,c] [c,c] [c,c] etc.
Here are 3 examples of strategies:
 HARD. I defect if the other player cooperates. If he defects once, I give up.
 TESTER-4. In the 4 rst moves, I play cooperate, cooperate, defect defect. Then, if the
other player in his rst 4 moves has defected 3 or 4 times, I give up; otherwise I cooperate
until the game is over.
 TIT-FOR-TAT-WITH-THRESHOLD. I play TIT-FOR-TAT, but each ve moves, I
compute my average payo . If I got less than 2 as an average, then I give up.
Let us consider games of 1000 moves.
The confrontation of HARD versus TESTER-4 gives the following results. In the rst move HARD
defects and TESTER-4 cooperates [d,c]; in the second move, HARD defects, and TESTER-4
cooperates [d,c]; in the third move, HARD defects and TESTER-4 defects [d,d]; in the fourth
move, HARD gives up since his opponent has defected in the previous move. We use the notation:
[d,c] [d,c] [d,d] [r].
Hence the score with a game of 1000 moves is 5 + 5 + 1 + 997 2 = 2005 for HARD, and 0 + 0 +

1 + 997 2 = 1995 for TESTER-4. Note that when a player gives up, every move gives 2 points

to each player until the end of the game (this is the result of their \solitary work" when they are
The game HARD versus TIT-FOR-TAT-WITH-THRESHOLD gives [d,c] [d,d] [r]. The strategy
HARD obtains 5 + 1 + 998 2 = 2002, and TIT-FOR-TAT-WITH-THRESHOLD obtains 0 + 1 +

998 2 = 1997.

The game TESTER-4 versus TIT-FOR-TAT-WITH-THRESHOLD gives [c,c] [c,c] [d,c] [d,d] [c,d]
[c,c] [c,c] [c,c] ... TESTER-4 obtains 3 + 3 + 5 + 1 + 0 + 3 995 = 2997 and TIT-FOR-TAT-WITH-

THRESHOLD gets 3 + 3 + 0 + 1 + 5 + 3 995 = 2997.


HARD versus HARD produces 1 + 999 2 = 1999.


TESTER-4 versus TESTER-4 gives 3 + 3 + 1 + 1 + 996 3 = 2996.

In this mini-tournament, the nal result is 7994 points for TIT-FOR-TAT-WITH-THRESHOLD

the winner, 7988 points for TESTER-4, and 6006 points for HARD.
In this example we veri ed again a basic fact of the cooperation theory: HARD wins when
confronted with each of the other two strategies, but it is not sucient to win the mini-tournament.
In fact HARD is the worst strategy when all the scores are in, because HARD does not succeed
in creating cooperation: \to win versus other strategies" is not the same as \to get a good score",
because aggressiveness is an obstacle to cooperation.
Note that, in the new game, it is easy to obtain 2000 points: all the player has to do is to give up
in the rst move. But such a \solitary strategy" cannot take advantage of possible cooperation
with other cooperative strategies: extreme caution is not a good idea.

3 The experiment with Pour La Science

With the cooperation of Pour La Science the French edition of Scienti c American, we organized
a tournament [9, 11, 12]. The rules of the tournament where as follows:
 Each participant could submit only one strategy, either by writing a program for his strategy
(instructions were given for writing strategies in C), or by sending us a description of his
strategy (in this case the de nition was limited to 100 words).
 A general round-robin tournament is run (all the strategies are opposed two by two). The
exact length of each game was not known before end, but was known to be between 100 and
1000 rounds ( nally, the simulation was made with 1000 rounds).
 The ranking of the strategies was determined by the overall number of points obtained.
 The winner won a ve-year subscription to Pour La Science. We received 104 submissions.
Some of the strategies proposed (9 of them) where incomprehensible and had to be rejected.
No other strategies where added to the 95 remaining strategies.

The winner is a kind of TIT-FOR-TAT-WITH-THRESHOLD. We call it FIRST:

In the rst move I cooperate.
Every 20 moves, I compute the average of my past payo s. If this average is less than 1.5, I give
Each time the other player defects and if I am not already in a period of retaliation, I start a new
period of retaliation. The th period of retaliation is the succession of ( + 1) 2 defections
N N N =

followed by 2 cooperations.
This strategy (conceived by Christophe Dziengelewski, a student in computer science) has several
interesting properties:
 FIRST is a reactive strategy.
 FIRST possesses a system of threshold.
 The retaliations of FIRST when the other player does not cooperate are progressive.
 After a period of retaliation, FIRST tries to calm the other player with two consecutive
unconditional cooperations.

The strategy FIRST is not simple. In fact, there is no simple strategy among the rst 20 strategies.
The second highest score in the tournament was obtained by the following strategy:
I play 5 moves of TIT-FOR-TAT, 5 moves of ALL-C (the strategy which always cooperates), ve
moves of SPITEFUL (\I always defect if the other defects once"), ve moves of PERIODIC-C-C-D
(\I play periodically cooperation, cooperation, defection, ...").
I compute the averages of the rewards obtained with the last four moves of each strategy. (*) If
the best average is less than 1.5, I give up. If not, I play 12 moves of the strategy (among the 4)
which has produced the best average.
Then I compute the new averages for each of the 4 strategies. I go to (*).
Note that this strategy uses again the idea of threshold, but it is based on an other principle.
The third strategy in the tournament was:
In the rst move, I cooperate and I am calm { i.e., I am in the state called \calm."
When I am calm, I play TIT-FOR-TAT and I stay calm, but when the other player defects, I
become irritated { I pass in the state \irritated."
When I am irritated, if the other player cooperates, I cooperate and I return to the state calm.
But when I am irritated, if the other player defects, I become \furious."
When I am furious I always defect, excepted if the other player has defected 12 consecutive times,
in which case I compute his number of cooperations 1 and his number of defections 2. If

N 1 2, I give up. If 1  2, I cooperate and I return to the state irritated.

< N N N

4 Analysis
This variant of the Iterated Prisoner's Dilemma is not obvious, and the good strategies in the
Classical Iterated Prisoner's Dilemma (which are still strategies in this variant) are not very good
anymore. Renunciation is useful: the best strategy that does not use the possibility of renunciation
is number 16. TIT-FOR-TAT is number 50.
As in the classical case, aggressiveness (to initiate defection) is always bad. When an ecological
simulation is run, all the aggressive strategies are rapidly eliminated, included SECOND (which
is aggressive since it plays PERIODIC-C-C-D). In the ecological simulation, FIRST remains rst.
The good ideas for building up good strategies are easy to understand. Here are some of them:
 \Don't be aggressive { especially don't defect on the rst move."
 \Be reactive (take into account the choices of your opponent)."
 \Use a threshold for renunciation."
 \Retaliate (immediately defect after an uncalled for defection from the other)."
 \Use gradual retaliation."
 \Be capable of forgiving."
 \From time to time, try create new conditions for cooperation (like FIRST who cooperates
twice after a period of retaliation)."
 \be a tester (play several moves and analyze the reaction, but be careful when choosing the
sequence of tests)."

 \Try to simulate several strategies and, according to the results obtained, continue to play
the best strategy (the main idea of SECOND).
The strategies incorporating only one or two of these principles obtained poor results. Many
strategies used the idea of a threshold and the ideas incorporated in TIT-FOR-TAT. According
to the choice of the parameters in the threshold, they ranked between 7 and 47.
The use of the renunciation seems required to obtain a good score: among the rst 40 strategies,
only 3 do not use renunciation.
With slight modi cations in FIRST (we make it count defections during periods of retaliation),
SECOND (we made it avoid aggressivity), and THIRD (we made it slightly lenient), we were able
to obtain 3 new strategies FIRST', SECOND', THIRD', which are better than FIRST, SECOND,
THIRD when added to the 95 strategies submitted in the Pour La Science tournament.
We veri ed that our results are not sensible to the parameter values = 0, = 1, = 2, = 3,

T = 5, or to the number of moves (1000).

Two remarkable facts stand out:
1. In this game, simplicity does not seem to bring the advantages it did in R. Axelrod's results
and analyses.
2. There does not appear to be a robust { a kind of optimal { strategy in such a game.1

5 Complex strategies in CIPD and IPDR

Let us recall some of Axelrod's conclusions [1]:
The advice takes the form of four simple suggestions for how to do well in a durable iterated
Prisoner's Dilemma :
1. Don't be envious.
2. Don't be the rst to defect.
3. Reciprocate both cooperation and defection.
4. Don't be too clever (p. 110).
The very sophisticated rules did not do better than the simple ones (p.120).
One way to account for TIT-FOR-TAT's great success in the tournament is that it has great
clarity: it is eminently comprehensible to the other player (p.123).
Too much complexity can appear total chaos. If you are using a strategy which appears random,
then you also appear unresponsive to the other player. If you are unresponsive, then the other
player has no incentive to cooperate with you. So being so complex as to be incomprehensible
is very dangerous (p.122).
In the iterated Prisoner's Dilemma, you bene t from the other player's cooperation. The trick
is to encourage that cooperation. A good way to do it is to make it clear that you will
reciprocate (p.123).
These remarks are not totally compatible with the results of our simulations of the Iterated
Prisoner's Dilemma with Renunciation (nor with some other simulations of the Classical Iterated
Prisoner's Dilemma [10]). But we think that Axelrod's proposed analyses of simplicity are not
general, and fail to take into account some important points.
1 Details on the strategies (including the codes) and on the tournament are given in [12]

5.1 To Be Uncompromising With Consistent, Rigid Strategies Is Not
Always the Best Choice
Even if a strategy is not cooperative (and even if you do not understand it perfectly well), it is not
a good reason to stop interacting with it. We experience such situations in everyday life, and we
sometimes choose not to follow Axelrod's recommendation; we consider instead that interactions
with generally in exible persons, even if risky, are often preferable to con ict or separation.
The example of PERIODIC-C-C-D is clear. TIT-FOR-TAT does not play optimally against it.
A very slight modi cation in TIT-FOR-TAT { add the following instructions: identify periodic
behavior and, after the 5-th period, exploit it { gives a strategy which is strictly better than
The idea of exploiting consistent strategies is exactly what men do with domestic animals. Their
relations are exactly an Iterated Prisoner's Dilemma with possible Renunciation: men choose to
turn to their advantage the rigid (and not always cooperative) behavior of cats, for example, and
this is pro table to both (if cats had not been useful to us, there would certainly not be as many
of them today).
5.2 Partially Random or Partially Unintelligible Strategies Require Clev-
Not to try to understand (at least partially) complex random behavior is sometimes a bad idea.
It is again possible to improve TIT-FOR-TAT by enabling it to identify (not with certainty, but
with a certain probability) a random strategy with a cooperation parameter greater than 1/3.
Even in the classical Iterated Prisoner's Dilemma, some form of cleverness is useful. If you are
confronted with a random strategy [2/3 c, 1/3 d], you would be better o always cooperating than
always defecting, or giving up the game. In order to identify such kinds of favorable randomness,
you must be clever. And, of course, confrontation with more complex strategies requires more
subtle analysis.
A bad property of TIT-FOR-TAT is that it generates strings of defections when there is noise in
communication. In order to avoid this,some modi cations are possible and necessary [25, 26, 5].
5.3 To Have a Clear Behavior Is Not Always a Good Strategy
The argument that in order to obtain cooperation, you must have an easy to understand { hence
simple { behavior is only valid with strategies that are ready to cooperate: you must show that
you are ready to cooperate, and the best method is simply to cooperate and to quickly react to
defection. But nothing is said about the complexity of the retaliation, and we have experienced
the fact that even in the Classical Iterated Prisoner's Dilemma progressive retaliations (as used
by the strategy FIRST) give better results than the TIT-FOR-TAT.
In fact, in our daily life we know that with some people it is preferable not to show exactly who
we are, and a good rule is often \be clear in presence of cooperation, but do not be too clear and
predictable in presence of strange adversaries." To give up too rapidly is a bad choice, since it may
be that after a period necessary to achieve mutual comprehension, some kind of reasonable, partial
cooperation will be possible, and that such partial cooperation is better than war or renunciation.
When you are confronted with a strategy that tries to exploit you or shows chaotic behavior (a
random strategy, or worse, some \psychotic" strategy), you must try to nd the optimal behavior
relative to this strategy, and it can't be expected that such dicult adjustment is easy to de ne.
Think a dicult character (an artist perhaps): to stop any relation with him is certainly a bad
idea in certain cases; trying to adjust and to elaborate a clever and cautious (but not too cautious)
strategy may bene t you (you may become his impresario!).
In his rst and second tournament, Axelrod ran many complex, but nonclever, strategies. We
suspect that this is why he missed the point that some form of clever TIT-FOR-TAT are possible
and give better results. The scarcity of good instances of TIT-FOR-TAT in real life (see, for

example, [19] p.125) is presumably due to the simple fact that TIT-FOR-TAT is not optimal,
which provides an indirect con rmation of our analyses.
We believe that the Iterated Prisoner's Dilemma allows for an unlimited perspective of improve-
ments in strategies. Only the rst steps of an in nite progression of more clever and complex
strategies have been observed. Were we to run large simulations (with unlimited computation
resources), we think that a kind of evolution toward complexity and intelligent strategies would
emerge. If future experiments support this conjecture, we will have new arguments for the idea
that complexity is not a contingent feature of the living world, but an unavoidable and spontaneous
natural fact.
R. Axelrod argues that \The discrimination of others may be among the most important abilities
because it allows one to handle interaction with many individuals without having to treat them all
the same, thus making possible the rewarding of cooperation from one individual and the punishing
of defection from another" [2] p.94. This is a rst outcome of complexity in cooperation, and we
agree with the importance R. Axelrod assigns it in the theory of cooperation (and hence in the
theory of evolution). But we think the development of good strategies more complex than TIT-
FOR-TAT will prove to be a more basic advantage of complexity. Cooperation would then give us
a further line of argument for the spontaneous development of complexity in evolutionary process.
We do not think that the evolution of strategies toward complexity is deterministic. As in real life,
there may be much indeterminism in the outcomes of evolution (on this, see [16]). Perhaps some
simulations will produce worlds with descendants of FIRST prevailing everywhere, and others will
give worlds with descendants of SECOND occupying the space. Automatic \complexi cation" in
evolutionary process (as we think is highly probable) does not imply evolutionary determinism,
i.e., the necessity of the actual world of life.

6 Conclusion and Research Program

Complex strategies can do better in the Iterated Prisoner's Dilemma (classical version and variant
with renunciation) than simple strategies. This idea follows both from our experimentation and
from abstract analysis. New experiments will be needed to give a more satisfactory proof of this
result { which is crucial for demonstrating that complexity spontaneously arises in natural process.
In this perspective, we are doing new simulations using genetics algorithms. But contrary to [2]
(see also [7]), we cannot limit our strategies to last three moves: complexity will appear only if we
give it enough room.
Acknowledgments: We would like to acknowledge helpful comments and bibliographical informa-
tion from Ejan Mackaay and Pierre Lemieux.

7 Bibliography
[1] Axelrod R. The Evolution of Cooperation. Basic Books, New-York, 1984. (Traduction fran-
caise Donnant donnant: Theorie du comportement cooperatif. Editions Odile Jacob. Paris,
[2] Axelrod R. The Evolution of Strategies in the Iterated Prisoner's Dilemma. in "Genetic
Algorithms and Simulated Annealing. L. Davis Ed." Pitman, London 1987. p. 32-41.
[3] Axelrod R., D. Dion. The Further Evolution of Cooperation. Sciences V. 242. 9 December
1988. pp. 1385-1390.

[4] Axelrod R., W. D. Hamilton. The evolution of cooperation. Science, V. 211, 27. March 1981.
[5] Bendor J. In Good Times and Bad: Reciprocity in an Uncertain World. Am. J. Polit. Sci. 31.
1987. pp. 531-558.
[6] Boyd R., J. P. Lorberbaum. No pure strategy is evolutionarily stable in the repeated Prisoner's
Dilemma game. Nature V. 327. 7 may 1987. pp. 58-59.
[7] Danielson P. A. Evolving Arti cial Moralities: Genetic Strategies, Spontaneous Orders, and
Moral Catastrophe. "Chaos and Society" at L'Universite de Hull a Quebec, june 1-2 1994.
(P.Lemieux ed., to appear).
[8] Dawkins R. The Sel sh Gene. Oxford University Press 1976. Second Edition, Richard Dawkins
1989. Traduction francaise Le Gene Egoiste, Editions Colin, Paris. 1990.
[9] Delahaye J.-P. L'altruisme recompense ? Pour La Science (French edition of the Scienti c
American) novembre 1992, pp.150-156.
[10] Delahaye J.-P., P. Mathieu. Experiences sur le dilemme itere des prisonniers. Rapport de
Recherche du Laboratoire d'Informatique Fondamentale de Lille (URA CNRS 369), no233,
juin 1992.
[11] Delahaye J.-P., P. Mathieu (a). L'altruisme perfectionne. Pour La Science (French edition of
the Scienti c American) mai 1993, pp.102-107
[12] Delahaye J.-P., P. Mathieu (b). L'altruisme perfectionne : details sur le concours. Rapport de
Recherche du Laboratoire d'Informatique Fondamentale de Lille (URA CNRS 369), no249,
mai 1993.
[13] Fader P. S., J. Hauser. Implicit Coalitions in the Generalized Prisoner's Dilemma. Journal of
Con ict Resolution 32,3. 1988. pp. 533-582.
[14] Feldman M. W., E. A. C. Thomas. Behavior-dependant Context for Repeated Plays of the
Prisoner's Dilemma II: Dynamical Aspects of the Evolution of Cooperation. J. Theor. Biol.
1987. pp. 297-315.
[15] Godfray H. C. J. The evolution of forgiveness. Nature V. 355. 16 january 1992. pp. 206-207.
[16] S.G. Gould, Wonderful Life. W.W. Norton, 1989.
[17] Hofstadter D. R. Metamagical Themas: Questing for the Essence of Mind and Pattern.
Basic Book 1985, Bantam Books, New York. 1986 (Traduction francaise: Ma Themagie.
InterEditions, Paris. 1988.)
[18] Hofstadter D. R. The prisoner's Dilemma Computer Tournaments and the evolution of Co-
operation. Scienti c American. No 248, May 1983. pp. 16-26.
[19] Jaisson P. La fourmi et le sociobiologiste. Editions Odile Jacob, Paris, 1993.
[20] Joshi N. V. Evolution of cooperation by reciprocation within structured demes.
[21] J Genet. V. 66-1. 1987. pp. 69-84.
[22] Lemieux P. Chaos et Anarchie. "Chaos and Society" at L'Universite de Hull a Quebec, june
1-2 1994. (P.Lemieux ed., to appear).
[23] Mackaay E. L'ordre spontane comme fondement du droit - un survol de l'emergence des regles
dans la societe civile. Revue Internationale de Droit Economique, 3, 1989, pp. 247-287.
[24] May R. M. More evolution of cooperation. Nature V. 327. May 1987. pp. 15-117.

[25] Molander P. The Optimal Level of Generosity in a Sel sh, Uncertain Environment. Journal
of Con ict Resolution. Vol. 29-4. December 1985. pp. 611-618.
[26] Mueller U. Optimal Retaliation for Optimal Cooperation. Journal of Con ict Resolution. 31,
4. December 1987. pp. 692-724.
[27] Nowak M. Stochastic Strategies in the Prisoner's Dilemma. Theoretical Population Biology.
38 1990. pp. 93-112.
[28] Nowak M., K. Sigmund. Oscillations in the Evolution of Reciprocity. J. Theoretical Biology.
137. 1989. pp. 21-26.
[29] Nowak M., K. Sigmund. Tit for tat in heterogeneous populations. Nature, V. 355 16 January
1992. pp. 250-253.
[30] Poundstone W. Prisoner's dilemma. Oxford University Press, 1993
[31] Rapoport A., A. M. Chammah. Prisoner's Dilemma : A Study in Con ict and Cooperation.
The University of Michigan Press, Ann Arbor. 1965.
[32] Vanberg V. J., R. D. Congleton. Rationality, Morality, and Exit. American Political Science
Review. Vol. 86, no2, june 1992. pp. 418-431.