Complex Strategies in the Iterated Prisoner's Dilemma

Jean-Paul Delahaye and Philippe Mathieu

Laboratoire d'Informatique Fondamentale de Lille U.A. 369 du C.N.R.S., University of Lille I, 59655 Villeneuve d'Ascq Cedex. FRANCE. e-mail: fdelahaye,mathieug@li .fr

A modi ed version of the Iterated Prisoner's Dilemma is proposed and examined. At each move of the game a de nitive renunciation to further interactions with the other player is allowed. The results of a simulation are presented. This simulation uses the 95 proposed strategies in a tournament organized by the French edition of Scienti c American. The conclusions are analogous to those of R. Axelrod excepted on the importance of simplicity. Several arguments are given in favor of the view that complexity is necessary to obtain good strategies in the Iterated Prisoner's Dilemma (classical or modi ed version), and that there is no limit in the expected strength of strategies: new strategies, more and more complex and e cient, will appear if su ciently rich environments are constructed and simulated. Finally we argue that in such a game, the whole perspective of evolution of intelligence is probable. Thus we have a new argument in favor of a law of \complexi cation" in the universe.

Abstract

1 The Classical Iterated Prisoner's Dilemma (CIPD)
In the Classical Prisoner's Dilemma 31, 18, 17, 8, 30] each player has two choices: cooperate (c) or defect (d). The reward for mutual cooperation c,c] is, say, R=3. The sucker's payo and the temptation to defect c,d] are S=0 and T=5. The punishment for mutual defection d,d] is P=1. If the game is only played once, then each player gets a higher payo from defecting than from cooperating, regardless of what the other player does. When the game is played repeatedly we obtain the Classical Iterated Prisoner's Dilemma (CIPD) 4, 1, 2, 8, 18, 17, 9, 30]. The exact rules of the game are: 1. The interactions are between pairs of players (i.e., strategies). 2. The number of moves is xed but unknown to the two players (in our experiment, we chose 1000 moves). 3. Each player has two possible choices on each move: cooperate or defect. Choices are made simultaneously. 4. The payo s R, S, T and P have been determined before the game and announced to the players. To see which strategies would be e ective in exploiting the opportunities for cooperation, two round-robin computer tournaments were organized by R. Axelrod 4, 1]. The winner of the two tournaments was the TIT-FOR-TAT, a strategy that uses cooperation on the rst move of the game and then plays whatever the other player chose on the previous move. This performance of the TIT-FOR-TAT strategy did not prove that TIT-FOR-TAT would perform well as an evolutionary strategy. Hence an ecological simulation was conducted. The population dynamics of the ecological simulation was determined by setting the change in frequency of each 1

strategy in any given round to be proportional to its relative success in the previous round. The result obtained by R. Axelrod with the ecological simulation is that TIT-FOR-TAT quickly becomes the most common strategy. Other experiments and studies about this game, or some of its variants have been done (see 2, 5, 3, 6, 13, 14, 15, 20, 24, 25, 26, 28, 29]). This game has been put to use in various ways ( 22, 23, 7]).

2 The Iterated Prisoner's Dilemma with Renunciation
Now we are going to explore a new version of the Iterated Prisoner's Dilemma in which each strategy can give up the game. This choice of strategy is irreversible and each player then obtains a payo of = 2 for every remaining moves until the game is over (the version of the iterated prisoner's dilemma with reversible renunciation has been considered by 32]). This value for is chosen to be greater than = 0 or = 1, on the basis on the fact that when you go your own way, you get a better result than when you are in a con ictual situation, or when somebody exploits you. The value for N is chosen to be less than = 5 or = 3 since, when you cooperate or when you exploit someone, you obtain a better result than when you are isolated. This version of the iterated prisoner's dilemma is a more realistic version of the real world than the original one (in our every day life, we are able to stop interacting with a \player" that seems too weird or too aggressive). Yet, This game is almost as simple as the original one. So, it is interesting to examine if it con rms Axelrod's results. The exact values = 0 = 1 = 2 = 3 = 5 are not important, provided it remains to that ( + )2
N N S P T R S ;P ;N ;R ;T T S = < R S < P < N < R < T

The rst relation is assumed in order to avoid that a game c,d] d,c] c,d] d,c] c,d] d,c] etc., provides each player with a better reward than c,c] c,c] c,c] etc. Here are 3 examples of strategies: HARD. I defect if the other player cooperates. If he defects once, I give up. TESTER-4. In the 4 rst moves, I play cooperate, cooperate, defect defect. Then, if the other player in his rst 4 moves has defected 3 or 4 times, I give up; otherwise I cooperate until the game is over. TIT-FOR-TAT-WITH-THRESHOLD. I play TIT-FOR-TAT, but each ve moves, I compute my average payo . If I got less than 2 as an average, then I give up. Let us consider games of 1000 moves. The confrontation of HARD versus TESTER-4 gives the following results. In the rst move HARD defects and TESTER-4 cooperates d,c]; in the second move, HARD defects, and TESTER-4 cooperates d,c]; in the third move, HARD defects and TESTER-4 defects d,d]; in the fourth move, HARD gives up since his opponent has defected in the previous move. We use the notation: d,c] d,c] d,d] r]. Hence the score with a game of 1000 moves is 5 + 5 + 1 + 997 2 = 2005 for HARD, and 0 + 0 + 1 + 997 2 = 1995 for TESTER-4. Note that when a player gives up, every move gives 2 points to each player until the end of the game (this is the result of their \solitary work" when they are isolated). The game HARD versus TIT-FOR-TAT-WITH-THRESHOLD gives d,c] d,d] r]. The strategy HARD obtains 5 + 1 + 998 2 = 2002, and TIT-FOR-TAT-WITH-THRESHOLD obtains 0 + 1 + 998 2 = 1997. The game TESTER-4 versus TIT-FOR-TAT-WITH-THRESHOLD gives c,c] c,c] d,c] d,d] c,d] c,c] c,c] c,c] ... TESTER-4 obtains 3 + 3 + 5 + 1 + 0 + 3 995 = 2997 and TIT-FOR-TAT-WITHTHRESHOLD gets 3 + 3 + 0 + 1 + 5 + 3 995 = 2997.
x x x x x x

2

HARD versus HARD produces 1 + 999 2 = 1999. TIT-FOR-TAT-WITH-THRESHOLD versus TIT-FOR-TAT-WITH-THRESHOLD produces 1000 3 = 3000. TESTER-4 versus TESTER-4 gives 3 + 3 + 1 + 1 + 996 3 = 2996. In this mini-tournament, the nal result is 7994 points for TIT-FOR-TAT-WITH-THRESHOLD the winner, 7988 points for TESTER-4, and 6006 points for HARD. In this example we veri ed again a basic fact of the cooperation theory: HARD wins when confronted with each of the other two strategies, but it is not su cient to win the mini-tournament. In fact HARD is the worst strategy when all the scores are in, because HARD does not succeed in creating cooperation: \to win versus other strategies" is not the same as \to get a good score", because aggressiveness is an obstacle to cooperation. Note that, in the new game, it is easy to obtain 2000 points: all the player has to do is to give up in the rst move. But such a \solitary strategy" cannot take advantage of possible cooperation with other cooperative strategies: extreme caution is not a good idea.
x x x

3 The experiment with Pour La Science
With the cooperation of Pour La Science the French edition of Scienti c American, we organized a tournament 9, 11, 12]. The rules of the tournament where as follows: Each participant could submit only one strategy, either by writing a program for his strategy (instructions were given for writing strategies in C), or by sending us a description of his strategy (in this case the de nition was limited to 100 words). A general round-robin tournament is run (all the strategies are opposed two by two). The exact length of each game was not known before end, but was known to be between 100 and 1000 rounds ( nally, the simulation was made with 1000 rounds). The ranking of the strategies was determined by the overall number of points obtained. The winner won a ve-year subscription to Pour La Science. We received 104 submissions. Some of the strategies proposed (9 of them) where incomprehensible and had to be rejected. No other strategies where added to the 95 remaining strategies. The winner is a kind of TIT-FOR-TAT-WITH-THRESHOLD. We call it FIRST: In the rst move I cooperate. Every 20 moves, I compute the average of my past payo s. If this average is less than 1.5, I give up. Each time the other player defects and if I am not already in a period of retaliation, I start a new period of retaliation. The th period of retaliation is the succession of ( + 1) 2 defections followed by 2 cooperations.
N N N =

FIRST

This strategy (conceived by Christophe Dziengelewski, a student in computer science) has several interesting properties: FIRST is a reactive strategy. FIRST possesses a system of threshold. The retaliations of FIRST when the other player does not cooperate are progressive. After a period of retaliation, FIRST tries to calm the other player with two consecutive unconditional cooperations. 3

The strategy FIRST is not simple. In fact, there is no simple strategy among the rst 20 strategies. The second highest score in the tournament was obtained by the following strategy: I play 5 moves of TIT-FOR-TAT, 5 moves of ALL-C (the strategy which always cooperates), ve moves of SPITEFUL (\I always defect if the other defects once"), ve moves of PERIODIC-C-C-D (\I play periodically cooperation, cooperation, defection, ..."). I compute the averages of the rewards obtained with the last four moves of each strategy. (*) If the best average is less than 1.5, I give up. If not, I play 12 moves of the strategy (among the 4) which has produced the best average. Then I compute the new averages for each of the 4 strategies. I go to (*). Note that this strategy uses again the idea of threshold, but it is based on an other principle. The third strategy in the tournament was: In the rst move, I cooperate and I am calm { i.e., I am in the state called \calm." When I am calm, I play TIT-FOR-TAT and I stay calm, but when the other player defects, I become irritated { I pass in the state \irritated." When I am irritated, if the other player cooperates, I cooperate and I return to the state calm. But when I am irritated, if the other player defects, I become \furious." When I am furious I always defect, excepted if the other player has defected 12 consecutive times, in which case I compute his number of cooperations 1 and his number of defections 2. If 1 2, I give up. If 1 2, I cooperate and I return to the state irritated.
N N N < N N N

SECOND

THIRD

4 Analysis
This variant of the Iterated Prisoner's Dilemma is not obvious, and the good strategies in the Classical Iterated Prisoner's Dilemma (which are still strategies in this variant) are not very good anymore. Renunciation is useful: the best strategy that does not use the possibility of renunciation is number 16. TIT-FOR-TAT is number 50. As in the classical case, aggressiveness (to initiate defection) is always bad. When an ecological simulation is run, all the aggressive strategies are rapidly eliminated, included SECOND (which is aggressive since it plays PERIODIC-C-C-D). In the ecological simulation, FIRST remains rst. The good ideas for building up good strategies are easy to understand. Here are some of them: \Don't be aggressive { especially don't defect on the rst move." \Be reactive (take into account the choices of your opponent)." \Use a threshold for renunciation." \Retaliate (immediately defect after an uncalled for defection from the other)." \Use gradual retaliation." \Be capable of forgiving." \From time to time, try create new conditions for cooperation (like FIRST who cooperates twice after a period of retaliation)." \be a tester (play several moves and analyze the reaction, but be careful when choosing the sequence of tests)." 4

\Try to simulate several strategies and, according to the results obtained, continue to play the best strategy (the main idea of SECOND). The strategies incorporating only one or two of these principles obtained poor results. Many strategies used the idea of a threshold and the ideas incorporated in TIT-FOR-TAT. According to the choice of the parameters in the threshold, they ranked between 7 and 47. The use of the renunciation seems required to obtain a good score: among the rst 40 strategies, only 3 do not use renunciation. With slight modi cations in FIRST (we make it count defections during periods of retaliation), SECOND (we made it avoid aggressivity), and THIRD (we made it slightly lenient), we were able to obtain 3 new strategies FIRST', SECOND', THIRD', which are better than FIRST, SECOND, THIRD when added to the 95 strategies submitted in the Pour La Science tournament. We veri ed that our results are not sensible to the parameter values = 0, = 1, = 2, = 3, = 5, or to the number of moves (1000). Two remarkable facts stand out:
S P N R T

1. In this game, simplicity does not seem to bring the advantages it did in R. Axelrod's results and analyses. 2. There does not appear to be a robust { a kind of optimal { strategy in such a game.1

5 Complex strategies in CIPD and IPDR
Let us recall some of Axelrod's conclusions 1]: The advice takes the form of four simple suggestions for how to do well in a durable iterated Prisoner's Dilemma : 1. 2. 3. 4. Don't be envious. Don't be the rst to defect. Reciprocate both cooperation and defection. Don't be too clever (p. 110).

The very sophisticated rules did not do better than the simple ones (p.120). One way to account for TIT-FOR-TAT's great success in the tournament is that it has great clarity: it is eminently comprehensible to the other player (p.123). Too much complexity can appear total chaos. If you are using a strategy which appears random, then you also appear unresponsive to the other player. If you are unresponsive, then the other player has no incentive to cooperate with you. So being so complex as to be incomprehensible is very dangerous (p.122). In the iterated Prisoner's Dilemma, you bene t from the other player's cooperation. The trick is to encourage that cooperation. A good way to do it is to make it clear that you will reciprocate (p.123). These remarks are not totally compatible with the results of our simulations of the Iterated Prisoner's Dilemma with Renunciation (nor with some other simulations of the Classical Iterated Prisoner's Dilemma 10]). But we think that Axelrod's proposed analyses of simplicity are not general, and fail to take into account some important points.
1

Details on the strategies (including the codes) and on the tournament are given in 12]

5

Even if a strategy is not cooperative (and even if you do not understand it perfectly well), it is not a good reason to stop interacting with it. We experience such situations in everyday life, and we sometimes choose not to follow Axelrod's recommendation; we consider instead that interactions with generally in exible persons, even if risky, are often preferable to con ict or separation. The example of PERIODIC-C-C-D is clear. TIT-FOR-TAT does not play optimally against it. A very slight modi cation in TIT-FOR-TAT { add the following instructions: identify periodic behavior and, after the 5-th period, exploit it { gives a strategy which is strictly better than TIT-FOR-TAT. The idea of exploiting consistent strategies is exactly what men do with domestic animals. Their relations are exactly an Iterated Prisoner's Dilemma with possible Renunciation: men choose to turn to their advantage the rigid (and not always cooperative) behavior of cats, for example, and this is pro table to both (if cats had not been useful to us, there would certainly not be as many of them today).

5.1 To Be Uncompromising With Consistent, Rigid Strategies Is Not Always the Best Choice

5.2 Partially Random or Partially Unintelligible Strategies Require Cleverness
Not to try to understand (at least partially) complex random behavior is sometimes a bad idea. It is again possible to improve TIT-FOR-TAT by enabling it to identify (not with certainty, but with a certain probability) a random strategy with a cooperation parameter greater than 1/3. Even in the classical Iterated Prisoner's Dilemma, some form of cleverness is useful. If you are confronted with a random strategy 2/3 c, 1/3 d], you would be better o always cooperating than always defecting, or giving up the game. In order to identify such kinds of favorable randomness, you must be clever. And, of course, confrontation with more complex strategies requires more subtle analysis. A bad property of TIT-FOR-TAT is that it generates strings of defections when there is noise in communication. In order to avoid this,some modi cations are possible and necessary 25, 26, 5].

The argument that in order to obtain cooperation, you must have an easy to understand { hence simple { behavior is only valid with strategies that are ready to cooperate: you must show that you are ready to cooperate, and the best method is simply to cooperate and to quickly react to defection. But nothing is said about the complexity of the retaliation, and we have experienced the fact that even in the Classical Iterated Prisoner's Dilemma progressive retaliations (as used by the strategy FIRST) give better results than the TIT-FOR-TAT. In fact, in our daily life we know that with some people it is preferable not to show exactly who we are, and a good rule is often \be clear in presence of cooperation, but do not be too clear and predictable in presence of strange adversaries." To give up too rapidly is a bad choice, since it may be that after a period necessary to achieve mutual comprehension, some kind of reasonable, partial cooperation will be possible, and that such partial cooperation is better than war or renunciation. When you are confronted with a strategy that tries to exploit you or shows chaotic behavior (a random strategy, or worse, some \psychotic" strategy), you must try to nd the optimal behavior relative to this strategy, and it can't be expected that such di cult adjustment is easy to de ne. Think a di cult character (an artist perhaps): to stop any relation with him is certainly a bad idea in certain cases; trying to adjust and to elaborate a clever and cautious (but not too cautious) strategy may bene t you (you may become his impresario!). In his rst and second tournament, Axelrod ran many complex, but nonclever, strategies. We suspect that this is why he missed the point that some form of clever TIT-FOR-TAT are possible and give better results. The scarcity of good instances of TIT-FOR-TAT in real life (see, for 6

5.3 To Have a Clear Behavior Is Not Always a Good Strategy

example, 19] p.125) is presumably due to the simple fact that TIT-FOR-TAT is not optimal, which provides an indirect con rmation of our analyses. We believe that the Iterated Prisoner's Dilemma allows for an unlimited perspective of improvements in strategies. Only the rst steps of an in nite progression of more clever and complex strategies have been observed. Were we to run large simulations (with unlimited computation resources), we think that a kind of evolution toward complexity and intelligent strategies would emerge. If future experiments support this conjecture, we will have new arguments for the idea that complexity is not a contingent feature of the living world, but an unavoidable and spontaneous natural fact. R. Axelrod argues that \The discrimination of others may be among the most important abilities because it allows one to handle interaction with many individuals without having to treat them all the same, thus making possible the rewarding of cooperation from one individual and the punishing of defection from another" 2] p.94. This is a rst outcome of complexity in cooperation, and we agree with the importance R. Axelrod assigns it in the theory of cooperation (and hence in the theory of evolution). But we think the development of good strategies more complex than TITFOR-TAT will prove to be a more basic advantage of complexity. Cooperation would then give us a further line of argument for the spontaneous development of complexity in evolutionary process. We do not think that the evolution of strategies toward complexity is deterministic. As in real life, there may be much indeterminism in the outcomes of evolution (on this, see 16]). Perhaps some simulations will produce worlds with descendants of FIRST prevailing everywhere, and others will give worlds with descendants of SECOND occupying the space. Automatic \complexi cation" in evolutionary process (as we think is highly probable) does not imply evolutionary determinism, i.e., the necessity of the actual world of life.

6 Conclusion and Research Program
Complex strategies can do better in the Iterated Prisoner's Dilemma (classical version and variant with renunciation) than simple strategies. This idea follows both from our experimentation and from abstract analysis. New experiments will be needed to give a more satisfactory proof of this result { which is crucial for demonstrating that complexity spontaneously arises in natural process. In this perspective, we are doing new simulations using genetics algorithms. But contrary to 2] (see also 7]), we cannot limit our strategies to last three moves: complexity will appear only if we give it enough room. Acknowledgments: We would like to acknowledge helpful comments and bibliographical information from Ejan Mackaay and Pierre Lemieux.

7 Bibliography References
1] Axelrod R. The Evolution of Cooperation. Basic Books, New-York, 1984. (Traduction francaise Donnant donnant: Theorie du comportement cooperatif. Editions Odile Jacob. Paris, 1992.) 2] Axelrod R. The Evolution of Strategies in the Iterated Prisoner's Dilemma. in "Genetic Algorithms and Simulated Annealing. L. Davis Ed." Pitman, London 1987. p. 32-41. 3] Axelrod R., D. Dion. The Further Evolution of Cooperation. Sciences V. 242. 9 December 1988. pp. 1385-1390. 7

4] Axelrod R., W. D. Hamilton. The evolution of cooperation. Science, V. 211, 27. March 1981. pp.1390-1396. 5] Bendor J. In Good Times and Bad: Reciprocity in an Uncertain World. Am. J. Polit. Sci. 31. 1987. pp. 531-558. 6] Boyd R., J. P. Lorberbaum. No pure strategy is evolutionarily stable in the repeated Prisoner's Dilemma game. Nature V. 327. 7 may 1987. pp. 58-59. 7] Danielson P. A. Evolving Arti cial Moralities: Genetic Strategies, Spontaneous Orders, and Moral Catastrophe. "Chaos and Society" at L'Universite de Hull a Quebec, june 1-2 1994. (P.Lemieux ed., to appear). 8] Dawkins R. The Sel sh Gene. Oxford University Press 1976. Second Edition, Richard Dawkins 1989. Traduction francaise Le Gene Egoiste, Editions Colin, Paris. 1990. 9] Delahaye J.-P. L'altruisme recompense ? Pour La Science (French edition of the Scienti c American) novembre 1992, pp.150-156. 10] Delahaye J.-P., P. Mathieu. Experiences sur le dilemme itere des prisonniers. Rapport de Recherche du Laboratoire d'Informatique Fondamentale de Lille (URA CNRS 369), no233, juin 1992. 11] Delahaye J.-P., P. Mathieu (a). L'altruisme perfectionne. Pour La Science (French edition of the Scienti c American) mai 1993, pp.102-107 12] Delahaye J.-P., P. Mathieu (b). L'altruisme perfectionne : details sur le concours. Rapport de Recherche du Laboratoire d'Informatique Fondamentale de Lille (URA CNRS 369), no249, mai 1993. 13] Fader P. S., J. Hauser. Implicit Coalitions in the Generalized Prisoner's Dilemma. Journal of Con ict Resolution 32,3. 1988. pp. 533-582. 14] Feldman M. W., E. A. C. Thomas. Behavior-dependant Context for Repeated Plays of the Prisoner's Dilemma II: Dynamical Aspects of the Evolution of Cooperation. J. Theor. Biol. 1987. pp. 297-315. 15] Godfray H. C. J. The evolution of forgiveness. Nature V. 355. 16 january 1992. pp. 206-207. 16] S.G. Gould, Wonderful Life. W.W. Norton, 1989. 17] Hofstadter D. R. Metamagical Themas: Questing for the Essence of Mind and Pattern. Basic Book 1985, Bantam Books, New York. 1986 (Traduction francaise: Ma Themagie. InterEditions, Paris. 1988.) 18] Hofstadter D. R. The prisoner's Dilemma Computer Tournaments and the evolution of Cooperation. Scienti c American. No 248, May 1983. pp. 16-26. 19] Jaisson P. La fourmi et le sociobiologiste. Editions Odile Jacob, Paris, 1993. 20] Joshi N. V. Evolution of cooperation by reciprocation within structured demes. 21] J Genet. V. 66-1. 1987. pp. 69-84. 22] Lemieux P. Chaos et Anarchie. "Chaos and Society" at L'Universite de Hull a Quebec, june 1-2 1994. (P.Lemieux ed., to appear). 23] Mackaay E. L'ordre spontane comme fondement du droit - un survol de l'emergence des regles dans la societe civile. Revue Internationale de Droit Economique, 3, 1989, pp. 247-287. 24] May R. M. More evolution of cooperation. Nature V. 327. May 1987. pp. 15-117. 8

25] Molander P. The Optimal Level of Generosity in a Sel sh, Uncertain Environment. Journal of Con ict Resolution. Vol. 29-4. December 1985. pp. 611-618. 26] Mueller U. Optimal Retaliation for Optimal Cooperation. Journal of Con ict Resolution. 31, 4. December 1987. pp. 692-724. 27] Nowak M. Stochastic Strategies in the Prisoner's Dilemma. Theoretical Population Biology. 38 1990. pp. 93-112. 28] Nowak M., K. Sigmund. Oscillations in the Evolution of Reciprocity. J. Theoretical Biology. 137. 1989. pp. 21-26. 29] Nowak M., K. Sigmund. Tit for tat in heterogeneous populations. Nature, V. 355 16 January 1992. pp. 250-253. 30] Poundstone W. Prisoner's dilemma. Oxford University Press, 1993 31] Rapoport A., A. M. Chammah. Prisoner's Dilemma : A Study in Con ict and Cooperation. The University of Michigan Press, Ann Arbor. 1965. 32] Vanberg V. J., R. D. Congleton. Rationality, Morality, and Exit. American Political Science Review. Vol. 86, no2, june 1992. pp. 418-431.

9