00 upvotes00 downvotes

49 views9 pagesJul 16, 2007

© Attribution Non-Commercial (BY-NC)

PS, PDF, TXT or read online from Scribd

Attribution Non-Commercial (BY-NC)

49 views

00 upvotes00 downvotes

Attribution Non-Commercial (BY-NC)

You are on page 1of 9

Laboratoire d'Informatique Fondamentale de Lille

U.A. 369 du C.N.R.S., University of Lille I,

59655 Villeneuve d'Ascq Cedex. FRANCE.

e-mail: fdelahaye,mathieug@li .fr

Abstract

A modied version of the Iterated Prisoner's Dilemma is proposed and examined. At

each move of the game a denitive renunciation to further interactions with the other player

is allowed. The results of a simulation are presented. This simulation uses the 95 proposed

strategies in a tournament organized by the French edition of Scientic American. The conclu-

sions are analogous to those of R. Axelrod excepted on the importance of simplicity. Several

arguments are given in favor of the view that complexity is necessary to obtain good strategies

in the Iterated Prisoner's Dilemma (classical or modied version), and that there is no limit

in the expected strength of strategies: new strategies, more and more complex and ecient,

will appear if suciently rich environments are constructed and simulated. Finally we argue

that in such a game, the whole perspective of evolution of intelligence is probable. Thus we

have a new argument in favor of a law of \complexication" in the universe.

In the Classical Prisoner's Dilemma [31, 18, 17, 8, 30] each player has two choices: cooperate (c)

or defect (d). The reward for mutual cooperation [c,c] is, say, R=3. The sucker's payo and the

temptation to defect [c,d] are S=0 and T=5. The punishment for mutual defection [d,d] is P=1.

If the game is only played once, then each player gets a higher payo from defecting than from

cooperating, regardless of what the other player does.

When the game is played repeatedly we obtain the Classical Iterated Prisoner's Dilemma (CIPD)

[4, 1, 2, 8, 18, 17, 9, 30]. The exact rules of the game are:

1. The interactions are between pairs of players (i.e., strategies).

2. The number of moves is xed but unknown to the two players (in our experiment, we chose

1000 moves).

3. Each player has two possible choices on each move: cooperate or defect. Choices are made

simultaneously.

4. The payos R, S, T and P have been determined before the game and announced to the

players.

To see which strategies would be eective in exploiting the opportunities for cooperation, two

round-robin computer tournaments were organized by R. Axelrod [4, 1]. The winner of the two

tournaments was the TIT-FOR-TAT, a strategy that uses cooperation on the rst move of the

game and then plays whatever the other player chose on the previous move.

This performance of the TIT-FOR-TAT strategy did not prove that TIT-FOR-TAT would perform

well as an evolutionary strategy. Hence an ecological simulation was conducted. The population

dynamics of the ecological simulation was determined by setting the change in frequency of each

1

strategy in any given round to be proportional to its relative success in the previous round.

The result obtained by R. Axelrod with the ecological simulation is that TIT-FOR-TAT quickly

becomes the most common strategy. Other experiments and studies about this game, or some of

its variants have been done (see [2, 5, 3, 6, 13, 14, 15, 20, 24, 25, 26, 28, 29]). This game has been

put to use in various ways ([22, 23, 7]).

Now we are going to explore a new version of the Iterated Prisoner's Dilemma in which each

strategy can give up the game. This choice of strategy is irreversible and each player then obtains

a payo of = 2 for every remaining moves until the game is over (the version of the iterated

N

prisoner's dilemma with reversible renunciation has been considered by [32]). This value for N

is chosen to be greater than = 0 or = 1, on the basis on the fact that when you go your

S P

own way, you get a better result than when you are in a con
ictual situation, or when somebody

exploits you. The value for N is chosen to be less than = 5 or = 3 since, when you cooperate

T R

or when you exploit someone, you obtain a better result than when you are isolated.

This version of the iterated prisoner's dilemma is a more realistic version of the real world than

the original one (in our every day life, we are able to stop interacting with a \player" that seems

too weird or too aggressive). Yet, This game is almost as simple as the original one. So, it is

interesting to examine if it conrms Axelrod's results.

The exact values = 0 = 1 = 2 = 3 = 5 are not important, provided it remains to

S ;P ;N ;R ;T

that

( + )2

T S = < R

The rst relation is assumed in order to avoid that a game [c,d] [d,c] [c,d] [d,c] [c,d] [d,c] etc.,

provides each player with a better reward than [c,c] [c,c] [c,c] etc.

Here are 3 examples of strategies:

HARD. I defect if the other player cooperates. If he defects once, I give up.

TESTER-4. In the 4 rst moves, I play cooperate, cooperate, defect defect. Then, if the

other player in his rst 4 moves has defected 3 or 4 times, I give up; otherwise I cooperate

until the game is over.

TIT-FOR-TAT-WITH-THRESHOLD. I play TIT-FOR-TAT, but each ve moves, I

compute my average payo. If I got less than 2 as an average, then I give up.

Let us consider games of 1000 moves.

The confrontation of HARD versus TESTER-4 gives the following results. In the rst move HARD

defects and TESTER-4 cooperates [d,c]; in the second move, HARD defects, and TESTER-4

cooperates [d,c]; in the third move, HARD defects and TESTER-4 defects [d,d]; in the fourth

move, HARD gives up since his opponent has defected in the previous move. We use the notation:

[d,c] [d,c] [d,d] [r].

Hence the score with a game of 1000 moves is 5 + 5 + 1 + 997 2 = 2005 for HARD, and 0 + 0 +

x

1 + 997 2 = 1995 for TESTER-4. Note that when a player gives up, every move gives 2 points

x

to each player until the end of the game (this is the result of their \solitary work" when they are

isolated).

The game HARD versus TIT-FOR-TAT-WITH-THRESHOLD gives [d,c] [d,d] [r]. The strategy

HARD obtains 5 + 1 + 998 2 = 2002, and TIT-FOR-TAT-WITH-THRESHOLD obtains 0 + 1 +

x

998 2 = 1997.

x

The game TESTER-4 versus TIT-FOR-TAT-WITH-THRESHOLD gives [c,c] [c,c] [d,c] [d,d] [c,d]

[c,c] [c,c] [c,c] ... TESTER-4 obtains 3 + 3 + 5 + 1 + 0 + 3 995 = 2997 and TIT-FOR-TAT-WITH-

x

x

2

HARD versus HARD produces 1 + 999 2 = 1999.

x

3000.

TESTER-4 versus TESTER-4 gives 3 + 3 + 1 + 1 + 996 3 = 2996.

x

the winner, 7988 points for TESTER-4, and 6006 points for HARD.

In this example we veried again a basic fact of the cooperation theory: HARD wins when

confronted with each of the other two strategies, but it is not sucient to win the mini-tournament.

In fact HARD is the worst strategy when all the scores are in, because HARD does not succeed

in creating cooperation: \to win versus other strategies" is not the same as \to get a good score",

because aggressiveness is an obstacle to cooperation.

Note that, in the new game, it is easy to obtain 2000 points: all the player has to do is to give up

in the rst move. But such a \solitary strategy" cannot take advantage of possible cooperation

with other cooperative strategies: extreme caution is not a good idea.

With the cooperation of Pour La Science the French edition of Scientic American, we organized

a tournament [9, 11, 12]. The rules of the tournament where as follows:

Each participant could submit only one strategy, either by writing a program for his strategy

(instructions were given for writing strategies in C), or by sending us a description of his

strategy (in this case the denition was limited to 100 words).

A general round-robin tournament is run (all the strategies are opposed two by two). The

exact length of each game was not known before end, but was known to be between 100 and

1000 rounds (nally, the simulation was made with 1000 rounds).

The ranking of the strategies was determined by the overall number of points obtained.

The winner won a ve-year subscription to Pour La Science. We received 104 submissions.

Some of the strategies proposed (9 of them) where incomprehensible and had to be rejected.

No other strategies where added to the 95 remaining strategies.

FIRST

In the rst move I cooperate.

Every 20 moves, I compute the average of my past payos. If this average is less than 1.5, I give

up.

Each time the other player defects and if I am not already in a period of retaliation, I start a new

period of retaliation. The th period of retaliation is the succession of ( + 1) 2 defections

N N N =

followed by 2 cooperations.

This strategy (conceived by Christophe Dziengelewski, a student in computer science) has several

interesting properties:

FIRST is a reactive strategy.

FIRST possesses a system of threshold.

The retaliations of FIRST when the other player does not cooperate are progressive.

After a period of retaliation, FIRST tries to calm the other player with two consecutive

unconditional cooperations.

3

The strategy FIRST is not simple. In fact, there is no simple strategy among the rst 20 strategies.

The second highest score in the tournament was obtained by the following strategy:

SECOND

I play 5 moves of TIT-FOR-TAT, 5 moves of ALL-C (the strategy which always cooperates), ve

moves of SPITEFUL (\I always defect if the other defects once"), ve moves of PERIODIC-C-C-D

(\I play periodically cooperation, cooperation, defection, ...").

I compute the averages of the rewards obtained with the last four moves of each strategy. (*) If

the best average is less than 1.5, I give up. If not, I play 12 moves of the strategy (among the 4)

which has produced the best average.

Then I compute the new averages for each of the 4 strategies. I go to (*).

Note that this strategy uses again the idea of threshold, but it is based on an other principle.

The third strategy in the tournament was:

THIRD

In the rst move, I cooperate and I am calm { i.e., I am in the state called \calm."

When I am calm, I play TIT-FOR-TAT and I stay calm, but when the other player defects, I

become irritated { I pass in the state \irritated."

When I am irritated, if the other player cooperates, I cooperate and I return to the state calm.

But when I am irritated, if the other player defects, I become \furious."

When I am furious I always defect, excepted if the other player has defected 12 consecutive times,

in which case I compute his number of cooperations 1 and his number of defections 2. If

N N

< N N N

4 Analysis

This variant of the Iterated Prisoner's Dilemma is not obvious, and the good strategies in the

Classical Iterated Prisoner's Dilemma (which are still strategies in this variant) are not very good

anymore. Renunciation is useful: the best strategy that does not use the possibility of renunciation

is number 16. TIT-FOR-TAT is number 50.

As in the classical case, aggressiveness (to initiate defection) is always bad. When an ecological

simulation is run, all the aggressive strategies are rapidly eliminated, included SECOND (which

is aggressive since it plays PERIODIC-C-C-D). In the ecological simulation, FIRST remains rst.

The good ideas for building up good strategies are easy to understand. Here are some of them:

\Don't be aggressive { especially don't defect on the rst move."

\Be reactive (take into account the choices of your opponent)."

\Use a threshold for renunciation."

\Retaliate (immediately defect after an uncalled for defection from the other)."

\Use gradual retaliation."

\Be capable of forgiving."

\From time to time, try create new conditions for cooperation (like FIRST who cooperates

twice after a period of retaliation)."

\be a tester (play several moves and analyze the reaction, but be careful when choosing the

sequence of tests)."

4

\Try to simulate several strategies and, according to the results obtained, continue to play

the best strategy (the main idea of SECOND).

The strategies incorporating only one or two of these principles obtained poor results. Many

strategies used the idea of a threshold and the ideas incorporated in TIT-FOR-TAT. According

to the choice of the parameters in the threshold, they ranked between 7 and 47.

The use of the renunciation seems required to obtain a good score: among the rst 40 strategies,

only 3 do not use renunciation.

With slight modications in FIRST (we make it count defections during periods of retaliation),

SECOND (we made it avoid aggressivity), and THIRD (we made it slightly lenient), we were able

to obtain 3 new strategies FIRST', SECOND', THIRD', which are better than FIRST, SECOND,

THIRD when added to the 95 strategies submitted in the Pour La Science tournament.

We veried that our results are not sensible to the parameter values = 0, = 1, = 2, = 3,

S P N R

Two remarkable facts stand out:

1. In this game, simplicity does not seem to bring the advantages it did in R. Axelrod's results

and analyses.

2. There does not appear to be a robust { a kind of optimal { strategy in such a game.1

Let us recall some of Axelrod's conclusions [1]:

The advice takes the form of four simple suggestions for how to do well in a durable iterated

Prisoner's Dilemma :

1. Don't be envious.

2. Don't be the rst to defect.

3. Reciprocate both cooperation and defection.

4. Don't be too clever (p. 110).

The very sophisticated rules did not do better than the simple ones (p.120).

One way to account for TIT-FOR-TAT's great success in the tournament is that it has great

clarity: it is eminently comprehensible to the other player (p.123).

Too much complexity can appear total chaos. If you are using a strategy which appears random,

then you also appear unresponsive to the other player. If you are unresponsive, then the other

player has no incentive to cooperate with you. So being so complex as to be incomprehensible

is very dangerous (p.122).

In the iterated Prisoner's Dilemma, you benet from the other player's cooperation. The trick

is to encourage that cooperation. A good way to do it is to make it clear that you will

reciprocate (p.123).

These remarks are not totally compatible with the results of our simulations of the Iterated

Prisoner's Dilemma with Renunciation (nor with some other simulations of the Classical Iterated

Prisoner's Dilemma [10]). But we think that Axelrod's proposed analyses of simplicity are not

general, and fail to take into account some important points.

1 Details on the strategies (including the codes) and on the tournament are given in [12]

5

5.1 To Be Uncompromising With Consistent, Rigid Strategies Is Not

Always the Best Choice

Even if a strategy is not cooperative (and even if you do not understand it perfectly well), it is not

a good reason to stop interacting with it. We experience such situations in everyday life, and we

sometimes choose not to follow Axelrod's recommendation; we consider instead that interactions

with generally in
exible persons, even if risky, are often preferable to con
ict or separation.

The example of PERIODIC-C-C-D is clear. TIT-FOR-TAT does not play optimally against it.

A very slight modication in TIT-FOR-TAT { add the following instructions: identify periodic

behavior and, after the 5-th period, exploit it { gives a strategy which is strictly better than

TIT-FOR-TAT.

The idea of exploiting consistent strategies is exactly what men do with domestic animals. Their

relations are exactly an Iterated Prisoner's Dilemma with possible Renunciation: men choose to

turn to their advantage the rigid (and not always cooperative) behavior of cats, for example, and

this is protable to both (if cats had not been useful to us, there would certainly not be as many

of them today).

5.2 Partially Random or Partially Unintelligible Strategies Require Clev-

erness

Not to try to understand (at least partially) complex random behavior is sometimes a bad idea.

It is again possible to improve TIT-FOR-TAT by enabling it to identify (not with certainty, but

with a certain probability) a random strategy with a cooperation parameter greater than 1/3.

Even in the classical Iterated Prisoner's Dilemma, some form of cleverness is useful. If you are

confronted with a random strategy [2/3 c, 1/3 d], you would be better o always cooperating than

always defecting, or giving up the game. In order to identify such kinds of favorable randomness,

you must be clever. And, of course, confrontation with more complex strategies requires more

subtle analysis.

A bad property of TIT-FOR-TAT is that it generates strings of defections when there is noise in

communication. In order to avoid this,some modications are possible and necessary [25, 26, 5].

5.3 To Have a Clear Behavior Is Not Always a Good Strategy

The argument that in order to obtain cooperation, you must have an easy to understand { hence

simple { behavior is only valid with strategies that are ready to cooperate: you must show that

you are ready to cooperate, and the best method is simply to cooperate and to quickly react to

defection. But nothing is said about the complexity of the retaliation, and we have experienced

the fact that even in the Classical Iterated Prisoner's Dilemma progressive retaliations (as used

by the strategy FIRST) give better results than the TIT-FOR-TAT.

In fact, in our daily life we know that with some people it is preferable not to show exactly who

we are, and a good rule is often \be clear in presence of cooperation, but do not be too clear and

predictable in presence of strange adversaries." To give up too rapidly is a bad choice, since it may

be that after a period necessary to achieve mutual comprehension, some kind of reasonable, partial

cooperation will be possible, and that such partial cooperation is better than war or renunciation.

When you are confronted with a strategy that tries to exploit you or shows chaotic behavior (a

random strategy, or worse, some \psychotic" strategy), you must try to nd the optimal behavior

relative to this strategy, and it can't be expected that such dicult adjustment is easy to dene.

Think a dicult character (an artist perhaps): to stop any relation with him is certainly a bad

idea in certain cases; trying to adjust and to elaborate a clever and cautious (but not too cautious)

strategy may benet you (you may become his impresario!).

In his rst and second tournament, Axelrod ran many complex, but nonclever, strategies. We

suspect that this is why he missed the point that some form of clever TIT-FOR-TAT are possible

and give better results. The scarcity of good instances of TIT-FOR-TAT in real life (see, for

6

example, [19] p.125) is presumably due to the simple fact that TIT-FOR-TAT is not optimal,

which provides an indirect conrmation of our analyses.

We believe that the Iterated Prisoner's Dilemma allows for an unlimited perspective of improve-

ments in strategies. Only the rst steps of an innite progression of more clever and complex

strategies have been observed. Were we to run large simulations (with unlimited computation

resources), we think that a kind of evolution toward complexity and intelligent strategies would

emerge. If future experiments support this conjecture, we will have new arguments for the idea

that complexity is not a contingent feature of the living world, but an unavoidable and spontaneous

natural fact.

R. Axelrod argues that \The discrimination of others may be among the most important abilities

because it allows one to handle interaction with many individuals without having to treat them all

the same, thus making possible the rewarding of cooperation from one individual and the punishing

of defection from another" [2] p.94. This is a rst outcome of complexity in cooperation, and we

agree with the importance R. Axelrod assigns it in the theory of cooperation (and hence in the

theory of evolution). But we think the development of good strategies more complex than TIT-

FOR-TAT will prove to be a more basic advantage of complexity. Cooperation would then give us

a further line of argument for the spontaneous development of complexity in evolutionary process.

We do not think that the evolution of strategies toward complexity is deterministic. As in real life,

there may be much indeterminism in the outcomes of evolution (on this, see [16]). Perhaps some

simulations will produce worlds with descendants of FIRST prevailing everywhere, and others will

give worlds with descendants of SECOND occupying the space. Automatic \complexication" in

evolutionary process (as we think is highly probable) does not imply evolutionary determinism,

i.e., the necessity of the actual world of life.

Complex strategies can do better in the Iterated Prisoner's Dilemma (classical version and variant

with renunciation) than simple strategies. This idea follows both from our experimentation and

from abstract analysis. New experiments will be needed to give a more satisfactory proof of this

result { which is crucial for demonstrating that complexity spontaneously arises in natural process.

In this perspective, we are doing new simulations using genetics algorithms. But contrary to [2]

(see also [7]), we cannot limit our strategies to last three moves: complexity will appear only if we

give it enough room.

Acknowledgments: We would like to acknowledge helpful comments and bibliographical informa-

tion from Ejan Mackaay and Pierre Lemieux.

7 Bibliography

References

[1] Axelrod R. The Evolution of Cooperation. Basic Books, New-York, 1984. (Traduction fran-

caise Donnant donnant: Theorie du comportement cooperatif. Editions Odile Jacob. Paris,

1992.)

[2] Axelrod R. The Evolution of Strategies in the Iterated Prisoner's Dilemma. in "Genetic

Algorithms and Simulated Annealing. L. Davis Ed." Pitman, London 1987. p. 32-41.

[3] Axelrod R., D. Dion. The Further Evolution of Cooperation. Sciences V. 242. 9 December

1988. pp. 1385-1390.

7

[4] Axelrod R., W. D. Hamilton. The evolution of cooperation. Science, V. 211, 27. March 1981.

pp.1390-1396.

[5] Bendor J. In Good Times and Bad: Reciprocity in an Uncertain World. Am. J. Polit. Sci. 31.

1987. pp. 531-558.

[6] Boyd R., J. P. Lorberbaum. No pure strategy is evolutionarily stable in the repeated Prisoner's

Dilemma game. Nature V. 327. 7 may 1987. pp. 58-59.

[7] Danielson P. A. Evolving Articial Moralities: Genetic Strategies, Spontaneous Orders, and

Moral Catastrophe. "Chaos and Society" at L'Universite de Hull a Quebec, june 1-2 1994.

(P.Lemieux ed., to appear).

[8] Dawkins R. The Selsh Gene. Oxford University Press 1976. Second Edition, Richard Dawkins

1989. Traduction francaise Le Gene Egoiste, Editions Colin, Paris. 1990.

[9] Delahaye J.-P. L'altruisme recompense ? Pour La Science (French edition of the Scientic

American) novembre 1992, pp.150-156.

[10] Delahaye J.-P., P. Mathieu. Experiences sur le dilemme itere des prisonniers. Rapport de

Recherche du Laboratoire d'Informatique Fondamentale de Lille (URA CNRS 369), no233,

juin 1992.

[11] Delahaye J.-P., P. Mathieu (a). L'altruisme perfectionne. Pour La Science (French edition of

the Scientic American) mai 1993, pp.102-107

[12] Delahaye J.-P., P. Mathieu (b). L'altruisme perfectionne : details sur le concours. Rapport de

Recherche du Laboratoire d'Informatique Fondamentale de Lille (URA CNRS 369), no249,

mai 1993.

[13] Fader P. S., J. Hauser. Implicit Coalitions in the Generalized Prisoner's Dilemma. Journal of

Con
ict Resolution 32,3. 1988. pp. 533-582.

[14] Feldman M. W., E. A. C. Thomas. Behavior-dependant Context for Repeated Plays of the

Prisoner's Dilemma II: Dynamical Aspects of the Evolution of Cooperation. J. Theor. Biol.

1987. pp. 297-315.

[15] Godfray H. C. J. The evolution of forgiveness. Nature V. 355. 16 january 1992. pp. 206-207.

[16] S.G. Gould, Wonderful Life. W.W. Norton, 1989.

[17] Hofstadter D. R. Metamagical Themas: Questing for the Essence of Mind and Pattern.

Basic Book 1985, Bantam Books, New York. 1986 (Traduction francaise: Ma Themagie.

InterEditions, Paris. 1988.)

[18] Hofstadter D. R. The prisoner's Dilemma Computer Tournaments and the evolution of Co-

operation. Scientic American. No 248, May 1983. pp. 16-26.

[19] Jaisson P. La fourmi et le sociobiologiste. Editions Odile Jacob, Paris, 1993.

[20] Joshi N. V. Evolution of cooperation by reciprocation within structured demes.

[21] J Genet. V. 66-1. 1987. pp. 69-84.

[22] Lemieux P. Chaos et Anarchie. "Chaos and Society" at L'Universite de Hull a Quebec, june

1-2 1994. (P.Lemieux ed., to appear).

[23] Mackaay E. L'ordre spontane comme fondement du droit - un survol de l'emergence des regles

dans la societe civile. Revue Internationale de Droit Economique, 3, 1989, pp. 247-287.

[24] May R. M. More evolution of cooperation. Nature V. 327. May 1987. pp. 15-117.

8

[25] Molander P. The Optimal Level of Generosity in a Selsh, Uncertain Environment. Journal

of Con
ict Resolution. Vol. 29-4. December 1985. pp. 611-618.

[26] Mueller U. Optimal Retaliation for Optimal Cooperation. Journal of Con
ict Resolution. 31,

4. December 1987. pp. 692-724.

[27] Nowak M. Stochastic Strategies in the Prisoner's Dilemma. Theoretical Population Biology.

38 1990. pp. 93-112.

[28] Nowak M., K. Sigmund. Oscillations in the Evolution of Reciprocity. J. Theoretical Biology.

137. 1989. pp. 21-26.

[29] Nowak M., K. Sigmund. Tit for tat in heterogeneous populations. Nature, V. 355 16 January

1992. pp. 250-253.

[30] Poundstone W. Prisoner's dilemma. Oxford University Press, 1993

[31] Rapoport A., A. M. Chammah. Prisoner's Dilemma : A Study in Con
ict and Cooperation.

The University of Michigan Press, Ann Arbor. 1965.

[32] Vanberg V. J., R. D. Congleton. Rationality, Morality, and Exit. American Political Science

Review. Vol. 86, no2, june 1992. pp. 418-431.

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.