Professional Documents
Culture Documents
and Defection
gzz786
Table of Contents:
Introduction……………………………………………………………...2
Axelrod’s Tournament…………………………………………………..4
Creating a Tournament………………………………………….……….9
Repeated Tournaments………………………………………….……...14
Evaluation of Research………………………………………….……..19
Works Cited………………………………………….………………...21
gzz786 1
Research Question: “How can designing an iterative tournament of the Prisoner’s Dilemma to
determine the most effective strategy teach us about the value of cooperation and defection?”
Introduction:
The research question of this Mathematics Extended Essay is “How can designing an
iterative tournament of the Prisoner’s Dilemma to determine the most effective strategy
teach us about the value of cooperation and defection?” on the topic of the application of
Game Theory. I have decided to use the context of the applicability of mathematics to solve both
real and abstract problems, as this would enlighten me in the idea of decision making in a
mathematical sense, whether it would be beneficial to aim for cooperating to a mutual benefit or
The problem I wanted to investigate came from the Axelrod’s tournament around 1980,
wherein Robert Axelrod, a professor of Political Science invited strategists all around the world
Game Theory that teaches about cooperation and defection (Duignan). There are multiple
versions of the game but the original scenario is as follows: Two criminals were caught and are
separately interrogated and no communication between criminals was allowed. They were both
given a choice: to confess or to remain silent, with the outcomes based on both of their choices.
Should both remain silent, their sentences would be reduced. If both confessed, then both will be
given a long sentence. If one confesses and the other remained silent, the confessor goes free and
the one who remained silent receives a much longer sentence (Kuhn). Here is the summary of the
payoffs:
gzz786 2
Research Question: “How can designing an iterative tournament of the Prisoner’s Dilemma to
determine the most effective strategy teach us about the value of cooperation and defection?”
Fig. 1 - this table is an example of the outcomes between each player’s choices, with the goal to have the least
amount of points , corresponding to the amount of years spent in prison as possible. ("Prisoner's Dilemma")
Looking at the table above, both player’s choices affect the overall outcome, and that one
player’s choice will not lead to a definite conclusion. In analyzing the payoffs in fig. 1, a few
● In any situation, a player can be better off and earn less points by confessing
● The most amount of points generally (worst overall outcome) is attained when both
players confess.
● The least amount of points in total (best overall outcome) is reached when both players
remain silent
● The best individual outcome is when you would confess and your opponent decides to
remain silent
● The worst individual outcome is presented when you chose to remain silent but your
opponent confesses
gzz786 3
Research Question: “How can designing an iterative tournament of the Prisoner’s Dilemma to
determine the most effective strategy teach us about the value of cooperation and defection?”
Axelrod’s Tournament
The Axelrod Tournament is where the Iterated Prisoner’s Dilemma (IPD) was introduced,
with the main difference being that the game is repeatedly played by the same players. Robert
Axelrod wanted Game Theory experts to submit a line of computer code, called agents, that
when played in a round-robin tournament against one another 200 times each, would lead to the
maximum amount of payoffs in his Iterated version(Axelrod project developers). The main
difference between this and the original version, mainly the large number of rounds, is that these
“agents” can learn from each other to gain “experience” and may adapt in future rounds. Another
difference is the terminology used, changing from a prisoner’s choice to remain silent or confess
to a more general version of cooperate and defect, which is more applicable to real life than the
original:
Fig. 2 -Axelrod’s tournament was set up so that mutual cooperation would yield 3 points, mutual defection 1
point, exploiting a cooperator would yield 5 points and being exploited would receive 0. The goal of this game
is to have the most amount of points by the end of the tournament (Moore).
For the sake of convenience and ease in understanding, I would refer to the codes in the
Iterated Prisoner's Dilemma as “players”. To shorten the terminology, I will occasionally refer
gzz786 4
Research Question: “How can designing an iterative tournament of the Prisoner’s Dilemma to
determine the most effective strategy teach us about the value of cooperation and defection?”
For me to further investigate this game, the knowledge of game theory comes into play.
can lead to outcomes with respect to the utility or preferred outcome of those players. An
important concept in Game theory is the idea of rationality and the assumption of maximization,
in which each player makes their choice which leads to their own maximum payoffs in the game.
example of a two player non-zero sum game, in which the gains of one player are not equally
offset by the losses of the other, in other words both players can ,but not always simultaneously
win or lose. A major difference between a zero and a non-zero sum game was that non-zero sum
games are not strictly competitive and have different degrees of cooperation between players. In
gzz786 5
Research Question: “How can designing an iterative tournament of the Prisoner’s Dilemma to
determine the most effective strategy teach us about the value of cooperation and defection?”
a zero sum game there would be a definite winner and a definite loser which can, but not always
Going back to the Prisoner’s Dilemma, cooperation is seen in the entire game even if
communication is not allowed, as the outcome is decided based on the combined choices
between players. The highest degree of cooperation is made when both players decide to remain
silent as this would give the most amount of points combined between both players.
● The other player cooperates, leading to a gain in both players. This is considered as the
● The other player defects, leading to a gain in only one player. This is called a locally
optimal outcome for the other player since this maximizes personal utility
Since Game Theory and the Axelrod Tournament is based upon the idea of maximization,
players are likely to defect when they are certain that their opponent would cooperate.
It would seem that cooperating would be a terrible decision, and defecting will always
lead to a better local scenario for the player, but that is where a concept in game theory comes in:
player’s actions. This is based on the idea that players would only choose the option that would
benefit them the most, thus maximizing their gain. In the Prisoner’s Dilemma, a rational choice
would be when one player defects regardless of their opponent’s choice, making the choice of
cooperation irrational. However, when both players use this idea of rationality to maximize their
gzz786 6
Research Question: “How can designing an iterative tournament of the Prisoner’s Dilemma to
determine the most effective strategy teach us about the value of cooperation and defection?”
gain, would lead to a paradoxical situation in which the rational play leads to a poorer outcome
than irrational play ("Rationality and Game Theory"). This is further explained in another
concept.
have the existence of a Nash equilibrium, a situation in which no player can benefit by changing
In a game such as the Iterated Prisoner's Dilemma, a Nash equilibrium is achieved when
both players repeatedly decide to defect. This is the case as it would be unwise for a player to
change his/her decision to cooperate and be exploited. Going back to the research question, this
outcome, mutual defection, would be considered as the lowest overall value in cooperation and
defection, where both players would try to “stab each other in the back”. This breaks the idea of
rationality, in which each player, hoping to gain from the other by thinking rationally/defecting,
instead ending up both worse off than had they both be “irrational” and cooperate.
This presence of a Nash equilibrium prevents players from defecting all the time. This is
shown by the iterated prisoner’s dilemma, as consistently defecting would lead to a very low
outcome when compared to constant cooperation (Darity). Although, an important thing that I
noticed is that that the socially optimal solution, when both players cooperate, is not considered
the Nash equilibrium, as one player would always benefit more from exploiting the other player
and defecting.
gzz786 7
Research Question: “How can designing an iterative tournament of the Prisoner’s Dilemma to
determine the most effective strategy teach us about the value of cooperation and defection?”
2. Constant Defection
a. Gains points no matter what, but receives less through mutual defection
3. “Tit-For-Tat”
a. Starts by cooperating, then copies the last action taken by the opposing player
4. “Grudger” (Spiteful)
defects
Constant cooperation and constant defection are easy to understand, as they do not
change their choice under any circumstance. “Constant Cooperation” risks punishment for
mutual cooperation, the highest overall outcome. “Constant Defection” on the other hand, is
more of the safer strategy as it would be impossible for a player to gain nothing each round, and
would either gain one or five. On the other hand, the other strategies might seem a bit confusing,
so this is what they would do when against a test strategy I made to explain what they would do
Round # 1 2 3 4 5 6 7
Random D D C C D C D
“Tit-For- C D D C C D C
Tat”
Round # 1 2 3 4 5 6 7
Random C D C C C D C
“Grudger” C C D D D D D
Fig. 3 – Choices made by “Tit-For-Tat” and “Grudger” against a Random Strategy
gzz786 8
Research Question: “How can designing an iterative tournament of the Prisoner’s Dilemma to
determine the most effective strategy teach us about the value of cooperation and defection?”
As shown by the first table, “Tit-For-Tat” would always cooperate in the first round, and
replicates the last action the opponent uses, labeled in blue and orange pair of diagonal boxes.
“Grudger” only needs one defect from the opponent before permanently defecting as seen in the
As I made the tables above, I noticed some basic ideas and concepts of each strategy,
“Tit-For-Tat" tries to balance out the overall score by reciprocating the opponent’s last actions.
The “Grudger” strategy does intend on cooperating but heavily punishes those who defect even
once.
Creating a tournament
The Iterated Prisoner’s Dilemma is not only a small one versus one competition similar to
the original version, but a large tournament. This meant that each player would compete against
every other player a set amount of rounds each. (Jensen). Here is a table I made to show the
gzz786 9
Research Question: “How can designing an iterative tournament of the Prisoner’s Dilemma to
determine the most effective strategy teach us about the value of cooperation and defection?”
Fig. 4 – Table of the results of 200 rounds in an Iterated Prisoner’s Dilemma, with the IPD grid beside as a basis
in finding the values
The table above includes the score of two hundred rounds between each player. For now I
have not included the interaction by a player to a similar strategy shown in black squares as I
wanted to have a simple tournament where players go against everyone but itself. I have placed
the table for the IPD beside it for reference. Here is how I calculated the numbers for the
tournament:
o This would be similar to the exploitation scenario shown in explaining the IPD.
§ Constant C = 0 x 200 = 0
For-Tat vs Grudger:
o These two are similar in results because both Tit-For-Tat and Grudger cooperates
in the first round, but due to Constant Defection the rest of the rounds will lead to
a mutual defection. This meant that round 1 would always be (D,C) in favor of
Constant Defection and the rest of the 199 rounds would be (D,D)
gzz786 10
Research Question: “How can designing an iterative tournament of the Prisoner’s Dilemma to
determine the most effective strategy teach us about the value of cooperation and defection?”
Round # 1 2 3 4 5 6 7
Cons. D D D D D D D
Def.
Grudger / C D D D D D D
Tit-For-
Tat
Fig. 5 – Table regarding how Constant Cooperation would go against either Tit-For-Tat or Grudger
After calculating the score for each set of rounds for each player, I then calculated the total
• Constant Cooperation
• Constant Defection
• Tit-For-Tat
• Grudger
As seen from the results above, Constant Defection has the highest overall score. This
would imply that in a situation where everyone’s strategies were unique yet predictable, the one
that would exploit the most players/rounds would end victorious. I would say that this is the case
as it obtained the highest amount of points against the Constant Cooperation, which is more than
a third of the second highest points in a set, 600. Since Constant Cooperation never retaliates,
gzz786 11
Research Question: “How can designing an iterative tournament of the Prisoner’s Dilemma to
determine the most effective strategy teach us about the value of cooperation and defection?”
Constant Defection would be able to exploit it for all 200 rounds, unlike against Tit-For-Tat and
Grudger, which is only exploited on the first round. Although I believe that a different result will
be obtained when players uses their strategy against itself, as Constant defect would be in a
massive disadvantage. To remove this total uniqueness of each strategy I decided to calculate the
result when a strategy goes against itself and add them to the total:
§ 1 x 200 = 200
o For a similar reason since no one would have the incentive to defect, mutual
§ 3 x 200 = 600
I decided to add this as in most of the time, players don’t have a unique strategy, and may
end up going against someone with a similar plan. This would also be applicable in real life, as
when people interact with one another in either a different or a similar way (Moore). In the first
version of the tournament, defecting would easily be the best choice, but up against itself would
lead to a different story . If I included these scores in the tournament, the results would be as
follows:
• Constant Cooperation:
• Constant Defection:
gzz786 12
Research Question: “How can designing an iterative tournament of the Prisoner’s Dilemma to
determine the most effective strategy teach us about the value of cooperation and defection?”
Looking at these results, the highest score in the previous tournament, Constant
Defection, became the lowest in this version as it was not able to exploit itself. On the other hand
Tit-For-Tat and Grudger benefit from itself through mutual cooperation, but is not as naïve when
up against defection. This shows how effective it is to learn from the previous round and decide
whether to change their decision in future rounds, while sticking to a selfish strategy would be
detrimental.
Since in a real life situation where some people have in most cases similar levels of
cooperation and defection, I decided to create a different tournament in which there would
multiple people having the same strategy against everyone, both people with different strategies
Repeated Tournaments
Comparing the first tournament, where players get to battle other players once, trust is
only made in a single set, and a player would not need to go up against that particular player for
the rest of the tournament. Here is how the repeated tournament goes (Case):
gzz786 13
Research Question: “How can designing an iterative tournament of the Prisoner’s Dilemma to
determine the most effective strategy teach us about the value of cooperation and defection?”
1. There would be 3 players sharing a similar strategy, and would play a tournament as
usual, going against everyone including players with the same strategy.
2. The bottom 3 are eliminated. If there is a tie, then randomly select between them.
3. The top 3 are being “cloned”. If there is a tie, then randomly select between them.
This is done based on the idea that people tend not to replicate “losing” behaviors and
tend to imitate “successful” behaviors (Case). For the sake of ease, I limited the number of
rounds per set to 20. I decided to remove the Grudger strategy in this tournament as it would act
similarly to Tit-For-Tat.
Here are the calculations I needed for a network map I intend to make to represent this
repeated tournament:
• Constant Cooperation:
§ 20 x 3 = 60 x 2 players = 120
§ 20 x 3 = 60 x 3 players = 180
§ 20 x 0 = 0 x 3 players = 0
• Constant Defection:
§ 20 x 1 = 20 x 2 players = 40
gzz786 14
Research Question: “How can designing an iterative tournament of the Prisoner’s Dilemma to
determine the most effective strategy teach us about the value of cooperation and defection?”
§ 5 + 1 x 19 = 24 x 3 players = 72
• Tit-For-Tat:
§ 20 x 3 = 60 x 2 players = 120
§ 20 x 3 = 60 x 3 players = 180
§ 0 + 1 x 19 = 19 x 3 players = 57
Fig. 7 – Network graph of the first tournament with the total scores for each strategy
gzz786 15
Research Question: “How can designing an iterative tournament of the Prisoner’s Dilemma to
determine the most effective strategy teach us about the value of cooperation and defection?”
The results show that Constant Defection still wins in this style of tournament.
Since the bottom three players being Constant Cooperation, being crossed out in the next
figure, they will be replaced by the top scorers. This leaves a 6 vs 3 in favor of Constant
Fig. 8 – Results of the first tournament. The winners to be cloned are being represented with crowns while the
losers are being crossed out
Now that Constant Cooperation has been eliminated by Constant Defection, they would
• Constant Defection:
§ 20 x 1 = 20 x 5 players = 100
§ 5 + 1 x 19 = 24 x 3 players = 72
• Tit-For-Tat:
§ 20 x 3 = 60 x 2 players = 120
gzz786 16
Research Question: “How can designing an iterative tournament of the Prisoner’s Dilemma to
determine the most effective strategy teach us about the value of cooperation and defection?”
§ 0 + 1 x 19 = 19 x 6 players = 114
This time, Constant Defection ends up hurting one another, while Tit-For-Tat has been
helpful to each another and the new network map has developed:
For the final tournament I’m sure it will be clearly visible which strategy wins:
• Constant Defection:
§ 20 x 1 = 20 x 2 players = 40
§ 5 + 1 x 19 = 24 x 6 players = 144
• Tit-For-Tat:
§ 20 x 3 = 60 x 2 players = 120
§ 0 + 1 x 19 = 19 x 6 players = 114
gzz786 17
Research Question: “How can designing an iterative tournament of the Prisoner’s Dilemma to
determine the most effective strategy teach us about the value of cooperation and defection?”
Despite Constant Defection having the advantage in the first part of the repeated
tournament, their strategy was their own undoing when more and more players imitate their plan.
Fig. 10 – Results of the third and final tournament, as only one strategy remains
This would always be the result when the number of players, rounds or in most cases,
strategies were put in. Constant Defection or any “harsh” strategy will take out the more “naive”
strategies, but when up against those that can retaliate while “encouraging” cooperation will be
removed over time. In the IPD, most winning strategies employ a “Tit-For-Tat” style proving
Evaluation of research
In making the two types of tournaments, I have made a number of assumptions based on
what I have learned. First, although cooperation between people eventually leads to a greater
success, being able to adapt against defection is crucial to avoid being used. This is seen as the
player that uses Constant Cooperation receives the lowest points in both tournament styles.
Another assumption made is to always start by cooperating. This rule is prominent in Tit-For-Tat
gzz786 18
Research Question: “How can designing an iterative tournament of the Prisoner’s Dilemma to
determine the most effective strategy teach us about the value of cooperation and defection?”
as well as Grudger because when up against itself would lead to a back-and-forth cycle of
defection. In a real life situation, immediately rebelling against a community would lead to doubt
and eventually weaken the integrity of the entire group. Lastly, not everyone can be predictable.
The simulations I made were based on the fact that each strategy would stick to their respective
rule, which is not present in real life. People make mistakes and may unintentionally make a
wrong decision that, when unknown to others as an accident may lead to a spiral of defection.
be connected to real life, I would say that defection would be more valuable only in a limited
number of circumstances. If there would only be “one round” then defection would be profitable
as there would be no chance of revenge in the future. When you are certain that the “opponent”
would be gullible and would easily exploited would defection be more valuable as the opposition
would be stubborn enough to change his or her strategy. When you are able to know who you are
up against would defection be beneficial, but only in the right timing. Due to these very specific
circumstances, defection is hardly valued overall as it does not easily comply with real life
situations and would only present yourself as selfish, changing the views of other people towards
you.
On the other hand, an adaptable cooperation, and not total cooperation, proves to be more
valuable in a large number of circumstances, especially in real life. This allows the more
cooperative ideas and people to flourish while slowing down the progress of more defective
ideas. When communication is being added, proving oneself to be cooperative would cement a
gzz786 19
Research Question: “How can designing an iterative tournament of the Prisoner’s Dilemma to
determine the most effective strategy teach us about the value of cooperation and defection?”
stronger bond when mutual cooperation is achieved. Overall, cooperation is invaluable proven by
gzz786 20
Research Question: “How can designing an iterative tournament of the Prisoner’s Dilemma to
determine the most effective strategy teach us about the value of cooperation and defection?”
Works Cited
Arora, Sanjeev. "Lecture 19: Equilibria and algorithms." Advanced Algorithm Design,
https://axelrod.readthedocs.io/en/stable/reference/description.html
---. "Welcome to the documentation for the Axelrod Python library." Apr. 2016, Reading.
https://axelrod.readthedocs.io/en/stable/index.html
Brook, Thomas. Computing the Mixed Strategy Nash Equilibria for Zero-Sum Games.
www.cs.bath.ac.uk/~mdv/courses/CM30082/projects.bho/2006-7/Brook-T-dissertation-
2006-07.pdf.
Cohen, Samuel N., and Victor Fedyashov. "Nash equilibria for nonzero-sum ergodic stochastic
differential games." Journal of Applied Probability, vol. 54, no. 04, 2017, pp. 977-
994, cambridge.org.
Davis, Morton D., and Steven J. Brams. "Game Theory | Mathematics." Encyclopedia
www.encyclopediaofmath.org/index.php/Differential_games.
Duffy, Jenny. "Game Theory and Nash Equilibrium." 2015, Lakehead University. Thunder Bay,
gzz786 21
Research Question: “How can designing an iterative tournament of the Prisoner’s Dilemma to
determine the most effective strategy teach us about the value of cooperation and defection?”
Presentation.
Ihavenotv.com, ihavenotv.com/the-joy-of-winning.
"The Iterated Prisoner's Dilemma and The Evolution of Cooperation." YouTube, This Place,
resource/easy-iterated-prisoners-dilemma/.
Johnson, Noel D., and Alexandra A. Mislin. "Trust games: A meta-analysis." Journal of
plato.stanford.edu/entries/prisoner-dilemma/.
gzz786 22
Research Question: “How can designing an iterative tournament of the Prisoner’s Dilemma to
determine the most effective strategy teach us about the value of cooperation and defection?”
www.lesswrong.com/posts/hamma4XgeNrsvAJv5/prisoner-s-dilemma-tournament-
results.
Feedback." SIAM Journal on Control and Optimization, vol. 43, no. 4, 2004, pp. 1222-
1233.
Professional Development,
www.learner.org/courses/mathilluminated/units/9/textbook/04.php.
Mathieu, Philippe, and Jean-Paul Delahaye. "New Winning Strategies for the Iterated Prisoner's
Dilemma." Journal of Artificial Societies and Social Simulation, vol. 20, no. 4, 2017.
McDonough, Michele. "Non-Zero-Sum Games Vs. Zero Sum Games: Examples and
management/61459-comparing-zero-sum-and-non-zero-sum-games/.
www.investopedia.com/articles/financial-theory/08/game-theory-basics.asp.
Moore, Doug. "This Adorable Game Explains the Math Behind Interpersonal Trust." Free
www.clearerthinking.org/single-post/2017/08/14/This-adorable-game-explains-the-
math-behind-interpersonal-trust.
gzz786 23
Research Question: “How can designing an iterative tournament of the Prisoner’s Dilemma to
determine the most effective strategy teach us about the value of cooperation and defection?”
equilibrium.asp.
Conference. 2004,
strategies-decisionmaking.asp.
dilemma.asp.
www.ams.org/publicoutreach/feature-column/fcarc-rationality.
Solan, Eilon, and Eran Shmaya. "Two-player nonZero?sum stopping games in discrete
time." The Annals of Probability, vol. 32, no. 3B, 2004, pp. 2733-2764.
"Sophisticated IPD Strategies Beat Simple Ones." Vince Knight, 28 July 2017,
vknight.org/unpeudemath/math/2017/07/28/sophisticated-ipd-strategies-beat-simple-
ones.html.
gzz786 24
Research Question: “How can designing an iterative tournament of the Prisoner’s Dilemma to
determine the most effective strategy teach us about the value of cooperation and defection?”
Tesfatsion, Leigh. "Game Theory: Basic Concepts and Terminology." 24 Oct. 2017,
www2.econ.iastate.edu/tesfatsi/GameDef.pdf.
"Two Person Games (Setting up the Pay-off Matrix)." University of Notre Dame,
www3.nd.edu/~apilking/Math10120/Lectures/Topic%2025.pdf.
gzz786 25