Non-Zero Sum Game Theory On Cooperation and Defection

Non-zero Sum Game Theory on Cooperation
and Defection
Research Question: “How can designing an iterative tournament of the

Prisoner’s Dilemma to determine the most effective strategy teach us
about the value of cooperation and defection?”
IB Extended Essay – Mathematics
gzz786
Essay Word Count: 3949
Exam Session: May 2019

Research Question: “How can designing an iterative tournament of the Prisoner’s Dilemma to
determine the most effective strategy teach us about the value of cooperation and defection?”
Table of Contents:
Introduction……………………………………………………………...2
Axelrod’s Tournament…………………………………………………..4
What is Game Theory?………………………………………………….5
Idea of Rationality and Maximization…………………………………...6
The Nash Equilibrium.…………………………………………………..7
Simple Strategies in IPD…………………………………………………8
Creating a Tournament………………………………………….……….9
Repeated Tournaments………………………………………….……...14
Evaluation of Research………………………………………….……..19
The Value of Cooperation and Defection………………………………19
Works Cited………………………………………….………………...21
gzz786 1
Introduction:
The research question of this Mathematics Extended Essay is “How can designing an
iterative tournament of the Prisoner’s Dilemma to determine the most effective strategy
teach us about the value of cooperation and defection?” on the topic of the application of
Game Theory. I have decided to use the context of the applicability of mathematics to solve both
real and abstract problems, as this would enlighten me in the idea of decision making in a
mathematical sense, whether it would be beneficial to aim for cooperating to a mutual benefit or
opting for a solo victory.
The problem I wanted to investigate came from the Axelrod’s tournament around 1980,
wherein Robert Axelrod, a professor of Political Science invited strategists all around the world
for his Iterated Prisoner’s Dilemma (Axelrod Project Developers).
To summarize the game, the Prisoner’s Dilemma is a hypothetical scenario presented in
Game Theory that teaches about cooperation and defection (Duignan). There are multiple
versions of the game but the original scenario is as follows: Two criminals were caught and are
separately interrogated and no communication between criminals was allowed. They were both
given a choice: to confess or to remain silent, with the outcomes based on both of their choices.
Should both remain silent, their sentences would be reduced. If both confessed, then both will be
given a long sentence. If one confesses and the other remained silent, the confessor goes free and
the one who remained silent receives a much longer sentence (Kuhn). Here is the summary of the
payoffs:
gzz786 2
Fig. 1 - this table is an example of the outcomes between each player’s choices, with the goal to have the least
amount of points , corresponding to the amount of years spent in prison as possible. ("Prisoner's Dilemma")
Looking at the table above, both player’s choices affect the overall outcome, and that one
player’s choice will not lead to a definite conclusion. In analyzing the payoffs in fig. 1, a few
observations can be made:
● In any situation, a player can be better off and earn less points by confessing
● The most amount of points generally (worst overall outcome) is attained when both
players confess.
● The least amount of points in total (best overall outcome) is reached when both players
remain silent
● The best individual outcome is when you would confess and your opponent decides to
remain silent
● The worst individual outcome is presented when you chose to remain silent but your
opponent confesses
gzz786 3
Axelrod’s Tournament
The Axelrod Tournament is where the Iterated Prisoner’s Dilemma (IPD) was introduced,
with the main difference being that the game is repeatedly played by the same players. Robert
Axelrod wanted Game Theory experts to submit a line of computer code, called agents, that
when played in a round-robin tournament against one another 200 times each, would lead to the
maximum amount of payoffs in his Iterated version(Axelrod project developers). The main
difference between this and the original version, mainly the large number of rounds, is that these
“agents” can learn from each other to gain “experience” and may adapt in future rounds. Another
difference is the terminology used, changing from a prisoner’s choice to remain silent or confess
to a more general version of cooperate and defect, which is more applicable to real life than the
original:
Fig. 2 -Axelrod’s tournament was set up so that mutual cooperation would yield 3 points, mutual defection 1
point, exploiting a cooperator would yield 5 points and being exploited would receive 0. The goal of this game
is to have the most amount of points by the end of the tournament (Moore).
For the sake of convenience and ease in understanding, I would refer to the codes in the
Iterated Prisoner's Dilemma as “players”. To shorten the terminology, I will occasionally refer
“C” as cooperation and “D” as defection.
gzz786 4
Looking at Axelrod’s iterated version, I calculated the most noticeable outcomes in a
pair’s 200 rounds:
• The most points a player could receive:
o All-D exploiting All-C = 5 x 200 = 1000
• The least points a player could receive:
o All-C being exploited by All-D = 0 x 200 = 0
• The result of a constant mutual cooperation:
o All-C against All-C = 3 x 200 = 600
• The result of a constant mutual defection:
o All-D against All-D = 1 x 200 = 200
For me to further investigate this game, the knowledge of game theory comes into play.
What is Game Theory?

Game theory is the study of the processes in which interacting choices between players
can lead to outcomes with respect to the utility or preferred outcome of those players. An
important concept in Game theory is the idea of rationality and the assumption of maximization,
in which each player makes their choice which leads to their own maximum payoffs in the game.
To explain the game in a more mathematical sense, the Prisoner’s Dilemma is an
example of a two player non-zero sum game, in which the gains of one player are not equally
offset by the losses of the other, in other words both players can ,but not always simultaneously
win or lose. A major difference between a zero and a non-zero sum game was that non-zero sum
games are not strictly competitive and have different degrees of cooperation between players. In
gzz786 5
a zero sum game there would be a definite winner and a definite loser which can, but not always
be avoided in a non-zero sum game.
Going back to the Prisoner’s Dilemma, cooperation is seen in the entire game even if
communication is not allowed, as the outcome is decided based on the combined choices
between players. The highest degree of cooperation is made when both players decide to remain
silent as this would give the most amount of points combined between both players.
If a player chooses to cooperate, two different outcomes are presented:
● The other player cooperates, leading to a gain in both players. This is considered as the
socially optimal solution as players aim to maximize their total payoff
● The other player defects, leading to a gain in only one player. This is called a locally
optimal outcome for the other player since this maximizes personal utility
Since Game Theory and the Axelrod Tournament is based upon the idea of maximization,
players are likely to defect when they are certain that their opponent would cooperate.
It would seem that cooperating would be a terrible decision, and defecting will always
lead to a better local scenario for the player, but that is where a concept in game theory comes in:
Idea of Rationality and Maximization

In game theory, rationality is one of the main concepts used to determine the limit of a
player’s actions. This is based on the idea that players would only choose the option that would
benefit them the most, thus maximizing their gain. In the Prisoner’s Dilemma, a rational choice
would be when one player defects regardless of their opponent’s choice, making the choice of
cooperation irrational. However, when both players use this idea of rationality to maximize their
gzz786 6
gain, would lead to a paradoxical situation in which the rational play leads to a poorer outcome
than irrational play ("Rationality and Game Theory"). This is further explained in another
concept.
The Nash Equilibrium

Another difference between these two types of games is that a non-zero sum game would
have the existence of a Nash equilibrium, a situation in which no player can benefit by changing
their current strategy (Duffy).
In a game such as the Iterated Prisoner's Dilemma, a Nash equilibrium is achieved when
both players repeatedly decide to defect. This is the case as it would be unwise for a player to
change his/her decision to cooperate and be exploited. Going back to the research question, this
outcome, mutual defection, would be considered as the lowest overall value in cooperation and
defection, where both players would try to “stab each other in the back”. This breaks the idea of
rationality, in which each player, hoping to gain from the other by thinking rationally/defecting,
instead ending up both worse off than had they both be “irrational” and cooperate.
This presence of a Nash equilibrium prevents players from defecting all the time. This is
shown by the iterated prisoner’s dilemma, as consistently defecting would lead to a very low
outcome when compared to constant cooperation (Darity). Although, an important thing that I
noticed is that that the socially optimal solution, when both players cooperate, is not considered
the Nash equilibrium, as one player would always benefit more from exploiting the other player
and defecting.
gzz786 7
Simple Strategies in Iterated Prisoner’s Dilemma

In the Axelrod Tournament, the strategies a few main and common strategies were found being seen in
the entries/code given by strategists. (Axelrod project developers)(Moore)
1. Constant Cooperation
a. Easily exploited by defecting, but rewards cooperation
2. Constant Defection
a. Gains points no matter what, but receives less through mutual defection
3. “Tit-For-Tat”
a. Starts by cooperating, then copies the last action taken by the opposing player
4. “Grudger” (Spiteful)
a. Starts by cooperating, then permanently switches to defecting if the other player
defects
Constant cooperation and constant defection are easy to understand, as they do not
change their choice under any circumstance. “Constant Cooperation” risks punishment for
mutual cooperation, the highest overall outcome. “Constant Defection” on the other hand, is
more of the safer strategy as it would be impossible for a player to gain nothing each round, and
would either gain one or five. On the other hand, the other strategies might seem a bit confusing,
so this is what they would do when against a test strategy I made to explain what they would do
under certain circumstances:
Round # 1 2 3 4 5 6 7
Random D D C C D C D
“Tit-For- C D D C C D C
Tat”
Round # 1 2 3 4 5 6 7
Random C D C C C D C
“Grudger” C C D D D D D
Fig. 3 – Choices made by “Tit-For-Tat” and “Grudger” against a Random Strategy
gzz786 8
As shown by the first table, “Tit-For-Tat” would always cooperate in the first round, and
replicates the last action the opponent uses, labeled in blue and orange pair of diagonal boxes.
“Grudger” only needs one defect from the opponent before permanently defecting as seen in the
red boxes on the second table.
As I made the tables above, I noticed some basic ideas and concepts of each strategy,
“Tit-For-Tat" tries to balance out the overall score by reciprocating the opponent’s last actions.
The “Grudger” strategy does intend on cooperating but heavily punishes those who defect even
once.
Creating a tournament
The Iterated Prisoner’s Dilemma is not only a small one versus one competition similar to
the original version, but a large tournament. This meant that each player would compete against
every other player a set amount of rounds each. (Jensen). Here is a table I made to show the
results of two hundred rounds, similarly to the Iterated Prisoner’s Dilemma:
gzz786 9
Fig. 4 – Table of the results of 200 rounds in an Iterated Prisoner’s Dilemma, with the IPD grid beside as a basis
in finding the values
The table above includes the score of two hundred rounds between each player. For now I
have not included the interaction by a player to a similar strategy shown in black squares as I
wanted to have a simple tournament where players go against everyone but itself. I have placed
the table for the IPD beside it for reference. Here is how I calculated the numbers for the
tournament:
• Constant Cooperation vs Constant Defection:
o This would be similar to the exploitation scenario shown in explaining the IPD.
§ Constant C = 0 x 200 = 0
§ Constant D = 5 x 200 = 1000
• Constant Cooperation vs Tit-For-Tat AND Constant Cooperation vs Grudger AND Tit-
For-Tat vs Grudger:
o Three of these would be similar to a mutual cooperation as none would start
defecting that will lead to retaliation from Tit-For-Tat and Grudger.
§ Constant C = 3 x 200 = 600
§ Tit-For-Tat = 3 x 200 = 600
§ Grudger = 3 x 200 = 600
• Constant Defection vs Tit-For-Tat AND Constant Defection vs Grudger:
o These two are similar in results because both Tit-For-Tat and Grudger cooperates
in the first round, but due to Constant Defection the rest of the rounds will lead to
a mutual defection. This meant that round 1 would always be (D,C) in favor of
Constant Defection and the rest of the 199 rounds would be (D,D)
gzz786 10
Round # 1 2 3 4 5 6 7
Cons. D D D D D D D
Def.
Grudger / C D D D D D D
Tit-For-
Tat
Fig. 5 – Table regarding how Constant Cooperation would go against either Tit-For-Tat or Grudger
§ Constant D = 5 + (199 x 1) = 204
§ Tit=For-Tat = 0 + (199 x 1)= 199
§ Grudger = 0 + (199 x 1)= 199
After calculating the score for each set of rounds for each player, I then calculated the total
score each player received throughout the small tournament:
• Constant Cooperation
o 0 + 600 + 600 = 1200
• Constant Defection
o 1000 + 204 + 204 = 1408
• Tit-For-Tat
o 600 + 199 + 600 = 1399
• Grudger
o 600 + 199 + 600 = 1399
As seen from the results above, Constant Defection has the highest overall score. This
would imply that in a situation where everyone’s strategies were unique yet predictable, the one
that would exploit the most players/rounds would end victorious. I would say that this is the case
as it obtained the highest amount of points against the Constant Cooperation, which is more than
a third of the second highest points in a set, 600. Since Constant Cooperation never retaliates,
gzz786 11
Constant Defection would be able to exploit it for all 200 rounds, unlike against Tit-For-Tat and
Grudger, which is only exploited on the first round. Although I believe that a different result will
be obtained when players uses their strategy against itself, as Constant defect would be in a
massive disadvantage. To remove this total uniqueness of each strategy I decided to calculate the
result when a strategy goes against itself and add them to the total:
• Constant Defect vs Constant Defect:
o Similar to the All-D vs All-D scenario in the Iterated Prisoner’s Dilemma
§ 1 x 200 = 200
• Constant Cooperation, Tit-For-Tat & Grudger against itself:
o For a similar reason since no one would have the incentive to defect, mutual
cooperation is constant throughout the 200 rounds
§ 3 x 200 = 600
I decided to add this as in most of the time, players don’t have a unique strategy, and may
end up going against someone with a similar plan. This would also be applicable in real life, as
when people interact with one another in either a different or a similar way (Moore). In the first
version of the tournament, defecting would easily be the best choice, but up against itself would
lead to a different story . If I included these scores in the tournament, the results would be as
follows:
• Constant Cooperation:
o 1200 + 600 = 1800
• Constant Defection:
o 1408 + 200 = 1600
gzz786 12
• Tit-For-Tat AND Grudger (having similar results):
o 1399 + 600 = 1999
Looking at these results, the highest score in the previous tournament, Constant
Defection, became the lowest in this version as it was not able to exploit itself. On the other hand
Tit-For-Tat and Grudger benefit from itself through mutual cooperation, but is not as naïve when
up against defection. This shows how effective it is to learn from the previous round and decide
whether to change their decision in future rounds, while sticking to a selfish strategy would be
detrimental.
Since in a real life situation where some people have in most cases similar levels of
cooperation and defection, I decided to create a different tournament in which there would
multiple people having the same strategy against everyone, both people with different strategies
and their own.
Repeated Tournaments
Comparing the first tournament, where players get to battle other players once, trust is
only made in a single set, and a player would not need to go up against that particular player for
the rest of the tournament. Here is how the repeated tournament goes (Case):
Fig. 6 – diagram of how a repeated tournament would be played
gzz786 13
1. There would be 3 players sharing a similar strategy, and would play a tournament as
usual, going against everyone including players with the same strategy.
2. The bottom 3 are eliminated. If there is a tie, then randomly select between them.
3. The top 3 are being “cloned”. If there is a tie, then randomly select between them.
4. Repeat until only one strategy remains or a continuous tie occurs.
This is done based on the idea that people tend not to replicate “losing” behaviors and
tend to imitate “successful” behaviors (Case). For the sake of ease, I limited the number of
rounds per set to 20. I decided to remove the Grudger strategy in this tournament as it would act
similarly to Tit-For-Tat.
Here are the calculations I needed for a network map I intend to make to represent this
repeated tournament:
• Constant Cooperation:
o When against Constant Cooperation:
§ 20 x 3 = 60 x 2 players = 120
o When against Tit-For-Tat:
§ 20 x 3 = 60 x 3 players = 180
o When against Constant Defection:
§ 20 x 0 = 0 x 3 players = 0
o Total Score = 120 + 180 + 0 = 300
§ 20 x 1 = 20 x 2 players = 40
gzz786 14
§ 20 x 5 = 100 x 3 players = 300
§ 5 + 1 x 19 = 24 x 3 players = 72
o Total Score = 40 + 300 + 72 = 412
• Tit-For-Tat:
§ 20 x 3 = 60 x 2 players = 120
§ 20 x 3 = 60 x 3 players = 180
§ 0 + 1 x 19 = 19 x 3 players = 57
o Total Score = 120 + 180 + 57 = 357
Fig. 7 – Network graph of the first tournament with the total scores for each strategy
gzz786 15
The results show that Constant Defection still wins in this style of tournament.
Since the bottom three players being Constant Cooperation, being crossed out in the next
figure, they will be replaced by the top scorers. This leaves a 6 vs 3 in favor of Constant
Defection as shown by the image below:
Fig. 8 – Results of the first tournament. The winners to be cloned are being represented with crowns while the
losers are being crossed out
Now that Constant Cooperation has been eliminated by Constant Defection, they would
have no one left to exploit in the second tournament:
§ 20 x 1 = 20 x 5 players = 100
§ 5 + 1 x 19 = 24 x 3 players = 72
o Total Score = 100 + 72 = 172
• Tit-For-Tat:
§ 20 x 3 = 60 x 2 players = 120
gzz786 16
§ 0 + 1 x 19 = 19 x 6 players = 114
o Total Score = 120 + 114 = 234
This time, Constant Defection ends up hurting one another, while Tit-For-Tat has been
helpful to each another and the new network map has developed:
Fig. 9 – Results of the second tournament
For the final tournament I’m sure it will be clearly visible which strategy wins:
§ 20 x 1 = 20 x 2 players = 40
§ 5 + 1 x 19 = 24 x 6 players = 144
o Total Score = 40 + 72 = 172
• Tit-For-Tat:
§ 20 x 3 = 60 x 2 players = 120
§ 0 + 1 x 19 = 19 x 6 players = 114
gzz786 17
o Total Score = 120 + 114 = 234
Despite Constant Defection having the advantage in the first part of the repeated
tournament, their strategy was their own undoing when more and more players imitate their plan.
This leaves Tit-For-Tat the last strategy standing:
Fig. 10 – Results of the third and final tournament, as only one strategy remains
This would always be the result when the number of players, rounds or in most cases,
strategies were put in. Constant Defection or any “harsh” strategy will take out the more “naive”
strategies, but when up against those that can retaliate while “encouraging” cooperation will be
removed over time. In the IPD, most winning strategies employ a “Tit-For-Tat” style proving
this idea to be true.
Evaluation of research
In making the two types of tournaments, I have made a number of assumptions based on
what I have learned. First, although cooperation between people eventually leads to a greater
success, being able to adapt against defection is crucial to avoid being used. This is seen as the
player that uses Constant Cooperation receives the lowest points in both tournament styles.
Another assumption made is to always start by cooperating. This rule is prominent in Tit-For-Tat
gzz786 18
as well as Grudger because when up against itself would lead to a back-and-forth cycle of
defection. In a real life situation, immediately rebelling against a community would lead to doubt
and eventually weaken the integrity of the entire group. Lastly, not everyone can be predictable.
The simulations I made were based on the fact that each strategy would stick to their respective
rule, which is not present in real life. People make mistakes and may unintentionally make a
wrong decision that, when unknown to others as an accident may lead to a spiral of defection.
The Value of Cooperation and Defection

In making this tournament to distinguish the usefulness of cooperation and defection to
be connected to real life, I would say that defection would be more valuable only in a limited
number of circumstances. If there would only be “one round” then defection would be profitable
as there would be no chance of revenge in the future. When you are certain that the “opponent”
would be gullible and would easily exploited would defection be more valuable as the opposition
would be stubborn enough to change his or her strategy. When you are able to know who you are
up against would defection be beneficial, but only in the right timing. Due to these very specific
circumstances, defection is hardly valued overall as it does not easily comply with real life
situations and would only present yourself as selfish, changing the views of other people towards
you.
On the other hand, an adaptable cooperation, and not total cooperation, proves to be more
valuable in a large number of circumstances, especially in real life. This allows the more
cooperative ideas and people to flourish while slowing down the progress of more defective
ideas. When communication is being added, proving oneself to be cooperative would cement a
gzz786 19
stronger bond when mutual cooperation is achieved. Overall, cooperation is invaluable proven by
the most effective strategy in an iterative format of the Prisoner’s Dilemma.
gzz786 20
Works Cited
Arora, Sanjeev. "Lecture 19: Equilibria and algorithms." Advanced Algorithm Design,
Princeton University. Princeton, NJ 08544, USA. Lecture.
Axelrod project developers. "Background to Axelrod’s Tournament." Reading.
https://axelrod.readthedocs.io/en/stable/reference/description.html
---. "Welcome to the documentation for the Axelrod Python library." Apr. 2016, Reading.
https://axelrod.readthedocs.io/en/stable/index.html
Brook, Thomas. Computing the Mixed Strategy Nash Equilibria for Zero-Sum Games.
University of Bath, 2007,
www.cs.bath.ac.uk/~mdv/courses/CM30082/projects.bho/2006-7/Brook-T-dissertation-
2006-07.pdf.
Case, Nicky. "The Evolution of Trust." It's Nicky Case!, ncase.me/trust/.
Cohen, Samuel N., and Victor Fedyashov. "Nash equilibria for nonzero-sum ergodic stochastic
differential games." Journal of Applied Probability, vol. 54, no. 04, 2017, pp. 977-
994, cambridge.org.
Darity, William A. "Nash Equilibrium." International Encyclopedia of the Social Sciences,
2nd ed., Thomson/Gale, 2008, pp. 540 - 542.
Davis, Morton D., and Steven J. Brams. "Game Theory | Mathematics." Encyclopedia
Britannica, 1999, www.britannica.com/science/game-theory.
"Differential Games." Encyclopedia of Mathematics, 2012,
www.encyclopediaofmath.org/index.php/Differential_games.
Duffy, Jenny. "Game Theory and Nash Equilibrium." 2015, Lakehead University. Thunder Bay,
Ontario, Canada. Presentation.
gzz786 21
Duignan, Brian. "Prisoner's Dilemma | Game Theory." Encyclopedia Britannica, edited
by Darshana Das, 2009, www.britannica.com/topic/prisoners-dilemma.
Frazzoli, Emilio. "Principles of Autonomy and Decision Making - Sequential Games."
Aeronautics and Astronautics, 6 Dec. 2010, Massachusetts Institute of Technology.
Presentation.
Fry, Hannah. "The Joy of Winning." Documentaries - Free Online Documentaries -
Ihavenotv.com, ihavenotv.com/the-joy-of-winning.
"Game Theory." Department of Computer Science, University of Maryland. Presentation.
"The Iterated Prisoner's Dilemma and The Evolution of Cooperation." YouTube, This Place,
2 July 2016, www.youtube.com/watch?v=BOvAbjfJ0x0&frags=pl%2Cwn.
Jensen, Christopher J. "Easy Iterated Prisoner’s Dilemma." Christopher X J. Jensen,
6 Apr. 2016, www.christopherxjjensen.com/research/projects/online-cooperative-
resource/easy-iterated-prisoners-dilemma/.
Johnson, Noel D., and Alexandra A. Mislin. "Trust games: A meta-analysis." Journal of
Economic Psychology, vol. 32, no. 5, 26 May 2011, pp. 865-889.
Karaman, Sertac. "Differential Games." Principles of Autonomy and Decision Making,
8 Dec. 2010, Massachusetts Institute of Technology. Presentation.
Kaznatcheev, Artem. "Short History of Iterated Prisoner’s Dilemma Tournaments." Theory,
Evolution, and Games Group, 3 Mar. 2015, egtheory.wordpress.com/2015/03/02/ipd/.
Kuhn, Steven. "Prisoner's Dilemma (Stanford Encyclopedia of Philosophy)." Stanford
Encyclopedia of Philosophy, edited by Edward N. Zalta, 2017 ed., 1997,
plato.stanford.edu/entries/prisoner-dilemma/.
gzz786 22
LaValle, Steven M. "13.5.2 Differential Game Theory." Planning Algorithms / Motion
Planning, Cambridge UP, 20 Apr. 2012, planning.cs.uiuc.edu/node710.html.
Lesswrong. "Prisoner's Dilemma Tournament Results." LessWrong 2.0, 6 Sept. 2011,
www.lesswrong.com/posts/hamma4XgeNrsvAJv5/prisoner-s-dilemma-tournament-
results.
Mannucci, Paola. "Nonzero-Sum Stochastic Differential Games with Discontinuous
Feedback." SIAM Journal on Control and Optimization, vol. 43, no. 4, 2004, pp. 1222-
1233.
"Mathematics Illuminated | Unit 9 | 9.4 Prisoner's Dilemma." Annenberg Learner - Teacher
Professional Development,
www.learner.org/courses/mathilluminated/units/9/textbook/04.php.
Mathieu, Philippe, and Jean-Paul Delahaye. "New Winning Strategies for the Iterated Prisoner's
Dilemma." Journal of Artificial Societies and Social Simulation, vol. 20, no. 4, 2017.
McDonough, Michele. "Non-Zero-Sum Games Vs. Zero Sum Games: Examples and
Definitions." Brighthub Project Management, 2 Jan. 2011, www.brighthubpm.com/risk-
management/61459-comparing-zero-sum-and-non-zero-sum-games/.
McNulty, Daniel. "The Basics Of Game Theory." Investopedia, 25 Mar. 2018,
www.investopedia.com/articles/financial-theory/08/game-theory-basics.asp.
Moore, Doug. "This Adorable Game Explains the Math Behind Interpersonal Trust." Free
Courses for Decision Making And Reasoning - ClearerThinking.org, 14 Aug. 2017,
www.clearerthinking.org/single-post/2017/08/14/This-adorable-game-explains-the-
math-behind-interpersonal-trust.
gzz786 23
"Nash Equilibrium." Investopedia, 5 July 2018, www.investopedia.com/terms/n/nash-
equilibrium.asp.
Ozdaglar, Asu. "Computation of NE in finite games." Game Theory with Engineering
Applications, 4 Mar. 2010, MIT. Lecture.
Pantaleão, Luiz, and Helena Azevedo. USE OF A NON-ZERO-SUM GAME AS A TEACHING
TOOL ABOUT ORGANIZATIONAL INDICATORS: OPTIMUM LOCAL X NASH
EQUILIBRIUM. Second World Conference on POM and 15th Annual POM
Conference. 2004,
Picardo, Elvis. "Advanced Game Theory Strategies for Decision-Making." Investopedia,
26 Mar. 2018, www.investopedia.com/articles/investing/111113/advanced-game-theory-
strategies-decisionmaking.asp.
"Prisoner's Dilemma." Investopedia, 5 July 2018, www.investopedia.com/terms/p/prisoners-
dilemma.asp.
"Rationality and Game Theory." American Mathematical Society,
www.ams.org/publicoutreach/feature-column/fcarc-rationality.
Ross, Don. "Game Theory." Stanford Encyclopedia of Philosophy, edited by Edward N.
Zalta, 2018, plato.stanford.edu/entries/game-theory/#Mot.
Solan, Eilon, and Eran Shmaya. "Two-player nonZero?sum stopping games in discrete
time." The Annals of Probability, vol. 32, no. 3B, 2004, pp. 2733-2764.
"Sophisticated IPD Strategies Beat Simple Ones." Vince Knight, 28 July 2017,
vknight.org/unpeudemath/math/2017/07/28/sophisticated-ipd-strategies-beat-simple-
ones.html.
gzz786 24
Tesfatsion, Leigh. "Game Theory: Basic Concepts and Terminology." 24 Oct. 2017,
www2.econ.iastate.edu/tesfatsi/GameDef.pdf.
"Two Person Games (Setting up the Pay-off Matrix)." University of Notre Dame,
www3.nd.edu/~apilking/Math10120/Lectures/Topic%2025.pdf.
gzz786 25

Non-Zero Sum Game Theory On Cooperation and Defection

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Non-Zero Sum Game Theory On Cooperation and Defection

Uploaded by

Copyright:

Available Formats

Non-zero Sum Game Theory on Cooperation

Research Question: “How can designing an iterative tournament of the

IB Extended Essay – Mathematics

Essay Word Count: 3949

Exam Session: May 2019

What is Game Theory?………………………………………………….5

Idea of Rationality and Maximization…………………………………...6

The Nash Equilibrium.…………………………………………………..7

Simple Strategies in IPD…………………………………………………8

The Value of Cooperation and Defection………………………………19

opting for a solo victory.

for his Iterated Prisoner’s Dilemma (Axelrod Project Developers).

To summarize the game, the Prisoner’s Dilemma is a hypothetical scenario presented in

observations can be made:

“C” as cooperation and “D” as defection.

Looking at Axelrod’s iterated version, I calculated the most noticeable outcomes in a

pair’s 200 rounds:

• The most points a player could receive:

o All-D exploiting All-C = 5 x 200 = 1000

• The least points a player could receive:

o All-C being exploited by All-D = 0 x 200 = 0

• The result of a constant mutual cooperation:

o All-C against All-C = 3 x 200 = 600

• The result of a constant mutual defection:

o All-D against All-D = 1 x 200 = 200

What is Game Theory?

To explain the game in a more mathematical sense, the Prisoner’s Dilemma is an

be avoided in a non-zero sum game.

If a player chooses to cooperate, two different outcomes are presented:

socially optimal solution as players aim to maximize their total payoff

Idea of Rationality and Maximization

The Nash Equilibrium

their current strategy (Duffy).

Simple Strategies in Iterated Prisoner’s Dilemma

a. Easily exploited by defecting, but rewards cooperation

a. Starts by cooperating, then permanently switches to defecting if the other player

under certain circumstances:

red boxes on the second table.

results of two hundred rounds, similarly to the Iterated Prisoner’s Dilemma:

• Constant Cooperation vs Constant Defection:

§ Constant D = 5 x 200 = 1000

• Constant Cooperation vs Tit-For-Tat AND Constant Cooperation vs Grudger AND Tit-

o Three of these would be similar to a mutual cooperation as none would start

defecting that will lead to retaliation from Tit-For-Tat and Grudger.

§ Constant C = 3 x 200 = 600

§ Tit-For-Tat = 3 x 200 = 600

§ Grudger = 3 x 200 = 600

• Constant Defection vs Tit-For-Tat AND Constant Defection vs Grudger:

§ Constant D = 5 + (199 x 1) = 204

§ Tit=For-Tat = 0 + (199 x 1)= 199

§ Grudger = 0 + (199 x 1)= 199

score each player received throughout the small tournament:

o 0 + 600 + 600 = 1200

o 1000 + 204 + 204 = 1408

o 600 + 199 + 600 = 1399

o 600 + 199 + 600 = 1399

• Constant Defect vs Constant Defect:

o Similar to the All-D vs All-D scenario in the Iterated Prisoner’s Dilemma

• Constant Cooperation, Tit-For-Tat & Grudger against itself:

cooperation is constant throughout the 200 rounds

o 1200 + 600 = 1800

o 1408 + 200 = 1600

• Tit-For-Tat AND Grudger (having similar results):