Professional Documents
Culture Documents
The game
The Prisoner’s Dilemma is a game that illustrates how players might prefer
to not cooperate although cooperation would be mutually more beneficial.
It has been extensively investigated by game theorists but also attracted in-
terest in other disciplines such as philosophy, psychology, economics and so-
cial sciences. The game was designed by Merrill Flood and Melvin Dresher
in 1950, whereas the name Prisoner’s Dilemma and its popularization is
attributed to Albert Tucker (see [Kuh09],[HHV04]pp.172,180). Game the-
ory labels games of this type, where every player makes a single move,
1
static games. As the actual order of these moves, or decisions, is not rel-
evant to the outcome they are also called simultaneous decision games
([Web07]p.61).
In short, Kowalski is narrating the following gameplay story: You and
your friend John are successfully robbing a bank, getting away with 1 mil-
lion pounds. Stopped by the police for a broken headlight, in the course of
a routine investigation, the money is being discovered. You are arrested on
the suspicion of robbery. The police, lacking witnesses and your confession,
can convict you only for the lesser offense of possessing stolen property with
a penalty of one year in jail. You are separated and questioned without
the possibility to communicate with each other. If one of you turns witness
whereas the other refuses, the witness will walk free and the other person
will be sentenced to six years in jails. If both of you turn witness you will
be sentenced to 3 years in jail each. However, if you both refuse to turn
witness you will be both sentenced to 1 year in jail. The following table
(taken from [Kow11]p.145) is summarizing your options:
Action State of the world
John turns witness John refuses
I turn witness I get 3 years in jail I get 0 years in jail
I refuse I get 6 years in jail I get 1 year in jail
The exact story being told as well as the numbers of years in jail vary
throughout the literature. But they are all instances of the general form
presented in the Stanford Encyclopedia of Philosophy([Kuh09]):
Cooperate Defect
Cooperate R,R S,T
Defect T,S P,P
The two players are called Row and Column, both having the possible
moves cooperate or defect1 . The table, referred to as decision table
or payoff matrix2 in the literature, is showing the consequences of the
players choices in pairs of (Row, Column), where the variables are to be
read Reward for mutual cooperation, Punishment for mutual defection,
Temptation for defecting alone and Sucker for cooperating alone and have
to satisfy the inequalities: T > R > P > S. The values of T, R, P, S are
the results of the utility function, which is mapping the chosen action
to a numerical value often referred to as payoff. Intuitively spending less
1
Note that cooperate corresponds to refusing, defect corresponds to turn witness, Row
corresponds to I and Column corresponds to John.
2
Such a tabular description of the game is sometimes also called normal form or
strategic form (see [Web07]p.64).
2
time in jail is of higher utility, thus following Kowalski’s instance of the
game the utility of N years in jail can be defined as −N , resulting in the
following utility functions for John and me:
3
A formal definition of strategy can be found in [Web07]pp.24-27. In the case of the
Prisoner’s Dilemma, with just one move per player, the strategies correspond to the
possible actions, that is turn witness and refuse.
4
Strict domination is also called strong domination in the literature (e.g. in [RN03]).
Webb provides a formal definition of dominance in [Web07]p.66.
5
The name was given in honor of the mathematician John Nash (1928-present) who
proved that every finite game has an equilibrium of this type (see [RN03]p.634). For
a formal definition see [Web07]pp.69-71. Nashs idea is extensively discussed in chapter
2 of [HHV04].
6
Read policies as strategies.
3
matter what John does, defection is always the favorable option for you.
Looking at the situation from Johns perspective defection is again strictly
dominating cooperation. The rational choice7 is the action with the highest
expected utility, that is to turn witness, thereby producing a sub-optimal
result of 3 years in jail for both players, which is Pareto dominated by
the result (−1, −1), that is mutual cooperation would be the (socially)
efficient strategy. Checking the Nash equilibrium’s criterion for this result
it becomes apparent, that if you change your strategy to cooperation, while
John sticks to defection, you get 6 years in jail instead of 3, that is you are
worse off by switching. The analog argument holds for John, therefore
mutual defection is indeed a Nash equilibrium of this game8 . Russel and
Norvig are describing this property to the point: “That is the attractive
power of an equilibrium point.”([RN03]p.634).
Kowalski’s solution
Kowalski is starting with the observation that a player cannot control the
actions of others but at least he can estimate a likelihood for their possible
choices. He then suggests, to incorporate this estimate into the computation
of the overall expected utility of an action, by weighing the utility of each
possible outcome of the action by its probability and then summing up the
obtained weighted utilities. That is, given the n alternative outcomes of an
action with associated utilities u1 , u2 , ..., un and the respective probabilities
p1 , p2 , ..., pn , an overall expected utility p1 u1 +p2 u2 +...+pn un of that action
is obtained. Sticking to the definition of the utility of getting N years in jail
as −N and assuming the probability of John turning witness as P and John
refusing as (1 − P ) he presents the following decision table ([Kow11]p.150):
Action State of the world Expected utility
John turns witness John refuses with P × utility1 +
with probability P probability (1 − P ) (1 − P ) × utility2
I turn witness I get 3 years I get 0 years −3P
with utility1 = −3 with utility2 = 0
I refuse I get 6 years I get 1 year −6P − (1 − P )
with utility1 = −6 with utility2 = −1 −5P − 1
7
Informally speaking, we can call a player rational if he has a preference for the results
of his actions such that given any two actions he can always compare them (complete-
ness) and such that he can list all actions in a preference ordering (transitivity). For
a more detailed discussion and formal definitions of rationality see [Web07]pp.11-17
and [HHV04]pp.7-8.
8
Investigating all other combinations reveals, that there is indeed no other Nash equi-
librium of this game.
4
The expected utility −3P ist greater than −5P − 1 if P > − 12 which is
always the case as probabilities have values 1 ≥ P ≥ 0. Therefore defection
stays the dominant strategy so far. Kowalski argues that the reason for
this is that you did not consider the utility of your choice to John in your
payoff calculations. If you value the time John serves in jail equal to your
sentence the payoff table (taken from [Kow11]p.151) looks like this:
Action State of the world Expected utility
John turns witness John refuses with P × utility1 +
with probability P probability (1 − P ) (1 − P ) × utility2
I turn witness I get 3 years I get 0 years −6P − 6(1 − P )
John gets 3 years John gets 6 years = −6
with utility1 = −6 with utility2 = −6
I refuse I get 6 years I get 1 year −6P − 2(1 − P )
John gets 0 years John gets 1 year = −4P − 2
with utility1 = −6 with utility2 = −2
The expected utilities satisfy the inequality −4P − 2 ≥ −6 for all values
of 1 ≥ P ≥ 0, that is cooperation weakly dominates defection9 . Assuming
that John shares your beliefs the game changes entirely. Cooperation is
now the dominant strategy for both of you, therefore (refuse,refuse) is a
Nash equilibrium of the game10 and, in contrast to previous outcomes, this
result is actually Pareto optimal. However, Kowalski points out that it
might not be realistic that you value a punishment equally for John and
yourself and gives an example for a scenario where you value sentences for
John only half as severely as for yourself. Your new utility function maps
N years in jail for John to − 12 N . The corresponding decision table (taken
from [Kow11]p.151) is:
Action State of the world Expected utility
John turns witness John refuses with P × utility1 +
with probability P probability (1 − P ) (1 − P ) × utility2
I turn witness I get 3 years I get 0 years −4.5P − 3(1 − P )
John gets 3 years John gets 6 years = −1.5P − 3
with utility1 = −4.5 with utility2 = −3
I refuse I get 6 years I get 1 year −6P − 1.5(1 − P )
John gets 0 years John gets 1 year = −4.5P − 1.5
with utility1 = −6 with utility2 = −1.5
5
probability of John turning witness is less than 50% then refusing to turn
witness is the rational choice for you.
The assumptions taken so far always led to a symmetrical structure of
the game as depicted in the general form in the second table. That is
not necessarily always the case. John and you might be facing different
punishments (e.g. if one of you already has a criminal record and the
other does not). Also, John might estimate different probabilities than you
do and consequently come up with different utility values. Such changes
would lead to asymmetry in the game structure. Asymmetry is addressed
in section 2 of [Kuh09]. Further variants of the Prisoner’s Dilemma are
discussed in subsequent sections of [Kuh09].
11
The picture is taken from the online draft version of the book (http://www.doc.ic.
ac.uk/~rak/papers/newbook.pdf,p.139). The printed version of the book contains a
grayscale version of the image on p.124.
6
Figure 1: An agents observation-thought-decision-action cycle (taken from
[Kow11])
Your lack of knowledge about John’s decision at this point reflects the
situation told in the introductory story. And if you make your decision
based on the consequences you just derived you end up with the well
known sub-optimal result. The solution presented in the previous section
can be realized by adding beliefs describing the consequences for John,
7
your estimates for Johns choice and your valuation of the punishment:
John gets 3 years in jail if I turn witness and john turns witness.
John gets 6 years in jail if I turn witness and john refuses to turn witness.
John gets 0 years in jail if I refuse to turn witness and john turns witness.
John gets 1 year in jail if I refuse to turn witness
and john refuses to turn witness.
If the agent sticks with the principle approach of the solution as presented
above12 then, at this point, it doesn’t make a fundamental difference if
additional rules are incorporated or if a pre-assembled solution, such as a
library function, is used to calculate the formula. That is the case because
all variable subterms of the formula are already represented as beliefs and
can therefore be reevaluated by the agent.
Kowalski argues that such calculations are a normative ideal and that in
practice they are often approximated by heuristics. The exemplary goals
he is giving in [Kow11]p.152 prevent an agent from performing an action
if that would harm another person. In the case of the Prisoner’s Dilemma
that intuitively seems to corresponds to absolute (or naive) trust or to mere
altruism and does, without further refinement, effectively prevent defection.
But of course further refinement is possible. Kowalski is concluding the
chapter by making an argument for smart choices (e.g. identifying higher-
level goals) as an alternative to the computational methods used above
([Kow11]pp.152-153).
8
fection remains the dominant strategy. This observation is often used as
an argument for the necessity of an enforcement agency such as the gov-
ernment ([HHV04]p.174). Moreover they point out that this trust issue
arises in every imperfectly synchronized economic exchange such as pur-
chase over the internet as well as in settings with imperfect information
like the acquisition of second hand goods where the buyer discovers the
true quality of the purchased good over a period of time after the trans-
action. They show how the game can be transformed into the free rider
problem, which essentially extends the illustration of the conflict between
individual and collective rationality to many players. This extension is then
used to address group issues such as public goods, disarmament and cor-
ruption ([HHV04]pp.176-180). Kuhn discusses various approaches to this
extension and their adequacy in section 4 of [Kuh09].
Chapter 5.3 of [HHV04](pp.180-185) extensively reviews experimental re-
sults of the Prisoner’s Dilemma and the free rider problem, revealing that
people are not always sticking to the logic suggested by the game. I would
like to conclude this essay with a few observations they report, that I find
notable: In 100 plays under the surveillance of Flood and Drescher mutual
defection occured only 14 times whereas 60 times the result was mutual
cooperation. In subsequent experiments the cooperation rate was between
30% and 70%. Comparable numbers are reported from experiments with
the many player version. Moreover, the authors present a particular exper-
iment where the following findings were statistically significant:
9
The authors also report that the results (3) and (4) have been replicated
in other free rider experiments. After presenting and discussing further
empirical material, the authors conclude the chapter by covering the topic
cooperation at length ([HHV04]pp.185-209).
References
[HHV04] Shaun P. Hargreaves-Heap and Yanis Varoufakis. Game Theory
- A Critical Introduction (2nd Edition). Routledge, London and
New York, 2004.
10