Prisoner's Dilemma

Prisoner's dilemma
The prisoner's dilemma constitutes a problem in game theory. It was originally framed by
Merrill Flood and Melvin Dresher working at RAND in 1950. Albert W. Tucker formalized the
game with prison sentence payoffs and gave it the "prisoner's dilemma" name (Poundstone,
1992).
In its classical form, the prisoner's dilemma ("PD") is presented as follows:
Two suspects are arrested by the police. The police have insufficient evidence for a
conviction, and, having separated both prisoners, visit each of them to offer the same
deal. If one testifies (defects from the other) for the prosecution against the other and the
other remains silent (cooperates with the other), the betrayer goes free and the silent
accomplice receives the full 10-year sentence. If both remain silent, both prisoners are
sentenced to only six months in jail for a minor charge. If each betrays the other, each
receives a five-year sentence. Each prisoner must choose to betray the other or to remain
silent. Each one is assured that the other would not know about the betrayal before the
end of the investigation. How should the prisoners act?
If we assume that each player cares only about minimizing his own time in jail, then the
prisoner's dilemma forms a non-zero-sum game in which two players may each cooperate with
or defect from (betray) the other player. In this game, as in all game theory, the only concern of
each individual player (prisoner) is maximizing his own payoff, without any concern for the
other player's payoff. The unique equilibrium for this game is a Pareto-suboptimal solution, that
is, rational choice leads the two players to both play defect, even though each player's individual
reward would be greater if they both played cooperatively.
In the classic form of this game, cooperating is strictly dominated by defecting, so that the only
possible equilibrium for the game is for all players to defect. No matter what the other player
does, one player will always gain a greater payoff by playing defect. Since in any situation
playing defect is more beneficial than cooperating, all rational players will play defect, all things
being equal.
In the iterated prisoner's dilemma, the game is played repeatedly. Thus each player has an
opportunity to punish the other player for previous non-cooperative play. If the number of steps
is known by both players in advance, economic theory says that the two players should defect
again and again, no matter how many times the game is played. Only when the players play an
indefinite or random number of times can cooperation be an equilibrium. In this case, the
incentive to defect can be overcome by the threat of punishment. When the game is infinitely
repeated, cooperation may be a subgame perfect equilibrium, although both players defecting
always remains an equilibrium and there are many other equilibrium outcomes.
In casual usage, the label "prisoner's dilemma" may be applied to situations not strictly matching
the formal criteria of the classic or iterative games, for instance, those in which two entities could
gain important benefits from cooperating or suffer from the failure to do so, but find it merely
difficult or expensive, not necessarily impossible, to coordinate their activities to achieve
cooperation.
Strategy for the classical prisoner's dilemma

The classical prisoner's dilemma can be summarized thus:
Prisoner B Stays Silent Prisoner B Betrays
Prisoner A: 10 years
Prisoner A Stays Silent Each serves 6 months
Prisoner B: goes free
Prisoner A: goes free

Prisoner A Betrays Each serves 5 years
Prisoner B: 10 years
In this game, regardless of what the opponent chooses, each player always receives a higher
payoff (lesser sentence) by betraying; that is to say that betraying is the strictly dominant
strategy. For instance, Prisoner A can accurately say, "No matter what Prisoner B does, I
personally am better off betraying than staying silent. Therefore, for my own sake, I should
betray." However, if the other player acts similarly, then they both betray and both get a lower
payoff than they would get by staying silent. Rational self-interested decisions result in each
prisoner being worse off than if each chose to lessen the sentence of the accomplice at the cost of
staying a little longer in jail himself (hence the seeming dilemma). In game theory, this
demonstrates very elegantly that in a non-zero-sum game a Nash equilibrium need not be a
Pareto optimum.
The Prisoner's Dilemma

The prisoner's dilemma is the story of two criminals who have been arrested for a heinous
crime and are being interrogated separately. Each knows that if neither of them talks, the case
against them is weak and they will be convicted and punished for lesser charges. If this happens,
each will get one year in prison. If both confess, each will get 20 years in prison. If only one
confesses and testifies against the other, the one who did not cooperate with the police will get a
life sentence and the one who did cooperate will get parole. The table below illustrates the
structure of payoffs.
Given this set of payoffs, there is a strong tendency for each to confess, which you can be see by
considering the choices and payoffs of either one. If prisoner A remains silent, prisoner B is
better off confessing (because parole is better than a year in jail). However, B is also better off
confessing if A confesses (because 20 years is better than life). Hence, B will tend to confess
regardless of what A will do; and by an identical argument, A will also tend to confess.
The prisoner's dilemma is a case in which actions determined by self-interest are not in the
group's interest (where the group is defined to include only the criminals, not the larger society).
It is a story that would have pleased Thomas Hobbes, a political theorist of the mid-17th century.
Hobbes is a grandparent of economics because he introduced into intellectual discussion two
assumptions: first, that the individual is the starting point of social analysis, and, second, that
people are motivated by self-interest. He believed that the unrestrained pursuit of self-interest
would result in chaos and that government, with its power to coerce people, was necessary to
bring order out of chaos.
The Prisoners' Dilemma

Cooperation is usually analysed in game theory by means of a non-zero-sum game called the "Prisoner's
Dilemma" (Axelrod, 1984). The two players in the game can choose between two moves, either
"cooperate" or "defect". The idea is that each player gains when both cooperate, but if only one of them
cooperates, the other one, who defects, will gain more. If both defect, both lose (or gain very little) but
not as much as the "cheated" cooperator whose cooperation is not returned. The whole game situation
and its different outcomes can be summarized by table 1, where hypothetical "points" are given as an
example of how the differences in result might be quantified.
Action of A\Action of B Cooperate Defect
Cooperate Fairly good [+ 5] Bad [ - 10]
Defect Good [+ 10] Mediocre [0]
Table 1: outcomes for actor A (in words, and in hypothetical "points") depending on the
combination of A's action and B's action, in the "prisoner's dilemma" game situation. A similar
scheme applies to the outcomes for B.
The game got its name from the following hypothetical situation: imagine two criminals arrested
under the suspicion of having committed a crime together. However, the police does not have
sufficient proof in order to have them convicted. The two prisoners are isolated from each other,
and the police visit each of them and offer a deal: the one who offers evidence against the other
one will be freed. If none of them accepts the offer, they are in fact cooperating against the
police, and both of them will get only a small punishment because of lack of proof. They both
gain. However, if one of them betrays the other one, by confessing to the police, the defector will
gain more, since he is freed; the one who remained silent, on the other hand, will receive the full
punishment, since he did not help the police, and there is sufficient proof. If both betray, both
will be punished, but less severely than if they had refused to talk. The dilemma resides in the
fact that each prisoner has a choice between only two options, but cannot make a good decision
without knowing what the other one will do.
Such a distribution of losses and gains seems natural for many situations, since the cooperator
whose action is not returned will lose resources to the defector, without either of them being able
to collect the additional gain coming from the "synergy" of their cooperation. For simplicity we
might consider the Prisoner's dilemma as zero-sum insofar as there is no mutual cooperation:
either each gets 0 when both defect, or when one of them cooperates, the defector gets + 10, and
the cooperator - 10, in total 0. On the other hand, if both cooperate the resulting synergy creates
an additional gain that makes the sum positive: each of them gets 5, in total 10.
The gain for mutual cooperation (5) in the prisoner's dilemma is kept smaller than the gain for
one-sided defection (10), so that there would always be a "temptation" to defect. This assumption
is not generally valid. For example, it is easy to imagine that two wolves together would be able
to kill an animal that is more than twice as large as the largest one each of them might have
killed on his own. Even if an altruistic wolf would kill a rabbit and give it to another wolf, and
the other wolf would do nothing in return, the selfish wolf would still have less to eat than if he
had helped his companion to kill a deer. Yet we will assume that the synergistic effect is smaller
than the gains made by defection (i.e. letting someone help you without doing anything in
return).
This is realistic if we take into account the fact that the synergy usually only gets its full power
after a long term process of mutual cooperation (hunting a deer is a quite time-consuming and
complicated business). The prisoner's dilemma is meant to study short term decision-making
where the actors do not have any specific expectations about future interactions or collaborations
(as is the case in the original situation of the jailed criminals). This is the normal situation during
blind-variation-and-selective-retention evolution. Long term cooperations can only evolve after
short term ones have been selected: evolution is cumulative, adding small improvements upon
small improvements, but without blindly making major jumps.
The problem with the prisoner's dilemma is that if both decision-makers were purely rational,
they would never cooperate. Indeed, rational decision-making means that you make the decision
which is best for you whatever the other actor chooses. Suppose the other one would defect, then
it is rational to defect yourself: you won't gain anything, but if you do not defect you will be
stuck with a -10 loss. Suppose the other one would cooperate, then you will gain anyway, but
you will gain more if you do not cooperate, so here too the rational choice is to defect. The
problem is that if both actors are rational, both will decide to defect, and none of them will gain
anything. However, if both would "irrationally" decide to cooperate, both would gain 5 points.
This seeming paradox can be formulated more explicitly through the principle of
suboptimization.
Pareto efficiency
Pareto efficiency, or Pareto optimality, is an important concept in economics with broad
applications in game theory, engineering and the social sciences. The term is named after
Vilfredo Pareto, an Italian economist who used the concept in his studies of economic efficiency
and income distribution. Informally, Pareto efficient situations are those in which any change to
make any person better off would make someone else worse off.
Given a set of alternative allocations of, say, goods or income for a set of individuals, a change
from one allocation to another that can make at least one individual better off without making
any other individual worse off is called a Pareto improvement. An allocation is defined as
Pareto efficient or Pareto optimal when no further Pareto improvements can be made. Such an
allocation is often called a strong Pareto optimum (SPO) by way of setting it apart from mere
"weak Pareto optima" as defined below.
Formally, a (strong/weak) Pareto optimum is a maximal element for the partial order relation of
Pareto improvement/strict Pareto improvement: it is an allocation such that no other allocation is
"better" in the sense of the order relation.
A common criticism of a state of Pareto efficiency is that it does not necessarily result in a
socially desirable distribution of resources, as it makes no statement about equality or the overall
well-being of a society.
The problem of suboptimization

Optimizing the outcome for a subsystem will in general not optimize the outcome for
the system as a whole. This intrinsic difficulty may degenerate into the "tragedy of the
commons": the exhaustion of shared resources because of competition between the
subsystems.
When you try to optimize the global outcome for a system consisting of distinct subsystems (e.g.
maximizing the amount of prey hunted for a pack of wolves, or minimizing the total punishment for the
system consisting of the two prisoners in the Prisoners' Dilemma game), you might try to do this by
optimizing the result for each of the subsystems separately. This is called "suboptimization". The
principle of suboptimization states that suboptimization in general does not lead to global optimization.
Indeed, the optimization for each of the wolves separately is to let the others do the hunting, and then
come to eat from their captures. Yet if all wolves would act like that, no prey would ever be captured
and all wolves would starve. Similarly, the suboptimization for each of the prisoners separately is to
betray the other one, but this leads to both of them being punished rather severely, whereas they might
have escaped with a mild punishment if they had stayed silent.
The principle of suboptimization can be derived from the more basic systemic principle stating
that "the whole is more than the sum of its parts". If the system (e.g. the wolf pack) would be a
simple sum or "aggregate" of its parts, then the outcome for the system as a whole (total prey
killed) would be a sum of the outcomes for the parts (prey killed by each wolf separately), but
that is clearly not the case when there is interaction (and in particular cooperation) between the
parts. Indeed, a pack of wolves together can kill animals (e.g. a moose or a deer), that are too big
to be killed by any wolf in separation. Another way of expressing this aspect of "non-linearity" is
to say that the interaction the different wolves are engaged in is a non zero-sum game, that is, the
sum of resources that can be gained is not constant, and depends on the specific interactions
between the wolves.
As a last example, suppose you want to buy a new car, and you have the choice between a
normal model, and a model with a catalyser, that strongly reduces the poisonous substances in
the exhaust. The model with catalyser is definitely more expensive, but the advantage for you is
minimal since the pollution from your exhaust is diffused in the air and you yourself will never
be able to distinguish any effect on your health of pollution coming from your own car. Rational
or optimizing decision-making from your part would lead you to buy the car without catalyser.
However, if everybody would make that choice, the total amount of pollution produced would
have an effect on everybody's health, including your own, that will be very serious, and certainly
worthy the relatively small investment of buying a catalyser. The suboptimizing decision (no
catalyser) is inconsistent with the globally optimizing one (everybody a catalyser). The reason is
that there is interaction between the different subsystems (owners and their cars), since
everybody inhales the pollutants produced by everybody. Hence, there is also an interaction
between the decision problems of each of the subsystems, and the combination of the optimal
decisions for each of the subproblems will be different from the optimal decision for the global
problem.
The problem of suboptimization underlies most of the problems appearing in evolutionary ethics.
Indeed, ethics tries to achieve the "greatest good for the greatest number", but the greatest good
(optimal outcome) for an individual is in general different from the greatest group for a system
(e.g. society) of individuals.
Another, more dramatic implication of the problem of suboptimization is what Garrett Hardin
has called the " tragedy of the commons". The example is simple: imagine a group of shepherds
who let their animals graze on a common pasture. Each animal that is added will bring additional
profits to its shepherd. However, it will also diminish the overall profits of the group, since the
grass eaten by that animal will no longer be available to the other animals. Yet, the loss of profit
for the owner because of reduced grass will always be smaller than the gain because of an
additional animal. Thus, for each individual shepherd, the optimal decision is to increase his
herd. For the system consisting of all shepherds together, however, this strategy will result in an
overgrazing of the pasture and the eventual exhaustion of the common resource.

Prisoner's Dilemma

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Prisoner's Dilemma

Uploaded by

Copyright:

Available Formats

Prisoner's dilemma

In its classical form, the prisoner's dilemma ("PD") is presented as follows:

Strategy for the classical prisoner's dilemma

Prisoner B Stays Silent Prisoner B Betrays

Prisoner A: goes free

The Prisoner's Dilemma

The Prisoners' Dilemma

Cooperate Fairly good [+ 5] Bad [ - 10]

Defect Good [+ 10] Mediocre [0]

The problem of suboptimization

You might also like