You are on page 1of 15

Game Theory Week #9

Outline:

Repeated prisoners dilemma


Simultaneous moves & infinite horizons
Discounted average payoffs
Best responses & NE
Feasible payoffs
Minimax payoff
Nash folk theorems
Repeated Prisoners dilemma

C D
C 2,2 0,3
D 3,0 1,1
Suppose game is repeated N times
Define: Payoff of trajectory to be the sum of stage payoffs
Example (N = 3): Action sequence {(C, C), (C, D), (D, D)} results in payoffs of

(2 + 0 + 1, 2 + 3 + 1)

for Players 1 & 2, respectively.


This looks like an extensive form game, except with simultaneous moves
Redefine:
(Sub)history, h: Sequence of joint moves
Player function, P (h): Set of players whose turn follows subhistory h
As before, a strategy for player i is an action selection for every subhistory in which it is
player is turn.
Example (N = 3): A strategy for either player must specify an action in each of the
following subhistories:

, {(C, C)} , {(C, D)} , {(D, C)} , {(D, D)} ,


{(C, C), (C, C)} , {(C, C), (C, D)} , {(C, C), (D, C)} , ..., {(D, D), (D, D)}

for a total of
1 + 4 + 16 = 21
subhistories.

1
Subgame perfect equilibria and backwards induction

Can extend same concepts to simultaneous moves:


Subgame perfect equilibrium: Strategy that is a NE for all subgames
Backwards induction: Same as before, except...there need not be an optimal action
at subhistory h. Rather, one might need to specify a one-shot NE.
Example: Repeated BoS
What is subgame perfect equilibrium in repeated Prisoners Dilemma?
(final) Stage N : (D, D), regardless of history.
Stage N 1: No future consequences, so (D, D), regardless of history.
..
.

Stage 1: No future consequences, so (D, D).


How can cooperation ever be justified?

2
Infinite horizon and discounting

The concept of extensive form games generalizes to infinite horizons:


A terminal history is an infinite sequence of actions, e.g.,

{(C, C), (C, C), (C, C), (D, C), (D, D), (D, D), ...}

Must now specify preferences over infinite terminal histories


Discounted averages:
Stages: t = 1, 2, 3, ...
Notation:
a(t): the joint profile at stage t
ui (a(t)): the stage game payoff
Ui (a()): the repeated game payoff
For < 1, define the discounted average payoff to Player i as
 
2
Ui (a()) = (1 ) ui (a(1)) + ui (a(2)) + ui (a(3)) + ...

X
= (1 ) t1 ui (a(t))
t=1

The (1 ) factor is for normalization. Recall



X 1
t1 =
t=1
1

Aside: Alternative (and equivalent) interpretation of discounting is that game ends after
every turn with probability 1 .

3
Infinite horizon strategies, best response, and NE

Must now specify strategies for infinite set of subhistories


Repeated Prisoners Dilemma examples:
Always cooperate: ai (t) = C
Always defect: ai (t) = D
Grim Trigger: Cooperate until opponent defects.
(
C ai ( ) 6= D, 0 t 1;
ai (t) =
D otherwise

Tit-for-Tat:
ai (t) = ai (t 1)
Limited Punishment:
Play C unless....
If opponent ever plays D, then play D for k-turns, and revert to C afterwards
As before, a strategy profile (s1 , s2 ) is a NE if each strategy is a best response to the
other.
Simple example: Best response to always C is always D.

4
Best response to Grim Trigger

Assume Player 2 uses Grim Trigger.


If Player 1 uses Grim Trigger, resulting terminal history is

{(C, C), (C, C), (C, C), ...}

and resulting discounted average payoff is 2


If Player 1 uses any strategy that results in D at period T , resulting payoff stream is:

{2, 2, ..., 2, 3, u1 (a(T + 1)), u1 (a(T + 2)), u1 (a(T + 3)), ...}

where
u1 (a( )) 1, for all T + 1
therefore, player 1 is forced to play

a1 ( ) = D, for all T + 1

Compare before & after:

{2, 2, ..., 2, 2, 2, 2, 2, ...}


vs
{2, 2, ..., 2, 3, 1, 1, 1, ...}

Fact: We can analyze difference as though deviation occurred at T = 1 (why?)


Inspect discounted average:
 

(1 ) (3 + + 2 + ...) = (1 ) 3 + 2?
1
1
Conclusion: Grim Trigger is best response to Grim Trigger for 2

5
Best response to Limited Punishment

Assume punishment for k periods


Inspect discounted payoffs over window of defect & punishment
Again, can assume window occurs at T = 1:

(1 ) 2, 2, 2 2..., k 2


vs
(1 ) 3, , 2 , ..., k


or
2(1 k+1 ) vs 2(1 ) + (1 k+1 )

Limited Punishment is best response to Limited Punishment if:


For k = 2: Require 0.62
For k = 3: Require 0.55

6
Best response to Tit-for-Tat

Inspect payoff stream over window of {D, D, ..., C}


{3, 0}
{3, 1, 0}
{3, 1, 1, 0}
.
..
{3, 1, 1, ...} (infinite)
End of window results in same state as start of window, so if defecting is profitable, then
one should repeat.
In each case, must compare payoff to

{2, 2, ..., 2}

including effects of discounting


1
In each case, 2 is threshold.

7
Feasible discounted average payoffs

We have seen that a total discounted average payoff of (2, 2) can be supported by a NE
(for suitable )
A total discounted payoff of (1, 1) also can be supported by a NE (both players always
defect)
Conclusion: Set of NE payoffs of repeated game with discounting is richer than one shot
game
Next questions:
What is the complete set of repeated game discounted average payoffs?
What is the complete set of NE payoffs?

8
Feasible payoffs, cont

Detour: What is discounted average of periodic sequence, e.g., {a, b, c, a, b, c, ...}

(1 )(a + b + 2 c + 3 a + 4 b + 5 c + ...)
= (1 )(a(1 + 3 + 6 + ...) + b(1 + 3 + 6 + ...) + 2 c(1 + 3 + 6 + ...))
1
= (a + b + 2 c)
1 3
Fact:
1 1
lim =
1 1 3 3
Therefore,
1 a+b+c
lim(a + b + 2 c) 3
=
1 1 3
Result generalizes to other high order periodic sequences
Fact: Set of feasible repeated game payoffs is weighted average of one-shot game
payoffs
Repeated prisoners dilemma: Take weighted average of

{(0, 3), (3, 0), (1, 1), (2, 2)}

(0,3)
Player 2 payoff

(2,2)

(1,1)
(3,0)

Player 1 payoff

9
Lower bound on achievable payoff

Repeated prisoners dilemma: A player can guarantee a payoff of at least 1 by defecting


every day.
Implication: The payoff in a NE of the repeated game must exceed 1.
This restricts the set of rational feasible payoffs in the repeated game:

(0,3)
Player 2 payoff

(2,2)

(1,1)
(3,0)

Player 1 payoff

10
Greedy strategy

Notation:
ai (t) = si (h(t)): The action of player i at time t, ai (t), is a function of the repeated
game strategy, si (), evaluated at the history at time t, h(t).
bi (ai ) the stage game best response function
Bi (si ) the repeated game best response function
Minimax payoff:
Suppose opponents commit to strategy si .
What is guaranteed payoff of si = Bi (si )?
Lower bound: Define the greedy response, Gi (si ):

ai (t) = bi (si (h(t)))

i.e., at stage t, play the stage game best response to ai (t)

Ui (Bi (si ), si ) Ui (Gi (si ), si )

i.e., the repeated game payoff using a repeated game best response is at least the
repeated game payoff using a day-by-day greedy response.
Why arent these equal?!

11
Minimax payoff

Define minimax payoff:


`i = min max ui (ai , ai )
ai ai

Interpretation: If player i is playing a greedy best response, the payoff is at least `i


This is not same (over pure strategies) as security strategy payoff:

v = max min ui (ai , ai )


ai ai

From previous discussion,

Ui (Bi (si ), si ) Ui (Gi (si ), si ) `i

Recap: Discounted average payoffs at a NE of a repeated game


Must be feasible
Must exceed minimax payoffs

12
Nash folk theorem

Nash folk theorem: Let (w1 , w2 ) be payoff levels that are i) feasible and ii) at least
minimax levels (`1 , `2 ). Then for sufficiently close to 1, there exists a NE with payoffs
(w1 , w2 ).
This is the converse statement from what was on the previous slide.
Statement holds for more than 2 players.
Idea of proof:
Both players agree on an action path that leads to (w1 , w2 ).
Apply grim trigger if any player deviates from agreed upon path.
Best response to grim trigger is to stick to path

13
Subgame perfect?

Interpretation of subgame perfect: Tail of strategy is a best response to tail of strategy


for any subhistory...even impossible subhistories
GT vs GT forms a Nash equilibrium that supports cooperation. Is it subgame perfect?
(Or is GT a credible threat?)
Consider subhistory of:
{(C, C), (C, C), (C, D), ...}
Player 1 will play D
Player 2 will continue to play GT
Player 1 is not playing a best response to Player 2
New approach: Double Grim Trigger (DGT)
Start off playing C and continue to play C until...
Either player plays D (including self) then play D always
Fact:
DGT is a best response to DGT for sufficiently large
DGT is subgame perfect
What is response to previous subhistory example?
Fact: Nash folk theorem holds for subgame perfect equilibria

14

You might also like