Game Theory Week #9

Game Theory Week #9
Outline:
Repeated prisoners dilemma

Simultaneous moves & infinite horizons
Discounted average payoffs
Best responses & NE
Feasible payoffs
Minimax payoff
Nash folk theorems
Repeated Prisoners dilemma
C D
C 2,2 0,3
D 3,0 1,1
Suppose game is repeated N times
Define: Payoff of trajectory to be the sum of stage payoffs
Example (N = 3): Action sequence {(C, C), (C, D), (D, D)} results in payoffs of
(2 + 0 + 1, 2 + 3 + 1)
for Players 1 & 2, respectively.

This looks like an extensive form game, except with simultaneous moves
Redefine:
(Sub)history, h: Sequence of joint moves
Player function, P (h): Set of players whose turn follows subhistory h
As before, a strategy for player i is an action selection for every subhistory in which it is
player is turn.
Example (N = 3): A strategy for either player must specify an action in each of the
following subhistories:
, {(C, C)} , {(C, D)} , {(D, C)} , {(D, D)} ,

{(C, C), (C, C)} , {(C, C), (C, D)} , {(C, C), (D, C)} , ..., {(D, D), (D, D)}
for a total of
1 + 4 + 16 = 21
subhistories.
1
Subgame perfect equilibria and backwards induction
Can extend same concepts to simultaneous moves:

Subgame perfect equilibrium: Strategy that is a NE for all subgames
Backwards induction: Same as before, except...there need not be an optimal action
at subhistory h. Rather, one might need to specify a one-shot NE.
Example: Repeated BoS
What is subgame perfect equilibrium in repeated Prisoners Dilemma?
(final) Stage N : (D, D), regardless of history.
Stage N 1: No future consequences, so (D, D), regardless of history.
..
.
Stage 1: No future consequences, so (D, D).

How can cooperation ever be justified?
2
Infinite horizon and discounting
The concept of extensive form games generalizes to infinite horizons:

A terminal history is an infinite sequence of actions, e.g.,
{(C, C), (C, C), (C, C), (D, C), (D, D), (D, D), ...}
Must now specify preferences over infinite terminal histories

Discounted averages:
Stages: t = 1, 2, 3, ...
Notation:
a(t): the joint profile at stage t
ui (a(t)): the stage game payoff
Ui (a()): the repeated game payoff
For < 1, define the discounted average payoff to Player i as

2
Ui (a()) = (1 ) ui (a(1)) + ui (a(2)) + ui (a(3)) + ...

X
= (1 ) t1 ui (a(t))
t=1
The (1 ) factor is for normalization. Recall

X 1
t1 =
t=1
1
Aside: Alternative (and equivalent) interpretation of discounting is that game ends after
every turn with probability 1 .
3
Infinite horizon strategies, best response, and NE
Must now specify strategies for infinite set of subhistories

Repeated Prisoners Dilemma examples:
Always cooperate: ai (t) = C
Always defect: ai (t) = D
Grim Trigger: Cooperate until opponent defects.
(
C ai ( ) 6= D, 0 t 1;
ai (t) =
D otherwise
Tit-for-Tat:
ai (t) = ai (t 1)
Limited Punishment:
Play C unless....
If opponent ever plays D, then play D for k-turns, and revert to C afterwards
As before, a strategy profile (s1 , s2 ) is a NE if each strategy is a best response to the
other.
Simple example: Best response to always C is always D.
4
Best response to Grim Trigger
Assume Player 2 uses Grim Trigger.

If Player 1 uses Grim Trigger, resulting terminal history is
{(C, C), (C, C), (C, C), ...}
and resulting discounted average payoff is 2

If Player 1 uses any strategy that results in D at period T , resulting payoff stream is:
{2, 2, ..., 2, 3, u1 (a(T + 1)), u1 (a(T + 2)), u1 (a(T + 3)), ...}
where
u1 (a( )) 1, for all T + 1
therefore, player 1 is forced to play
a1 ( ) = D, for all T + 1
Compare before & after:
{2, 2, ..., 2, 2, 2, 2, 2, ...}

vs
{2, 2, ..., 2, 3, 1, 1, 1, ...}
Fact: We can analyze difference as though deviation occurred at T = 1 (why?)

Inspect discounted average:

(1 ) (3 + + 2 + ...) = (1 ) 3 + 2?
1
1
Conclusion: Grim Trigger is best response to Grim Trigger for 2
5
Best response to Limited Punishment
Assume punishment for k periods

Inspect discounted payoffs over window of defect & punishment
Again, can assume window occurs at T = 1:
(1 ) 2, 2, 2 2..., k 2

vs
(1 ) 3, , 2 , ..., k

or
2(1 k+1 ) vs 2(1 ) + (1 k+1 )
Limited Punishment is best response to Limited Punishment if:

For k = 2: Require 0.62
For k = 3: Require 0.55
6
Best response to Tit-for-Tat
Inspect payoff stream over window of {D, D, ..., C}

{3, 0}
{3, 1, 0}
{3, 1, 1, 0}
.
..
{3, 1, 1, ...} (infinite)
End of window results in same state as start of window, so if defecting is profitable, then
one should repeat.
In each case, must compare payoff to
{2, 2, ..., 2}
including effects of discounting

1
In each case, 2 is threshold.
7
Feasible discounted average payoffs
We have seen that a total discounted average payoff of (2, 2) can be supported by a NE
(for suitable )
A total discounted payoff of (1, 1) also can be supported by a NE (both players always
defect)
Conclusion: Set of NE payoffs of repeated game with discounting is richer than one shot
game
Next questions:
What is the complete set of repeated game discounted average payoffs?
What is the complete set of NE payoffs?
8
Feasible payoffs, cont
Detour: What is discounted average of periodic sequence, e.g., {a, b, c, a, b, c, ...}
(1 )(a + b + 2 c + 3 a + 4 b + 5 c + ...)
= (1 )(a(1 + 3 + 6 + ...) + b(1 + 3 + 6 + ...) + 2 c(1 + 3 + 6 + ...))
1
= (a + b + 2 c)
1 3
Fact:
1 1
lim =
1 1 3 3
Therefore,
1 a+b+c
lim(a + b + 2 c) 3
=
1 1 3
Result generalizes to other high order periodic sequences
Fact: Set of feasible repeated game payoffs is weighted average of one-shot game
payoffs
Repeated prisoners dilemma: Take weighted average of
{(0, 3), (3, 0), (1, 1), (2, 2)}
(0,3)
Player 2 payoff
(2,2)
(1,1)
(3,0)
Player 1 payoff
9
Lower bound on achievable payoff
Repeated prisoners dilemma: A player can guarantee a payoff of at least 1 by defecting

every day.
Implication: The payoff in a NE of the repeated game must exceed 1.
This restricts the set of rational feasible payoffs in the repeated game:
(0,3)
Player 2 payoff
(2,2)
(1,1)
(3,0)
Player 1 payoff
10
Greedy strategy
Notation:
ai (t) = si (h(t)): The action of player i at time t, ai (t), is a function of the repeated
game strategy, si (), evaluated at the history at time t, h(t).
bi (ai ) the stage game best response function
Bi (si ) the repeated game best response function
Minimax payoff:
Suppose opponents commit to strategy si .
What is guaranteed payoff of si = Bi (si )?
Lower bound: Define the greedy response, Gi (si ):
ai (t) = bi (si (h(t)))
i.e., at stage t, play the stage game best response to ai (t)
Ui (Bi (si ), si ) Ui (Gi (si ), si )
i.e., the repeated game payoff using a repeated game best response is at least the
repeated game payoff using a day-by-day greedy response.
Why arent these equal?!
11
Minimax payoff
Define minimax payoff:

ì = min max ui (ai , ai )
ai ai
Interpretation: If player i is playing a greedy best response, the payoff is at least ì

This is not same (over pure strategies) as security strategy payoff:
v = max min ui (ai , ai )

ai ai
From previous discussion,
Ui (Bi (si ), si ) Ui (Gi (si ), si ) ì
Recap: Discounted average payoffs at a NE of a repeated game

Must be feasible
Must exceed minimax payoffs
12
Nash folk theorem
Nash folk theorem: Let (w1 , w2 ) be payoff levels that are i) feasible and ii) at least
minimax levels (`1 , `2 ). Then for sufficiently close to 1, there exists a NE with payoffs
(w1 , w2 ).
This is the converse statement from what was on the previous slide.
Statement holds for more than 2 players.
Idea of proof:
Both players agree on an action path that leads to (w1 , w2 ).
Apply grim trigger if any player deviates from agreed upon path.
Best response to grim trigger is to stick to path
13
Subgame perfect?
Interpretation of subgame perfect: Tail of strategy is a best response to tail of strategy

for any subhistory...even impossible subhistories
GT vs GT forms a Nash equilibrium that supports cooperation. Is it subgame perfect?
(Or is GT a credible threat?)
Consider subhistory of:
{(C, C), (C, C), (C, D), ...}
Player 1 will play D
Player 2 will continue to play GT
Player 1 is not playing a best response to Player 2
New approach: Double Grim Trigger (DGT)
Start off playing C and continue to play C until...
Either player plays D (including self) then play D always
Fact:
DGT is a best response to DGT for sufficiently large
DGT is subgame perfect
What is response to previous subhistory example?
Fact: Nash folk theorem holds for subgame perfect equilibria
14

Game Theory Week #9

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Game Theory Week #9

Uploaded by

Copyright:

Available Formats

Game Theory Week #9

Repeated prisoners dilemma

for Players 1 & 2, respectively.

, {(C, C)} , {(C, D)} , {(D, C)} , {(D, D)} ,

Can extend same concepts to simultaneous moves:

Stage 1: No future consequences, so (D, D).

The concept of extensive form games generalizes to infinite horizons:

Must now specify preferences over infinite terminal histories

The (1 ) factor is for normalization. Recall

Must now specify strategies for infinite set of subhistories

Assume Player 2 uses Grim Trigger.

{(C, C), (C, C), (C, C), ...}

and resulting discounted average payoff is 2

{2, 2, ..., 2, 3, u1 (a(T + 1)), u1 (a(T + 2)), u1 (a(T + 3)), ...}

Compare before & after:

{2, 2, ..., 2, 2, 2, 2, 2, ...}

Fact: We can analyze difference as though deviation occurred at T = 1 (why?)

Assume punishment for k periods

Limited Punishment is best response to Limited Punishment if:

Inspect payoff stream over window of {D, D, ..., C}

including effects of discounting

Detour: What is discounted average of periodic sequence, e.g., {a, b, c, a, b, c, ...}

{(0, 3), (3, 0), (1, 1), (2, 2)}

Repeated prisoners dilemma: A player can guarantee a payoff of at least 1 by defecting

ai (t) = bi (si (h(t)))

i.e., at stage t, play the stage game best response to ai (t)

Ui (Bi (si ), si ) Ui (Gi (si ), si )

Define minimax payoff:

Interpretation: If player i is playing a greedy best response, the payoff is at least `i

v = max min ui (ai , ai )

From previous discussion,

Ui (Bi (si ), si ) Ui (Gi (si ), si ) `i

Recap: Discounted average payoffs at a NE of a repeated game

Interpretation of subgame perfect: Tail of strategy is a best response to tail of strategy

You might also like