Lecture On Game Theory IITM Part4

EE6417: Incentive-Centered Design
Lecture 08
Bharadwaj Satchidanandan
Department of Electrical Engineering

Indian Institute of Technology Madras
Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 1 / 22

A Slice of History (The 70s)
Once CE was introduced, it was quickly recognized that

▶ all convex combinations of NE were CE, and
▶ there in general are CEs outside the convex hull of NEs.
The second observation implies that there could be CEs that
Pareto-dominate all NEs. Aumann himself provides examples of such
CEs in his paper.
It was also recognized that for two-player zero-sum games, no CE
could Pareto-dominate the NEs.
Following this, a question of interest was to characterize the set of all
games such that their NE cannot be “improved upon” by a CE.

To address this problem, Moulin introduced

Coarse Correlated Equilibrium — a
generalization of CE — as an intermediate
theoretical tool.
His idea was to show that if NE cannot be
improved upon by a CCE, then it cannot be
improved upon by a CE either. Showing that
NE cannot be improved upon by a CCE was
mathematically easier.
Using this trick, he characterized games whose
MNEs cannot be improved upon by any CE (or
even any CCE for that matter). These are
known as strategically zero-sum games.
Herve Moulin
Today, CCEs are studied for an altogether
different reason. They emerge as limits of
no-regret learning rules (more on this in Part II
of the course).

Coarse Correlated Equilibrium
The Setup
Consider an n−player game in normal form and denote by Si and ui the

strategy set and the utility function, respectively, of Player i.
There is a central coordinator who can recommend to each player the
strategy that she has to choose. Denote by Sbi the strategy recommended
to Player i and by Sb = (Sb1 , . . . , Sbn ) the recommended strategy profile.
The recommendation Sb can, in general, be a random variable. I.e., the
coordinator can sample Sb at random from some joint distribution over the
players’ strategies.
Denote by p : S1 × . . . × Sn → [0, 1] the joint distribution so that
p(bs1 , . . . , sbn ) is the probability that the coordinator recommends the
strategy sb1 to Player 1, the strategy sb2 to Player 2, and so on.

The Definition
Definition
A joint distribution p over the players’ strategies is a Coarse Correlated
Equilibrium (CCE) if for every i and every si ∈ Si ,

E ui (Sbi , Sb−i ) ≥ E ui (si , Sb−i ) , (1)
where Sb ∼ p and the expectations are taken with respect to p.
In words, if all other players play the coordinator’s recommendation, then a

priori, it is best for me as a player to also play the coordinator’s
recommendation.

An Example
Consider the game

 
(1, 1) (−1, −1) (0, 0)
(−1, −1) (1, 1) (0, 0)  .
(0, 0) (0, 0) (−1, −1)
The joint distribution

 
1/3 0 0
 0 1/3 0 
0 0 1/3
is a Coarse Correlated Equilibrium. However, it is not a Correlated

equilibrium.

Some intuition for CE and CCE
Imagine that you, as a player, are getting into a contract with a central
coordinator before the game begins.
The term of the contract is that you play the coordinator’s recommended
strategy.
The contract also specifies the joint distribution p from which the
coordinator would sample the players’ strategies.
You have reason to believe that all other players would obey the
coordinator’s recommendation.
The coordinator will give you her recommendation only if you sign the
contract. Hence, if you choose not to sign it, you can only play a strategy
that is independent of the coordinator’s recommendation.

The joint distribution p is a CCE if (and only if) you prefer signing the
contract to not signing it.
However, after signing the contract and looking at the coordinator’s

recommendation, you may want to break the contract. This is because
after looking at your recommendation, you may learn something about the
other players’ strategies leading to the realization that you could profitably
deviate from the coordinator’s recommendation.
For this reason, CCE is said to be a “non-self-enforcing” equilibrium.
The equilibrium can be implemented only if there are mechanisms
outside the game to enforce the contract (such as a legal system).

The joint distribution p is a CE if (and only if)

(i) you prefer signing the contract to not signing it, and
(ii) you would not want to break the contract even after observing the
coordinator’s recommendation, no matter what it is.
For this reason, CE is said to be a “self-enforcing” equilibrium. It is in the

players’ own best interests to obey the contract, and one doesn’t need to
rely on any external power to enforce it.
NE being a subset of CE is also self-enforcing.

Part II: Learning in Games
The Aim of the Field
Recall that the original goal of Von Neumann in developing Game theory
was to predict how strategic agents would behave in a given situation.
It is to answer this question that the notion of equilibrium came into
existence. Von Neumann, Nash, Aumann, etc. teach us that players play
NE, CE, CCE, or variants thereof, depending on the setting.
However, an unsettling aspect of the theory is the non-uniqueness of
equilibrium. Given that the may be multiple equilibria, which equilibrium
will players settle in?

The Aim of the Field
Two broad directions were pursued to answer this question:

(i) A theory of equilibrium selection that develops refined notions of NE,
(ii) Express equilibria as limits of some learning procedure of agents.
The latter direction leads to the rich theory of “Learning in Games.”

Not only can it answer which equilibrium the players would settle in,
but also whether at all they would converge to an equilibrium starting
from an arbitrary strategy profile, and if so, how fast.
Hence, the theory of learning in games would have a lot to say even if
the equilibrium were unique.

An Analogy
Consider a ball in an arbitrary terrain.
Where will the ball eventually end up?

The theory of statics answers this question. It defines the notion of an equilibrium
(stable/unstable/neutral/limit cycle) and says that the ball will eventually end up
at an equilibrium point.
There may be a multiplicity of equilibria (12 in the above example ignoring limit
cycles). While the theory of statics assures that the ball will end up at one of the
equilibrium points, it cannot say which one of those points the ball will end up at.

An Analogy
Such questions come under the purview of the theory of dynamics.
The theory of dynamics provides a methodology to compute how the state of a

system evolves over time, thereby answering the question of which equilibrium
point the system will eventually (as t → ∞) end up at.
Part I of the course can be thought of as the analog of the theory of statics. It
develops the notion of equilibrium for self-interested agents’ behaviors. If at all
the players’ behaviors converge, it is to an equilibrium of the game that they will
converge to.
Part II of the course can be thought of as the analog of dynamics. It gives a

fine-grained explanation of how players will adapt their behavior at each time,
thereby answering questions such as whether players’ behaviors will converge or
not, and if so, which equilibrium they will converge to.

George Brown, in 1951, introduced Fictitious Play as
a means to compute a Nash equilibrium.
Today, it is interpreted as one of the simplest

plausible learning rules that players in a repeated
game might employ to learn what the best strategy
to play is at every step.
Brown only proposed the algorithm; he did not

analyze it. Shortly after Brown proposed the
algorithm, Julia Robinson established the properties
of Fictitious Play and published her results in a
seminal paper in 1951. Her main result is that
Fictitious Play converges, in a certain sense, to a
Julia Robinson Nash equilibrium for all Zero-Sum Games.
Due to this important result, Fictitious Play is
also referred to as the Brown-Robinson
method.

Fictitious Play
The Setup
Two players repeatedly engage in a game.

Player 1 has m strategies available at his disposal and Player 2 has n
strategies available at her disposal.
Each player knows his/her payoff function but may or may not know
the payoff function of the other player.
Each player observes the strategy employed by the other player at the
end of each round.
Both players have perfect recall in that they remember the strategies
played in all previous rounds.

Fictitious Play in a Nutshell
Each player maintains a vector denoting the empirical distribution of the

other player’s strategies which is updated after every round.
At each round, each player pretends that the other player is playing a
mixed strategy specified by the empirical distribution and best responds to
that strategy.
Ties are broken arbitrarily.

The Algorithm
Algorithm Fictitious Play (Player 1’s Algorithm)
1: Initialize weights as w1 (i) ← 0 for all i ∈ {1, . . . , n}.
2: Play an arbitrary strategy a1 (1) at time 1.
3: for t ← 2, 3, . . . do
4: Observe strategy a2 (t − 1) played by Player 2 at time t − 1 and
update ∀i ∈ {1, . . . , n}
wt (i) ← wt−1 (i) + 1{i=a2 (t−1)} . (2)

wt
5: Set pt ← ||w t ||
.
6: Assume pt to be the mixed strategy that Player 2 plays at time t
and play any a1 (t) ∈ S1 which is a best response to pt . I.e.,
a1 (t) ∈ arg max u1 (a, pt ).

a∈S1
7: end for
An Example
Take the example of the repeated version of Matching Pennies.

At t = 1, both players play an arbitrary strategy. Say both play H.
With this initial condition, and assuming ties are broken in favor of H,
Fictitious Play will produce the strategy profile sequence
(H, H), (H, T ), (H, T ), (T , T ), (T , T ), (T , T ), (T , H), (T , H), . . .
The initial condition (H, T ) will lead to the play path
(H, T ), (T , T ), (T , H), (T , H), (H, H), (H, H), (H, H), (H, T ), . . .

Two Convergence Results
What can we say about Fictitious Play?
Theorem (Robinson, 1951)

If the stage game is zero-sum, then the empirical distribution of the
players’ strategies converges to a Nash equilibrium.
A decade later, Miyazawa showed a similar result by (i) removing the

restriction that the game be zero-sum, and (ii) adding the restriction that
each player has only two pure strategies.
Theorem (Miyazawa, 1961)

The empirical distribution of the players’ strategies in a Fictitious play
path converges to a Nash equilibrium for all 2 × 2 games.

Lecture On Game Theory IITM Part4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture On Game Theory IITM Part4

Uploaded by

Copyright:

Available Formats

EE6417: Incentive-Centered Design

Department of Electrical Engineering

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 1 / 22

Once CE was introduced, it was quickly recognized that

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 2 / 22

To address this problem, Moulin introduced

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 3 / 22

Consider an n−player game in normal form and denote by Si and ui the

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 5 / 22

where Sb ∼ p and the expectations are taken with respect to p.

In words, if all other players play the coordinator’s recommendation, then a

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 6 / 22

Consider the game

The joint distribution

is a Coarse Correlated Equilibrium. However, it is not a Correlated

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 7 / 22

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 8 / 22

However, after signing the contract and looking at the coordinator’s

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 9 / 22

The joint distribution p is a CE if (and only if)

For this reason, CE is said to be a “self-enforcing” equilibrium. It is in the

NE being a subset of CE is also self-enforcing.

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 10 / 22

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 12 / 22

Two broad directions were pursued to answer this question:

The latter direction leads to the rich theory of “Learning in Games.”

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 13 / 22

Where will the ball eventually end up?

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 14 / 22

Such questions come under the purview of the theory of dynamics.

The theory of dynamics provides a methodology to compute how the state of a

Part II of the course can be thought of as the analog of dynamics. It gives a

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 15 / 22

Today, it is interpreted as one of the simplest

Brown only proposed the algorithm; he did not

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 16 / 22

Two players repeatedly engage in a game.

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 18 / 22

Each player maintains a vector denoting the empirical distribution of the

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 19 / 22

wt (i) ← wt−1 (i) + 1{i=a2 (t−1)} . (2)

a1 (t) ∈ arg max u1 (a, pt ).

Take the example of the repeated version of Matching Pennies.

(H, H), (H, T ), (H, T ), (T , T ), (T , T ), (T , T ), (T , H), (T , H), . . .

The initial condition (H, T ) will lead to the play path

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 21 / 22

What can we say about Fictitious Play?

Theorem (Robinson, 1951)

A decade later, Miyazawa showed a similar result by (i) removing the

Theorem (Miyazawa, 1961)

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 22 / 22

You might also like