You are on page 1of 22

EE6417: Incentive-Centered Design

Lecture 08

Bharadwaj Satchidanandan

Department of Electrical Engineering


Indian Institute of Technology Madras

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 1 / 22


A Slice of History (The 70s)

Once CE was introduced, it was quickly recognized that


▶ all convex combinations of NE were CE, and
▶ there in general are CEs outside the convex hull of NEs.
The second observation implies that there could be CEs that
Pareto-dominate all NEs. Aumann himself provides examples of such
CEs in his paper.
It was also recognized that for two-player zero-sum games, no CE
could Pareto-dominate the NEs.
Following this, a question of interest was to characterize the set of all
games such that their NE cannot be “improved upon” by a CE.

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 2 / 22


A Slice of History (The 70s)

To address this problem, Moulin introduced


Coarse Correlated Equilibrium — a
generalization of CE — as an intermediate
theoretical tool.
His idea was to show that if NE cannot be
improved upon by a CCE, then it cannot be
improved upon by a CE either. Showing that
NE cannot be improved upon by a CCE was
mathematically easier.
Using this trick, he characterized games whose
MNEs cannot be improved upon by any CE (or
even any CCE for that matter). These are
known as strategically zero-sum games.
Herve Moulin
Today, CCEs are studied for an altogether
different reason. They emerge as limits of
no-regret learning rules (more on this in Part II
of the course).

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 3 / 22


Coarse Correlated Equilibrium
The Setup

Consider an n−player game in normal form and denote by Si and ui the


strategy set and the utility function, respectively, of Player i.
There is a central coordinator who can recommend to each player the
strategy that she has to choose. Denote by Sbi the strategy recommended
to Player i and by Sb = (Sb1 , . . . , Sbn ) the recommended strategy profile.
The recommendation Sb can, in general, be a random variable. I.e., the
coordinator can sample Sb at random from some joint distribution over the
players’ strategies.
Denote by p : S1 × . . . × Sn → [0, 1] the joint distribution so that
p(bs1 , . . . , sbn ) is the probability that the coordinator recommends the
strategy sb1 to Player 1, the strategy sb2 to Player 2, and so on.

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 5 / 22


The Definition

Definition
A joint distribution p over the players’ strategies is a Coarse Correlated
Equilibrium (CCE) if for every i and every si ∈ Si ,
   
E ui (Sbi , Sb−i ) ≥ E ui (si , Sb−i ) , (1)

where Sb ∼ p and the expectations are taken with respect to p.

In words, if all other players play the coordinator’s recommendation, then a


priori, it is best for me as a player to also play the coordinator’s
recommendation.

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 6 / 22


An Example

Consider the game


 
(1, 1) (−1, −1) (0, 0)
(−1, −1) (1, 1) (0, 0)  .
(0, 0) (0, 0) (−1, −1)

The joint distribution


 
1/3 0 0
 0 1/3 0 
0 0 1/3

is a Coarse Correlated Equilibrium. However, it is not a Correlated


equilibrium.

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 7 / 22


Some intuition for CE and CCE

Imagine that you, as a player, are getting into a contract with a central
coordinator before the game begins.

The term of the contract is that you play the coordinator’s recommended
strategy.

The contract also specifies the joint distribution p from which the
coordinator would sample the players’ strategies.

You have reason to believe that all other players would obey the
coordinator’s recommendation.

The coordinator will give you her recommendation only if you sign the
contract. Hence, if you choose not to sign it, you can only play a strategy
that is independent of the coordinator’s recommendation.

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 8 / 22


Some intuition for CE and CCE

The joint distribution p is a CCE if (and only if) you prefer signing the
contract to not signing it.

However, after signing the contract and looking at the coordinator’s


recommendation, you may want to break the contract. This is because
after looking at your recommendation, you may learn something about the
other players’ strategies leading to the realization that you could profitably
deviate from the coordinator’s recommendation.
For this reason, CCE is said to be a “non-self-enforcing” equilibrium.
The equilibrium can be implemented only if there are mechanisms
outside the game to enforce the contract (such as a legal system).

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 9 / 22


Some intuition for CE and CCE

The joint distribution p is a CE if (and only if)


(i) you prefer signing the contract to not signing it, and
(ii) you would not want to break the contract even after observing the
coordinator’s recommendation, no matter what it is.

For this reason, CE is said to be a “self-enforcing” equilibrium. It is in the


players’ own best interests to obey the contract, and one doesn’t need to
rely on any external power to enforce it.

NE being a subset of CE is also self-enforcing.

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 10 / 22


Part II: Learning in Games
The Aim of the Field

Recall that the original goal of Von Neumann in developing Game theory
was to predict how strategic agents would behave in a given situation.
It is to answer this question that the notion of equilibrium came into
existence. Von Neumann, Nash, Aumann, etc. teach us that players play
NE, CE, CCE, or variants thereof, depending on the setting.
However, an unsettling aspect of the theory is the non-uniqueness of
equilibrium. Given that the may be multiple equilibria, which equilibrium
will players settle in?

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 12 / 22


The Aim of the Field

Two broad directions were pursued to answer this question:


(i) A theory of equilibrium selection that develops refined notions of NE,
(ii) Express equilibria as limits of some learning procedure of agents.

The latter direction leads to the rich theory of “Learning in Games.”


Not only can it answer which equilibrium the players would settle in,
but also whether at all they would converge to an equilibrium starting
from an arbitrary strategy profile, and if so, how fast.
Hence, the theory of learning in games would have a lot to say even if
the equilibrium were unique.

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 13 / 22


An Analogy
Consider a ball in an arbitrary terrain.

Where will the ball eventually end up?


The theory of statics answers this question. It defines the notion of an equilibrium
(stable/unstable/neutral/limit cycle) and says that the ball will eventually end up
at an equilibrium point.
There may be a multiplicity of equilibria (12 in the above example ignoring limit
cycles). While the theory of statics assures that the ball will end up at one of the
equilibrium points, it cannot say which one of those points the ball will end up at.

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 14 / 22


An Analogy

Such questions come under the purview of the theory of dynamics.

The theory of dynamics provides a methodology to compute how the state of a


system evolves over time, thereby answering the question of which equilibrium
point the system will eventually (as t → ∞) end up at.

Part I of the course can be thought of as the analog of the theory of statics. It
develops the notion of equilibrium for self-interested agents’ behaviors. If at all
the players’ behaviors converge, it is to an equilibrium of the game that they will
converge to.

Part II of the course can be thought of as the analog of dynamics. It gives a


fine-grained explanation of how players will adapt their behavior at each time,
thereby answering questions such as whether players’ behaviors will converge or
not, and if so, which equilibrium they will converge to.

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 15 / 22


A Slice of History (The 50s)
George Brown, in 1951, introduced Fictitious Play as
a means to compute a Nash equilibrium.

Today, it is interpreted as one of the simplest


plausible learning rules that players in a repeated
game might employ to learn what the best strategy
to play is at every step.

Brown only proposed the algorithm; he did not


analyze it. Shortly after Brown proposed the
algorithm, Julia Robinson established the properties
of Fictitious Play and published her results in a
seminal paper in 1951. Her main result is that
Fictitious Play converges, in a certain sense, to a
Julia Robinson Nash equilibrium for all Zero-Sum Games.
Due to this important result, Fictitious Play is
also referred to as the Brown-Robinson
method.

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 16 / 22


Fictitious Play
The Setup

Two players repeatedly engage in a game.


Player 1 has m strategies available at his disposal and Player 2 has n
strategies available at her disposal.
Each player knows his/her payoff function but may or may not know
the payoff function of the other player.
Each player observes the strategy employed by the other player at the
end of each round.
Both players have perfect recall in that they remember the strategies
played in all previous rounds.

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 18 / 22


Fictitious Play in a Nutshell

Each player maintains a vector denoting the empirical distribution of the


other player’s strategies which is updated after every round.

At each round, each player pretends that the other player is playing a
mixed strategy specified by the empirical distribution and best responds to
that strategy.
Ties are broken arbitrarily.

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 19 / 22


The Algorithm
Algorithm Fictitious Play (Player 1’s Algorithm)
1: Initialize weights as w1 (i) ← 0 for all i ∈ {1, . . . , n}.
2: Play an arbitrary strategy a1 (1) at time 1.
3: for t ← 2, 3, . . . do
4: Observe strategy a2 (t − 1) played by Player 2 at time t − 1 and
update ∀i ∈ {1, . . . , n}

wt (i) ← wt−1 (i) + 1{i=a2 (t−1)} . (2)


wt
5: Set pt ← ||w t ||
.
6: Assume pt to be the mixed strategy that Player 2 plays at time t
and play any a1 (t) ∈ S1 which is a best response to pt . I.e.,

a1 (t) ∈ arg max u1 (a, pt ).


a∈S1

7: end for
Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 20 / 22
An Example

Take the example of the repeated version of Matching Pennies.


At t = 1, both players play an arbitrary strategy. Say both play H.
With this initial condition, and assuming ties are broken in favor of H,
Fictitious Play will produce the strategy profile sequence

(H, H), (H, T ), (H, T ), (T , T ), (T , T ), (T , T ), (T , H), (T , H), . . .

The initial condition (H, T ) will lead to the play path

(H, T ), (T , T ), (T , H), (T , H), (H, H), (H, H), (H, H), (H, T ), . . .

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 21 / 22


Two Convergence Results

What can we say about Fictitious Play?

Theorem (Robinson, 1951)


If the stage game is zero-sum, then the empirical distribution of the
players’ strategies converges to a Nash equilibrium.

A decade later, Miyazawa showed a similar result by (i) removing the


restriction that the game be zero-sum, and (ii) adding the restriction that
each player has only two pure strategies.

Theorem (Miyazawa, 1961)


The empirical distribution of the players’ strategies in a Fictitious play
path converges to a Nash equilibrium for all 2 × 2 games.

Bharadwaj Satchidanandan EE6417: Incentive-Centered Design Lecture 08 22 / 22

You might also like