This action might not be possible to undo. Are you sure you want to continue?
An Introduction to Dynamic Games
A. Haurie J. Krawczyk
March 28, 2000
2
Contents
1 Foreword 9
1.1 What are Dynamic Games? . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Origins of these Lecture Notes . . . . . . . . . . . . . . . . . . . . . 9
1.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
I Elements of Classical Game Theory 13
2 Decision Analysis with Many Agents 15
2.1 The Basic Concepts of Game Theory . . . . . . . . . . . . . . . . . . 15
2.2 Games in Extensive Form . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.1 Description of moves, information and randomness . . . . . . 16
2.2.2 Comparing Random Perspectives . . . . . . . . . . . . . . . 18
2.3 Additional concepts about information . . . . . . . . . . . . . . . . . 20
2.3.1 Complete and perfect information . . . . . . . . . . . . . . . 20
2.3.2 Commitment . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.3 Binding agreement . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Games in Normal Form . . . . . . . . . . . . . . . . . . . . . . . . 21
3
4 CONTENTS
2.4.1 Playing games through strategies . . . . . . . . . . . . . . . . 21
2.4.2 From the extensive form to the strategic or normal form . . . 22
2.4.3 Mixed and Behavior Strategies . . . . . . . . . . . . . . . . . 24
3 Solution concepts for noncooperative games 27
3.1 introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Matrix Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.1 SaddlePoints . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.2 Mixed strategies . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.3 Algorithms for the Computation of SaddlePoints . . . . . . . 34
3.3 Bimatrix Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.1 Nash Equilibria . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3.2 Shortcommings of the Nash equilibrium concept . . . . . . . 38
3.3.3 Algorithms for the Computation of Nash Equilibria in Bima
trix Games . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4 Concave mPerson Games . . . . . . . . . . . . . . . . . . . . . . . 44
3.4.1 Existence of Coupled Equilibria . . . . . . . . . . . . . . . . 45
3.4.2 Normalized Equilibria . . . . . . . . . . . . . . . . . . . . . 47
3.4.3 Uniqueness of Equilibrium . . . . . . . . . . . . . . . . . . . 48
3.4.4 A numerical technique . . . . . . . . . . . . . . . . . . . . . 50
3.4.5 A variational inequality formulation . . . . . . . . . . . . . . 50
3.5 Cournot equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.5.1 The static Cournot model . . . . . . . . . . . . . . . . . . . . 51
CONTENTS 5
3.5.2 Formulation of a Cournot equilibrium as a nonlinear comple
mentarity problem . . . . . . . . . . . . . . . . . . . . . . . 52
3.5.3 Computing the solution of a classical Cournot model . . . . . 55
3.6 Correlated equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.6.1 Example of a game with correlated equlibria . . . . . . . . . 56
3.6.2 A general deﬁnition of correlated equilibria . . . . . . . . . . 59
3.7 Bayesian equilibrium with incomplete information . . . . . . . . . . 60
3.7.1 Example of a game with unknown type for a player . . . . . . 60
3.7.2 Reformulation as a game with imperfect information . . . . . 61
3.7.3 A general deﬁnition of Bayesian equilibria . . . . . . . . . . 63
3.8 Appendix on Kakutani Fixedpoint theorem . . . . . . . . . . . . . . 64
3.9 exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
II Repeated and sequential Games 67
4 Repeated games and memory strategies 69
4.1 Repeating a game in normal form . . . . . . . . . . . . . . . . . . . 70
4.1.1 Repeated bimatrix games . . . . . . . . . . . . . . . . . . . . 70
4.1.2 Repeated concave games . . . . . . . . . . . . . . . . . . . . 71
4.2 Folk theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.2.1 Repeated games played by automata . . . . . . . . . . . . . . 74
4.2.2 Minimax point . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.2.3 Set of outcomes dominating the minimax point . . . . . . . . 76
4.3 Collusive equilibrium in a repeated Cournot game . . . . . . . . . . . 77
6 CONTENTS
4.3.1 Finite vs inﬁnite horizon . . . . . . . . . . . . . . . . . . . . 79
4.3.2 A repeated stochastic Cournot game with discounting and im
perfect information . . . . . . . . . . . . . . . . . . . . . . . 80
4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5 Shapley’s Zero Sum Markov Game 83
5.1 Process and rewards dynamics . . . . . . . . . . . . . . . . . . . . . 83
5.2 Information structure and strategies . . . . . . . . . . . . . . . . . . 84
5.2.1 The extensive form of the game . . . . . . . . . . . . . . . . 84
5.2.2 Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.3 Shapley’sDenardo operator formalism . . . . . . . . . . . . . . . . . 86
5.3.1 Dynamic programming operators . . . . . . . . . . . . . . . 86
5.3.2 Existence of sequential saddle points . . . . . . . . . . . . . 87
6 Nonzerosum Markov and Sequential games 89
6.1 Sequential Game with Discrete state and action sets . . . . . . . . . . 89
6.1.1 Markov game dynamics . . . . . . . . . . . . . . . . . . . . 89
6.1.2 Markov strategies . . . . . . . . . . . . . . . . . . . . . . . . 90
6.1.3 FeedbackNash equilibrium . . . . . . . . . . . . . . . . . . 90
6.1.4 SobelWhitt operator formalism . . . . . . . . . . . . . . . . 90
6.1.5 Existence of Nashequilibria . . . . . . . . . . . . . . . . . . 91
6.2 Sequential Games on Borel Spaces . . . . . . . . . . . . . . . . . . . 92
6.2.1 Description of the game . . . . . . . . . . . . . . . . . . . . 92
6.2.2 Dynamic programming formalism . . . . . . . . . . . . . . . 92
CONTENTS 7
6.3 Application to a Stochastic Duopoloy Model . . . . . . . . . . . . . . 93
6.3.1 A stochastic repeated duopoly . . . . . . . . . . . . . . . . . 93
6.3.2 A class of trigger strategies based on a monitoring device . . . 94
6.3.3 Interpretation as a communication device . . . . . . . . . . . 97
III Differential games 99
7 Controlled dynamical systems 101
7.1 A capital accumulation process . . . . . . . . . . . . . . . . . . . . . 101
7.2 State equations for controlled dynamical systems . . . . . . . . . . . 102
7.2.1 Regularity conditions . . . . . . . . . . . . . . . . . . . . . . 102
7.2.2 The case of stationary systems . . . . . . . . . . . . . . . . . 102
7.2.3 The case of linear systems . . . . . . . . . . . . . . . . . . . 103
7.3 Feedback control and the stability issue . . . . . . . . . . . . . . . . 103
7.3.1 Feedback control of stationary linear systems . . . . . . . . . 104
7.3.2 stabilizing a linear system with a feedback control . . . . . . 104
7.4 Optimal control problems . . . . . . . . . . . . . . . . . . . . . . . . 104
7.5 A model of optimal capital accumulation . . . . . . . . . . . . . . . . 104
7.6 The optimal control paradigm . . . . . . . . . . . . . . . . . . . . . 105
7.7 The Euler equations and the Maximum principle . . . . . . . . . . . . 106
7.8 An economic interpretation of the Maximum Principle . . . . . . . . 108
7.9 Synthesis of the optimal control . . . . . . . . . . . . . . . . . . . . 109
7.10 Dynamic programming and the optimal feedback control . . . . . . . 109
8 CONTENTS
7.11 Competitive dynamical systems . . . . . . . . . . . . . . . . . . . . 110
7.12 Competition through capital accumulation . . . . . . . . . . . . . . . 110
7.13 Openloop differential games . . . . . . . . . . . . . . . . . . . . . . 110
7.13.1 Openloop information structure . . . . . . . . . . . . . . . . 110
7.13.2 An equilibrium principle . . . . . . . . . . . . . . . . . . . . 110
7.14 Feedback differential games . . . . . . . . . . . . . . . . . . . . . . 111
7.14.1 Feedback information structure . . . . . . . . . . . . . . . . 111
7.14.2 A veriﬁcation theorem . . . . . . . . . . . . . . . . . . . . . 111
7.15 Why are feedback Nash equilibria outcomes different from Openloop
Nash outcomes? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.16 The subgame perfectness issue . . . . . . . . . . . . . . . . . . . . . 111
7.17 Memory differential games . . . . . . . . . . . . . . . . . . . . . . . 111
7.18 Characterizing all the possible equilibria . . . . . . . . . . . . . . . . 111
IV A Differential Game Model 113
7.19 A Game of R&D Investment . . . . . . . . . . . . . . . . . . . . . . 115
7.19.1 Dynamics of R&D competition . . . . . . . . . . . . . . . . 115
7.19.2 Product Differentiation . . . . . . . . . . . . . . . . . . . . . 116
7.19.3 Economics of innovation . . . . . . . . . . . . . . . . . . . . 117
7.20 Information structure . . . . . . . . . . . . . . . . . . . . . . . . . . 118
7.20.1 State variables . . . . . . . . . . . . . . . . . . . . . . . . . 118
7.20.2 Piecewise openloop game. . . . . . . . . . . . . . . . . . . . 118
7.20.3 A Sequential Game Reformulation . . . . . . . . . . . . . . . 118
Chapter 1
Foreword
1.1 What are Dynamic Games?
Dynamic Games are mathematical models of the interaction between different agents
who are controlling a dynamical system. Such situations occur in many instances like
armed conﬂicts (e.g. duel between a bomber and a jet ﬁghter), economic competition
(e.g. investments in R&D for computer companies), parlor games (Chess, Bridge).
These examples concern dynamical systems since the actions of the agents (also called
players) inﬂuence the evolution over time of the state of a system (position and velocity
of aircraft, capital of knowhow for HiTech ﬁrms, positions of remaining pieces on a
chess board, etc). The difﬁculty in deciding what should be the behavior of these
agents stems from the fact that each action an agent takes at a given time will inﬂuence
the reaction of the opponent(s) at later time. These notes are intended to present the
basic concepts and models which have been proposed in the burgeoning literature on
game theory for a representation of these dynamic interactions.
1.2 Origins of these Lecture Notes
These notes are based on several courses on Dynamic Games taught by the authors,
in different universities or summer schools, to a variety of students in engineering,
economics and management science. The notes use also some documents prepared in
cooperation with other authors, in particular B. Tolwinski [Tolwinski, 1988].
These notes are written for control engineers, economists or management scien
tists interested in the analysis of multiagent optimization problems, with a particular
9
10 CHAPTER 1. FOREWORD
emphasis on the modeling of conﬂict situations. This means that the level of mathe
matics involved in the presentation will not go beyond what is expected to be known by
a student specializing in control engineering, quantitative economics or management
science. These notes are aimed at lastyear undergraduate, ﬁrst year graduate students.
The Control engineers will certainly observe that we present dynamic games as an
extension of optimal control whereas economists will see also that dynamic games are
only a particular aspect of the classical theory of games which is considered to have
been launched in [Von Neumann & Morgenstern 1944]. Economic models of imper
fect competition, presented as variations on the ”classic” Cournot model [Cournot, 1838],
will serve recurrently as an illustration of the concepts introduced and of the theories
developed. An interesting domain of application of dynamic games, which is described
in these notes, relates to environmental management. The conﬂict situations occur
ring in ﬁsheries exploitation by multiple agents or in policy coordination for achieving
global environmental control (e.g. in the control of a possible global warming effect)
are well captured in the realm of this theory.
The objects studied in this book will be dynamic. The term dynamic comes from
Greek dynasthai (which means to be able) and refers to phenomena which undergo
a timeevolution. In these notes, most of the dynamic models will be discrete time.
This implies that, for the mathematical description of the dynamics, difference (rather
than differential) equations will be used. That, in turn, should make a great part of the
notes accessible, and attractive, to students who have not done advanced mathematics.
However, there will still be some developments involving a continuous time description
of the dynamics and which have been written for readers with a stronger mathematical
background.
1.3 Motivation
There is no doubt that a course on dynamic games suitable for both control engineer
ing students and economics or management science students requires a specialized
textbook.
Since we emphasize the detailed description of the dynamics of some speciﬁc sys
tems controlled by the players we have to present rather sophisticated mathematical
notions, related to control theory. This presentation of the dynamics must be accom
panied by an introduction to the speciﬁc mathematical concepts of game theory. The
originality of our approach is in the mixing of these two branches of applied mathe
matics.
There are many good books on classical game theory. A nonexhaustive list in
1.3. MOTIVATION 11
cludes [Owen, 1982], [Shubik, 1975a], [Shubik, 1975b], [Aumann, 1989], and more
recently [Friedman 1986] and [Fudenberg & Tirole, 1991]. However, they do not in
troduce the reader to the most general dynamic games. [Bas¸ar & Olsder, 1982] does
cover extensively the dynamic game paradigms, however, readers without a strong
mathematical background will probably ﬁnd that book difﬁcult. This text is therefore
a modest attempt to bridge the gap.
12 CHAPTER 1. FOREWORD
Part I
Elements of Classical Game Theory
13
Chapter 2
Decision Analysis with Many Agents
As we said in the introduction to these notes dynamic games constitute a subclass
of the mathematical models studied in what is usually called the classical theory of
game. It is therefore proper to start our exposition with those basic concepts of game
theory which provide the fundamental tread of the theory of dynamic games. For
an exhaustive treatment of most of the deﬁnitions of classical game theory see e.g.
[Owen, 1982], [Shubik, 1975a], [Friedman 1986] and [Fudenberg & Tirole, 1991].
2.1 The Basic Concepts of Game Theory
In a game we deal with the following concepts
• Players. They will compete in the game. Notice that a player may be an indi
vidual, a set of individuals (or a team, a corporation, a political party, a nation,
a pilot of an aircraft, a captain of a submarine, etc. .
• A move or a decision will be a player’s action. Also, borrowing a term from
control theory, a move will be realization of a player’s control or, simply, his
control.
• A player’s (pure) strategy will be a rule (or function) that associates a player’s
move with the information available to him
1
at the time when he decides which
move to choose.
1
Political correctness promotes the usage of gender inclusive pronouns “they” and “their”. However,
in games, we will frequently have to address an individual player’s action and distinguish it from a
collective action taken by a set of several players. As far as we know, in English, this distinction is
only possible through usage of the traditional grammar gender exclusive pronouns: possessive “his”,
“her” and personal “he”, “she”. We ﬁnd that the traditional grammar better suits your purpose (to avoid)
15
16 CHAPTER 2. DECISION ANALYSIS WITH MANY AGENTS
• A player’s mixed strategy is a probability measure on the player’s space of pure
strategies. In other words, a mixed strategy consists of a random draw of a pure
strategy. The player controls the probabilities in this random experiment.
• A player’s behavioral strategy is a rule which deﬁnes a random draw of the ad
missible move as a function of the information available
2
. These strategies are
intimately linked with mixed strategies and it has been proved early [Kuhn, 1953]
that, for many games the two concepts coincide.
• Payoffs are real numbers measuring desirability of the possible outcomes of the
game, e.g. , the amounts of money the players may win (or loose). Other names
of payoffs can be: rewards, performance indices or criteria, utility measures,
etc. .
The concepts we have introduced above are described in relatively imprecise terms.
A more rigorous deﬁnition can be given if we set the theory in the realm of decision
analysis where decision trees give a representation of the dependence of outcomes on
actions and uncertainties . This will be called the extensive form of a game.
2.2 Games in Extensive Form
A game in extensive form is a graph (i.e. a set of nodes and a set of arcs) which has the
structure of a tree
3
and which represents the possible sequence of actions and random
perturbations which inﬂuence the outcome of a game played by a set of players.
2.2.1 Description of moves, information and randomness
A game in extensive form is described by a set of players, including one particular
player called Nature , and a set of positions described as nodes on a tree structure. At
each node one particular player has the right to move, i.e. he has to select a possible
action in an admissible set represented by the arcs emanating from the node.
The information at the disposal of each player at the nodes where he has to select
an action is described by the information structure of the game . In general the player
confusion and we will refer in this book to a singular genderless agent as “he” and the agent’s possession
as “his”.
2
A similar concept has been introduced in control theory under the name of relaxed controls.
3
A tree is a graph where all nodes are connected but there are no cycles. In a tree there is a single
node without ”parent”, called the ”root” and a set of nodes without descendants, the ”leaves”. There is
always a single path from the root to any leaf.
2.2. GAMES IN EXTENSIVE FORM 17
may not know exactly at which node of the tree structure the game is currently located.
His information has the following form:
he knows that the current position of the game is an element in a given
subset of nodes. He does not know which speciﬁc one it is.
When the player selects a move, this correponds to selecting an arc of the graph which
deﬁnes a transition to a new node, where another player has to select his move, etc.
Among the players, Nature is playing randomly, i.e. Nature’s moves are selected at
random. The game has a stopping rule described by terminal nodes of the tree. Then
the players are paid their rewards, also called payoffs .
Figure 2.1 shows the extensive form of a twoplayer, onestage stochastic game
with simultaneous moves. We also say that this game has the simultaneous move in
formation structure . It corresponds to a situation where Player 2 does not know which
action has been selected by Player 1 and vice versa. In this ﬁgure the node marked D
1
corresponds to the move of player 1, the nodes marked D
2
correspond to the move of
Player 2.
The information of the second player is represented by the oval box. Therefore
Player 2 does not know what has been the action chosen by Player 1. The nodes
marked E correspond to Nature’s move. In that particular case we assume that three
possible elementary events are equiprobable. The nodes represented by dark circles
are the terminal nodes where the game stops and the payoffs are collected.
This representation of games is obviously inspired from parlor games like Chess ,
Poker , Bridge , etc which can be, at least theoretically, correctly described in this
framework. In such a context, the randomness of Nature ’s play is the representation
of card or dice draws realized in the course of the game.
The extensive form provides indeed a very detailed description of the game. It is
however rather non practical because the size of the tree becomes very quickly, even
for simple games, absolutely huge. An attempt to provide a complete description of a
complex game like Bridge , using an extensive form, would lead to a combinatorial ex
plosion. Another drawback of the extensive form description is that the states (nodes)
and actions (arcs) are essentially ﬁnite or enumerable. In many models we want to deal
with, actions and states will also often be continuous variables. For such models, we
will need a different method of problem description.
Nevertheless extensive form is useful in many ways. In particular it provides the
fundamental illustration of the dynamic structure of a game. The ordering of the se
quence of moves, highlighted by extensive form, is present in most games. Dynamic
games theory is also about sequencing of actions and reactions. Here, however, dif
18 CHAPTER 2. DECISION ANALYSIS WITH MANY AGENTS
D
1
`
a
1
1
`
`
`
`
`
`
`
`
`
`¬
a
2
1
_
`
¸
D
2
D
2
a
1
2
a
2
2
a
1
2
a
2
2
`
_¸
E
>
>
>
>
>
1/3
¸
1/3
,
,
,
[payoffs]
[payoffs]
[payoffs]
`
_¸
E
>
>
>
>
>
1/3
¸
1/3
,
,
,
[payoffs]
[payoffs]
[payoffs]
`
_¸
E
>
>
>
>
>
1/3
¸
1/3
,
,
,
[payoffs]
[payoffs]
[payoffs]
`
_¸
E
>
>
>
>
>
1/3
¸
1/3
,
,
,
[payoffs]
[payoffs]
[payoffs]
Figure 2.1: A game in extensive form
ferent mathematical tools are used for the representation of the game dynamics. In
particular, differential and/or difference equations are utilized for this purpose.
2.2.2 Comparing Random Perspectives
Due to Nature’s randomness, the players will have to compare and choose among
different random perspectives in their decision making. The fundamental decision
structure is described in Figure 2.2. If the player chooses action a
1
he faces a random
perspective of expected value 100. If he chooses action a
2
he faces a sure gain of 100.
If the player is risk neutral he will be indifferent between the two actions. If he is risk
2.2. GAMES IN EXTENSIVE FORM 19
averse he will choose action a
2
, if he is risk lover he will choose action a
1
. In order to
D
a
1
a
2
`
_¸
E
e.v.=100
1/3
1/3
¸
1/3
100
0
100
200
Figure 2.2: Decision in uncertainty
represent the attitude toward risk of a decision maker Von Neumann and Morgenstern
introduced the concept of cardinal utility [Von Neumann & Morgenstern 1944]. If one
accepts the axioms of utility theory then a rational player should take the action which
leads toward the random perspective with the highest expected utility .
This solves the problem of comparing random perspectives. However this also
introduces a new way to play the game. A player can set a random experiment in order
to generate his decision. Since he uses utility functions the principle of maximization
of expected utility permits him to compare deterministic action choices with random
ones.
As a ﬁnal reminder of the foundations of utility theory let’s recall that the Von Neumann
Morgenstern utility function is deﬁned up to an afﬁne transformation. This says that
the player choices will not be affected if the utilities are modiﬁed through an afﬁne
transformation.
20 CHAPTER 2. DECISION ANALYSIS WITH MANY AGENTS
2.3 Additional concepts about information
What is known by the players who interact in a game is of paramount importance. We
refer brieﬂy to the concepts of complete and perfect information.
2.3.1 Complete and perfect information
The information structure of a game indicates what is known by each player at the time
the game starts and at each of his moves.
Complete vs Incomplete Information
Let us consider ﬁrst the information available to the players when they enter a game
play. A player has complete information if he knows
• who the players are
• the set of actions available to all players
• all possible outcomes to all players.
A game with complete information and common knowledge is a game where all play
ers have complete information and all players know that the other players have com
plete information.
Perfect vs Imperfect Information
We consider now the information available to a player when he decides about speciﬁc
move. In a game deﬁned in its extensive form, if each information set consists of just
one node, then we say that the players have perfect information . If that is not the case
the game is one of imperfect information .
Example 2.3.1 A game with simultaneous moves, as e.g. the one shown in Figure 2.1,
is of imperfect information.
2.4. GAMES IN NORMAL FORM 21
Perfect recall
If the information structure is such that a player can always remember all past moves
he has selected, and the information he has received, then the game is one of perfect
recall . Otherwise it is one of imperfect recall .
2.3.2 Commitment
A commitment is an action taken by a player that is binding on him and that is known
to the other players. In making a commitment a player can persuade the other players
to take actions that are favorable to him. To be effective commitments have to be
credible . A particular class of commitments are threats .
2.3.3 Binding agreement
Binding agreements are restrictions on the possible actions decided by two or more
players, with a binding contract that forces the implementation of the agreement. Usu
ally, to be binding an agreement requires an outside authority that can monitor the
agreement at no cost and impose on violators sanctions so severe that cheating is pre
vented.
2.4 Games in Normal Form
2.4.1 Playing games through strategies
Let M = ¦1, . . . , m¦ be the set of players. A pure strategy γ
j
for Player j is a mapping
which transforms the information available to Player j at a decision node where he is
making a move into his set of admissible actions. We call strategy vector the mtuple
γ = (γ)
j=1,...m
. Once a strategy is selected by each player, the strategy vector γ is
deﬁned and the game is played as it were controlled by an automaton
4
.
An outcome (expressed in terms of expected utility to each player if the game
includes chance nodes) is associated with a strategy vector γ. We denote by Γ
j
the set
4
This idea of playing games through the use of automata will be discussed in more details when we
present the folk theorem for repeated games in Part II
22 CHAPTER 2. DECISION ANALYSIS WITH MANY AGENTS
of strategies for Player j. Then the game can be represented by the m mappings
V
j
: Γ
1
Γ
j
Γ
m
→ IR, j ∈ M
that associate a unique (expected utility) outcome V
j
(γ) for each player j ∈ M with
a given strategy vector in γ ∈ Γ
1
Γ
j
Γ
m
. One then says that the game is
deﬁned in its normal form.
2.4.2 From the extensive form to the strategic or normal form
We consider a simple twoplayer game, called “matching pennies”. The rules of the
game are as follows:
The game is played over two stages. At ﬁrst stage each player chooses
head (H) or tail (T) without knowing the other player’s choice. Then they
reveal their choices to one another. If the coins do not match, Player 1
wins $5 and Payer 2 wins $5. If the coins match, Player 2 wins $5 and
Payer 1 wins $5. At the second stage, the player who lost at stage 1 has
the choice of either stopping the game or playing another penny matching
with the same type of payoffs as in the ﬁrst stage (Q, H, T).
The extensive form tree
This game is represented in its extensive form in Figure 2.3. The terminal payoffs rep
resent what Player 1 wins; Player 2 receives the opposite values. We have represented
the information structure in a slightly different way here. A dotted line connects the
different nodes forming an information set for a player. The player who has the move
is indicated on top of the graph.
Listing all strategies
In Table 2.1 we have identiﬁed the 12 different strategies that can be used by each of
the two players in the game of Matching pennies. Each player moves twice. In the
ﬁrst move the players have no information; in the second move they know what have
been the choices made at ﬁrst stage. We can easily identify the whole set of possible
strategies.
2.4. GAMES IN NORMAL FORM 23
Figure 2.3: The extensive form tree of the matching pennies game
¿´
´
´
´
´
´
´
´´
`
`
`
`
`
`
`
`·
¿
¿
¿
¿>
>
>
>
>
>
>
>
>
>
>
>
>
>
>




¸
¿
`
`
`
`
`
>
>
>
>
>
>
>
>
>
>
.
.
.
.
.
.
.
.
.
.
¿
¿
.
.
.
.
.
.
.
.
.
.
¸ ¿
¿
.
.
.
.
.
.
.
.
.
.




¸
`
`
`
`
`
¿
¿ ¸
¸
¿
¿.
.
.
.
.
.
.
.
.
.




¸
`
`
`
`
`
¿ ¸
¿ ¸
¸




¸
`
`
`
`
`
¿
¿
¸









¸









¸
10
0
0
10
5
0
10
10
0
5
0
10
10
0
5
10
0
0
10
5
H
T
H
T
H
T
Q
H
T
Q
H
T
H
T
H
T
H
T
H
T
H
T
H
T
H
T
H
T
Q
H
T
Q
H
T
P1 P2 P1 P2 P1
Payoff matrix
In Table 2.2 we have represented the payoffs that are obtained by Player 1 when both
players choose one of the 12 possible strategies.
24 CHAPTER 2. DECISION ANALYSIS WITH MANY AGENTS
Strategies of Player 1 Strategies of Player 2
1st scnd move 1st scnd move
move if player 2 move if player 1
has played move has played
H T H T
1 H Q H H H Q
2 H Q T H T Q
3 H H H H H H
4 H H T H T H
5 H T H H H T
6 H T T H T T
7 T H Q T Q H
8 T T Q T Q T
9 T H H T H H
10 T T H T H T
11 T H T T T H
12 T T T T T T
Table 2.1: List of strategies
2.4.3 Mixed and Behavior Strategies
Mixing strategies
Since a player evaluates outcomes according to his VNMutility functions he can envi
sion to “mix” strategies by selecting one of them randomly, according to a lottery that
he will deﬁne. This introduces one supplementary chance move in the game descrip
tion.
For example, if Player j has p pure strategies γ
jk
, k = 1, . . . , p he can select the
strategy he will play through a lottery which gives a probability x
jk
to the pure strategy
γ
jk
, k = 1, . . . , p. Now the possible choices of action by Player j are elements of the
set of all the probability distributions
A
j
= ¦x
j
= (x
jk
)
k=1,...,p
[x
jk
≥ 0,
p
k=1
x
jk
= 1.
We note that the set A
j
is compact and convex in IR
p
.
2.4. GAMES IN NORMAL FORM 25
1 2 3 4 5 6 7 8 9 10 11 12
1 5 5 5 5 5 5 5 5 0 0 10 10
2 5 5 5 5 5 5 5 5 10 10 0 0
3 0 10 0 10 10 0 5 5 0 0 10 10
4 10 0 10 0 10 0 5 5 10 10 0 0
5 0 10 0 10 0 10 5 5 0 0 10 10
6 0 10 0 10 0 10 5 5 10 10 0 0
7 5 5 0 0 10 10 5 5 5 5 5 5
8 5 5 10 10 0 0 5 5 5 5 5 5
9 5 5 0 0 10 10 10 0 10 0 10 0
10 5 5 10 10 0 0 10 0 10 0 10 0
11 5 5 0 0 10 10 0 10 0 10 0 10
12 5 5 10 10 0 0 0 10 0 10 0 10
Table 2.2: Payoff matrix
Behavior strategies
A behavior strategy is deﬁned as a mapping which associates with the information
available to Player j at a decision node where he is making a move a probability dis
tribution over his set of actions.
The difference between mixed and behavior strategies is subtle. In a mixed strat
egy, the player considers the set of possible strategies and picks one, at random, ac
cording to a carefully designed lottery. In a behavior strategy the player designs a
strategy that consists in deciding at each decision node, according to a carefully de
signed lottery, this design being contingent to the information available at this node.
In summary we can say that a behavior strategy is a strategy that includes randomness
at each decision node. A famous theorem [Kuhn, 1953], that we give without proof,
establishes that these two ways of introding randomness in the choice of actions are
equivalent in a large class of games.
Theorem 2.4.1 In an extensive game of perfect recall all mixed strategies can be rep
resented as behavior strategies.
26 CHAPTER 2. DECISION ANALYSIS WITH MANY AGENTS
Chapter 3
Solution concepts for noncooperative
games
3.1 introduction
In this chapter we consider games described in their normal form and we propose
different solution concept under the assumption that the players are non cooperating.
In noncooperative games the players do not communicate between each other, they
don’t enter into negotiation for achieving a common course of action. They know that
they are playing a game. They know how the actions, their own and the actions of
the other players, will determine the payoffs of every player. However they will not
cooperate.
To speak of a solution concept for a game one needs, ﬁrst of all, to describe the
game in its normal form. The solution of an mplayer game will thus be a set of strategy
vectors γ that have attractive properties expressed in terms of the payoffs received by
the players.
Recall that an mperson game in normal form is deﬁned by the following data
¦M, (Γ
i
), (V
j
) for j ∈ M¦,
where M is the set of players, M = ¦1, 2, ..., m¦, and for each player j ∈ M, Γ
j
is the
set of strategies (also called the strategy space ) and V
j
, j ∈ M, is the payoff function
that assigns a real number V
j
(γ) with a strategy vector γ ∈ Γ
1
Γ
2
. . . Γ
m
.
In this chapter we shall study different classes of games in normal form. The
ﬁrst category consists in the socalled matrix games describing a situation where two
players are in a complete antagonistic situation since what a player gains the other
27
28 CHAPTER 3. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES
player looses, and where each player has a ﬁnite choice of strategies. Matrix games
are also called two player zerosum ﬁnite games . The second category will consist
of two player games, again with a ﬁnite strategy set for each player, but where the
payoffs are not zerosum. These are the nonzerosummatrix games or bimatrix games .
The third category, will be the socalled concave games that encompass the previous
classes of matrix and bimatrix games and for which we will be able to prove nice
existence, uniqueness and stability results for a noncooperative game solution concept
called equilibrium.
3.2 Matrix Games
Deﬁnition 3.2.1 A game is zerosum if the sum of the players’ payoffs is always zero.
Otherwise the game is nonzerosum. A twoplayer zerosum game is also called a duel.
Deﬁnition 3.2.2 A twoplayer zerosum game in which each player has only a ﬁnite
number of actions to choose from is called a matrix game.
Let us explore how matrix games can be “solved”. We number the players 1 and 2
respectively. Conventionally, Player 1 is the maximizer and has m (pure) strategies,
say i = 1, 2, ..., m, and Player 2 is the minimizer and has n strategies to choose from,
say j = 1, 2, ..., n. If Player 1 chooses strategy i while Player 2 picks strategy j, then
Player 2 pays Player 1 the amount a
ij
1
. The set of all possible payoffs that Player
1 can obtain is represented in the form of the m n matrix A with entries a
ij
for
i = 1, 2, ..., m and j = 1, 2, ..., n. Now, the element in the i−th row and j−th column
of the matrix A corresponds to the amount that Player 2 will pay Player 1 if the latter
chooses strategy i and the former chooses strategy j. Thus one can say that in the
game under consideration, Player 1 (the maximizer) selects rows of A while Player
2 (the minimizer) selects columns of that matrix, and as the result of the play, Player
2 pays Player 1 the amount of money speciﬁed by the element of the matrix in the
selected row and column.
Example 3.2.1 Consider a game deﬁned by the following matrix:
_
3 1 8
4 10 0
_
1
Negative payments are allowed. We could have said also that Player 1 receives the amount a
ij
and
Player 2 receives the amount −a
ij
.
3.2. MATRIX GAMES 29
The question now is what can be considered as players’ best strategies.
One possibility is to consider the players’ security levels . It is easy to see that if
Player 1 chooses the ﬁrst row, then, whatever Player 2 does, Player 1 will get a payoff
equal to at least 1 (util
2
). By choosing the second row, on the other hand, Player 1 risks
getting 0. Similarly, by choosing the ﬁrst column Player 2 ensures that he will not have
to pay more than 4, while the choice of the second or third column may cost him 10
or 8, respectively. Thus we say that Player 1’s security level is 1 which is ensured by
the choice of the ﬁrst row, while Player 2’s security level is 4 and it is ensured by the
choice of the ﬁrst column. Notice that
1 = max
i
min
j
a
ij
and
4 = min
j
max
i
a
ij
which is the reason why the strategy which ensures that Player 1 will get at least the
payoff equal to his security level is called his maximin strategy . Symmetrically, the
strategy which ensures that Player 2 will not have to pay more than his security level
is called his minimax strategy .
Lemma 3.2.1 The following inequality holds
max
i
min
j
a
ij
≤ min
j
max
i
a
ij
. (3.1)
Proof: The proof of this result is based on the remark that, since both security levels
are achievable, they necessarily satisfy the inequality (3.1). We give below a more
detailed proof.
We note the obvious set of inequalities
min
j
a
kj
≤ a
kl
≤ max
i
a
il
(3.2)
which holds for all possible k and l. More formally, let (i
∗
, j
∗
) and (i
∗
, j
∗
) be deﬁned
by
a
i
∗
j
∗ = max
i
min
j
a
ij
(3.3)
and
a
i
∗
j
∗
= min
j
max
i
a
ij
(3.4)
2
A util is the utility unit.
30 CHAPTER 3. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES
respectively. Now consider the payoff a
i
∗
j
∗
. Then, by (3.2) with k = i
∗
and l = j
∗
we
get
max
i
(min
j
a
ij
) ≤ a
i
∗
j
∗
≤ min
j
(max
i
a
ij
).
QED.
An important observation is that if Player 1 has to move ﬁrst and Player 2 acts
having seen the move made by Player 1, then the maximin strategy is Player 1’s best
choice which leads to the payoff equal to 1. If the situation is reversed and it is Player
2 who moves ﬁrst, then his best choice will be the minimax strategy and he will have
to pay 4. Now the question is what happens if the players move simultaneously. The
careful study of the example shows that when the players move simultaneously the
minimax and maximin strategies are not satisfactory “solutions” to this game. Notice
that the players may try to improve their payoffs by anticipating each other’s strategy.
In the result of that we will see a process which in this case will not converge to any
solution.
Consider now another example.
Example 3.2.2 Let the matrix game A be given as follows
_
¸
_
10 −15
∗
20
20 −30 40
30 −45 60
_
¸
_ .
Can we ﬁnd satisfactory strategy pairs?
It is easy to see that
max
i
min
j
a
ij
= max¦−15, −30, −45¦ = −15
and
min
j
max
i
a
ij
= min¦30, −15, 60¦ = −15
and that the pair of maximin and minimax strategies is given by
(i, j) = (1, 2).
That means that Player 1 should choose the ﬁrst row while Player 2 should select the
second column, which will lead to the payoff equal to 15.
In the above example, we can see that the players’ maximin and minimax strategies
“solve” the game in the sense that the players will be best off if they use these strategies.
3.2. MATRIX GAMES 31
3.2.1 SaddlePoints
Let us explore in more depth this class of strategies that solve the zerosum matrix
game.
Deﬁnition 3.2.3 If in a matrix game A = [a
ij
]
i=1,...,m;j=1,...,n;
there exists a pair (i
∗
, j
∗
)
such that, for all i1, . . . , m and j1, . . . , n
a
ij
∗ ≤ a
i
∗
j
∗ ≤ a
i
∗
j
(3.5)
we say that the pair (i
∗
, j
∗
) is a saddle point in pure strategies for the matrix game.
As an immediate consequence, we see that, at a saddle point of a zerosum game, the
security levels of the two players are equal, i.e. ,
max
i
min
j
a
ij
= min
j
max
i
a
ij
= a
i
∗
j
∗.
What is less obvious is the fact that, if the security levels are equal then there exists a
saddle point.
Lemma 3.2.2 If, in a matrix game, the following holds
max
i
min
j
a
ij
= min
j
max
i
a
ij
= v
then the game admits a saddle point in pure strategies.
Proof: Let i
∗
and j
∗
be a strategy pair that yields the security level payoffs v (resp.
−v) for Player 1 (resp. Player 2). We thus have for all i = 1, . . . , m and j = 1, . . . , n
a
i
∗
j
≥ min
j
a
i
∗
j
= max
i
min
j
a
ij
(3.6)
a
ij
∗ ≤ max
i
a
ij
∗ = min
j
max
i
a
ij
. (3.7)
Since
max
i
min
j
a
ij
= min
j
max
i
a
ij
= a
i
∗
j
∗ = v
by (3.6)(3.7) we obtain
a
ij
∗ ≤ a
i
∗
j
∗ ≤ a
i
∗
j
which is the saddle point condition. QED.
Saddle point strategies provide a solution to the game problem even if the players
move simultaneously. Indeed, in Example 3.2.2, if Player 1 expects Player 2 to choose
32 CHAPTER 3. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES
the second column, then the ﬁrst row will be his optimal choice. On the other hand,
if Player 2 expects Player 1 to choose the ﬁrst row, then it will be optimal for him
to choose the second column. In other words, neither player can gain anything by
unilaterally deviating from his saddle point strategy. Each strategy constitutes the best
reply the player can have to the strategy choice of his opponent. This observation leads
to the following deﬁnition.
Remark 3.2.1 Using strategies i
∗
and j
∗
, Players 1 and 2 cannot improve their payoff
by unilaterally deviating from (i)
∗
or ((j)
∗
) respectively. We call such strategies an
equilibrium.
Saddle point strategies, as shown in Example 3.2.2, lead to both an equilibrium and
a pair of guaranteed payoffs. Therefore such a strategy pair, if it exists, provides a
solution to a matrix game, which is “good” in that rational players are likely to adopt
this strategy pair.
3.2.2 Mixed strategies
We have already indicated in chapter 2 that a player could “mix” his strategies by
resorting to a lottery to decide what to play. A reason to introduce mixed strategies in
a matrix game is to enlarge the set of possible choices. We have noticed that, like in
Example 3.2.1, many matrix games do not possess saddle points in the class of pure
strategies. However Von Neumann proved that saddle point strategy pairs exist in the
class of mixed strategies . Consider the matrix game deﬁned by an m n matrix A.
(As before, Player 1 has m strategies, Player 2 has n strategies). A mixed strategy for
Player 1 is an mtuple
x = (x
1
, x
2
, ..., x
m
)
where x
i
are nonnegative for i = 1, 2, ..., m, and x
1
+ x
2
+ ... + x
m
= 1. Similarly, a
mixed strategy for Player 2 is an ntuple
y = (y
1
, y
2
, ..., y
n
)
where y
j
are nonnegative for j = 1, 2, ..., n, and y
1
+ y
2
+ ... + y
m
= 1.
Note that a pure strategy can be considered as a particular mixed strategy with
one coordinate equal to one and all others equal to zero. The set of possible mixed
strategies of Player 1 constitutes a simplex in the space IR
m
. This is illustrated in
Figure 3.1 for m = 3. Similarly the set of mixed strategies of Player 2 is a simplex in
IR
m
. A simplex is, by construction, the smallest closed convex set that contains n + 1
points in IR
n
.
3.2. MATRIX GAMES 33
`
x
3
´
´
´
´
´
´
´
´
x
1
¸
x
2
1
1
`
`
`
`
`
`
`
`
`
`
`
1
Figure 3.1: The simplex of mixed strategies
The interpretation of a mixed strategy, say x, is that Player 1, chooses his pure
strategy i with probability x
i
, i = 1, 2, ..., m. Since the two lotteries deﬁning the
random draws are independent events, the joint probability that the strategy pair (i, j)
be selected is given by x
i
y
j
. Therefore, with each pair of mixed strategies (x, y) we
can associate an expected payoff given by the quadratic expression (in x. y)
3
m
i=1
n
j=1
x
i
y
j
a
ij
= x
T
Ay.
The ﬁrst important result of game theory proved in [Von Neumann 1928] showed
the following
Theorem 3.2.1 Any matrix game has a saddle point in the class of mixed strategies,
i.e. , there exist probability vectors x and y such that
max
x
min
y
x
T
Ay = min
y
max
x
x
T
Ay = (x
∗
)
T
Ay
∗
= v
∗
where v
∗
is called the value of the game.
3
The superscript
T
denotes the transposition operator on a matrix.
34 CHAPTER 3. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES
We shall not repeat the complex proof given by von Neumann. Instead we shall relate
the search for saddle points with the solution of linear programs. A well known duality
property will give us the saddle point existence result.
3.2.3 Algorithms for the Computation of SaddlePoints
Matrix games can be solved as linear programs. It is easy to show that the following
two relations hold:
v
∗
= max
x
min
y
x
T
Ay = max
x
min
j
m
i=1
x
i
a
ij
(3.8)
and
z
∗
= min
y
max
x
x
T
Ay = min
y
max
i
n
j=1
y
j
a
ij
(3.9)
These two relations imply that the value of the matrix game can be obtained by solving
any of the following two linear programs:
1. Primal problem
max v
subject to
v ≤
m
i=1
x
i
a
ij
, j = 1, 2, ..., n
1 =
m
i=1
x
i
x
i
≥ 0, i = 1, 2, ...m
2. Dual problem
min z
subject to
z ≥
n
j=1
y
i
a
ij
, i = 1, 2, ..., m
1 =
n
j=1
y
j
y
j
≥ 0, j = 1, 2, ...n
The following theorem relates the two programs together.
Theorem 3.2.2 (Von Neumann [Von Neumann 1928]): Any ﬁnite twoperson zero
sum matrix game A has a value
3.2. MATRIX GAMES 35
proof: The value v
∗
of the zerosum matrix game A is obtained as the common op
timal value of the following pair of dual linear programming problems. The respective
optimal programs deﬁne the saddlepoint mixed strategies.
Primal Dual
max v min z
subject to x
T
A ≥ v1
T
subject to Ay ≤ z1
x
T
1 = 1 1
T
y = 1
x ≥ 0 y ≥ 0
where
1 =
_
¸
¸
¸
¸
¸
¸
¸
_
1
.
.
.
1
.
.
.
1
_
¸
¸
¸
¸
¸
¸
¸
_
denotes a vector of appropriate dimension with all components equal to 1. One needs
to solve only one of the programs. The primal and dual solutions give a pair of saddle
point strategies. •
Remark 3.2.2 Simple n n games can be solved more easily (see [Owen, 1982]).
Suppose A is an n n matrix game which does not have a saddle point in pure strate
gies. The players’ unique saddle point mixed strategies and the game value are given
by:
x =
1
T
A
D
1
T
A
D
1
(3.10)
y =
A
D
1
1
T
A
D
1
(3.11)
v =
detA
1
T
A
D
1
(3.12)
where A
D
is the adjoint matrix of A, det A the determinant of A, and 1 the vector of
ones as before.
Let us illustrate the usefulness of the above formulae on the following example.
Example 3.2.3 We want to solve the matrix game
_
1 0
−1 2
_
.
36 CHAPTER 3. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES
The game, obviously, has no saddle point (in pure strategies). The adjoint A
D
is
_
2 0
1 1
_
and 1A
D
= [3 1], A
D
1
T
= [2 2], 1A
D
1
T
= 4, det A = 2. Hence the best mixed
strategies for the players are:
x = [
3
4
,
1
4
], y = [
1
2
,
1
2
]
and the value of the play is:
v =
1
2
In other words in the long run Player 1 is supposed to win .5 if he uses the ﬁrst row
75% of times and the second 25% of times. The Player 2’s best strategy will be to use
the ﬁrst and the second column 50% of times which ensures him a loss of .5 (only;
using other strategies he is supposed to lose more).
3.3 Bimatrix Games
We shall now extend the theory to the case of a nonzero sum game. A bimatrix game
conveniently represents a twoperson nonzero sum game where each player has a ﬁnite
set of possible pure strategies.
In a bimatrix game there are two players, say Player 1 and Player 2 who have mand
n pure strategies to choose from respectively. Now, if the players select a pair of pure
strategies, say (i, j), then Player 1 obtains the payoff a
ij
and Player 2 obtains b
ij
, where
a
ij
and b
ij
are some given numbers. The payoffs for the two players corresponding to
all possible combinations of pure strategies can be represented by two m n payoff
matrices A and B with entries a
ij
and b
ij
respectively (from here the name).
Notice that a (zerosum) matrix game is a bimatrix game where b
ij
= −a
ij
. When
a
ij
+ b
ij
= 0, the game is a zerosum matrix game. Otherwise, the game is nonzero
sum. (As a
ij
and b
ij
are the players’ payoff this conclusion agrees with Deﬁnition
3.2.1.)
Example 3.3.1 Consider the bimatrix game deﬁned by the following matrices
_
52 44 44
42 46 39
_
and
_
50 44 41
42 49 43
_
.
3.3. BIMATRIX GAMES 37
It is often convenient to combine the data contained in two matrices and write it in the
form of one matrix whose entries are ordered pairs (a
ij
, b
ij
). In this case one obtains
_
(52, 50)
∗
(44, 44) (44, 41)
(42, 42) (46, 49)
∗
(39, 43)
_
.
In the above bimatrix some cells have been indicated with asterisks ∗. They correspond
to outcomes resulting from equilibra , a concept that we shall introduce now.
If Player 1 chooses strategy “row 1”, the best reply by Player 2 is to choose strategy
“column 1”, and vice versa. Therefore we say that the outcome (52, 50) is associated
with the equilibrium pair “row 1,column 1”.
The situation is the same for the outcome (46, 49) that is associated with another
equilibrium pair “row 2,column 2”.
We already see on this simple example that a bimatrix game can have many equi
libria. However there are other examples where no equilibrium can be found in pure
strategies. So, as we have already done with (zerosum) matrix games, let us expand
the strategy sets to include mixed strategies.
3.3.1 Nash Equilibria
Assume that the players may use mixed strategies . For zerosummatrix games we have
shown the existence of a saddle point in mixed strategies which exhibits equilibrium
properties. We formulate now the Nash equilibrium concept for the bimatrix games.
The same concept will be deﬁned later on for a more general mplayer case.
Deﬁnition 3.3.1 A pair of mixed strategies (x
∗
, y
∗
) is said to be a Nash equilibrium of
the bimatrix game if
1. (x
∗
)
T
A(y
∗
) ≥ x
T
A(y
∗
) for every mixed strategy x, and
2. (x
∗
)
T
B(y
∗
) ≥ (x
∗
)
T
By for every mixed strategy y.
In an equilibrium, no player can improve his payoff by deviating unilaterally from his
equilibrium strategy.
38 CHAPTER 3. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES
Remark 3.3.1 A Nash equilibrium extends to nonzero sum games the equilibrium
property that was observed for the saddle point solution to a zero sum matrix game.
The big difference with the saddle point concept is that, in a nonzero sum context, the
equilibriumstrategy for a player does not guarantee himthat he will receive at least the
equilibrium payoff. Indeed if his opponent does not play “well”, i.e. does not use the
equilibrium strategy, the outcome of a player can be anything; there is no guarantee.
Another important step in the development of the theory of games has been the follow
ing theorem [Nash, 1951]
Theorem 3.3.1 Every ﬁnite bimatrix game has at least one Nash equilibrium in mixed
strategies .
Proof: A more general existence proof including the case of bimatrix games will be
given in the next section.•
3.3.2 Shortcommings of the Nash equilibrium concept
Multiple equilibria
As noticed in Example 3.3.1 a bimatrix game may have several equilibria in pure strate
gies. There may be additional equilibria in mixed strategies as well. The nonunique
ness of Nash equilibria for bimatrix games is a serious theoretical and practical prob
lem. In Example 3.3.1 one equilibrium strictly dominates the other, ie., gives both
players higher payoffs. Thus, it can be argued that even without any consultations the
players will naturally pick the strategy pair (i, j) = (1, 1).
It is easy to deﬁne examples where the situation is not so clear.
Example 3.3.2 Consider the following bimatrix game
_
(2, 1)
∗
(0, 0)
(0, 0) (1, 2)
∗
_
It is easy to see that this game has two equilibria (in pure strategies), none of which
dominates the other. Moreover, Player 1 will obviously prefer the solution (1, 1), while
Player 2 will rather have (2, 2). It is difﬁcult to decide how this game should be played
if the players are to arrive at their decisions independently of one another.
3.3. BIMATRIX GAMES 39
The prisoner’s dilemma
There is a famous example of a bimatrix game, that is used in many contexts to argue
that the Nash equilibrium solution is not always a “good” solution to a noncooperative
game.
Example 3.3.3 Suppose that two suspects are held on the suspicion of committing
a serious crime. Each of them can be convicted only if the other provides evidence
against him, otherwise he will be convicted as guilty of a lesser charge. By agreeing
to give evidence against the other guy, a suspect can shorten his sentence by half.
Of course, the prisoners are held in separate cells and cannot communicate with each
other. The situation is as described in Table 3.1 with the entries giving the length of the
prison sentence for each suspect, in every possible situation. In this case, the players
are assumed to minimize rather than maximize the outcome of the play. The unique
Suspect I: Suspect II: refuses agrees to testify
refuses (2, 2) (10, 1)
agrees to testify (1, 10) (5, 5)
∗
Table 3.1: The Prisoner’s Dilemma.
Nash equilibrium of this game is given by the pair of pure strategies (agree − to −
testify, agree − to − testify) with the outcome that both suspects will spend ﬁve
years in prison. This outcome is strictly dominated by the strategy pair (refuse−to−
testify, refuse −to −testify), which however is not an equilibrium and thus is not
a realistic solution of the problem when the players cannot make binding agreements.
This example shows that Nash equilibria could result in outcomes being very far from
efﬁciency.
3.3.3 Algorithms for the Computation of Nash Equilibria in Bima
trix Games
Linear programming is closely associated with the characterization and computation
of saddle points in matrix games. For bimatrix games one has to rely to algorithms
solving either quadratic programming or complementarity problems. There are also a
few algorithms (see [Aumann, 1989], [Owen, 1982]) which permit us to ﬁnd an equi
librium of simple bimatrix games. We will show one for a 2 2 bimatrix game and
then introduce the quadratic programming [Mangasarian and Stone, 1964] and com
plementarity problem [Lemke & Howson 1964] formulations.
40 CHAPTER 3. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES
Equilibrium computation in a 2 2 bimatrix game
For a simple 2 2 bimatrix game one can easily ﬁnd a mixed strategy equilibrium as
shown in the following example.
Example 3.3.4 Consider the game with payoff matrix given below.
_
(1, 0) (0, 1)
(
1
2
,
1
3
) (1, 0)
_
Notice that this game has no pure strategy equilibrium. Let us ﬁnd a mixed strategy
equilibrium.
Assume Player 2 chooses his equilibrium strategy y (i.e. 100y% of times use
ﬁrst column, 100(1 − y)% times use second column) in such a way that Player 1 (in
equilibrium) will get as much payoff using ﬁrst row as using second row i.e.
y + 0(1 −y) =
1
2
y + 1(1 −y).
This is true for y
∗
=
2
3
.
Symmetrically, assume Player 1 is using a strategy x (i.e. 100x% of times use ﬁrst
row, 100(1 − y)% times use second row) such that Player 2 will get as much payoff
using ﬁrst column as using second column i.e.
0x +
1
3
(1 −x) = 1x + 0(1 −x).
This is true for x
∗
=
1
4
. The players’ payoffs will be, respectively, (
2
3
) and (
1
4
).
Then the pair of mixed strategies
(x
∗
, 1 −x
∗
), (y
∗
, 1 −y
∗
)
is an equilibrium in mixed strategies.
Links between quadratic programming and Nash equilibria in bimatrix games
Mangasarian and Stone (1964) have proved the following result that links quadratic
programming with the search of equilibria in bimatrix games. Consider a bimatrix
3.3. BIMATRIX GAMES 41
game (A, B). We associate with it the quadratic program
max [x
T
Ay + x
T
By −v
1
−v
2
] (3.13)
s.t.
Ay ≤ v
1
1
m
(3.14)
B
T
x ≤ v
2
1
n
(3.15)
x, y ≥ 0 (3.16)
x
T
1
m
= 1 (3.17)
y
T
1
n
= 1 (3.18)
v
1
, v
2
, ∈ IR. (3.19)
Lemma 3.3.1 The following two assertions are equivalent
(i) (x, y, v
1
, v
2
) is a solution to the quadratic programming problem (3.13) (3.19)
(ii) (x, y) is an equilibrium for the bimatrix game.
Proof: From the constraints it follows that x
T
Ay ≤ v
1
and x
T
By ≤ v
2
for any
feasible (x, y, v
1
, v
2
). Hence the maximum of the program is at most 0. Assume that
(x, y) is an equilibrium for the bimatrix game. Then the quadruple
(x, y, v
1
= x
T
Ay, v
2
= x
T
By)
is feasible, i.e. satisﬁes (7.2)(3.19), and gives to the objective function (7.1) a value
0. Hence the equilibrium deﬁnes a solution to the quadratic programming problem
(7.1)(3.19).
Conversely, let (x
∗
, y
∗
, v
1
∗
, v
2
∗
) be a solution to the quadratic programming problem
(7.1) (3.19). We know that an equilibrium exists for a bimatrix game (Nash theorem).
We know that this equilibrium is a solution to the quadratic programming problem
(7.1)(3.19) with optimal value 0. Hence the optimal program (x
∗
, y
∗
, v
1
∗
, v
2
∗
) must also
give a value 0 to the objective function and thus be such that
x
T
∗
Ay
∗
+ x
T
∗
By
∗
= v
1
∗
+ v
2
∗
. (3.20)
For any x ≥ 0 and y ≥ 0 such that x
T
1
m
= 1 and y
T
1
n
= 1 we have, by (3.17) and
(3.18)
x
T
Ay
∗
≤ v
1
∗
x
T
∗
By ≤ v
2
∗
.
42 CHAPTER 3. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES
In particular we must have
x
T
∗
Ay
∗
≤ v
1
∗
x
T
∗
By
∗
≤ v
2
∗
These two conditions with (3.20) imply
x
T
∗
Ay
∗
= v
1
∗
x
T
∗
By
∗
= v
2
∗
.
Therefore we can conclude that For any x ≥ 0 and y ≥ 0 such that x
T
1
m
= 1 and
y
T
1
n
= 1 we have, by (3.17) and (3.18)
x
T
Ay
∗
≤ x
T
∗
Ay
∗
x
T
∗
By ≤ x
T
∗
By
∗
and hence, (x
∗
, y
∗
) is a Nash equilibrium for the bimatrix game. QED
A Complementarity Problem Formulation
We have seen that the search for equilibria could be done through solving a quadratic
programming problem. Here we show that the solution of a bimatrix game can also be
obtained as the solution of a complementarity problem.
There is no loss in generality if we assume that the payoff matrices are mn and
have only positive entries (A > 0 and B > 0). This is not restrictive, since VNM
utilities are deﬁned up to an increasing afﬁne transformation. A strategy for Player 1
is deﬁned as a vector x ∈ IR
m
that satisﬁes
x ≥ 0 (3.21)
x
T
1
m
= 1 (3.22)
and similarly for Player 2
y ≥ 0 (3.23)
y
T
1
n
= 1. (3.24)
It is easily shown that the pair (x
∗
, y
∗
) satisfying (3.21)(3.24) is an equilibrium iff
(x
∗
T
Ay
∗
)1
m
≥ Ay
∗
(A > 0)
(x
∗
T
By
∗
)1
n
≥ B
T
x
∗
(B > 0)
(3.25)
i.e. if the equilibrium condition is satisﬁed for pure strategy alternatives only.
3.3. BIMATRIX GAMES 43
Consider the following set of constraints with v
1
∈ IR and v
2
∈ IR
v
1
1
m
≥ Ay
∗
v
2
1
n
≥ B
T
x
∗
and
x
∗
T
(Ay
∗
−v
1
1
m
) = 0
y
∗
T
(B
T
x
∗
−v
2
1
n
) = 0
. (3.26)
The relations on the right are called complementarity constraints . For mixed strategies
(x
∗
, y
∗
) satisfying (3.21)(3.24), they simplify to x
∗
T
Ay
∗
= v
1
, x
∗
T
By
∗
= v
2
. This
shows that the above system (3.26) of constraints is equivalent to the system (3.25).
Deﬁne s
1
= x/v
2
, s
2
= y/v
1
and introduce the slack variables u
1
and u
2
, the
system of constraints (3.21)(3.24) and (3.26) can be rewritten
_
u
1
u
2
_
=
_
1
m
1
n
_
−
_
0 A
B
T
0
__
s
1
s
2
_
(3.27)
0 =
_
u
1
u
2
_
T
_
s
1
s
2
_
(3.28)
0 ≤
_
u
1
u
2
_
T
(3.29)
0 ≤
_
s
1
s
2
_
. (3.30)
Introducing four obvious new variables permits us to rewrite (3.27) (3.30) in the
generic formulation
u = q + Ms (3.31)
0 = u
T
s (3.32)
u ≥ 0 (3.33)
s ≥ 0, (3.34)
of a socalled a complementarity problem.
Apivoting algorithm([Lemke & Howson 1964], [Lemke 1965]) has been proposedto
solve such problems. This algorithm applies also to quadratic programming , so this
conﬁrms that the solution of a bimatrix game is of the same level of difﬁculty as solving
a quadratic programming problem.
Remark 3.3.2 Once we obtain x and y, solution to (3.27)(3.30) we shall have to
reconstruct the strategies through the formulae
x = s
1
/(s
T
1
1
m
) (3.35)
y = s
2
/(s
T
2
1
n
). (3.36)
44 CHAPTER 3. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES
3.4 Concave mPerson Games
The nonuniqueness of equilibria in bimatrix games and a fortiori in mplayer matrix
games poses a delicate problem. If there are many equilibria, in a situation where
one assumes that the players cannot communicate or enter into preplay negotiations,
how will a given player choose among the different strategies corresponding to the
different equilibrium candidates? In single agent optimization theory we know that
strict concavity of the (maximized) objective function and compactness and convexity
of the constraint set lead to existence and uniqueness of the solution. The following
question thus arises
Can we generalize the mathematical programming approach to a situation
where the optimization criterion is a NashCournot equilibrium? can we
then give sufﬁcient conditions for existence and uniqueness of an equilib
rium solution?
The answer has been given by Rosen in a seminal paper [Rosen, 1965] dealing with
concave mperson game .
A concave mperson game is described in terms of individual strategies repre
sented by vectors in compact subsets of Euclidian spaces (IR
m
j
for Player j) and by
payoffs represented, for each player, by a continuous functions which is concave w.r.t.
his own strategic variable. This is a generalization of the concept of mixed strategies,
introduced in previous sections. Indeed, in a matrix or a bimatrix game the mixed
strategies of a player are represented as elements of a simplex , i.e. a compact con
vex set, and the payoffs are bilinear or multilinear forms of the strategies, hence, for
each player the payoff is concave w.r.t. his own strategic variable. This structure is
thus generalized in two ways: (i) the strategies can be vectors constrained to be in a
more general compact convex set and (ii) the payoffs are represented by more general
continuousconcave functions.
Let us thus introduce the following game in strategic form
• Each player j ∈ M = ¦1, . . . , m¦ controls the action u
j
∈ U
j
a compact convex
subset of IR
m
j
, where m
j
is a given integer, and gets a payoff ψ
j
(u
1
, . . . , u
j
, . . . , u
m
),
where ψ
j
: U
1
. . . U
j
, . . . U
m
→ IR is continuous in each u
i
and concave
in u
j
.
• A coupled constraint is deﬁned as a proper subset  of U
1
. . . U
j
. . . U
m
.
The constraint is that the joint action u = (u
1
, . . . , u
m
) must be in .
3.4. CONCAVE MPERSON GAMES 45
Deﬁnition 3.4.1 An equilibrium, under the coupled constraint  is deﬁned as a deci
sion mtuple (u
∗
1
, . . . , u
∗
j
, . . . , u
∗
m
) ∈  such that for each player j ∈ M
ψ
j
(u
∗
1
, . . . , u
∗
j
, . . . , u
∗
m
) ≥ ψ
j
(u
∗
1
, . . . , u
j
, . . . , u
∗
m
) (3.37)
for all u
j
∈ U
j
s.t. (u
∗
1
, . . . , u
j
, . . . , u
∗
m
) ∈ . (3.38)
Remark 3.4.1 The consideration of a coupled constraint is a new feature. Now each
player’s strategy space may depend on the strategy of the other players. This may look
awkward in the context of nonocooperative games where thw player cannot enter into
communication or cannot coordinate their actions. However the concept is mathemat
ically well deﬁned. We shall see later on that it ﬁts very well some interesting aspects
of environmental management. One can think for example of a global emission con
straint that is imposed on a ﬁnite set of ﬁrms that are competing on the same market.
This environmental example will be further developed in forthcoming chapters.
3.4.1 Existence of Coupled Equilibria
Deﬁnition 3.4.2 A coupled equilibrium is a vector u
∗
such that
ψ
j
(u
∗
) = max
u
j
¦ψ
j
(u
∗
1
, . . . , u
j
, . . . , u
∗
m
)[(u
∗
1
, . . . , u
j
, . . . , u
∗
m
) ∈ ¦. (3.39)
At such a point no player can improve his payoff by a unilateral change in his strategy
which keeps the combined vector in .
Let us ﬁrst show that an equilibrium is actually deﬁned through a ﬁxed point condi
tion. For that purpose we introduce a socalled global reaction function θ :  → IR
deﬁned by
θ(u, v, r) =
m
j=1
r
j
ψ
j
(u
1
, . . . , v
j
, . . . , u
m
), (3.40)
where r
j
> 0, j = 1, . . . , m are given weights. The precise role of this weighting
scheme will be explained later. For the moment we could take as well r
j
≡ 1. Notice
that, even if u and v are in , the combined vectors (u
1
, . . . , v
j
, . . . , u
m
) are element
of a larger set in U
1
. . . U
m
. This function is continuous in u and concave in v
for every ﬁxed u. We call it a reaction function since the vector v can be interpreted as
composed of the reactions of the different players to the given vector u. This function
is helpful as shown in the following result.
Lemma 3.4.1 Let u
∗
∈  be such that
θ(u
∗
, u
∗
, r) = max
u∈U
θ(u
∗
, u, r). (3.41)
Then u
∗
is a coupled equilibrium.
46 CHAPTER 3. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES
Proof: Assume u
∗
satisﬁes (3.41) but is not a coupled equilibrium, i.e. does not
satisfy (3.39). Then, for one player, say , there would exist a vector
¯ u = (u
∗
1
, . . . , u
, . . . , u
∗
m
) ∈ 
such that
ψ
(u
∗
1
, . . . , u
, . . . , u
∗
m
) > ψ
(u
∗
).
Then we shall also have θ(u
∗
, ¯ u) > θ(u
∗
, u
∗
) which is a contradiction to (3.41). QED
This result has two important consequences.
1. It shows that proving the existence of an equilibrium reduces to proving that a
ﬁxed point exists for an appropriately deﬁned reaction mapping (u
∗
is the best
reply to u
∗
in (3.41));
2. it associates with an equilibrium an implicit maximization problem, deﬁned in
(3.41). We say that this problem is implicit since it is deﬁned in terms of its
solution u
∗
.
To make more precise the ﬁxed point argument we introduce a coupled reaction map
ping .
Deﬁnition 3.4.3 The point to set mapping
Γ(u, r) = ¦v[θ(u, v, r) = max
w∈U
θ(u, w, r)¦. (3.42)
is called the coupled reaction mapping associated with the positive weighting r. A
ﬁxed point of Γ(, r) is a vector u
∗
such that u
∗
∈ Γ(u
∗
, r).
By Lemma 3.4.1 a ﬁxed point of Γ(, r) is a coupled equilibrium.
Theorem 3.4.1 For any positive weighting r there exists a ﬁxed point of Γ(, r), i.e. a
point u
∗
s.t. u
∗
∈ Γ(u
∗
, r). Hence a coupled equilibrium exists.
Proof: The proof is based on the Kakutani ﬁxedpoint theorem that is given in the
Appendix of section 3.8. One is required to show that the point to set mapping is upper
semicontinuous. QED
Remark 3.4.2 This existence theorem is very close, in spirit, to the theorem of Nash.
It uses a ﬁxedpoint result which is “topological” and not “constructive”, i.e. it does
not provide a computational method. However, the deﬁnition of a normalised equilib
rium introduced by Rosen establishes a link between mathematical programming and
concave games with coupled constraints.
3.4. CONCAVE MPERSON GAMES 47
3.4.2 Normalized Equilibria
KuhnTucker multipliers
Suppose that the coupled constraint (3.39) can be deﬁned by a set of inequalities
h
k
(u) ≥ 0, k = 1, . . . , p (3.43)
where h
k
: U
1
. . . U
m
→ IR, k = 1, . . . , p, are given concave functions. Let
us further assume that the payoff functions ψ
j
() as well as the constraint functions
h
k
() are continuously differentiable and satisfy the constraint qualiﬁcation conditions
so that KuhnTucker multipliers exist for each of the implicit single agent optimization
problems deﬁned below.
Assume all players other than Player j use their strategy u
∗
i
, i ∈ M, i ,= j. Then the
equilibrium conditions (3.37)(3.39) deﬁne a single agent optimization problem with
concave objective function and convex compact admissible set. Under the assumed
constraint qualiﬁcation assumption there exists a vector of KuhnTucker multipliers
λ
j
= (λ
jk
)
k=1,...,p
such that the Lagrangean
L
j
([u
∗−j
, u
j
], λ
j
) = ψ
j
([u
∗−j
, u
j
]) +
k=1...p
λ
jk
h
k
([u
∗−j
, u
j
]) (3.44)
veriﬁes, at the optimum
0 =
∂
∂u
j
L
j
([u
∗−j
, u
∗
j
], λ
j
) (3.45)
0 ≤ λ
j
(3.46)
0 = λ
jk
h
k
([u
∗−j
, u
∗
j
]) k = 1, . . . , p. (3.47)
Deﬁnition 3.4.4 We say that the equilibrium is normalized if the different multipliers
λ
j
for j ∈ M are colinear with a common vector λ
0
, namely
λ
j
=
1
r
j
λ
0
(3.48)
where r
j
> 0 is a given weighting of the players.
Actually, this common multiplier λ
0
is associated with the implicit mathematical pro
gramming problem
max
u∈U
θ(u
∗
, u, r) (3.49)
to which we associate the Lagrangean
L
0
(u, λ
0
) =
j∈M
r
j
ψ
j
([u
∗−j
, u
j
]) +
k=1...p
λ
0k
h
k
(u). (3.50)
48 CHAPTER 3. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES
and the ﬁrst order necessary conditions
0 =
∂
∂u
j
¦r
j
ψ
j
(u
∗
) +
k=1,...,p
λ
0k
h
k
(u
∗
)¦, j ∈ M (3.51)
0 ≤ λ
0
(3.52)
0 = λ
0k
h
k
(u
∗
) k = 1, . . . , p. (3.53)
An economic interpretation
The multiplier, in a mathematical programming framework, can be interpreted as a
marginal cost associated with the righthand side of the constraint. More precisely it
indicates the sensitivity of the optimumsolution to marginal changes in this righthand
side. The multiplier permits also a price decentralization in the sense that, through an
adhoc pricing mechanism the optimizing agent is induced to satisfy the constraints. In
a normalized equilibrium, the shadow cost interpretation is not so apparent; however,
the price decomposition principle is still valid. Once the common multiplier has been
deﬁned, with the associated weighting r
j
> 0, j = 1, . . . , mthe coupled constraint will
be satisﬁed by equilibrium seeking players, when they use as payoffs the Lagrangeans
L
j
([u
∗−j
, u
j
], λ
j
) = ψ
j
([u
∗−j
, u
j
]) +
1
r
j
k=1...p
λ
0k
h
k
([u
∗−j
, u
j
]),
j = 1, . . . , m. (3.54)
The common multiplier permits then an “implicit pricing” of the common constraint
so that it remains compatible with the equilibrium structure. Indeed, this result to be
useful necessitates uniqueness of the normalized equilibrium associated with a given
weighting r
j
> 0, j = 1, . . . , m. In a mathematical programming framework, unique
ness of an optimum results from strict concavity of the objective function to be max
imized. In a game structure, uniqueness of the equilibrium will result from a more
stringent strict concavity requirement, called by Rosen strict diagonal concavity .
3.4.3 Uniqueness of Equilibrium
Let us consider the socalled pseudogradient deﬁned as the vector
g(u, r) =
_
_
_
_
_
_
_
r
1
∂
∂u
1
ψ
1
(u)
r
2
∂
∂u
2
ψ
2
(u)
.
.
.
r
m
∂
∂u
m
ψ
m
(u)
_
_
_
_
_
_
_
(3.55)
3.4. CONCAVE MPERSON GAMES 49
We notice that this expression is composed of the partial gradients of the different
payoffs with respect to the decision variables of the corresponding player. We also
consider the function
σ(u, r) =
m
j=1
r
j
ψ
j
(u) (3.56)
Deﬁnition 3.4.5 The function σ(u, r) is diagonally strictly concave on  if, for every
u
1
et u
2
in , the following holds
(u
2
−u
1
)
T
g(u
1
, r) + (u
1
−u
2
)
T
g(u
2
, r) > 0. (3.57)
A sufﬁcient condition that σ(u, r) be diagonally strictly concave is that the symmetric
matrix [G(u, r) + G(u, r)
T
] be negative deﬁnite for any u
1
in , where G(u, r) is the
Jacobian of g(u
0
, r) with respect to u.
Theorem 3.4.2 If σ(u, r) is diagonally strictly concave on the convex set , with the
assumptions insuring existence of K.T. multipliers, then for every r > 0 there exists a
unique normalized equilibrium
Proof We sketch below the proof given by Rosen [Rosen, 1965]. Assume that for
some r > 0 we have two equilibria u
1
and u
2
. Then we must have
h(u
1
) ≥ 0 (3.58)
h(u
2
) ≥ 0 (3.59)
and there exist multipliers λ
1
≥ 0, λ
2
≥ 0, such that
λ
1
T
h(u
1
) = 0 (3.60)
λ
2
T
h(u
2
) = 0 (3.61)
and for which the following holds true for each player j ∈ M
r
j
∇
u
i
ψ
j
(u
1
) + λ
1
T
∇
u
i
h(u
1
) = 0 (3.62)
r
j
∇
u
i
ψ
j
(u
2
) + λ
2
T
∇
u
i
h(u
2
) = 0. (3.63)
We multiply (3.62) by (u
2
− u
1
)
T
and (3.63) by (u
1
− u
2
)
T
and we sum together
to obtain an expression β + γ = 0, where, due to the concavity of the h
k
and the
conditions (3.58)(3.61)
γ =
j∈M
p
k=1
¦λ
1
k
(u
2
−u
1
)
T
∇
u
i
h
k
(u
1
) + λ
2
k
(u
1
−u
2
)
T
∇
u
i
h
k
(u
2
)¦
≥
j∈M
¦λ
1
T
[h(u
2
) −h(u
1
)] + λ
2
T
[h(u
1
) −h(u
2
)]¦
=
j∈M
¦λ
1
T
h(u
2
) + λ
2
T
h(u
1
)¦ ≥ 0, (3.64)
50 CHAPTER 3. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES
and
β =
j∈M
r
j
[(u
2
−u
1
)
T
∇
u
i
ψ
j
(u
1
) + (u
1
−u
2
)
T
∇
u
i
ψ
j
(u
2
)]. (3.65)
Since σ(u, r) is diagonally strictly concave we have β > 0 which contradicts β+γ = 0.
QED
3.4.4 A numerical technique
The diagonal strict concavity property that yielded the uniqueness result of theorem3.4.2
also provides an interesting extension of the gradient method for the computation of
the equilibrium. The basic idea is to ”project”, at each step, the pseudo gradient g(u, r)
on the constraint set  = ¦u : h(u) ≥ 0¦ (let’s call ¯ g(u
, r) this projection) and to
proceed through the usual steepest ascent step
u
+1
= u
+ τ
¯ g(u
, r).
Rosen shows that, at each step the step size τ
> 0 can be chosen small enough for
having a reduction of the norm of the projected gradient. This yields convergence of
the procedure toward the unique equilibrium.
3.4.5 A variational inequality formulation
Theorem 3.4.3 Under assumptions 3.5.1 the vector q
∗
= (q
∗
1
, . . . , q
∗
m
) is a Nash
Cournot equilibrium if and only if it satisﬁes the following variational inequality
(q −q
∗
)
T
g(q
∗
) ≤ 0, (3.66)
where g(q
∗
) is the pseudogradient at q
∗
with weighting 1.
Proof: Apply ﬁrst order necessary and sufﬁcient optimality conditions for each player
and aggregate to obtain (3.66). QED
Remark 3.4.3 The diagonal strict concavity assumption is then equivalent to the prop
erty of strong monotonicity of the operator g(q
∗
) in the parlance of variational in
equality theory.
3.5. COURNOT EQUILIBRIUM 51
3.5 Cournot equilibrium
The model proposed in 1838 by [Cournot, 1838] is still one of the most frequently used
game model in economic theory. It represents the competition between different ﬁrms
supplying a market for a divisible good.
3.5.1 The static Cournot model
Let us ﬁrst of all recall the basic Cournot oligopoly model. We consider a single
market on which m ﬁrms are competing. The market is characterized by its (inverse)
demand law p = D(Q) where p is the market clearing price and Q =
j=1,...,m
q
j
is
the total supply of goods on the market. The ﬁrm j faces a cost of production C
j
(q
j
),
hence, letting q = (q
1
, . . . , q
m
) represent the production decision vector of the m ﬁrms
together, the proﬁt of ﬁrm j is π
j
(q) = q
j
D(Q) −C
j
(q
j
). The following assumptions
are placed on the model
Assumption 3.5.1 The market demand and the ﬁrms cost functions satisfy the follow
ing (i) The inverse demand function is ﬁnite valued, nonnegative and deﬁned for all
Q ∈ [0, ∞). It is also twice differentiable, with D
(Q) < 0 wherever D(Q) > 0.
In addition D(0) > 0. (ii) C
j
(q
j
) is deﬁned for all q
j
∈ [0, ∞), nonnegative, convex,
twice continuously differentiable, and C
(q
j
) > 0. (iii) QD(Q) is bounded and strictly
concave for all Q such that D(Q) > 0.
If one assumes that each ﬁrm j chooses a supply level q
j
∈ [0, ¯ q
j
], this model satis
ﬁes the deﬁnition of a concave game ` a la Rosen (see exercise 3.7). There exists an
equilibrium. Let us consider the uniqueness issue in the duopoly case.
Consider the pseudo gradient (there is no need of weighting (r
1
, r
2
) since the con
straints are not coupled)
g(q
1
, q
2
) =
_
D(Q) + q
1
D
(Q) −C
1
(q
1
)
D(Q) + q
2
D
(Q) −C
2
(q
2
)
_
and the jacobian matrix
G(q
1
, q
2
) =
_
2D
(Q) + q
1
D
(Q) −C
1
(q
1
) D
(Q) + q
1
D
(Q)
D
(Q) + q
2
D
(Q) 2D
(Q) + q
2
D
(Q) −C
1
(q
2
)
_
.
The negative deﬁniteness of the symmetric matrix
1
2
[G(q
1
, q
2
) + G(q
1
, q
2
)
T
] =
_
2D
(Q) + q
1
D
(Q) −C
1
(q
1
) D
(Q) +
1
2
QD
(Q)
D
(Q) +
1
2
QD
(Q) 2D
(Q) + q
2
D
(Q) −C
1
(q
2
)
_
52 CHAPTER 3. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES
implies uniqueness of the equilibrium.
3.5.2 Formulation of a Cournot equilibrium as a nonlinear com
plementarity problem
We have seen, when we dealt with Nash equilibria in mmatrix games, that a solution
could be found through the solution of a linear complementarity problem. A similar
formulation holds for a Cournot equilibrium and it will in general lead to a nonlinear
complementarity problem or NLCP, a class of problems that has recently been the
object of considerable developments in the ﬁeld of Mathematical Programming (see
e.g. the book [Ferris & Pang, 1997a] and the survey paper [Ferris & Pang, 1997b]) We
can rewrite the equilibrium condition as an NLCP in canonical form
q
j
≥ 0 (3.67)
f
j
(q) ≥ 0 (3.68)
q
j
f
j
(q) = 0, (3.69)
where
f
j
(q) = −D(Q) −q
j
D
(Q) + C
j
(q
j
). (3.70)
The relations (3.673.69) express that, at a Cournot equilibrium, either q
j
= 0 and the
gradient g
j
(q) is ≤ 0, or q
j
> 0 and g
j
(q) = 0. There are now optimization softwares
that solve these types of problems. In [Rutherford, 1995] an extension of the GAMS
(see [Brooke et al., 1992]) modeling software to provide an ability to solve this type of
complementarity problem is described. More recently the modeling language AMPL
(see [Fourer et al., 1993]) has also been adapted to handel a problem of this type and
to submit it to an efﬁcient NLCP solver like PATH (see [Ferris & Munson, 1994]).
Example 3.5.1 This example comes from the AMPL website
http://www.ampl.com/ampl/TRYAMPL/
Consider an oligopoly with n = 10 ﬁrms. The production cost function of ﬁrm j is
given by
c
i
(q
i
) = c
i
q
i
+
β
i
1 + β
i
(L
i
q
i
)
(1+β
i
)/β
i
. (3.71)
The market demand function is deﬁned by
D(Q) = (A/Q)
1/γ
. (3.72)
Therefore
D
(Q) =
1
γ
(A/Q)
1/γ−1
(−
A
Q
2
) = −
1
γ
D(Q)/Q. (3.73)
3.5. COURNOT EQUILIBRIUM 53
The Nash Cournot equilibrium will be the solution of the NLCP (3.673.69) where
f
j
(q) = −D(Q) −q
j
D
(Q) + C
j
(q
j
)
= −(A/Q)
1/γ
+
1
γ
q
i
D(Q)/Q + c
i
+ (L
i
q
i
)
1/β
i
.
AMPL input
%%%%%%%FILE nash.mod%%%%%%%%%%
#==>nash.gms
# A noncooperative game example: a Nash equilibrium is sought.
# References:
# F.H. Murphy, H.D. Sherali, and A.L. Soyster, "A Mathematical
# Programming Approach for Determining Oligopolistic Market
# Equilibrium", Mathematical Programming 24 (1986), pp. 92
106.
# P.T. Harker, "Accelerating the convergence . . .", Mathematical
# Programming 41 (1988), pp. 2959.
set Rn := 1 .. 10;
param gamma := 1.2;
param c {i in Rn};
param beta {i in Rn};
param L {i in Rn} := 10;
var q {i in Rn} >= 0; # production vector
var Q = sum {i in Rn} q[i];
var divQ = (5000/Q)**(1/gamma);
s.t. feas {i in Rn}:
q[i] >= 0 complements
0 <= c[i] + (L[i] * q[i])**(1/beta[i])  divQ
 q[i] * (1/gamma) * divQ / Q;
%%%%%%%FILE nash.dat%%%%%%%%%%
data;
param c :=
54 CHAPTER 3. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES
1 5
2 3
3 8
4 5
5 1
6 3
7 7
8 4
9 6
10 3
;
param beta :=
1 1.2
2 1
3 .9
4 .6
5 1.5
6 1
7 .7
8 1.1
9 .95
10 .75
;
%%%%%%%FILE nash.run%%%%%%%%%%
model nash.mod;
data nash.dat;
set initpoint := 1 .. 4;
param initval {Rn,initpoint} >= 0;
data;param initval
: 1 2 3 4 :=
1 1 10 1.0 7
2 1 10 1.2 4
3 1 10 1.4 3
4 1 10 1.6 1
5 1 10 1.8 18
6 1 10 2.1 4
7 1 10 2.3 1
8 1 10 2.5 6
9 1 10 2.7 3
10 1 10 2.9 2
;
for {point in initpoint}
3.6. CORRELATED EQUILIBRIA 55
{
let{i in Rn} q[i] := initval[i,point];
solve;
include compchk
}
3.5.3 Computing the solution of a classical Cournot model
There are many approaches to ﬁnd a solution to a static Cournot game. We have
described above the one based on a solution of an NLCP. We list below a few other
alternatives
(a) one discretizes the decision space and uses the complementarity algorithm pro
posed in [Lemke & Howson 1964] to compute a solution to the associated bima
trix game;
(b) one exploits the fact that the players are uniquely coupled through the condi
tion that Q =
j=1,...,m
q
j
and a sort of primaldual search technique is used
[Murphy,Sherali & Soyster, 1982];
(c) one formulates the NashCournot equilibrium problem as a variational inequal
ity. Assumption 3.5.1 implies Strict Diagonal Concavity (SDC) , in the sense of
Rosen, i.e. strict monotonicity of the pseudogradient operator, hence unique
ness and convergence of various gradient like algorithms.
3.6 Correlated equilibria
Aumann, [Aumann, 1974], has proposed a mechanism that permits the players to enter
into preplay arrangements so that their strategy choices could be correlated, instead
of being independently chosen, while keeping an equilibrium property. He called this
type of solution correlated equilibrium.
56 CHAPTER 3. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES
3.6.1 Example of a game with correlated equlibria
Example 3.6.1 This example has been initially proposed by Aumann [Aumann, 1974].
Consider a simple bimatrix game deﬁned as follows
c
1
c
2
r
1
5, 1 0, 0
r
2
4, 4 1, 5
This game has two pure strategy equilibria (r
1
, c
1
) and (r
2
, c
2
) and a mixed strategy
equilibrium where each player puts the same probality 0.5 on each possible pure strat
egy. The respective outcomes are shown in Table 3.2 Now, if the players agree to play
(r
1
, c
1
) : 5,1
(r
2
, c
2
) : 1,5
(0.5 −0.5, 0.5 −0.5) : 2.5,2.5
Table 3.2: Outcomes of the three equilibria
by jointly observing a “coin ﬂip” and playing (r
1
, c
1
) if the result is “head”, (r
2
, c
2
) if
it is “tail” then they expect the outcome 3, 3 which is a convex combination of the two
pure equilibrium outcomes.
By deciding to have a coin ﬂip deciding on the equilibrium to play, a new type of
equilibrium has been introduced that yields an outcome located in the convex hull of
the set of Nash equilibrium outcomes of the bimatrix game. In Figure 3.2 we have
represented these different outcomes. The three full circles represent the three Nash
equilibrium outcomes. The triangle deﬁned by these three points is the convex hull of
the Nash equilibrium outcomes. The empty circle represents the outcome obtained by
agreeing to play according to the coin ﬂip mechanism.
¸
`
¿
¿
¿
¸
(3,3)
(2.5,2.5)
(1,5)
(5,1)
`
`
`
`
`
Figure 3.2: The convex hull of Nash equilibria
3.6. CORRELATED EQUILIBRIA 57
It is easy to see that this way to play the game deﬁnes an equilibrium for the exten
sive game shown in Figure 3.3. This is an expanded version of the initial game where,
in a preliminary stage, “Nature” decides randomly the signal that will be observed by
the players
4
. The result of the ”coin ﬂip” is a “public information”, in the sense that
¿
`
`
`
`
`
`
`¬¿
Head
Tail
¿
´
`
`
`
`·
´
`
`
`
`·¿
¿
¿
¿
c
1
c
1
c
1
c
1
c
2
c
2
c
2
c
2
r
1
r
1
r
2
r
2
¿
¿
¿
¿
¿
¿
¿
¿
(5,1)
(0,0)
(4,4)
(1,5)
(5,1)
(0,0)
(4,4)
(1,5)
0.5
0.5
Figure 3.3: Extensive game formulation with “Nature” playing ﬁrst
it is shared by all the players. One can easily check that the correlated equilibrium is
a Nash equilibrium for the expanded game.
We now push the example one step further by assuming that the players agree to
play according to the following mechanism: A random device selects one cell in the
game matrix with the probabilities shown in Figure 3.74.
c
1
c
2
r
1
1
3
0
r
2
1
3
1
3
(3.74)
When a cell is selected, then each player is told to play the corresponding pure strategy.
The trick is that a player is told what to play but is not told what the recommendation to
the other player is. The information received by each player is not “public” anymore.
More precisely, the three possible signals are (r
1
, c
1
), (r
2
, c
1
), (r
2
, c
2
). When Player 1
4
Dotted lines represent information sets of Player 2.
58 CHAPTER 3. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES
receives the signal “play r
2
” he knows that with a probability
1
2
the other player has
been told “play c
1
” and with a probability
1
2
the other player has been told “play
c
2
”. When player 1 receives the signal “play r
1
” he knows that with a probability 1
the other player has been told “play c
1
”. Consider now what Player 1 can do, if he
assumes that the other player plays according to the recommendation. If Player 1 has
been told “play r
2
” and if he plays so he expects
1
2
4 +
1
2
1 = 2.5;
if he plays r
1
instead he expects
1
2
5 +
1
2
0 = 2.5
so he cannot improve his expected reward. If Player 1 has been told “play r
1
” and
if he plays so he expects 5, whereas if he plays r
2
he expects 4. So, for Player 1,
obeying to the recommendation is the best reply to Player 2 behavior when he himself
plays according to the suggestion of the signalling scheme. Now we can repeat the the
veriﬁcation for Player 2. If he has been told “play c
1
” he expects
1
2
1 +
1
2
4 = 2.5;
if he plays c
2
instead he expects
1
2
0 +
1
2
5 = 2.5
so he cannot improve. If he has been told “play c
2
” he expects 5 whereas if he plays
c
1
instead he expects 4 so he is better off with the suggested play. So we have checked
that an equilibrium property holds for this way of playing the game. All in all, each
player expects
1
3
5 +
1
3
1 +
1
3
4 = 3 +
1
3
from a game played in this way. This is illustrated in Figure 3.4 where the black spade
shows the expected outcome of this mode of play. Auman called it a ”correlated equi
librium”. Indeed we can now mix these equilibria and still keep the correlated equi
librium property, as indicated by the doted line on Figure 3.4; also we can construct
an expanded game in extensive form for which the correlated equilibrium constructed
as above deﬁnes a Nash equilibrium (see Exercise 3.6).
In the above example we have seen that, by expanding the game via the adjunction of
a ﬁrst stage where “Nature” plays and gives information to the players, a new class
of equilibria can be reached that dominate, in the outcome space, some of the original
Nashequilibria. If the random device gives an information which is common to all
players, then it permits a mixing of the different pure strategy Nash equilibria and the
outcome is in the convex hull of the Nash equilibrium outcomes. If the random device
gives an information which may be different from one player to the other, then the
correlated equilibrium can have an outcome which lies out of the convex hull of Nash
equilibrium outcomes.
3.6. CORRELATED EQUILIBRIA 59
¸
`
¿
¿
¿
¸
(3,3)
(2.5,2.5)
(1,5)
(5,1)
`
`
`
`
`
(3.33,3.33)
♠
Figure 3.4: The dominating correlated equilibrium
3.6.2 A general deﬁnition of correlated equilibria
Let us give a general deﬁnition of a correlated equilibrium in an mplayer normal form
game. We shall actually give two deﬁnitions. The ﬁrst one describes the construct
of an expanded game with a random device distributing some preplay information to
the players. The second deﬁnition which is valid for mmatrix games is much simpler
although equivalent.
Nash equilibrium in an expanded game
Assume that a game is described in normal form, with m players j = 1, . . . , m, their
respective strategy sets Γ
j
and payoffs V
j
(γ
1
, . . . , γ
j
, . . . , γ
m
). This will be called the
original normal form game.
Assume the players may enter into a phase of preplay communication during
which they design a correlation device that will provide randomly a signal called pro
posed mode of play. Let E = ¦1, 2, . . . , L¦ be the ﬁnite set of the possible modes of
play. The correlation device will propose with probability λ() the mode of play .
The device will then give the different players some information about the proposed
mode of play. More precisely, let H
j
be a class of subsets of E, called the information
structure of player j. Player j, when the mode of play has been selected, receives an
information denoted h
j
() ∈ H
j
. Now, we associate with each player j a meta strategy
denoted ˜ γ
j
: H
j
→ Γ
j
that determines a strategy for the original normal form game,
on the basis of the information received. All this construct, is summarized by the data
(E, ¦λ()¦
∈E
, ¦h
j
() ∈ H
j
¦
j∈M,∈E
, ¦˜ γ
∗
j
: H
j
→ Γ
j
¦
j∈M
) that deﬁnes an expanded
game.
Deﬁnition 3.6.1 The data (E, ¦λ()¦
∈E
, ¦h
j
() ∈ H
j
¦
j∈M,∈E
, ¦˜ γ
∗
j
: H
j
→ Γ
j
¦
j∈M
)
60 CHAPTER 3. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES
deﬁnes a correlated equilibrium of the original normal form game if it is a Nash equi
librium for the expanded game i.e. if no player can improve his expected payoff by
changing unilaterally his meta strategy, i.e. by playing ˜ γ
j
(h
j
()) instead of ˜ γ
∗
j
(h
j
())
when he receives the signal h
j
() ∈ H
j
.
∈E
λ()V
j
([˜ γ
∗
j
(h
j
()), ˜ γ
∗
M−j
(h
M−j
())] ≥
∈E
λ()V
j
([˜ γ
j
(h
j
()), ˜ γ
∗
M−j
(h
M−j
())].
(3.75)
An equivalent deﬁnition for mmatrix games
In the case of an mmatrix game, the deﬁnition given above can be replaced with the
following one, which is much simpler
Deﬁnition 3.6.2 A correlated equilibrium is a probability distribution π(s) over the
set of pure strategies S = S
1
S
2
. . . S
m
such that, for every player j and any
mapping δ
j
: S
j
→ S
j
the following holds
s∈S
π(s)V
j
([s
j
, s
M−j
]) ≥
s∈S
π(s)V
j
([δ(s
j
), s
M−j
]). (3.76)
3.7 Bayesian equilibrium with incomplete information
Up to now we have considered only games where each player knows everything con
cerning the rules, the players types (i.e. their payoff functions, their strategy sets) etc...
We were dealing with games of complete information. In this section we look at a
particular class of games with incomplete information and we explore the class of so
called Bayesian equilibria.
3.7.1 Example of a game with unknown type for a player
In a game of incomplete information, some players do not know exactly what are the
characteristics of other players. For example, in a twoplayer game, Player 2 may not
know exactly what the payoff function of Player 1 is.
Example 3.7.1 Consider the case where Player 1 could be of two types, called θ
1
and
θ
2
respectively. We deﬁne below two matrix games, corresponding to the two possible
3.7. BAYESIAN EQUILIBRIUM WITH INCOMPLETE INFORMATION 61
types respectively:
θ
1
c
1
c
2
r
1
0, −1 2, 0
r
2
2, 1 3, 0
θ
2
c
1
c
2
r
1
1.5, −1 3.5, 0
r
2
2, 1 3, 0
Game 1 Game 2
If Player 1 is of type θ
1
, then the bimatrix game 1 is played; if Player 1 is of type θ
2
,
then the bimatrix game 2 is played; the problem is that Player 2 does not know the type
of Player 1.
3.7.2 Reformulation as a game with imperfect information
Harsanyi, in [Harsanyi, 196768], has proposed a transformation of a game of incom
plete information into a game with imperfect information. For the example 3.7.1 this
transformation introduces a preliminary chance move, played by Nature which decides
randomly the type θ
i
of Player 1. The probability distribution of each type, denoted
p
1
and p
2
respectively, represents the beliefs of Player 2, given here in terms of prior
probabilities, about facing a player of type θ
1
or θ
2
. One assumes that Player 1 knows
also these beliefs, the prior probabilities are thus common knowledge. The information
structure
5
in the associated extensive game shown in Figure 3.5, indicates that Player 1
knows his type when deciding, whereas Player 2 does not observe the type neither,
indeed in that game of simultaneous moves, the action chosen by Player 1. Call x
i
(resp. 1−x
i
) the probability of choosing r
1
(resp. r
2
) by Player 1 when he implements
a mixed strategy, knowing that he is of type θ
i
. Call y (resp. 1 − y) the probability of
choosing c
1
(resp. c
2
) by Player 2 when he implements a mixed strategy.
We can deﬁne the optimal response of Player 1 to the mixed strategy (y, 1 −y) of
Player 2 by solving
6
max
i=1,2
a
θ
1
i1
y + a
θ
1
i2
(1 −y)
if the type is θ
1
,
max
i=1,2
a
θ
2
i1
y + a
θ
2
i2
(1 −y)
if the type is θ
2
.
We can deﬁne the optimal response of Player 2 to the pair of mixed strategy (x
i
, 1−
x
i
), i = 1, 2 of Player 1 by solving
max
j=1,2
p
1
(x
1
b
θ
1
1j
+ (1 −x
1
)b
θ
1
2j
) + p
2
(x
2
b
θ
2
1j
+ (1 −x
2
)b
θ
2
2j
).
5
The dotted line in Figure 3.5 represents the information set of Player 2.
6
We call a
θ
ij
and b
θ
ij
the payoffs of Player 1 and Player 2 respectively when the type is θ
.
62 CHAPTER 3. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES
¿
`
`
`
`
`
`
`¬¿
θ
1
¿
θ
2
´
`
`
`
`·
´
`
`
`
`·¿
¿
¿
¿
¿
¿
¿
¿
¿
¿
¿
¿
(0,1)
(2,0)
(2,1)
(3,0)
(1.5,1)
(3.5,0)
(2,1)
(3,0)
c
1
c
1
c
1
c
1
c
2
c
2
c
2
c
2
r
1
r
1
r
2
r
2
p
1
p
2
Figure 3.5: Extensive game formulation of the game of incomplete information
Let us rewrite these conditions with the data of the game illustrated in Figure 3.5. First
consider the reaction function of Player 1
θ
1
⇒ max¦0y + 2(1 −y), 2y + 3(1 −y)¦
θ
2
⇒ max¦1.5y + 3.5(1 −y), 2y + 3(1 −y)¦.
We draw in Figure 3.6 the lines corresponding to these comparisons between two linear
functions. We observe that, if Player 1’s type is θ
1
, he will always choose r
2
, whatever
y, whereas, if his type is θ
2
he will choose r
1
if y is mall enough and switch to r
2
when
y > 0.5. For y = 0.5, the best reply of Player 1 could be any mixed strategy (x, 1−x).
Consider now the optimal reply of Player 2. We know that Player 1, if he is of type
θ
1
xhooses always action r
2
i.e. x
1
= 0. So the best reply conditions for Player 2 can
be written as follows
max¦(p
1
(−1) +(1−p
1
)(x
2
(−1) +(1−x
2
)1)), (p
1
(0) +(1−p
1
)(x
2
(0) +(1−x
2
)0))¦
which boils down to
max¦(1 −2(1 −p
1
)x), 0)¦.
We conclude that Player 2 chooses action c
1
if x
2
<
1
2(1−p
1
)
, action c
2
if x
2
>
1
2(1−p
1
)
and any mixed action with y ∈ [0, 1] if x
2
=
1
2(1−p
1
)
. We conclude easily from these
observations that the equilibria of this game are characterized as follows:
3.7. BAYESIAN EQUILIBRIUM WITH INCOMPLETE INFORMATION 63
¸
`










0 1
y
r
1
r
2
θ
1
2
3
¸
`










0 1 0.5
y
θ
2
3.5
3
2
1.5
r
1
r
2
Figure 3.6: Optimal reaction of Player 1 to Player 2 mixed strategy (y, 1 −y)
x
1
≡ 0
if p
1
≤ 0.5 x
2
= 0, y = 1 or x
2
= 1, y = 0 or x
2
=
1
2(1−p
1
)
, y = 0.5
if p
1
> 0.5 x
2
= 0, y = 1.
3.7.3 A general deﬁnition of Bayesian equilibria
We can generalize the analysis performed on the previous example and introduce the
following deﬁnitions. Let M be a set of m players. Each player j ∈ M may be of
different possible types. Let Θ
j
be the ﬁnite set of types for Player j. Whatever his
type, Payer j has the same set of pure strategies o
j
. If θ = (θ
1
, . . . .θ
m
) ∈ Θ =
Θ
1
. . . Θ
m
is a type speciﬁcation for every player, then the normal form of the
game is speciﬁed by the payoff functions
u
j
(θ; , . . . , ) : o
1
. . . o
m
→ IR, j ∈ M. (3.77)
A prior probability distribution p(θ
1
, . . . .θ
m
) on Θ is given as common knowledge. We
assume that all the marginal probabilities are nonzero
p
j
(θ
j
) > 0, ∀j ∈ M.
When Player j observes that he is of type θ
j
∈ Θ
j
he can construct his revised condi
tional probality distribution on the types θ
M−j
of the other players through the Bayes
formula
p(θ
M−j
[θ
j
) =
p([θ
j
, θ
M−j
])
p
j
(θ
j
)
. (3.78)
We can now introduce an expanded game where Nature draws randomly, according to
the prior probability distribution p(), a type vector θ ∈ Θ for all players. Playerj can
observe only his own type θ
j
∈ Θ
j
. Then each player j ∈ M picks a strategy in his
own strategy set o
j
. The outcome is then deﬁned by the payoff functions (3.77).
64 CHAPTER 3. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES
Deﬁnition 3.7.1 In a game of incomplete information with m players having respec
tive type sets Θ
j
and pure strategy sets o
j
, i = 1, . . . , m, a Bayesian equilibrium is a
Nash equilibrium in the “ expanded game ” in which each player pure strategy γ
j
is a
map from Θ
j
to o
j
.
The expanded game can be described in its normal form as follows. Each player j ∈ M
has a strategy set Γ
j
where a strategy is deﬁned as a mapping γ
j
: Θ
j
→ o
j
. Associated
with a strategy proﬁle γ = (γ
1
, . . . , γ
m
) the payoff to player j is given by
V
j
(γ) =
θ
j
∈Θ
j
p
j
(θ
j
)
θ
M−j
∈Θ
M−j
p(θ
M−j
[θ
j
)
u
j
([θ
j
, θ
M−j
]; γ
1
(θ
1
), . . . , γ
j
(θ
j
), . . . , γ
m
(θ
m
)). (3.79)
As usual, a Nash equilibrium is a strategy proﬁle γ
∗
= (γ
∗
1
, . . . , γ
∗
m
) ∈ Γ = Γ
1
. . .
Γ
m
such that
V
j
(γ
∗
) ≥ V
j
(γ
j
, γ
∗
M−j
) ∀γ
j
∈ Γ
j
. (3.80)
It is easy to see that, since each p
j
(θ
j
) is positive, the equilibrium conditions (3.80)
lead to the following conditions
γ
∗
j
(θ
j
) = argmax
s
j
∈S
j
θ
M−j
∈Θ
M−j
p(θ
M−j
[θ
j
) u
j
([θ
j
, θ
M−j
]; [s
j
, γ
∗
M−j
(θ
M−j
)])
j ∈ M.(3.81)
Remark 3.7.1 Since the sets Θ
j
and o
j
are ﬁnite, the set Γ
j
of mappings from Θ
j
to o
j
is also ﬁnite. Therefore the expanded game is an mmatrix game and, by Nash theorem
there exists at least one mixed strategy equilibrium.
3.8 Appendix on Kakutani Fixedpoint theorem
Deﬁnition 3.8.1 Let Φ : IR
m
→ 2
IR
n
be a point to set mapping. We say that this
mapping is upper semicontinuous if, whenever the sequence ¦x
k
¦
k=1,2,...
converges in
IR
m
toward x
0
then any accumulation point y
0
of the sequence ¦y
k
¦
k=1,2,...
in IR
n
, where
y
k
∈ Φ(x
k
), k = 1, 2, . . ., is such that y
0
∈ Φ(x
0
).
Theorem 3.8.1 Let Φ : A → 2
A
be a point to set upper semicontinuous mapping,
where A is a compact subset
7
of IR
m
. Then there exists a ﬁxedpoint for Φ. That is,
there is x
∗
∈ Φ(x
∗
) for some x
∗
∈ A.
7
i.e. a closed and bounded subset.
3.9. EXERCISES 65
3.9 exercises
Exercise 3.1: Consider the matrix game.
_
3 1 8
4 10 0
_
Assume that the players are in the simultaneous move information structure. Assume
that the players try to guess and counterguess the optimal behavior of the opponent in
order to determine their optimal strategy choice. Show that this leads to an unstable
process.
Exercise 3.2: Find the value and the saddle point mixedstrategies for the above ma
trix game.
Exercise 3.3: Deﬁne the quadratic programming problem that will ﬁnd a Nash equi
librium for the bimatrix game
_
(52, 50)
∗
(44, 44) (44, 41)
(42, 42) (46, 49)
∗
(39, 43)
_
.
Verify that the entries marked with a ∗ correspond to a solution of the associated
quadratic programming problem.
Exercise 3.4: Do the same as above but using now the complementarity problem
formulation.
Exercise 3.5: Consider the two player game where the strategy sets are the intervals
U
1
= [0, 100] and U
2
= [0, 100] respectively, and the payoffs ψ
1
(u) = 25u
1
+15u
1
u
2
−
4u
2
1
and ψ
2
(u) = 100u
2
− 50u
1
− u
1
u
2
− 2u
2
2
respectively. Deﬁne the best reply
mapping. Find an equilibrium point. Is it unique?
66 CHAPTER 3. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES
Exercise 3.6: Consider the two player game where the strategy sets are the intervals
U
1
= [10, 20] and U
2
= [0, 15] respectively, and the payoffs ψ
1
(u) = 40u
1
+ 5u
1
u
2
−
2u
2
1
and ψ
2
(u) = 50u
2
− 3u
1
u
2
− 2u
2
2
respectively. Deﬁne the best reply mapping.
Find an equilibrium point. Is it unique?
Exercise 3.7: Consider an oligopoly model. Show that the assumptions 3.5.1 deﬁne
a concave game in the sense of Rosen.
Exercise 3.6: In example 3.7.1 a correlated equilibrium has been constructed for the
game
c
1
c
2
r
1
5, 1 0, 0
r
2
4, 4 1, 5
.
Find the associated extensive form game for which the proposed correlated equilibrium
corresponds to a Nash equilibrium.
Part II
Repeated and sequential Games
67
Chapter 4
Repeated games and memory
strategies
In this chapter we begin our analysis of “dynamic games”. To say it in an imprecise
but intuitive way, the dynamic structure of a game expresses a change over time of the
conditions under which the game is played. In a repeated game this structural change
over time is due to the accumulation of information about the “history” of the game. As
time unfolds the information at the disposal of the players changes and, since strategies
transform this information into actions, the players’ strategic choices are affected. If
a game is repeated twice, i.e. the same game is played in two successive stages and a
reward is obtained at each stage, one can easily see that, at the second stage, the players
can decide on the basis of the outcome of the ﬁrst stage. The situation becomes more
and more complex as the number of stages increases, since the players can base their
decisions on histories represented by sequences of actions and outcomes observed
over increasing numbers of stages. We may also consider games that are played over an
inﬁnite number of stages. These inﬁnitely repeated games are particularly interesting,
since, at each stage, there is always an inﬁnite number of remaining stages over which
the game will be played and, therefore, there will not be the socalled “end of horizon
effect” where the fact that the game is ending up affects the players behavior. Indeed,
the evaluation of strategies will have to be based on the comparison of inﬁnite streams
of rewards and we shall explore different ways to do that. In summary, the important
concept to understand here is that, even if it is the “same game” which is repeated over
a number of periods, the global “repeated game” becomes a fully dynamic system with
a much more complex strategic structure than the onestage game.
In this chapter we shall consider repeated bimatrix games and repeated concave m
player games. We shall explore, in particular, the class of equilibria when the number
of periods over which the game is repeated (the horizon ) becomes inﬁnite. We will
show how the class of equilibria in repeated games can be expended very much if one
69
70 CHAPTER 4. REPEATED GAMES AND MEMORY STRATEGIES
allows memory in the information structure. The class of memory strategies allows the
players to incorporate threats in their strategic choices. A threat is meant to deter the
opponents to enter into a detrimental behavior. In order to be effective a threat must be
credible. We shall explore the credibility issue and show how the subgame perfectness
property of equilibria introduces a reﬁnement criterion that make these threats credible.
Notice that, at a given stage, the players actions have no direct inﬂuence on the
normal form description of the games that will be played in the forthcoming periods. A
more complex dynamic structure would obtain if one assumes that the players actions
at one stage inﬂuence the type of game to be played in the next stage. This is the case
for the class of Markov or sequential games and a fortiori for the class of differential
games, where the time ﬂows continuously. These dynamic games will be discussed in
subsequent chapters of this volume.
4.1 Repeating a game in normal form
Repeated games have been also called games in semiextensive formin [Friedman 1977].
Indeed this corresponds to an explicit dynamic structure, as in the extensive form, but
with a game in normal form deﬁned at each period of time. In the repeated game,
the global strategy becomes a way to deﬁne the choice of a stage strategy at each pe
riod, on the basis of the knowledge accumulated by the players on the ”history” of the
game. Indeed the repetition of the same game in normal form permits the players to
adapt their strategies to the observed history of the game. In particular, they have the
opportunity to implement a class of strategies that incorporate threats.
The payoff associated with a repeated game is usually deﬁned as the sum of the
payoffs obtained at each period. However, when the number of periods to tend to ∞,
a total payoff that implies an inﬁnite sum may not converge to a ﬁnite value. There are
several ways of dealing with the comparison of inﬁnite streams of payoffs. We shall
discover some of them in the subsequent examples.
4.1.1 Repeated bimatrix games
Figure 4.1 describes a repeated bimatrix game structure. The same matrix game is
played repeatedly, over a number T of periods or stages that represent the passing of
time. In the onestage game, where Player 1 has p pure strategies and Player 2 q pure
strategies and the onestage payoff pair associated with the startegy pair (k, ) is given
by (α
j
k
)
j=1,2
. The payoffs are accumulated over time. As the players may recall the
past history of the game, the extensive form of those repeated games is quite complex.
4.1. REPEATING A GAME IN NORMAL FORM 71
1 q
1 (α
j
11
)
j=1,2
(α
j
1q
)
j=1,2
.
.
.
.
.
.
.
.
.
.
.
.
p (α
j
p1
)
j=1,2
(α
j
pq
)
j=1,2
¸
1 q
1 (α
j
11
)
j=1,2
(α
j
1q
)
j=1,2
.
.
.
.
.
.
.
.
.
.
.
.
p (α
j
p1
)
j=1,2
(α
j
pq
)
j=1,2
¸
Figure 4.1: A repeated game
The game is deﬁned in a semiextensive form since at each period a normal form (i.e.
an already aggregated description) deﬁnes the consequences of the strategic choices
in a onestage game. Even if the same game seems to be played at each period (it is
similar to the others in its normal form description), in fact, the possibility to use the
history of the game as a source of information to adapt the strategic choice at each
stage makes a repeated game a fully dynamic “object”.
We shall use this structure to prove an interesting result, called the “folk theorem”,
which shows that we can build an inﬁnitely repeated bimatrix game, played with the
help of ﬁnite automata, where the class of equilibria permits the approximation of any
individually rational outcome of the onestage bimatrix game.
4.1.2 Repeated concave games
Consider now the class of dynamic games where one repeats a concave game ` a la
Rosen. Let t ∈ ¦0, 1, . . . , T −1¦ be the time index. At each time period t the players
enter into a noncooperative game deﬁned by
(M; U
j
, ψ
j
(), j ∈ M)
where as indicated in section 3.4, M = ¦1, 2, . . . , m¦ is the set of players, U
j
is a
compact convex set describing the actions available to Player j at each period and
ψ
j
(u(t)) is the payoff to player j when the action vector u(t) = (u
1
(t), . . . , u
m
(t)) ∈
U = U
1
U
2
U
m
is chosen at period t by the m players. This function is
assumed to be concave in u
j
.
We shall denote ˜ u = (u(t) : t = 0, 1 . . . , T − 1) the actions sequence over the
sequence of T periods. The total payoff of player j over the Thorizon is then deﬁned
by
V
T
j
(˜ u) =
T−1
t=0
ψ
j
(u(t)). (4.1)
72 CHAPTER 4. REPEATED GAMES AND MEMORY STRATEGIES
Openloop information structure
At period t, each player j knows only the current time t and what he has played in
periods τ = 0, 1, . . . , t −1. He does not observe what the other players do. We implic
itly assume that the players cannot use the rewards they receive at each period to infer
through e.g. an appropriate ﬁltering the actions of the other players. The openloop
information structure actually eliminates almost every aspect of the dynamic structure
of the repeated game context.
Closedloop information structure
At period t, each player j knows not only the current time t and what he has played
in periods τ = 0, 1, . . . , t − 1, but also what the other players have done at previous
period. We call history of the game at period τ the sequence
h(τ) = (u(t) : t = 0, 1 . . . , τ −1).
A closedloop strategy for Player j is deﬁned as a sequence of mappings
γ
τ
j
: h(τ) → U
j
.
Due to the repetition over time of the same game, each player j can adjust his choice
of onestage action u
j
to the history of the game. This permits the consideration of
memory strategies where some threats are included in the announced strategies. For
example, a player can declare that some particular histories would “trigger” a retalia
tion from his part. The description of a trigger strategy will therefore include
• a nominal “mood” of play that contributes to the expected or desired outcomes
• a retaliation “mood” of play that is used as a threat
• the set of histories that trigger a switch from the nominal mood of play to the
retaliation mood of play .
Many authors have studied equilibria obtained in repeated games through the use of
trigger strategies (see for example [Radner, 1980], [Radner, 1981], [Friedman 1986]).
Inﬁnite horizon games
If the game is repeated over an inﬁnite number of periods then payoffs are represented
by inﬁnite streams of rewards ψ
j
(u(t)), t = 1, 2, . . . , ∞.
4.1. REPEATING A GAME IN NORMAL FORM 73
Discounted sums: We have assumed that the actions sets U
j
are compact, therefore,
since the functions ψ
j
() are continuous, the onestage rewards ψ
j
(u(t)) are uniformly
bounded. Hence, the discounted payoffs
V
j
(˜ u) =
∞
t=0
β
t
j
ψ
j
(u(t)), (4.2)
where β
j
∈ [0, 1) is the discount factor of Player j, are well deﬁned. Therefore, if the
players discount time, we can compare strategies on the basis of the discounted sums
of the rewards they generate for each player.
Average rewards per period: If the players do not discount time the comparison of
strategies imply the consideration of inﬁnite streams of rewards that sum up to inﬁnity.
A way to circumvent the difﬁculty is to consider a limit average reward per period in
the following way
g
j
(˜ u) = liminf
T→∞
1
T
T−1
t=0
ψ
j
(u(t)). (4.3)
To link this evaluation of inﬁnite streams of rewards with the previous one based on
discounted sums one may deﬁne the equivalent discounted constant reward
g
β
j
j
(˜ u) = (1 −β
j
)
∞
t=0
β
t
j
ψ
j
(u(t)). (4.4)
One expects that, for well behaved games, the following limiting property holds true
lim
β
j
→1
g
β
j
j
(˜ u) = g
j
(˜ u). (4.5)
In inﬁnite horizon repeated games each player has always enough time to retaliate . He
will always have the possibility to implement an announced threat, if the opponents are
not playing according to some nominal sequence corresponding to an agreement. We
shall see that the equilibria, based on the use of threats constitute a very large class.
Therefore the set of strategies with the memory information structure is much more
encompassing than the openloop one.
Existence of a “trivial” dynamic equilibrium
We know, by Rosen’s existence theorem, that a concave ”static game”
(M; U
j
, ψ
j
(), j ∈ M)
admits an equilibrium u
∗
. It should be clear (see exercise 4.1) that the sequence ˜ u
∗
=
(u(t) ≡ u
∗
: t = 0, 1 . . . , T − 1) is also an equilibrium for the repeated game in both
the openloop and closedloop (memory) information structure.
74 CHAPTER 4. REPEATED GAMES AND MEMORY STRATEGIES
Assume the one period game is such that a unique equilibrium exists. The open
loop repeated game will also have a unique equilibrium, whereas the closedloop re
peated game will still have a plethora of equilibria, as we shall see it shortly.
4.2 Folk theorem
In this section we present the celebrated Folk theorem. This name is due to the fact that
there is no well deﬁned author of the ﬁrst version of it. The idea is that an inﬁnitely
repeated game should permit the players to design equilibria, supported by threats,
with outcomes being Pareto efﬁcient. The strategies will be based on memory, each
player remembering the past moves of game. Indeed the inﬁnite horizon could give
raise to an inﬁnite storage of information, so there is a question of designing strategies
with the help of mechanisms that would exploit the history of the game but with a ﬁnite
information storage capacity. Finite automata provide such systems.
4.2.1 Repeated games played by automata
A ﬁnite automaton is a logical system that, when stimulated by an input, generates an
output. For example an automaton could be used by a player to determine the stage
game action he takes given the information he has received. The automaton is ﬁnite
if the length of the logical input (a stream of 0 − 1 bits) it can store is ﬁnite. So in
our repeated game context, a ﬁnite automaton will not permit a player to determine his
stagegame action upon an inﬁnite memory of what has happened before. To describe
a ﬁnite automaton one needs the following
• A list of all possible stored input conﬁgurations; this list is ﬁnite and each ele
ment is called a state. One element of this list is the initial state.
• An output function determines the action taken by the automaton, given its cur
rent state.
• A transition function tells the automaton how to move from one state to another
after it has received a new input element.
In our paradigm of a repeated game played by automata, the input element for the
automaton a of Player 1 is the stagegame action of Player 2 in the preceding stage,
and symmetrically for Player 2. In this way, each player can choose a way to process
the information he receives in the course of the repeated plays, in order to decide the
next action.
4.2. FOLK THEOREM 75
An important aspect of ﬁnite automata is that, when they are used to play a repeated
game, they necessarily generate cycles in the successive states and hence in the actions
taken in the successive stagegames.
Let G be a ﬁnite two player game, i.e. a bimatrix game, which deﬁnes the one
stage game. U and V are the sets of pure strategies in the game G for Player 1 and 2
respectively. Denote by g
j
(u, v) the payoff to Player j in the onestage game when the
pure strategy pair (u, v) ∈ U V has been selected by the two players.
The game is repeated indeﬁnitely. Although the set of strategies for an inﬁnitely
repeated game is enormously rich, we shall restrict the strategy choices of the players
to the class of ﬁnite automata. A pure strategy for Player 1 is an automaton a ∈ A
which has for input the actions u ∈ U of Player 2. Symmetrically a pure strategy for
Player 2 is an automaton b ∈ B which has for input the actions v ∈ V of Player 1.
We associate with the pair of ﬁnite automata (a, b) ∈ A B a pair of average
payoffs per stage deﬁned as follows
˜ g
j
(a, b) =
1
N
N
n=1
g
j
(u
n
, v
n
) j = 1, 2, (4.6)
where N is the length of the cycle associated with the pair of automata (a, b) and
(u
n
, v
n
) is the action pair at the nth stage in this cycle. One notices that the expression
(4.6) is also the limite average reward per period due to the cycling behavior of the two
automata.
We call ( the game deﬁned by the strategy sets A and B and the payoff functions
(4.6).
4.2.2 Minimax point
Assume Player 1 wants to threaten Player 2. An effective threat would be to deﬁne his
action in a onestage game through the solution of
¯ m
2
= min
u
max
v
g
2
(u, v).
Similarly if Player 2 wants to threaten Player 2, an effective threat would be to deﬁne
his action in a onestage game through the solution of
¯ m
1
= min
v
max
u
g
1
(u, v).
We call minimax point the pair ¯ m = ( ¯ m
1
, ¯ m
2
) in IR
2
.
76 CHAPTER 4. REPEATED GAMES AND MEMORY STRATEGIES
4.2.3 Set of outcomes dominating the minimax point
Theorem 4.2.1 Let T
¯ m
be the set of outcomes in a matrix game that dominate the min
imax point ¯ m. Then the outcomes corresponding to Nash equilibria in pure strategies
1
of the game ( are dense in T
¯ m
.
Proof: Let g
1
, g
2
, . . . , g
K
be the pairs of payoffs that appear in the bimatrix game
when it is written in the vector form, that is
g
k
= (g
1
(u
k
, v
k
), g
2
(u
k
, v
k
))
where (u
k
, v
k
) is a possible pure strategy pair. Let q
1
, q
2
, . . . , q
K
be nonnegative ratio
nal numbers that sum to 1. Then the convex combination
g
∗
= q
1
g
1
+ q
2
g
2
, . . . , q
K
g
K
is an element of T if g
∗
1
≥ ¯ m
1
and g
∗
2
≥ ¯ m
2
. The set of these g is dense in T. Each q
k
being a rational number can be written as a ratio of two integers, that is a fraction. It is
possible to write the K fractions with a common denominator, hence
q
k
=
n
k
N
,
with
n
1
+ n
2
+ . . . + n
K
= N
since the q
k
’s must sum to 1.
We construct two automata a
∗
and b
∗
, such that, together, they play (u
1
, v
1
) dur
ing n
1
stages, (u
2
, v
2
) during n
2
stages, etc. They complete the sequence by playing
(u
K
, v
K
) during n
K
stages and the Nstage cycle begins again. The limit average per
period reward of Player j in the ( game played by these automata is given by
˜ g
j
(a
∗
, b
∗
) =
1
N
K
k=1
n
k
g
j
(u
k
, v
k
) =
K
k=1
q
k
g
j
(u
k
, v
k
) = g
∗
j
.
Therefore these two automata will achieve the payoff pair g. Now we reﬁne the struc
ture of these automata as follows:
• Let ˜ u(n) and ˜ v(n) be the pure strategy that Player 1 and Player 2 respectively
are supposed to play at stage n, according to the cycle deﬁned above.
1
A pure strategy in the game ( is a deterministic choice of an automaton; this automaton will process
information coming from the repeated use of mixed strategies in the repeated game.
4.3. COLLUSIVE EQUILIBRIUM IN A REPEATED COURNOT GAME 77
• The automaton a
∗
plays ˜ u(n + 1) at stage n + 1 if automaton b has played ˜ v(n)
at stage n; otherwise it plays ¯ u, the strategy that is minimaxing Player 2’s one
shot payoff and this strategy will be played forever.
• Symmetrically, the automaton b
∗
plays ˜ v(n + 1) at stage n + 1 if automaton a
has played ˜ u(n) at stage n; otherwise it plays ¯ v, the strategy that is minimaxing
Player 1’s one shot payoff and this strategy will be played forever.
It should be clear that the two automata a
∗
and b
∗
deﬁned as above constitute a Nash
equilibriumfor the repeated game ( whenever g is dominating ¯ m, i.e. g ∈ T
¯ m
. Indeed,
if Player 1 changes unilaterally his automaton to a then the automaton b
∗
will select
for all periods, except a ﬁnite number, the minimax action ¯ v and, as a consequence, the
limit average per period reward obtained by Player 1 is
˜ g
1
(a, b
∗
) ≤ ¯ m
1
≤ ˜ g
1
(a
∗
, b
∗
).
Similarly for Player 2, a unilateal change to b ∈ B leads to a payoff
˜ g
2
(a
∗
, b) ≤ ¯ m
2
≤ ˜ g
2
(a
∗
, b
∗
).
Q.E.D.
Remark 4.2.1 This result is known as the Folk theorem, because it has always been in
the ”folklore” of game theory without knowing to whom it should be attributed.
The class of equilibria is more restricted if one imposes a condition that the threats
used by the players in their announced strategies have to be credible. This is the issue
discussed in one of the following sections.
4.3 Collusive equilibrium in a repeated Cournot game
Consider a Cournot oligopoly game repeated over an inﬁnite sequence of periods. Let
t = 1, . . . , ∞be the sequence of time periods. Let q(t) ∈ IR
m
be the production vector
chosen by the m ﬁrms at period t and ψ
j
(q(t)) the payoff to player j at period t.
We denote
˜ q = q(1), . . . , q(t), . . .
the inﬁnite sequence of production decisions of the m ﬁrms. Over the inﬁnite time
horizon the rewards of the players are deﬁned as
˜
V
j
(˜ q) =
∞
t=0
β
t
j
ψ
j
(q(t))
78 CHAPTER 4. REPEATED GAMES AND MEMORY STRATEGIES
where β
j
∈ [0, 1) is a discount factor used by Player j.
We assume that at each period t all players have the same information h(t) which
is the history of passed productions h(t) = (q(0), q(1), . . . , q(t −1)).
We consider, for the onestage game, two possible outcomes,
(i) the Cournot equilibrium, supposed to be uniquely deﬁned by q
c
= (q
c
j
)
j=1,...,m
and
(ii) a Pareto outcome resulting from the production levels q
∗
= (q
∗
j
)
j=1,...,m
and such
that ψ
j
(q)
∗
≥ ψ
j
(q
c
), j = 1, . . . , m. We say that this Pareto outcome is a
collusive outcome dominating the Cournot equilibrium.
A Cournot equilibrium for the repeated game is deﬁned by the strategy consisting
for each Player j to choose repeatedly the production levels q
j
(t) ≡ q
c
j
, j = 1, 2. How
ever there are many other equilibria that can be obtained through the use of memory
strategies.
Let us deﬁne
˜
V
∗
j
=
∞
t=0
β
t
j
ψ
j
(q
∗
) (4.7)
Φ
j
(q
∗
) = max
q
j
ψ
j
([q
∗(−j)
, q
j
]). (4.8)
The following has been shown in [Friedman 1977]
Lemma 4.3.1 There exists an equilibrium for the repeated Cournot game that yields
the payoffs
˜
V
∗
j
if the following inequality holds
β
j
≥
Φ
j
(q
∗
) −ψ
j
(q
∗
)
Φ
j
(q
∗
) −ψ
j
(q
c
)
. (4.9)
This equilibrium is reached through the use of a socalled trigger strategy deﬁned as
follows
if (q(0), q(1), . . . , q(t −1)) = (q
∗
, q
∗
, . . . , q
∗
) then q
j
(t) = q
∗
j
otherwise q
j
(t) = q
c
j
.
_
(4.10)
Proof: If the players play according to (4.10) the payoff they obtain is given by
˜
V
∗
j
=
∞
t=0
β
t
j
ψ
j
(q
∗
) =
1
1 −β
j
ψ
j
(q
∗
).
4.3. COLLUSIVE EQUILIBRIUM IN A REPEATED COURNOT GAME 79
Assume Player j decides to deviate unilaterally at period 0. He knows that his
deviation will be detected at period 1 and that, thereafter the Cournot equilibrium pro
duction level q
c
will be played. So, the best he can expect from this deviation is the
following payoff which combines the maximal reward he can get in period 0 when the
other ﬁrms play q
∗
k
with the discounted value of an inﬁnite streamof Cournot outcomes.
˜
V
j
(˜ q) = Φ
j
(q
∗
) +
β
j
1 −β
j
ψ
j
(q
c
).
This unilateral deviation is not proﬁtable if the following inequality holds
ψ
j
(q
∗
) −Φ
j
(q
∗
) +
β
j
1 −β
j
(ψ
j
(q
∗
) −ψ
j
(q
c
)) ≥ 0,
which can be rewritten as
(1 −β
j
)(ψ
j
(q
∗
) −Φ
j
(q
∗
)) + β
j
(ψ
j
(q
∗
) −ψ
j
(q
c
)) ≥ 0,
from which we get
ψ
j
(q
∗
) −Φ
j
(q
∗
) −β
j
(ψ
j
(q
c
) −Φ
j
(q
∗
)) ≥ 0.
and, ﬁnally the condition (4.9)
β
j
≥
ψ
j
(q
∗
) −Φ
j
(q
∗
)
Φ
j
(q
∗
) −ψ
j
(q
c
)
.
The same reasonning holds if the deviation occurs at any other period t. Q.E.D.
Remark 4.3.1 According to this strategy the q
∗
production levels correspond to a
cooperative mood of play while the q
c
correspond to a punitive mood of play. The
punitive mood is a threat. As the punitive mood of play consists to play the Cournot
solution which is indeed an equilibrium, it is a credible threat. Therefore the players
will always choose to play in accordance with the Pareto solution q
∗
and thus the
equilibrium deﬁnes a nondominated outcome.
It is important to notice the difference between the threats used in the folk theorem
and the ones used in this repeated Cournot game. Here the threats are constituting
themselves an equilibrium for the dynamic game. We say that the memory (trigger
strategy) equilibrium is subgame perfect in the sense of Selten [Selten, 1975].
4.3.1 Finite vs inﬁnite horizon
There is an important difference between ﬁnitely and inﬁnitely repeated games. If the
same game is repeated only a ﬁnite number of times then, in a subgame perfect equi
librium, a single game equilibrium will be repeated at each period. This is due to the
80 CHAPTER 4. REPEATED GAMES AND MEMORY STRATEGIES
impossibility to retaliate at the last period. Then this situation is translated backward
in time until the initial period. When the game is repeated over an inﬁnite time horizon
there is always enough time to retaliate and possibly resume cooperation. For exam
ple, let us consider a repeated Cournot game, without discounting where the players’
payoffs are determined by the long term average of the one sage rewards. If the players
have perfect information with delay they can observe without errors the moves selected
by their opponents in previous stages. There might be a delay of several stages before
this observation is made. In that case it is easy to show that, to any cooperative solution
dominating a static Nash equilibrium corresponds a subgame perfect sequential equi
librium for the repeated game, based on the use of trigger strategies. These strategies
use as a threat a switch to the noncooperative solution for all remainig periods, as soon
as a player has been observed not to play according to the cooperative solution. Since
the long term average only depends on the inﬁnitely repeated one stage rewards, it is
clear that the threat gives raise to the Cournot equilibrium payoff in the long run. This
is the subgame perfect version of the Folk theorem [Friedman 1986].
4.3.2 A repeated stochastic Cournot game with discounting and
imperfect information
We give now an example of stochastic repeated game for wich the construction of
equilibria based on the use of threats is not trivial. This is the stochastic repeated game,
based on the Cournot model with imperfect information, (see [Green & Porter, 1985],
[Porter, 1983]).
Let X(t) =
m
j=1
x
j
(t) be the total supply of m ﬁrms at time t and θ(t) be a
sequence of i.i.d
2
random variables affecting these prices. Each ﬁrm j chooses its
supplies x
j
(t) ≥ 0, in a desire to maximize its total expected discounted proﬁt
V
j
(x
1
(), . . . , x
j
(), . . . , x
m
()) = E
θ
∞
t=0
β
t
[D(X(t), θ(t))x
j
(t) −c
j
(x
j
(t))],
j = 1, . . . , m,
where β < 1 is the discount factor. The assumed information structure does not allow
each player to observe the opponent actions used in the past. Only the sequence of
past prices, corrupted by the random noise, is available. Therefore the players can
not monitor without error the past actions of their opponent(s). This impossibility to
detect without error a breach of cooperation increases considerably the difﬁculty of
building equilibria based on the use of credible threats. In [Green & Porter, 1985],
[Porter, 1983] it is shown that there exist subgame perfect equilibria, based on the use
of trigger strategies, which are dominating the repeated CournotNash equilibrium.
This construct uses the concept of Markov game and will be further discussed in Part 3
2
Independent and identically distributed.
4.4. EXERCISES 81
of these Notes. It will be shown that these equilibria are in fact closely related to the
socalled correlated and/or communication equilibria discussed at the end of Part 2.
4.4 Exercises
Exercise 5.1. Prove that the repeated onestage equilibrium is an equilibrium for the
repeated game with additive payoffs and a ﬁnite number of periods. Extend the result
to inﬁnitely repeated games.
Exercise 5.2. Show that condition (4.9) holds if β
j
= 1. Adapt Lemma 4.3.1 to the
case where the game is evaluated through the long term average reward criterion.
82 CHAPTER 4. REPEATED GAMES AND MEMORY STRATEGIES
Chapter 5
Shapley’s Zero Sum Markov Game
5.1 Process and rewards dynamics
The concept of Markov game has been introduced, in a zerosum game framework,
in [Shapley. 1953]. The structure of this game can also be described as a controlled
Markov chain with two competing agents.
Let S = ¦1, 2, . . . , n¦ be the set of possible states of a discrete time stochastic
process ¦x(t) : t = 0, 1, . . .¦. Let U
j
= ¦1, 2, . . . , λ
j
¦, j = 1, 2, be the ﬁnite action
sets of two players. The process dynamics is described by the transition probabilities
p
s,s
(u) = P[x(t + 1) = s
[x(t) = s, u], s, s
∈ S, u ∈ U
1
U
2
,
which satisfy, for all u ∈ U
1
U
2
p
s,s
(u) ≥ 0
s
∈S
p
s,s
(u) = 1, s ∈ S.
As the transition probabilities depend on the players actions we speak of a controlled
Markov chain. A transition reward function
r(s, u), s ∈ S, u ∈ U
1
U
2
,
deﬁnes the gain of player 1 when the process is in state s and the two players take the
action pair u. Player 2’s reward is given by −r(s, u), since the game is assumed to be
zerosum.
83
84 CHAPTER 5. SHAPLEY’S ZERO SUM MARKOV GAME
5.2 Information structure and strategies
5.2.1 The extensive form of the game
The game deﬁned above corresponds to a game in extensive form with an inﬁnite
number of moves. We assume an information structure where the players choose se
quentially and simultaneously, with perfect recall, their actions. This is illustrated in
ﬁgure 5.1
`
_¸
s
¸
P
1
`
u
1
1
`
`
`
`
`
`
`
`
`
`¬
u
2
1
_
`
¸
D
2
P
2
u
1
2
u
2
2
u
1
2
u
2
2
`
_¸
E
>
>
>
>
>
p
s,·
(u)
¸
p
s,·
(u)
`
_¸
`
_¸
`
_¸
`
_¸
E
>
>
>
>
>
p
s,·
(u)
¸
p
s,·
(u)
`
_¸
`
_¸
`
_¸
`
_¸
E
>
>
>
>
>
p
s,·
(u)
¸
p
s,·
(u)
`
_¸
`
_¸
`
_¸
`
_¸
E
>
>
>
>
>
p
s,·
(u)
¸
p
s,·
(u)
`
_¸
`
_¸
`
_¸
Figure 5.1: A Markov game in extensive form
5.2. INFORMATION STRUCTURE AND STRATEGIES 85
5.2.2 Strategies
A strategy is a device which transforms the information into action. Since the players
may recall the past observations, the general form of a strategy can be quite complex.
Markov strategies: These strategies are also called feedback strategies. The as
sumed information structure allows the use of strategies deﬁned as mappings
γ
j
: S → P(U
j
), j = 1, 2,
where P(U
j
) is the class of probability distributions over the action set U
j
. Since
the strategies are based only on the information conveyed by the current state of the
xprocess, they are called Markov strategies.
When the two players have chosen their respective strategies the state process be
comes a Markov chain with transition probabilities
P
γ
1
,γ
2
s,s
=
λ
1
k=1
λ
2
=1
µ
γ
1
(s)
k
ν
γ
2
(s)
p
s,s
(u
k
1
, u
2
),
where we have denoted µ
γ
1
(s)
k
, k = 1, . . . , λ
1
, and ν
γ
2
(s)
, = 1, . . . , λ
2
the prob
ability distributions induced on U
1
and U
2
by γ
1
(s) and γ
2
(s) respectively. In order
to formulate the normal form of the game, it remains to deﬁne the payoffs associated
with the admissible strategies of the twp players. Let β ∈ [0, 1) be a given discount
factor. Player 1’s payoff, when the game starts in state s
0
, is deﬁned as the discounted
sum over the inﬁnite time horizon of the expected transitions rewards, i.e.
V (s
0
; γ
1
, γ
2
) = E
γ
1
,γ
2
[
∞
t=0
β
t
λ
1
k=1
λ
2
=1
µ
γ
1
(x(t))
k
ν
γ
2
(x(t))
r(x(t), u
k
1
, u
2
)[x(0) = s
0
].
Player 2’s payoff is equal to −V (s
0
; γ
1
, γ
2
).
Deﬁnition 5.2.1 A pair of strategies (γ
∗
1
, γ
∗
2
) is a saddle point if, for all strategies γ
1
and γ
2
of Players 1 and 2 respecively, for all s ∈ S
V (s; γ
1
, γ
∗
2
) ≤ V (s; γ
∗
1
, γ
∗
2
) ≤ V (s; γ
∗
1
, γ
2
). (5.1)
The number
v
∗
(s) = V (s; γ
∗
1
, γ
∗
2
)
is called the value of the game at state s.
86 CHAPTER 5. SHAPLEY’S ZERO SUM MARKOV GAME
Memory strategies: Since the game is of perfect recall, the information on which a
player can base his/her decision at period t is the whole statehistory
h(t) = ¦x(0), x(1), . . . , x(t)¦.
Notice that we don’t assume that the players have a direct access to the actions used
in the past by their opponent; they can only observe the state history. However, as in
the case of a singleplayer Markov decision processes, it can be shown that optimality
does not require more than the use of Markov strategies.
5.3 Shapley’sDenardo operator formalism
We introduce, in this section the powerful formalism of dynamic progaming opera
tors that has been formally introduced in [Denardo, 1967] but was already implicit in
Shapley’s work. The solution of the dynamic programming equations for the stochastic
game is obtained as a ﬁxed point of an operator acting on the space of value functions.
5.3.1 Dynamic programming operators
In a seminal paper [Shapley. 1953] the existence of optimal stationary Markov strate
gies has been established via a ﬁxed point argument, involving a contracting op
erator. We give here a brief account of the method, taking our inspiration from
[Denardo, 1967] and [Filar & Vrieze, 1997]. Let v() = (V (s) : s ∈ S) be an ar
bitrary function with real values, deﬁned over S. Since we assume that S is ﬁnite this
is also a vector. Introduce, for any s ∈ S the socalled local reward functions
h(v(), s, u
1
, u
2
) = r(s, , u
1
, u
2
) + β
s
∈S
p
s,s
(u
1
, u
2
) v(s
), (u
1
, u
2
) ∈ U
1
U
2
.
Deﬁne now, for each s ∈ S, the zerosum matrix game with pure strategies U
1
and U
1
and with payoffs
H(v(), s) = [h(v(), s, u
k
1
, u
2
)]
k = 1, . . . , λ
1
= 1, . . . , λ
2
.
Denote the value of each of these games by
T(v(), s) := val[H(v(), s)]
and let T(v()) = (T(v(), s) : s ∈ S). This deﬁnes a mapping T : IR
n
→ IR
n
. This
mapping is also called the Dynamic Programming operator of the Markov game.
5.3. SHAPLEY’SDENARDO OPERATOR FORMALISM 87
5.3.2 Existence of sequential saddle points
Lemma 5.3.1 If A and B are two matrices of same dimensions, then
[val[A] −val[B][ ≤ max
k,
[a
k,
−b
k,
[. (5.2)
The proof of this lemma is left as exercise 6.1.
Lemma 5.3.2 If v(), γ
1
and γ
2
are such that, for all s ∈ S
v(s) ≤ (resp. ≥ resp. =) h(v(), s, γ
1
(s), γ
2
s(s))
= r(s, γ
1
(s), γ
s
(s)) + β
s
∈S
p
s,s
(γ
1
(s), γ
2
s(s)) v(s
) (5.3)
then
v(s) ≤ (resp. ≥ resp. =) V (s; γ
1
, γ
2
). (5.4)
Proof: The proof is relatively straightforward and consists in iterating the inequal
ity (5.3). QED
We can now establish the following result
Theorem 5.3.1 the mapping T is contracting in the “max”norm
v() = max
s∈S
[v(s)[
and admits the value function introduced in Deﬁnition 5.2.1 as its unique ﬁxedpoint
v
∗
() = T(v
∗
()).
Furthermore the optimal (saddlepoint) strategies are deﬁned as the mixed strategies
yielding this value, i.e.
h(v
∗
(), s, γ
∗
1
(s), γ
∗
2
(s)) = val[H(v
∗
(), s)], s ∈ S. (5.5)
Proof: We ﬁrst establish the contraction property. We use lemma 5.3.1 and the tran
sition probability properties to establish the following inequalities
T(v()) −T(w()) ≤ max
s∈S
_
_
_
max
u
1
∈U
1
; u
2
∈U
2
¸
¸
¸
¸
¸
¸
r(s, u
1
, u
2
) + β
s
∈S
p
s,s
(u
1
, u
2
) v(s
)
88 CHAPTER 5. SHAPLEY’S ZERO SUM MARKOV GAME
−r(s, u
1
, u
2
) −β
s
∈S
p
s,s
(u
1
, u
2
) w(s
)
¸
¸
¸
¸
¸
¸
_
_
_
= max
s∈S;u
1
∈U
1
; u
2
∈U
2
¸
¸
¸
¸
¸
¸
β
s
∈S
p
s,s
(u
1
, u
2
) (v(s
) −w(s
))
¸
¸
¸
¸
¸
¸
≤ max
s∈S;u
1
∈U
1
; u
2
∈U
2
β
s
∈S
p
s,s
(u
1
, u
2
) [v(s
) −w(s
)[
≤ max
s∈S;u
1
∈U
1
; u
2
∈U
2
β
s
∈S
p
s,s
(u
1
, u
2
) v() −w()
= β v() −w() .
Hence T is a contraction, since 0 ≤ β < 1. By the Banach contraction theorem this
implies that there exists a unique ﬁxed point v
∗
() to the operator T.
We now show that there exist stationary Markov strategies γ
∗
1
, γ
∗
2
for wich the sad
dle point condition holds
V (s; γ
1
, γ
∗
2
) ≤ V (s; γ
∗
1
, γ
∗
2
) ≤ V (s; γ
∗
1
, γ
2
). (5.6)
Let γ
∗
1
(s), γ
∗
2
(s) be the saddle point strategies for the local matrix game with payoffs
h(v
∗
(), s, u
1
, u
2
)
u
1
∈U
1
,u
2
∈U
2
= H(v(), s). (5.7)
Consider any strategy γ
2
for Player 2. Then, by deﬁnition
h(v
∗
(), s, γ
∗
1
(s), γ
2
(s))
u
1
∈U
1
,u
2
∈U
2
≥ v
∗
(s) ∀s ∈ S. (5.8)
By Lemma 5.3.2 the inequality (5.8) implies that for all s ∈ S
V (s; γ
∗
1
, γ
2
) ≥ v
∗
(s). (5.9)
Similarly we would obtain that for any strategy γ
1
and for all s ∈ S
V (s; γ
1
, γ
∗
2
) ≤ v
∗
(s), (5.10)
and
V (s; γ
∗
1
, γ
∗
2
) = v
∗
(s), (5.11)
This establishes the saddle point property. QED
In the single player case (MDP or Markov Decision Process) the contraction and
monotonicity of the dynamic programming operator yield readily a converging numer
ical algorithm for the computation of the optimal strategies [Howard, 1960]. In the
twoplayer zerosum case, even if the existence theorem can also be translated into a
converging algorithm, some difﬁculties arise (see e.g. [Filar & Tolwinski, 1991] for a
recently proposed converging algorithm).
Chapter 6
Nonzerosum Markov and Sequential
games
In this chapter we consider the possible extension of the Shapley Markov game formal
ism to nonzerosum games. We shall consider two steps in these extensions: (i) use
the same ﬁnite state and ﬁnite action formalism as in Shapley’s and introduce differ
ent reward functions for the different players; (ii) introduce a more general framework
where the state and actions can be continuous variables.
6.1 Sequential Game with Discrete state and action sets
We consider a stochastic game played noncooperatively by players that are not in
purely antagonistic sutuation. Shapley’s formalism extends without difﬁculty to a
nonzerosum setting.
6.1.1 Markov game dynamics
Introduce S = ¦1, 2, . . . , n¦, the set of possible states, U
j
(s) = ¦1, 2, . . . , λ
j
(s)¦, j =
1, . . . , m, the ﬁnite action sets at state s of m players and the transition probabilities
p
s,s
(u) = P[x(t + 1) = s
[x(t) = s, u], s, s
∈ S, u ∈ U
1
(s) . . . U
m
(s).
Deﬁne the transition reward of player j when the process is in state s and the players
take the action pair u by
r
j
(s, u), s ∈ S, u ∈ U
1
(s) . . . U
m
(s).
89
90 CHAPTER 6. NONZEROSUM MARKOV AND SEQUENTIAL GAMES
Remark 6.1.1 Notice that we permit the action sets of the different players to depend
on the current state s of the game. Shapley’s theory remains valid in that case too.
6.1.2 Markov strategies
Markov strategies are deﬁned as in the zerosum case. We denote µ
γ
j
(x(t))
k
the proba
bility given to action u
k
j
by player j when he uses strategy γ
j
and the current state is
x.
6.1.3 FeedbackNash equilibrium
Deﬁne Markov strategies as above, i.e. as mappings from S into the players mixed
actions. Player j’s payoff is thus deﬁned as
V
j
(s
0
; γ
1
, . . . , γ
m
) = E
γ
1
,...,γ
m
[
∞
t=0
β
t
λ
1
k=1
. . .
λ
m
=1
µ
γ
1
(x(t))
k
µ
γ
m
(x(t))
r
j
(x(t), u
k
1
, . . . , u
m
)[x(0) = s
0
].
Deﬁnition 6.1.1 An mtuple of Markov strategies (γ
∗
1
, . . . , γ
∗
m
) is a feedbackNash equilibrium
if for all s ∈ S
V
j
(s; γ
∗
1
, . . . , γ
j
, . . . , γ
∗
m
) ≤ V
j
(s; γ
∗
1
, . . . , γ
∗
j
, . . . , γ
∗
m
)
for all strategies γ
j
of player j, j ∈ M. The number
v
∗
j
(s) = V
j
(s; γ
∗
1
, . . . , γ
∗
j
, . . . , γ
∗
m
)
will be called the equilibrium value of the game at state s for Player j.
6.1.4 SobelWhitt operator formalism
The ﬁrst author to extend Shapley’s work to a nonzerosum framework has been So
bel [?]. A more recent treatment of nonzerosum sequential games can be found in
[Whitt, 1980].
We introduce the socalled local reward functions
h
j
(s, v
j
(), u) = r
j
(s, u) + β
s
∈S
p
ss
(u)v
j
(s
), j ∈ M (6.1)
6.1. SEQUENTIAL GAME WITH DISCRETE STATE AND ACTION SETS 91
where the functions v
j
() : S → IR are given rewardtogo functionals (in this case
they are vectors of dimension n = card(S)) deﬁned for each player j. The local
reward (6.1) is the sumof the transition reward for player j and the discounted expected
rewardtogo from the new state reached after the transition. For a given s and a given
set of rewardtogo functionals v
j
(), j ∈ M, the local rewards (6.1) deﬁne a matrix
game over the pure strategy sets U
j
(s), j ∈ M.
We now deﬁne, for any given Markov policy vector γ = ¦γ
j
¦
j∈M
, an operator
H
γ
, acting on the space of rewardtogo functionals, (i.e. ndimensional vectors) and
deﬁned as
(H
γ
v())(s) = ¦E
γ(s)
[h
j
(s, v
j
(), ˜ u]¦
j∈M
. (6.2)
We also introduce the operator F
γ
deﬁned as
(F
γ
v())(s) = ¦sup
γ
j
E
γ
(j)
(s)
[h
j
(s, v
j
(), ˜ u]¦
j∈M
, (6.3)
where ˜ u is the random action vector and γ
(j)
is the Markov policy obtained when only
Player j adjusts his/her policy, while the other players keep their γpolicies ﬁxed. In
other words Eq. 6.3 deﬁnes the optimal reply of each player j to the Markov strategies
chosen by the other players.
6.1.5 Existence of Nashequilibria
The dynamic programming formalism introduced above, leads to the following pow
erful results (see [Whitt, 1980] for a recent proof)
Theorem 6.1.1 Consider the sequential game deﬁned above, then
1. The expected payoff vector associated with a stationary Markov policy γ is given
by the unique ﬁxed point v
γ
() of the contracting operator H
γ
.
2. The operator F
γ
is also contracting and thus admits a unique ﬁxedpoint f
γ
().
3. The stationary Markov policy γ
∗
is an equilibrium strategy iff
f
γ
∗
(s) = v
γ
∗(s), ∀s ∈ S. (6.4)
4. There exists an equilibrium deﬁned by a stationary Markov policy.
Remark 6.1.2 In the nonzerosum case the existence theorem is not based on a con
traction property of the “equilibrium” dynamic programming operator. As a conse
quence, the existence result does not translate easily into a converging algorithm for
the numerical solution of these games (see [?]).
92 CHAPTER 6. NONZEROSUM MARKOV AND SEQUENTIAL GAMES
6.2 Sequential Games on Borel Spaces
The theory of noncooperative markov games has been extended by several authors to
the case where the state and the actions are in continuous sets. Since we are dealing
with stochastic processes, the apparatus of measure theory is now essential.
6.2.1 Description of the game
An mplayer sequential game is deﬁned by the following objects:
(S, Σ), U
j
, Γ
j
(), r
j
(, ), Q(, ).β,
where:
1. (S, Σ) is a measurable state space with a countably generated σalgebra of Σ
subsets of S.
2. U
j
is a compact metric space of actions for player j.
3. Γ
j
() is a lower measurable map from S into nonempty compact subsets of U
j
.
For each s, Γ
j
(s) represents the set of admissible actions for player j.
4. r
j
(, ) : S U → IR is a bounded measurable transition reward function for
player j. This functions are assumed to be continuous on U, for every s ∈ S.
5. Q(, ) is a product measurable transition probability from S U to S. It is
assumed that Q(, ) satisﬁes some regularity conditions which are too techni
cal to be given here. We refer the reader to [Nowak, 199?] for a more precise
statement.
6. β ∈ (0, 1) is the discount factor.
A stationary Markov strategy for player j is a measurable map γ
j
() from S into the
set P(U
j
) of probability measure on U
j
such that γ
j
(s) ∈ P(Γ
j
(s)) for every s ∈ S.
6.2.2 Dynamic programming formalism
The deﬁnition of local reward functions given in 6.1 in the discrete state case has to
be adapted to the continuous state format, it becomes
h
j
(s, v
j
(), u) = r
j
(s, u) + β
_
S
v
j
(t)Q(dt[s, u), j ∈ M (6.5)
6.3. APPLICATION TO A STOCHASTIC DUOPOLOY MODEL 93
The operators H
γ
and F
γ
are deﬁned as above. The existence of equilibria is difﬁcult to
establish for this general class of sequential games. In [Whitt, 1980], the existence of
εequilibria is proved, using an approximation theory in dynamic programming. The
existence of equilibria was obtained only for special cases
1
6.3 Application to a Stochastic Duopoloy Model
6.3.1 A stochastic repeated duopoly
Consider the stochastic duopoly model deﬁned by the following linear demand equa
tion
x(t + 1) = α −ρ[u
1
(t) + u
2
(t)] + ε(t)
which determines the price x(t + 1) of a good at period t + 1 given the total supply
u
1
(t) +u
2
(t) decided at the end of period t by Players 1 and 2. Assume a unit produc
tion cost equal to γ. The proﬁts, at the end of period t by both players (ﬁrms) are then
determined as
π
j
(t) = (x(t + 1) −δ)u
j
(t).
Assume that the two ﬁrms have the same discount rate β, then over an inﬁnite time
horizon, the payoff to Player j will be given by
V
j
=
∞
t=0
β
t
π
j
(t).
This game is repeated, therefore an obvious equilibrium solution consists to play re
peatedly the (static) Cournot solution
u
c
j
(t) =
α −δ
3ρ
, j = 1, 2 (6.6)
which generates the payoffs
V
c
j
=
(α −δ)
2
9ρ(1 −β)
j = 1, 2. (6.7)
A symmetric Pareto (nondominated) solution is given by the repeated actions
u
P
j
(t) =
α −δ
4ρ
, j = 1, 2
1
see [Nowak, 1985], [Nowak, 1987], [Nowak, 1993], [Parthasarathy Shina, 1989].
94 CHAPTER 6. NONZEROSUM MARKOV AND SEQUENTIAL GAMES
and the associated payoffs
V
P
j
=
(α −δ)
8ρ(1 −β)
j = 1, 2.
where δ = α −γ.
The Pareto outcome dominates the Cournot equilibrium but it does not represent
an equilibrium. The question is the following:
is it possible to construct a pair of memory strategies which would deﬁne
an equilibrium with an outcome dominating the repeated Cournot strategy
outcome and which would be as close as possible to the Pareto nondomi
nated solution?
6.3.2 A class of trigger strategies based on a monitoring device
The random perturbations affecting the price mechanism do not permit a direct exten
sion of the approach described in the deterministic context. Since it is assumed that
the actions of players are not directly observable, there is a need to proceed to some
ﬁltering of the sequence of observed states in order to monitor the possible breaches of
agreement.
In [Green & Porter, 1985] a dominating memory strategy equilibriumis constructed,
based on a onestep memory scheme. We propose below another scheme, using a mul
tistep memory, that yields an outcome which lies closer to the Pareto solution.
The basic idea consists to extend the state space by introducing a newstate variable,
denoted v which is used to monitor a cooperative policy that all players have agreed
to play and which is deﬁned as φ : v → u
j
= φ(v). The state equation governing the
evolution of this state variable is designed as follows
v(t + 1) = max¦−K, v(t) + x
e
(t + 1) −x(t + 1)¦, (6.8)
where x
e
is the expected outcome if both players use the cooperative policy, i.e.
x
e
(t + 1) = α −2ρφ(v(t)).
It should be clear that the new state variable v provides a cumulative measure of the
positive discrepancies between the expected prices x
e
and the realized ones x(t). The
parameter −K deﬁnes a lower bound for v. This is introduced to prevent a compen
sation of positive discrepancies by negative ones. A positive discrepancy could be an
indication of oversupply, i.e. an indication that at least one player is not respecting the
agreement and is maybe trying to take advantage over the other player.
6.3. APPLICATION TO A STOCHASTIC DUOPOLOY MODEL 95
If these discrepancies accumulate too fast, the evidence of cheating is mounting
and thus some retaliation should be expected. To model the retaliation process we
introduce another state variable, denoted y, which is a binary variable, i.e. y ∈ ¦0, 1¦.
This new state variable will be an indicator of the prevailing mood of play. If y = 1
then the game is played cooperatively; if y = 0, then the the game is played in a
noncooperative manner, interpreted as a punitive or retaliatory mood of play.
This state variable is assumed to evolve according to the following state equation
y(t + 1) =
_
1 if y(t) = 1 and v(t + 1) < θ(v(t))
0 otherwise,
(6.9)
where the positive valued function θ : v → θ(v) is a design parameter of this
monitoring scheme.
According to this state equation, the cooperative mood of play will be maintained
provided the cumulative positive discrepancies do not increase too fast from one period
to the next. Also, this state equation tells us that, once y(t) = 0, then y(t
) ≡ 0 for all
periods t
> t, i.e a punitive mood of play lasts forever. In the models discussed later
on we shall relax this assumption of everlasting punishment.
When the mood of play is noncooperative, i.e. when y = 0, both players use as
a punishment (or retaliation) the static Cournot solution forever. This generates the
expected payoffs V
c
j
, j = 1, 2 deﬁned in Eq. (6.7). Since the two players are identical
we shall not use the subscript j anymore.
When the mood of play is cooperative, i.e. when y = 1, both players use an
agreed upon policy which determines their respective controls as a function of the
state variable v. This agreement policy is deﬁned by the function φ(v). The expected
payoff is then a function W(v) of this state variable v.
For this agreement to be stable, i.e. not to provide a temptation to cheat to any
player, one imposes that it be an equilibrium. Note that the game is now a sequential
Markov game with a continuous state space. The dynamic programming equation
characterizing an equilibrium is given below
W(v) = max
u
¦[α −δ −ρ(φ(v) + u)]u
+βP[v
≥ θ(v)]V
c
+βP[v
< θ(v)]E[W(v
)[v
< θ(v)]¦, (6.10)
where we have denoted
v
= max¦−K, v + ρ(u −φ(v)) −ε¦
96 CHAPTER 6. NONZEROSUM MARKOV AND SEQUENTIAL GAMES
the random value of the state variable v after the transition generated by the controls
(u, φ(v)).
In Eq. (6.10) we recognize the immediate reward [α−δ −ρ(φ(v)+u)]u of Player 1
when he plays u while the opponent sticks to φ(v). This is added to the conditional ex
pected payoffs after the transition to either the punishment mood of play, correspond
ing to the values y = 0 or the cooperative mood of play corresponding to y = 1.
A solution of these DP equations can be found by solving an associated ﬁxed point
problem, as indicated in [Haurie & Tolwinski, 1990]. To summarize the approach we
introduce the operator
(T
φ
W)(v, u) = [α −δ −ρ(u + φ(v))] u + β(α −δ)
2
F(s −θ(v))
9ρ(1 −β)
+βW(−K)[1 −F(s −K)]
+β
_
θ(v)
−K
W(τ)f(s −τ) dτ (6.11)
where F() and f() are the cumulative distribution function and the density probability
function respectively of the random disturbance ε. We have also used the following
notation
s = v + ρ(u −φ(v)).
An equilibrium solution is a pair of functions (w(), φ()) such that
W(v) = max
u
T
φ
(W)(v, u) (6.12)
W(v) = (T
φ
W)(v, φ(v)). (6.13)
In [Haurie & Tolwinski, 1990] it is shown how an adapatation of the Howard policy
improvement algorithm [Howard, 1960] permits the computation of the solution of
this sort of ﬁxedpoint problem. The case treated in [Haurie & Tolwinski, 1990] cor
responds to the use of a quadratic function θ() and a Gaussian distribution law for
ε. The numerical experiments reported in [Haurie & Tolwinski, 1990] show that one
can deﬁne, using this approach, a subgame perfect equilibrium which dominates the
repeated Cournot solution.
In [Porter, 1983], this problem has studied in the case where the (inverse) demand
law is subject to a multiplicative noise. Then one obtains an existence proof for a
dominating equilibrium based on a simple onestep memory scheme where the variable
v satisﬁes the following equation
v(t + 1) =
x
e
−x(t + 1)
x(t)
.
This is the case where one does not monitor the cooperative policy through the use of a
cumulated discrepancy function but rather on the basis of repeated identical tests. Also
in Porter’s approach the punishment period is ﬁnite.
6.3. APPLICATION TO A STOCHASTIC DUOPOLOY MODEL 97
In [Haurie & Tolwinski, 1990] it is shown that the approach could be extended to
a full ﬂedged Markov game, i.e. a sequential game rather than a repeated game. A
simple model of Fisheries management was used in that work to illustrate this type of
sequential game cooperative equilibrium.
6.3.3 Interpretation as a communication device
In our approach, by extending the state space description (i.e. introducing the new vari
ables v and y), we retained a Markov game formalism for an extended game and this
has permitted us to use dynamic programming for the characterization of subgame per
fect equilibria. This is of course reminiscent of the concept of communication device
considered in [Forges 1986]for repeated games and discussed in Part 1. An easy ex
tension of the approach described above would lead to random transitions between the
two moods of play, with transition probabilities depending on the monitoring statistic
v. Also a punishment of random duration is possible in this model. In the next section
we illustrate these features when we propose a differential game model with random
moods of play.
The monitoring scheme is a communication device which receives as input the
observation of the state of the system and sends as an output a public signal which is
suggesting to play according to two different moods of play.
98 CHAPTER 6. NONZEROSUM MARKOV AND SEQUENTIAL GAMES
Part III
Differential games
99
Chapter 7
Controlled dynamical systems
7.1 A capital accumulation process
A ﬁrm is producing a good with some equipment that we call its physical capital, or
more simply its capital. At time t we denote by x(t) ∈ IR
+
the amount of capital
acccumulated by the ﬁrm. In the parlance of dynamical systems x(t) is the state vari
able. A capital accumulation path over a time interval [t
0
, t
f
] is therefore deﬁned as
the graph of the function x() : [t
0
, t
f
] → x(t) ∈ IR
+
. In the parlance of dynamical
systems this will be also called a state trajectory.
The ﬁrmaccumulates capital through the investment process. Denote by u(t) ∈ IR
+
the investment realised at time t. This corresponds to the control variable for the
dynamical systems. We also assume that the capital depreciates (wears out) at a ﬁxed
rate of µ.
Let us denote ˙ x(t) the derivative over time
dx(t)
dt
. The evolution over time of the
accumulated capital is then described by a differential equation, with initial data x
0
,
called the state equation of the dynamical system
˙ x(t) = u(t) −µx(t) (7.1)
x(t
0
) = x
0
. (7.2)
Now, once the ﬁrm has chosen an investment schedule u() : [t
0
, t
f
] → u(t) ∈ IR
+
, and
the initial capital stock x
0
∈ IR
+
being given, there exists a unique capital accumulation
path that is determined as the unique solution of the differential equation (7.1) with
initial data (7.2). It is easy to check that since the investment rates are never negative,
the capital stock will remain nonnegative (∈ IR
+
).
101
102 CHAPTER 7. CONTROLLED DYNAMICAL SYSTEMS
7.2 State equations for controlled dynamical systems
In a more general setting a controlled dynamical system is described by the the state
equation
˙ x(t) = f(x(t), u(t), t) (7.3)
x(t
0
) = x
0
, (7.4)
u(t) ∈ U ⊂ IR
m
(7.5)
where x ∈ IR
n
is the state variable, u ∈ IR
m
is the control variable constrained to
remain in the set U ⊂ IR
m
, f(, , ) : IR
n
IR
m
IR → IR
n
is the velocity function and
x
0
∈ IR
n
is the initial state. Under some regularity conditions that we shall describe
shortly, if the controller chooses a control function u() : [t
0
, t
f
] → u(t) ∈ U, it results
a unique solution x() : [t
0
, t
f
] → u(t) ∈ IR
n
to the differential equation (7.3), with
initial condition (7.4). This solution x() is called the state trajectory generated by the
control u() from the initial state x
0
.
Indeed one must impose some regularity conditions for assuring that the funda
mental theorem of existence and uniqueness of the solution to a system of differential
equations can be invoked. We give below the simplest set of conditions. There are
more general ones that will be needed later on, but for the moment we shall content
ourselves with the following
7.2.1 Regularity conditions
1. The velovity function f(x, u, t) is C
1
in x and continuous in u and t.
2. The control function u() is piecewise continuous.
7.2.2 The case of stationary systems
If the velocity function does not depend explicitly on t we say that the system is sta
tionary. The capital accumulation process of section 7.1 is an example of a stationary
system.
7.3. FEEDBACK CONTROL AND THE STABILITY ISSUE 103
7.2.3 The case of linear systems
A linear stationary system is described by a linear state equation
˙ x(t) = Ax(t) + Bu(t) (7.6)
x(t
0
) = x
0
, (7.7)
where A : n n and B : n m are given matrices.
For linear systems we can give the explicit form of the solution to Eqs. (7.6,7.7).
x(t) = Φ(t, t
0
)x
0
+ Φ(t, t
0
)
−1
_
t
t
0
Φ(s, t
0
)Bu(s)ds, (7.8)
where Φ(t, t
0
) is the n n transfer matrix that veriﬁes
d
dt
Φ(t, t
0
) = AΦ(t, t
0
) (7.9)
Φ(t
0
, t
0
) = I
n
. (7.10)
Indeed, we can easily check in Eq. (7.8) that x(t
0
) = x
0
. Furthermore, if one differen
tiates Eq. (7.8) over time one obtains
˙ x(t) = AΦ(t, t
0
)x
0
+ AΦ(t, t
0
)
−1
_
t
t
0
Φ(s, t
0
)Bu(s)ds
+Φ(t, t
0
)
−1
Φ(t, t
0
)Bu(t) (7.11)
= Ax(t) + Bu(t). (7.12)
Therefore the function deﬁned by (7.8) satisﬁes the differential equation (7.6) with
initial condition (7.7).
7.3 Feedback control and the stability issue
An interesting way to implement a control on a dynamical systemdescribed by the gen
eral state equation (7.3,7.4) is to deﬁne the control variable u(t) as a function σ(t, x(t))
of the current time t and state x(t). We then say that the control is deﬁned by the feed
back law σ(, ) : IR IR
n
→ IR
m
, or more directly that one uses the feedback control
σ. The associated trajectory will then be the solution to the differential equation
˙ x(t) = F
σ
(t, x(t)) (7.13)
x(t
0
) = x
0
, (7.14)
where we use the notation
F
σ
(t, x(t)) = f(x(t), σ(t, x(t)), t).
We see that, if F
σ
(, ) is regular enough, the feedback control σ will determine uniquely
an associated trajectory emanating from x
0
.
104 CHAPTER 7. CONTROLLED DYNAMICAL SYSTEMS
7.3.1 Feedback control of stationary linear systems
Consider the linear system deﬁned by the state equation (7.6,7.7) and suppose that one
uses a linear feedback law
u(t) = K(t) x(t), (7.15)
where the matrix K(t) : m n is called the feedback gain at time t. The controlled
system is now described by the dufferential equation
˙ x(t) = (A + BK(t)) x(t) (7.16)
x(t
0
) = x
0
. (7.17)
7.3.2 stabilizing a linear system with a feedback control
When a stationary system is observed on an inﬁnite time horizon it may be natural
to use a constant feedback gain K. Stability is an important qualitative property of
dynamical systems, observed over an inﬁnite time horizon, that is when t
f
→ ∞.
Deﬁnition 7.3.1 The dynamical system (7.16,7.17) is
asymptotically stable if
lim
t→∞
x(t) = 0; (7.18)
globally asymptotically stable if (7.18) holds true for any initial condition x
0
.
7.4 Optimal control problems
7.5 A model of optimal capital accumulation
Consider the system described in section 7.1, concerning a ﬁrm which accumulates a
productive capital. The state equation describing this controlled accumulation process
is (7.1,7.2
˙ x(t) = u(t) −µx(t)
x(t
0
) = x
0
,
where x(t) and u(t) are the capital stock and the investment rate at time t, respectively.
Assume that the output y(t), produced by the ﬁrm at time t is given by a production
7.6. THE OPTIMAL CONTROL PARADIGM 105
function y(t) = φ(x(t)). Assume now that this output, considered as an homogenous
good, can be either consumed, that is distributed as a proﬁt to the owner of the ﬁrm, at
a rate c(t), or invested, at a rate u(t). Thus one must have at any instant t
c(t) + u(t) = y(t) = φ(x(t)). (7.19)
Notice that we permit negative values for c(t). This would correspond to an operation
of borrowing to invest more. The owner wants to maximise the total proﬁt accumulated
over the time horizon [t
0
, t
f
] which is given by the integral payoff
J(x
0
; x(), u()) =
_
t
f
t
0
c(t) dt
=
_
t
f
t
0
[φ(x(t)) −u(t)] dt. (7.20)
We have used the notation J(x
0
; x(), u()) to emphasize the fact that the payoff de
pends on the initial capital stock and the investment schedule, which will determine
the capital accumulation path, that is the trajectory x(). The search for the optimal
investment schedule can now be formulated as follows
max
x(·),u(·)
_
t
f
t
0
[φ(x(t)) −u(t)] dt (7.21)
s.t.
˙ x(t) = u(t) −µx(t) (7.22)
x(t
0
) = x
0
(7.23)
u(t) ≥ 0 (7.24)
t ∈ [t
0
, t
f
].
Notice that, in this formulation we consider that the optimization is performed over
the set of admissible control functions and state trajectories that must satisfy the con
straints given by the state equation, the initial state and the nonnegativity of the in
vestment rates, at any time t ∈ [t
0
, t
f
]. The optimisation problem (7.217.24) is an
instance of the generic optimal control problem that we describe now.
7.6 The optimal control paradigm
We consider the dynamical system introduced in section 7.2 with a criterion function
J(t
0
, x
0
; x(), u()) =
_
t
f
t
0
L(x(t), u(t), t) dt (7.25)
that is to be maximised through the appropriate choice of an admissible control func
tion. This problem can be formulated as follows
max
x(·),u(·)
_
t
f
t
0
L(x(t), u(t), t) dt (7.26)
106 CHAPTER 7. CONTROLLED DYNAMICAL SYSTEMS
s.t.
˙ x(t) = f(x(t), u(t), t) (7.27)
x(t
0
) = x
0
(7.28)
u(t) ∈ U ⊂ IR
m
(7.29)
t ∈ [t
0
, t
f
].
7.7 The Euler equations and the Maximum principle
We assume here that U = IR
m
and that the functions f(x, u, t) and L(x, u, t) are both
C
1
in x and u. Under these more stringrnt conditions we can derive necessary con
ditions for an optimal control by using some simple arguments from the calculus of
variations. These consitions are also called Euler equations as they are equivalent to
the celebrated conditions of Euler for the classical variational problems.
Deﬁnition 7.7.1 The functions δx() and δu() are called variations locally compati
ble with the constraints if they satisfy
˙
δx(t) = f
x
(x(t), u(t), t) δx(t) + f
u
(x(t), u(t), t) δu(t)) (7.30)
δx(t
0
) = 0. (7.31)
Lemma 7.7.1 If u
∗
() and x
∗
() are the optimal control and the associated optimal
trajectories, then the differential at (x
∗
(), u
∗
()) of the performance criterion must be
equal to 0 for all variations that are locally compatible with the constraints.
The differential at (x
∗
(), u
∗
()) of the integral performance criterion is given by
δJ(x
0
; x
∗
(), u
∗
()) =
_
t
f
t
0
[L
x
(x
∗
(t), u
∗
(t), t) δx(t) + L
u
(x
∗
(t), u
∗
(t), t) δu(t)] dt.
(7.32)
In order to express that we only consider variations δx() and δu() that are compatible
with the constraints, we say that δu() is chosen arbitrarily and δx() is computed
according to the ”variational equations” (7.30,7.31).
Now we use a ”trick”. We introduce an auxiliary function λ() : [t
0
, t
f
] → IR
n
,
where λ(t) is of the same dimension as the state variable x(t) and will be called the
costate variable. If Eq. 7.30 is veriﬁed we may thus rewrite the expression (7.32) of
δJ as follows
δJ =
_
t
f
t
0
[L
x
(x
∗
(t), u
∗
(t), t) δx(t) + L
u
(x
∗
(t), u
∗
(t), t) δu(t) +
7.7. THE EULER EQUATIONS AND THE MAXIMUM PRINCIPLE 107
λ(t)
¦−
˙
δx(t) + f
x
(x
∗
(t), u
∗
(t), t) δx(t) +
f
u
(x
∗
(t), u
∗
(t), t) δu(t))¦ dt. (7.33)
We integrate by parts the expression in λ(t)
˙
δx(t) and obtain
_
t
f
t
0
λ(t)
˙
δx(t) dt = λ(t
f
)
δx(t
f
) −λ(t
0
)
δx(t
0
) −
_
t
f
t
0
˙
λ(t)
δx(t) dt.
We can thus rewrite Eq. (7.33) as follows (we don’t write anymore the arguments
(x
∗
(t), u
∗
(t), t))
δJ = −λ(t
f
)
δx(t
f
) + λ(t
0
)
δx(t
0
)
_
t
f
t
0
[¦L
x
+ λ(t)
f
x
δx(t) +
˙
λ(t)¦
δx(t)
+¦L
u
+ λ(t)
f
u
δx(t)¦ δu(t)] dt. (7.34)
Let us introduce the Hamiltonian function
H(λ, x, u, t) = L(x, u, t) + λ
f(x, u, t). (7.35)
Then we can rewrite Eq. (7.34) as follows
δJ = −λ(t
f
)
δx(t
f
) + λ(t
0
)
δx(t
0
)
_
t
f
t
0
[¦H
x
(λ(t), x
∗
(t), u
∗
(t), t) +
˙
λ(t)¦
δx(t)
+H
u
(λ(t), x
∗
(t), u
∗
(t), t) δu(t)] dt. (7.36)
As we can choose the function λ() as we like, we may impose that it satisfy the
following adjoint variational equations
˙
λ(t)
= −H
x
(λ(t), x
∗
(t), u
∗
(t), t) (7.37)
λ(t
f
) = 0. (7.38)
With this particular choice of λ() the differential δJ writes simply as
δJ = λ(t
0
)
δx(t
0
) + H
u
(λ(t), x
∗
(t), u
∗
(t), t) δu(t)] dt. (7.39)
Since δx(t
0
) = 0 and δu(t) is arbitrary, δJ = 0 for all variations locally compatible
with the constraints only if
H
u
(λ(t), x
∗
(t), u
∗
(t), t) ≡ 0. (7.40)
We can summarise Eqs. (7.397.43 in the following theorem
108 CHAPTER 7. CONTROLLED DYNAMICAL SYSTEMS
Theorem 7.7.1 If u
∗
() and x
∗
() are the optimal control and the associated optimal
trajectory, then there exists a function λ() : [t
0
, t
f
] → IR
n
which satisﬁes the adjoint
variational equations
˙
λ(t)
= −H
x
(λ(t), x
∗
(t), u
∗
(t), t) t ∈ [t
0
, t
f
] (7.41)
λ(t
f
) = 0, (7.42)
and such that
H
u
(λ(t), x
∗
(t), u
∗
(t), t) ≡ 0, t ∈ [t
0
, t
f
], (7.43)
where the hamiltonian function H is deﬁned by
H(λ, x, u, t) = L(x, u, t) + λ
f(x, u, t), t ∈ [t
0
, t
f
]. (7.44)
These necessary optimality conditions, called the Euler equations, have been general
ized by Pontryagin [?] for dynamical systems where the velocity
f(x, u, t)
and the payoff rate
L(x, u, t)
are C
1
un x, continuous in u and t, and where the control constraint set U is compact.
The following theorem is the famous ”Maximum principle” of the theory of optimal
control.
Theorem 7.7.2 If u
∗
() and x
∗
() are the optimal control and the associated optimal
trajectory, then there exists a function λ() : [t
0
, t
f
] → IR
n
which satisﬁes the adjoint
variational equations
˙
λ(t)
= −H
x
(λ(t), x
∗
(t), u
∗
(t), t) t ∈ [t
0
, t
f
] (7.45)
λ(t
f
) = 0, (7.46)
and such that
H(λ(t), x
∗
(t), u
∗
(t), t) = max
u∈U
H(λ(t), x
∗
(t), u, t), t ∈ [t
0
, t
f
], (7.47)
where the hamiltonian function H is deﬁned as in Eq. (7.44).
7.8 An economic interpretation of the Maximum Prin
ciple
It could be shown that the costate variable λ(t) indicates the sensitivity of the perfor
mance criterion to a marginal variation of the state x(t). It can therefore be interpreted
7.9. SYNTHESIS OF THE OPTIMAL CONTROL 109
as a marginal value of the state variable. Along the optimal trajectory, the Hamiltonian
is therefore composed of two ”values”; the current payoff rate L(x
∗
(t), u
∗
(t), t) and
the value of the current state modiﬁcation λ(t)
f(x
∗
(t), u
∗
(t), t). This is this global
value that is maximised w.r.t. u at any point of the optimal trajectory.
The adjoint variational equations express the evolution of the marginal value of the
state variables along the trajectory.
7.9 Synthesis of the optimal control
7.10 Dynamic programming and the optimal feedback
control
Theorem 7.10.1 Assume that there exist a function V
∗
(, ) : IR IR
n
→ IR and a
feedback law σ
∗
(t, x) that satisfy the following functional equations
−
∂
∂t
V
∗
(t, x) = max
u∈U
H(
∂
∂x
V
∗
(t, x), x
∗
(t), u, t), ∀t, ∀x (7.48)
= H(
∂
∂x
V
∗
(t, x), x
∗
(t), σ
∗
(t, x), t), ∀t, ∀x (7.49)
V
∗
(t
f
, x) = 0. (7.50)
Then V
∗
(t
i
, x
i
) is the optimal value of the performance criterion, for the optimal con
trol problem deﬁned with initial data x(t
i
) = x
i
, t
i
∈ [t
0
, t
f
]. Furthermore the solution
of the maximisation in the R.H.S. of Eq. (7.48) deﬁnes the optimal feedback control.
Proof:
Remark 7.10.1 Notice here that the partial derivatives
∂
∂x
V
∗
(t, x) appearing in the
Hamiltonian in the R.H.S. of Eq. (7.48) is consistent with the interpretation of the co
state variable λ(t) as the sensitivity of the optimal payoff to marginal state variations.
110 CHAPTER 7. CONTROLLED DYNAMICAL SYSTEMS
7.11 Competitive dynamical systems
7.12 Competition through capital accumulation
7.13 Openloop differential games
J
j
(t
0
, x
0
; x(), u
1
(), . . . , u
m
()) =
_
t
f
t
0
L
j
(x(t), u
1
(t), . . . , u
m
(t), t) dt (7.51)
s.t.
˙ x(t) = f(x(t), u
1
(t), . . . , u
m
(t), t) (7.52)
x(t
0
) = x
0
(7.53)
u
j
(t) ∈ U
j
⊂ IR
m
j
(7.54)
t ∈ [t
0
, t
f
].
7.13.1 Openloop information structure
Each player knows the initial data t
0
, x
0
and all the other elements of the dynamical
system; Player j selects a control function u
j
() : [t
0
, t
f
] → U
j
; the game is played as
a single simultaneous move.
7.13.2 An equilibrium principle
Theorem 7.13.1 If u
∗
() and x
∗
() are the Nash equilibrium controls and the asso
ciated optimal trajectory, then there exists m functions λ
j
() : [t
0
, t
f
] → IR
n
which
satisfy the adjointvariational equations
˙
λ
j
(t)
= −H
j
x
(λ
j
(t), x
∗
(t), u
∗
, t) t ∈ [t
0
, t
f
] (7.55)
λ
j
(t
f
) = 0, (7.56)
and such that
H
j
(λ
j
(t), x
∗
(t), u
∗
, t) = max
u
j
∈U
j
H(λ(t), x
∗
(t), [u
∗−j
, u
j
], t), t ∈ [t
0
, t
f
], (7.57)
where the hamiltonian function H
j
is deﬁned as follows
H
j
(λ
j
, x, u, t) = L(x, u, t) + λ
j
f(x, u, t). (7.58)
7.14. FEEDBACK DIFFERENTIAL GAMES 111
7.14 Feedback differential games
7.14.1 Feedback information structure
Player j can observe time and state (t, x(t)); he deﬁnes his control as the result of a
feedback law σ
j
(t, x); the game is played as a single simultaneous move where each
player announces the strategy he will use.
7.14.2 A veriﬁcation theorem
Theorem 7.14.1 Assume that there exist m functions V
∗
j
(, ) : IR IR
n
→ IR and m
feedback strategies σ
∗−j
(t, x) that satisfy the following functional equations
−
∂
∂t
V
∗
j
(t, x) = max
u
j
∈U
H
j
(
∂
∂x
V
∗
j
(t, x), x
∗
(t), [σ
∗−j
, u
j
], t),
t ∈ [t
0
, t
f
], x ∈ IR
n
(7.59)
V
∗
j
(t
f
, x) = 0. (7.60)
Then V
∗
j
(t
i
, x
i
) is the equilibrium value of the performance criterion of Player j, for
the feedback Nash differential game deﬁned with initial data x(t
i
) = x
i
, t
i
∈ [t
0
, t
f
].
Furthermore the solution of the maximisation in the R.H.S. of Eq. (7.59) deﬁnes the
equilibrium feedback control of Player j.
7.15 Why are feedback Nash equilibria outcomes dif
ferent from Openloop Nash outcomes?
7.16 The subgame perfectness issue
7.17 Memory differential games
7.18 Characterizing all the possible equilibria
112 CHAPTER 7. CONTROLLED DYNAMICAL SYSTEMS
Part IV
A Differential Game Model
113
7.19. A GAME OF R&D INVESTMENT 115
in this last part of our presentation we propose a stochastic differential games model
of economic competition through technological innovation (R&Dcompetition) and we
show how it is connected to the theory of Markov and/or sequential games discussed
in Part 2. This example deals with a stochastic game where the random disturbances
are modelled as a controlled jump process.
In control theory, an interesting class of stochastic systems has been studied, where
the random perturbations appear as controlled jump processes. In [?], [?] the relation
ship between these control models and discrete event dynamic programming models is
developed. In [?] a ﬁrst adaptation of this approach to the case of dynamic stochastic
games has been proposed. Below we illustrate this class of games through a model of
competition through investment in R&D.
7.19 A Game of R&D Investment
7.19.1 Dynamics of R&D competition
We propose here a model of competition through R&D which extends earlier formu
lations which can be found in [?]. Consider m ﬁrms competing on a market with
differentiated products. Each ﬁrm can invest in R&D for the purpose of making a
technological advance which could provide it with a competitive advantage. The com
petitive advantage and product differentiation concepts will be discussed later. We fo
cuss here on the description of the R&D dynamics. Let x
j
be the level of accumulation
of R&D capital by the ﬁrm j. The state equation describing the capital accumulation
process is
˙ x
j
(t) = u
j
(t) −µ
j
x
j
(t)
x
j
(0) = x
0
j
,
_
¸
_
¸
_
(7.61)
where the control variable u
j
∈ U
j
gives the investment rate in R&D and µ
j
is the
depreciation rate of R&D capital.
We represent ﬁrm’s j policy of R&D investment as a function
u
j
() : [0, ∞) → U
j
.
Denote 
j
the class of admissible functions u
j
(). With an initial condition x
j
(0) = x
0
j
and a piecewise continuous control function u
j
() ∈ 
j
is associated a unique evolution
x
j
() of the de capital stock, solution of equation (7.61).
The R&D capital brings innovation through a discrete event stochastic process. Let
T be a random stopping time which deﬁnes the date at which the advance takes place.
116
Consider ﬁrst the case with a single ﬁrm j. The elementary conditional probability is
given by
P
u
j
(·)
[T ∈ (t; t + dt)[T > t] = ω
j
(x
j
(t))dt + o(dt). (7.62)
where lim
dt→0
o(dt)
dt
= 0 uniformly in x
j
. The function ω
j
(x
j
) represents the controlled
intensity of the jump process which describes the innovation. The probability of having
a technological advance occuring between times t and t + dt is given by
P
u
j
(·)
[t < T < t + dt] = ω
j
(x
j
(t))e
−
_
t
0
ω
j
(x
j
(s))ds
dt + o(dt)
(7.63)
Therefore the probability of having the occurence before time τ is given by
P
u
j
(·)
[T ≤ τ] =
_
τ
0
ω
j
(x
j
(t))e
−
_
t
0
ω
j
(x
j
(s))ds
dt.
(7.64)
Equations (7.627.64) deﬁne the dynamics of innovation occurences for ﬁrm j.
Since there are m ﬁrms the combined jump rate will be
ω(x(t)) =
m
j=1
ω
j
(x
j
(t)).
Given that a jump occurs at time τ, the conditionnal probability that the innovation
come from ﬁrm j is given by
ω
j
(x
j
(τ))
ω(x(τ))
.
The impact of innovation on the competitive hedge of the ﬁrm is now discussed.
7.19.2 Product Differentiation
We model the market with differentiated products as in [?]. We assume that each
ﬁrm j markets a product characterized by a quality index q
j
∈ IR and a price p
j
∈ IR.
Consider ﬁrst a static situation where N consumers are buying this type of goods. It is
assumed that the (indirect) utility of a consumer buying variant j is
˜
V
j
= y −p
j
+ θq
j
+ ε
j
, (7.65)
where θ is the valuation of quality and the ε
j
are i.i.d., with standard deviation µ,
according to the double exponential distribution (see [?] for details). Then the demand
for variant j is given by
˜
D
j
= NP
j
(7.66)
where
P
j
=
exp[(θq
j
−p
j
)/µ]
m
i=1
exp[(θq
i
−p
i
)/µ]
, j = 1, . . . , m. (7.67)
7.19. A GAME OF R&D INVESTMENT 117
This corresponds to a multinomial logit demand model.
In a dynamic framework we assume a ﬁxed number N of consumers with a demand
rate per unit of time given by (7.657.67). But now both the quality index q
j
() and the
price p
j
() are function of t. Actually they will be the instruments used by the ﬁrms to
compete on this market.
7.19.3 Economics of innovation
The economics of innovation is based on the costs and the advantages associated with
the R&D activity.
R&D operations and investment costs. Let L
j
(x
j
, u
j
) be a function which deﬁnes
the cost rate of R&D operations and investment for ﬁrm j.
Quality improvement. When a ﬁrm j makes a technological advance, the quality
index of the variant increases by some amount. We use a random variable ∆q
j
(τ))
to represent this quality improvement. The probability distribution of this random
variable, speciﬁed by the cumulative probability function F
j
(, x
j
), is supposed to also
depend on the current amount of knowhow, indicated by the capital x
j
(τ). The higher
this stock of capital is, the higher will be the probability given to important quality
gains.
Adjustment cost. Let Φ
j
(x
j
) be a monotonous et decreasing function of x
j
which
represents the adjustment cost for taking advantage of the advance. The higher the
capital x
j
at the time of the innovation the lower will be the adjustment cost.
Production costs Assume that the marginal production cost c
j
for ﬁrm j is constant
and does not vary at the time of an innovation.
Remark 7.19.1 We implicitly assume that the ﬁrm which beneﬁts from a technological
innovation will use that advantage. We could extend the formalism to the case where a
ﬁrm stores its technological advance and delays its implementation.
118
7.20 Information structure
We assume that the ﬁrms observe the state of the game at each occurence of the discrete
event which represents a technology advance. This system is an instance where two
dynamics, a fast one and a slow one are combined, as shown in the deﬁnition of the
state variables.
7.20.1 State variables
Fast dynamics. The levels of capital accumulation x
j
(t), j = 1, . . . , m deﬁne the
fast moving state of this system. These levels change according to the differential state
equations (7.61).
Slow dynamics. The quality levels q
j
(t), j = 1, . . . , m deﬁne a slow moving state.
These levels change according to the random jump process deﬁned by the jump rates
(7.61) and the distribution functions F
j
(, x
j
). If ﬁrm j makes a technological advance
at time τ then the quality index changes abruptly
q
j
(τ) = q
j
(τ
−
) + ∆q
j
(τ).
7.20.2 Piecewise openloop game.
Assume that at each random time τ of technological innovation all ﬁrms observe the
state of the game. i.e the vectors
s(τ) = (x(τ), q(τ)) = (x
j
(τ), q
j
(τ))
j=1,...,m
.
Then each ﬁrm j selects a price schedule p
j
() : [τ, ∞) → IR and an R&D investment
schedule u
j
() : [τ, ∞) → IR which are deﬁned as openloop controls. These controls
will be used until the next discrete event (technological advance) occurs. We are thus
in the context of piecewise deterministic games as introduced in [?].
7.20.3 A Sequential Game Reformulation
In [?] it is shown how piecewise deterministic games can be studied as particular in
stances of sequential games with Borel state and action spaces. The fundamental ele
ment of a sequential game is the socalled local reward functionals which permit the
deﬁnition of the dynamic programming operators.
7.20. INFORMATION STRUCTURE 119
Let v
j
(s) be the expected rewardtogo for player j when the system is in initial
state s. The local reward functional describes the total expected payoff for ﬁrm j when
all ﬁrms play according to the openloop controls p
j
() : [τ, ∞) → IR and u
j
() :
[τ, ∞) → IR until the next technological event occurs and the subsequent rewards are
deﬁned by the v
j
(s) function. We can write
h
j
(s, v
j
(), u(), p()) = E
s,u(·)
__
τ
0
e
−ρt
¦
˜
D
j
(t)(p
j
(t) −c
j
)
−L
j
(x
j
(t), u
j
(t))¦ dt + e
−ρτ
v
j
(s(τ))
_
, (7.68)
j = 1, . . . , m,
where ρ is the discount rate, τ is the random time of the next technological event and
s(τ) is the new random state reached right after the occurence of the next technological
event, i.e. at time τ. From the expression (7.68) we see immediately that:
1. We are in a sequential game formalism, with continuous state space and func
tional sets as action spaces (∞horizon controls).
2. The price schedule p
j
(t), t ≥ τ can be taken as a constant (until the next techno
logical event) and is actually given by the solution of the static oligopoly game
with differentiated products of qualities p
j
(t), j = 1, . . . , m. The solution of this
oligopoly game is discussed in [?], where it is shown to be uniquely deﬁned.
3. The R&D investment schedule is obtained as a tradeoff between the random
(due to the randomstopping time τ) transition cost given by
_
τ
0
e
−ρt
L
j
(x
j
(t), u
j
(t)) dt
and the expected rewardtogo after the next technological event.
We shall limit our discussion to this deﬁnition of the model and its reformulation
as a sequential game in Borel spaces. The sequential game format does not lend itself
easily to qualitative analysis via analytical solutions. A numerical analysis seems more
appropriate to explore the consequences of these economic assumptions. The reader
will ﬁnd in [?] more information about the numerical approximation of solutions for
this type of games.
120
Bibliography
[Alj & Haurie, 1983] A. ALJ AND A. HAURIE, Dynamic Equilibria in Multigener
ation Stochastic Games, IEEE Trans. Autom. Control, Vol. AC28, 1983, pp
193203.
[Aumann, 1974] R. J. AUMANN Subjectivity and correlation in randomized strate
gies, J. Economic Theory, Vol. 1, pp. 6796, 1974.
[Aumann, 1989] R. J.AUMANN Lectures on Game Theory, Westview Press, Boul
der etc. , 1989.
[Bas¸ar & Olsder, 1982] T. BAS¸ AR AND G. K. OLSDER, Dynamic Noncooperative
Game Theory, Academic Press, New York, 1982.
[Bas¸ar, 1989] T. BAS¸ AR, Time consistency and robustness of equilibria in noncoop
erative dynamic games in F. VAN DER PLOEG AND A. J. DE ZEEUW EDS.,
Dynamic Policy Games in Economics , North Holland, Amsterdam, 1989.
[Bas¸ar, 1989] T. BAS¸ AR , A Dynamic Games Approach to Controller Design: Dis
turbance Rejection in Discrete Time in Proceedings of the 28
th
Conference on
Decision and Control, IEEE, Tampa, Florida, 1989.
[Bertsekas, 1987] D. P. BERTSEKAS Dynamic Programming: Deterministic and
Stochastic Models, Prentice Hall, Englewood Cliffs, New Jersey, 1987.
[Brooke et al., 1992] A. BROOKE, D. KENDRICK AND A. MEERAUS, GAMS. A
User’s Guide, Release 2.25, Scientiﬁc Press/Duxbury Press, 1992.
[Cournot, 1838] A. COURNOT, Recherches sur les principes math´ ematiques de la
th´ eorie des richesses, Librairie des sciences politiques et sociales, Paris, 1838.
[Denardo, 1967] E.V. DENARDO, Contractions Mappings in the Theory Underlying
Dynamic Programming, SIAM Rev. Vol. 9, 1967, pp. 165177.
[Ferris & Munson, 1994] M.C. FERRIS AND T.S. MUNSON, Interfaces to PATH 3.0,
to appear in Computational Optimization and Applications.
121
122 BIBLIOGRAPHY
[Ferris & Pang, 1997a] M.C. FERRIS AND J.S. PANG, Complementarity and Vari
ational Problems: State of the Art, SIAM, 1997.
[Ferris & Pang, 1997b] M.C. FERRIS AND J.S. PANG, Engineering and economic
applications of complementarity problems, SIAM Review, Vol. 39, 1997,
pp. 669713.
[Filar & Vrieze, 1997] J.A. FILAR AND VRIEZE K., Competitive Markov Decision
Processes, SpringerVerlag New York, 1997.
[Filar & Tolwinski, 1991] J.A. FILAR AND B. TOLWINSKI, On the Algorithm of Pol
latschek and AviItzhak in T.E.S. Raghavan et al. (eds) Stochastic Games and
Related Topics, 1991, Kluwer Academic Publishers.
[Forges 1986] F. FORGES, 1986, An Approach to Communication Equilibria, Econo
metrica, Vol. 54, pp. 13751385.
[Fourer et al., 1993] R. FOURER, D.M. GAY AND B.W. KERNIGHAN, AMPL:
A Modeling Languagefor Mathematical Programming, Scientiﬁc
Press/Duxbury Press, 1993.
[Fudenberg & Tirole, 1991] D. FUDENBERG AND J. TIROLE Game Theory, The
MIT Press 1991, Cambridge, Massachusetts, London, England.
[Haurie & Tolwinski, 1990] A. HAURIE AND B. TOLWINSKI Cooperative Equilibria
in Discounted Stochastic Sequential Games, Journal of Optimization Theory
and Applications, Vol. 64, No. 3, March 1990.
[Friedman 1977] J.W. FRIEDMAN, Oligopoly and the Theory of Games, Amster
dam: NorthHolland, 1977.
[Friedman 1986] J.W. FRIEDMAN, Game Theory with Economic Applications,
Oxford: Oxford University Press, 1986.
[Green & Porter, 1985] GREEN E.J. AND R.H. PORTER Noncooperative collusion
under imperfect price information , Econometrica, Vol. 52, 1985, 1984, pp. 87
100.
[Harsanyi, 196768] J. HARSANY, Games with incomplete information played by
Bayesian players, Management Science, Vol. 14, 196768, pp. 159182, 320
334, 486502.
[Haurie & Tolwinski 1985a] A. HAURIE AND B. TOLWINSKI, 1984, Acceptable
Equilibria in Dynamic bargaining Games, Large Scale Systems, Vol. 6, pp. 73
89.
BIBLIOGRAPHY 123
[Haurie & Tolwinski, 1985b] A. HAURIE AND B. TOLWINSKI, Deﬁnition and Prop
erties of Cooperative Equilibria in a TwoPlayer Game of Inﬁnite Duration,
Journal of Optimization Theory and Applications, Vol. 46i, 1985, No.4, pp.
525534.
[Haurie & Tolwinski, 1990] A. HAURIE AND B. TOLWINSKI, Cooperative Equilib
ria in Discounted Stochastic Sequential Games, Journal of Optimization The
ory and Applications, Vol. 64, 1990, No.3, pp. 511535.
[Howard, 1960] R. HOWARD, Dynamic programming and Markov Processes, MIT
Press, Cambridge Mass, 1960.
[Moulin & Vial 1978] H. MOULIN AND J.P. VIAL, strategically Zerosum Games:
The Class of Games whose Completely Mixed Equilibria Cannot be Improved
Upon, International Journal of Game Theory, Vol. 7, 1978, pp. 201221.
[Kuhn, 1953] H.W. KUHN, Extensive games and the problem of information, in H.W.
Kuhn and A.W. Tucker eds. Contributions to the theory of games, Vol. 2, An
nals of Mathematical Studies No 28, Princeton University pres, Princeton new
Jersey, 1953, pp. 193216.
[Lemke & Howson 1964] C.E. LEMKE AND J.T. HOWSON, Equilibrium points of
bimatrix games, J. Soc. Indust. Appl. Math., 12, 1964, pp. 413423.
[Lemke 1965] C.E. LEMKE, Bimatrix equilibrium points and mathematical program
ming, Management Sci., 11, 1965, pp. 681689.
[Luenberger, 1969] D. G. LUENBERGER, Optimization by Vector Space Methods,
J. Wiley & Sons, New York, 1969.
[Luenberger, 1979] D. G. LUENBERGER, Introduction to Dynamic Systems: The
ory, Models & Applications, J. Wiley & Sons, New York, 1979.
[Mangasarian and Stone, 1964] O.L. MANGASARIAN AND STONE H., TwoPerson
Nonzerosum Games and Quadartic progamming, J. Math. Anal. Applic.,/bf 9,
1964, pp. 348355.
[Merz, 1976] A. W. MERZ, The Game of Identical Cars in: Multicriteria Decision
Making and Differential Games, G. Leitmann ed., Plenum Press, New York
and London, 1976.
[Murphy,Sherali & Soyster, 1982] F.H. MURPHY,H.D.SHERALI AND A.L. SOYS
TER, A mathematical programming approach for determining oligopolistic mar
ket equilibrium, Mathematical Programming, Vol. 24, 1982, pp. 92106.
[Nash, 1951] J.F. NASH, Noncooperative games, Annals of Mathematics, 54, 1951,
pp. 286295.
124 BIBLIOGRAPHY
[Nowak, 1985] A.S. NOWAK, Existence of equilibrium Stationary Strategies in Dis
counted Noncooperative Stochastic Games with Uncountable State Space, J. Op
tim. Theory Appl., Vol. 45, pp. 591602, 1985.
[Nowak, 1987] A.S. NOWAK, Nonrandomized Strategy Equilibria in Noncooperative
Stochastic Games with Additive Transition and Reward Structures, J. Optim.
Theory Appl., Vol. 52, 1987, pp. 429441.
[Nowak, 1993] A.S. NOWAK, Stationary Equilibria for NonzeroSum Average Payoff
Ergodic Stochastic Games with General State Space, Annals of the Interna
tional Society of Dynamic Games, Birkhauser, 1993.
[Nowak, 199?] A.S. NOWAK AND T.E.S. RAGHAVAN, Existence of Stationary Cor
related Equilibria with Symmetric Information for Discounted Stochastic Games
Mathematics of Operations Research, to appear.
[Owen, 1982] Owen G. Game Theory, Academic Press, New York, London etc. ,
1982.
[Parthasarathy Shina, 1989] PARTHASARATHY T. AND S. SHINA, Existence of Sta
tionary Equilibrium Strategies in NonzeroSum Discounted Stochastic Games
with Uncountable State Space and State Independent Transitions, International
Journal of Game Theory, Vol. 18, 1989, pp.189194.
[Petit, 1990] M. L. PETIT, Control Theory and Dynamic Games in Economic Pol
icy Analysis, Cambridge University Press, Cambridge etc. , 1990.
[Porter, 1983] R.H. PORTER, Optimal Cartel trigger Strategies , Journal of Eco
nomic Theory, Vol. 29, 1983, pp.313338.
[Radner, 1980] R. RADNER, Collusive Behavior in Noncooperative εEquilibria of
Oligopolies with Long but Finite Lives, Journal of Economic Theory, Vol.22,
1980, pp.136154.
[Radner, 1981] R. RADNER, Monitoring Cooperative Agreement in a Repeated
PrincipalAgent Relationship, Econometrica, Vol. 49, 1981, pp. 11271148.
[Radner, 1985] R. RADNER, Repeated PrincipalAgent Games with Discounting,
Econometrica, Vol. 53, 1985, pp. 11731198.
[Rosen, 1965] J.B. ROSEN, Existence and Uniqueness for concave nperson games,
Econometrica, 33, 1965, pp. 520534.
[Rutherford, 1995] T.F RUTHERFORD, Extensions of GAMS for complementarity
problems arising in applied economic analysis, Journal of Economic Dynamics
and Control, Vol. 19, 1995, pp. 12991324.
BIBLIOGRAPHY 125
[Selten, 1975] R. SELTEN, Rexaminition of the Perfectness Concept for Equilibrium
Points in Extensive Games, International Journal of Game Theory, Vol. 4,
1975, pp. 2555.
[Shapley. 1953] L. SHAPLEY, Stochastic Games, 1953, Proceedings of the National
Academy of Science Vol. 39, 1953, pp. 10951100.
[Shubik, 1975a] M. SHUBIK, Games for Society, Business and War, Elsevier, New
York etc. 1975.
[Shubik, 1975b] M. SHUBIK, The Uses and Methods of Gaming, Elsevier, New
York etc. , 1975.
[Simaan & Cruz, 1976a] M. SIMAAN AND J. B. CRUZ, On the Stackelberg Strategy
in NonzeroSum Games in: G. LEITMANN ed., Multicriteria Decision Making
and Differential Games, Plenum Press, New York and London, 1976.
[Simaan & Cruz, 1976b] M. SIMAAN AND J. B. CRUZ, Additional Aspects of the
Stackelberg Strategy in NonzeroSum Games in: G. LEITMANN ed., Multicri
teria Decision Making and Differential Games, Plenum Press, New York and
London, 1976.
[Sobel 1971] M.J. SOBEL, Noncooperative Stochastic games, Annals of Mathemat
ical Statistics, Vol. 42, pp. 19301935, 1971.
[Tolwinski, 1988] B. TOLWINSKI, Introduction to Sequential Games with Appli
cations, Discussion Paper No. 46, Victoria University of Wellington, Wellington,
1988.
[Tolwinski, Haurie & Leitmann, 1986] B. TOLWINSKI, A. HAURIE AND G. LEIT
MANN, Cooperative Equilibria in Differential Games, Journal of Mathematical
Analysis and Applications, Vol. 119,1986, pp.182202.
[Von Neumann 1928] J. VON NEUMANN, Zur Theorie der Gesellshaftsspiele, Math.
Annalen, 100, 1928, pp. 295320.
[Von Neumann & Morgenstern 1944] J. VON NEUMANN AND O. MORGENSTERN,
Theory of Games and Economic Behavior, Princeton: Princeton University
Press, 1944.
[Whitt, 1980] W. WHITT, Representation and Approximation of Noncooperative Se
quential Games, SIAM J. Control, Vol. 18, pp. 3348, 1980.
[Whittle, 1982] P. WHITTLE, Optimization Over Time: Dynamic Programming
and Stochastic Control, Vol. 1, J. Wiley, Chichester, 1982.
2
Contents
1
Foreword 1.1 1.2 1.3 What are Dynamic Games? . . . . . . . . . . . . . . . . . . . . . . . Origins of these Lecture Notes . . . . . . . . . . . . . . . . . . . . . Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9 9 9 10
I
2
Elements of Classical Game Theory
Decision Analysis with Many Agents 2.1 2.2 The Basic Concepts of Game Theory . . . . . . . . . . . . . . . . . . Games in Extensive Form . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 2.2.2 2.3 Description of moves, information and randomness . . . . . . Comparing Random Perspectives . . . . . . . . . . . . . . .
13
15 15 16 16 18 20 20 21 21 21
Additional concepts about information . . . . . . . . . . . . . . . . . 2.3.1 2.3.2 2.3.3 Complete and perfect information . . . . . . . . . . . . . . . Commitment . . . . . . . . . . . . . . . . . . . . . . . . . . Binding agreement . . . . . . . . . . . . . . . . . . . . . . .
2.4
Games in Normal Form . . . . . . . . . . . . . . . . . . . . . . . . 3
. . . . . . . . . . . . . . . . . .5 Cournot equilibrium . . . .4. . .2 3. . . . . . . .2. . . . Mixed strategies . . 3. . . . .2 3. . . . . . .1 3. . . . . . . . . . .3 CONTENTS Playing games through strategies . . . . . . . . . 3. Algorithms for the Computation of Nash Equilibria in Bimatrix Games . .1 3. . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . 21 22 24 3 Solution concepts for noncooperative games 3. . .3 3. . .2 2. . . . . . . . . . . . . Mixed and Behavior Strategies .2 3. . . . . A variational inequality formulation . . . A numerical technique . . .3. . . . . . . .4. . . . . . . . . .4. . . . . .3. . . .1 3. . . . . . . . . . . .4 Concave mPerson Games . . . . . . . . . . . . .1 3. . . . . . . 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 SaddlePoints . . . . . . . .3 3. . . . From the extensive form to the strategic or normal form . . . . . . . . . . . . . . . . . . . . . . . . . . .4. . .4. . . . . . . . . . . Uniqueness of Equilibrium . Algorithms for the Computation of SaddlePoints .1 The static Cournot model . . . . . . . . . . . . . . . .2 introduction . . . . . . . . .1 2. . . . . . . . . . . . . . . .4. . . . . .4 2. . . . . . . . . . . 27 27 28 31 32 34 36 37 38 Bimatrix Games .5 Existence of Coupled Equilibria . . . . . . . . . . . . . . Normalized Equilibria .5. .4. . . . . . . . . . . . 39 44 45 47 48 50 50 51 51 3. . . . Shortcommings of the Nash equilibrium concept . .3 Nash Equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . 3. . Matrix Games . . .4. . . . . . .3. 3. .4 3.
. . . . . . .8 3. . 4. . . . . . . . . . .1 4. . . . . . . . . . . Repeated concave games . . . .1 3. . . . . . . . . . . . . . . . . . . . .2. . . . . . . . . .3 Repeated games played by automata . . . . . .7. . . . 3. . . .2. . exercises . . . . . . . . . . .2 Example of a game with correlated equlibria . . 3. . . . 67 69 70 70 71 74 74 75 76 77 Folk theorem . 4. . . . . . . . . . . . . 4. . . .1 3.5. . . . . .6. . . . . . . . . . . . . . . . . . . . . . . . . . Reformulation as a game with imperfect information . . . . . . . . . .3 Example of a game with unknown type for a player . . . . . . Set of outcomes dominating the minimax point . . . . . . .CONTENTS 3. 5 52 55 55 56 59 60 60 61 63 64 65 3.2 4. . . . 3. .2 4. . .2 3. . . .6. . . .7 Bayesian equilibrium with incomplete information . . . .9 Appendix on Kakutani Fixedpoint theorem . . . . . . . . A general deﬁnition of correlated equilibria . . . . . . . . . . . . . . Minimax point . . . . . . . . . . . .5. . . . . . . .6 Correlated equilibria .3 Collusive equilibrium in a repeated Cournot game . . . . . . . . . . . . . Computing the solution of a classical Cournot model . .2 Repeated bimatrix games . . . . . .2 Formulation of a Cournot equilibrium as a nonlinear complementarity problem . . . . . . . . . . . .3 3.7. . . . . . . . . . . . . . . . . II Repeated and sequential Games 4 Repeated games and memory strategies 4. . . . .1. 3. . . .1 Repeating a game in normal form . . . . . . . .2. . .7. . . . .1 4. . . . . . A general deﬁnition of Bayesian equilibria . . . . . .1. .
. . . . . . . Dynamic programming formalism . . . . . . . .2. . . . Existence of sequential saddle points . . . . .2 Process and rewards dynamics . . . . . . .2 Markov game dynamics . . . . . .1 5. . . . . . . . . . . . . . . . . .2 5.4 6. . . . . . . . . . . . . . . . . 89 89 89 90 90 90 91 92 92 92 Sequential Games on Borel Spaces . . . . . . . . .3. .1 5. . . .6 4. . . . . . . . . . . . . . . . . .1 6. .1. . . . . . . .3 The extensive form of the game . . . . . .2 CONTENTS Finite vs inﬁnite horizon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . .1. . .5 6. . . .1. . . . . . .2. Information structure and strategies . . SobelWhitt operator formalism . . . . . . . . . .2 6. . . .2 Dynamic programming operators . . 5. . . . . . . . . .3 6. 5. Existence of Nashequilibria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . . . . . . . . . Markov strategies . . . . . . . . .1. . .1. . . .1 Sequential Game with Discrete state and action sets . . . . . . . . . . . . . . . . . . . . . . . 79 80 81 4. .3. . . . . . . . . . . . . . . . .2.2. . 5 Shapley’s Zero Sum Markov Game 5. . . A repeated stochastic Cournot game with discounting and imperfect information . .1 6. . . 6 Nonzerosum Markov and Sequential games 6. . . .4 Exercises . . . . . . . . . . . . . . .2 Description of the game . .1 5. 83 83 84 84 85 86 86 87 Shapley’sDenardo operator formalism . . . . . . . . . . . 6. . . . . . . .1 4. . . . . . . . FeedbackNash equilibrium . . Strategies . . . . . . . . . . . . . . 6.
. . .7 7. . . 104 stabilizing a linear system with a feedback control . . . . . . 101 State equations for controlled dynamical systems . . . . . . . .3 Feedback control and the stability issue . . . . . . . . .3 Regularity conditions . . . . . . . . . .1 7.2. . . . . 106 An economic interpretation of the Maximum Principle . . . . . . .1 7. .3 A stochastic repeated duopoly . . . . . 103 7. . . . . . . . . . . 7 93 93 94 97 III 7 Differential games Controlled dynamical systems 7. . . 104 7. . . . . . . . . . . . . . . . . . . . .10 Dynamic programming and the optimal feedback control . .2. . . . .3. . .3. . . . .1 7. . . . .CONTENTS 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 The case of stationary systems . . . . Interpretation as a communication device . . . . .4 7. . .1 6. . . . . . . . . . . . . . . . . . 102 7. . . . . . . . . . . 104 The optimal control paradigm . . . .2. 6. . . . . . .2 Feedback control of stationary linear systems .6 7.3. 102 The case of linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8 7.2 99 101 A capital accumulation process . . .3. . 104 A model of optimal capital accumulation . . . A class of trigger strategies based on a monitoring device . . . . . . . . . . 103 7. 105 The Euler equations and the Maximum principle . 109 . .3. . . . .5 7. . . . . . . 109 7. . . . . .3 Application to a Stochastic Duopoloy Model . . . . . . . . .2 6. . . . . . .9 Optimal control problems . . . . . . 108 Synthesis of the optimal control . . . . . .2 7. . . .
. . . . . . . . . . . . . . . 110 7.20 Information structure . . . . . . . . . .2 Piecewise openloop game. . . 116 7. . . . . . . . . . . . . . . . . . . . . . . 111 7. . . . . 110 7. . .12 Competition through capital accumulation . 111 7. . . . 118 7. . . . . 115 7. . . 118 . . . . . . 117 7. . . . . . . . . . . .14. . .2 Product Differentiation .3 A Sequential Game Reformulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19 A Game of R&D Investment . . . . . . . . . . . . . . . . . . . . .14.13 Openloop differential games . . . . . . . . . 118 7. 110 7. . . . . . . . . . . . . . . . . . . .18 Characterizing all the possible equilibria . . .1 Dynamics of R&D competition . . . .1 State variables . . . . . 111 7. . . . . . . . . . . . . . . . . . . 110 7. . 111 7. .2 A veriﬁcation theorem . . .3 Economics of innovation . . . . . . . . . .13. . . . . . . . . . . . . . . .20.13. . . . . . .17 Memory differential games . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 7. .20. . . . . . . . . . . . . . . . . . . . . . .15 Why are feedback Nash equilibria outcomes different from Openloop Nash outcomes? .14 Feedback differential games . . .1 Openloop information structure .1 Feedback information structure . . . . . 111 IV A Differential Game Model 113 7.8 CONTENTS 7. . . . . . . . .19. .16 The subgame perfectness issue . . . . . . . . . . . . .19. . . . . . . . . . . . 111 7. . . . . . . . . . . . . . 110 7. . . . . . . . . . . . . . . . . . . . . .20. . .11 Competitive dynamical systems . . . . . . .19. . . . . . . . . . . . 118 7. . . . . . . 111 7. . . . . .2 An equilibrium principle . . . . . . . . . . . . . . . . .
1 What are Dynamic Games? Dynamic Games are mathematical models of the interaction between different agents who are controlling a dynamical system. capital of knowhow for HiTech ﬁrms.2 Origins of these Lecture Notes These notes are based on several courses on Dynamic Games taught by the authors. parlor games (Chess. with a particular 9 . These notes are written for control engineers. economists or management scientists interested in the analysis of multiagent optimization problems. duel between a bomber and a jet ﬁghter). 1. Tolwinski [Tolwinski. in different universities or summer schools. positions of remaining pieces on a chess board. in particular B. These examples concern dynamical systems since the actions of the agents (also called players) inﬂuence the evolution over time of the state of a system (position and velocity of aircraft. Bridge). The notes use also some documents prepared in cooperation with other authors. The difﬁculty in deciding what should be the behavior of these agents stems from the fact that each action an agent takes at a given time will inﬂuence the reaction of the opponent(s) at later time. These notes are intended to present the basic concepts and models which have been proposed in the burgeoning literature on game theory for a representation of these dynamic interactions. economic competition (e. Such situations occur in many instances like armed conﬂicts (e.g. etc). to a variety of students in engineering.g. 1988]. investments in R&D for computer companies). economics and management science.Chapter 1 Foreword 1.
ﬁrst year graduate students. Economic models of imperfect competition. This presentation of the dynamics must be accompanied by an introduction to the speciﬁc mathematical concepts of game theory. in turn. These notes are aimed at lastyear undergraduate. to students who have not done advanced mathematics. and attractive. difference (rather than differential) equations will be used. Since we emphasize the detailed description of the dynamics of some speciﬁc systems controlled by the players we have to present rather sophisticated mathematical notions. should make a great part of the notes accessible. In these notes.g. 1.10 CHAPTER 1. relates to environmental management. quantitative economics or management science. will serve recurrently as an illustration of the concepts introduced and of the theories developed. The conﬂict situations occurring in ﬁsheries exploitation by multiple agents or in policy coordination for achieving global environmental control (e. This implies that. FOREWORD emphasis on the modeling of conﬂict situations. which is described in these notes. The objects studied in this book will be dynamic. That. 1838]. There are many good books on classical game theory. there will still be some developments involving a continuous time description of the dynamics and which have been written for readers with a stronger mathematical background. The originality of our approach is in the mixing of these two branches of applied mathematics. An interesting domain of application of dynamic games.3 Motivation There is no doubt that a course on dynamic games suitable for both control engineering students and economics or management science students requires a specialized textbook. presented as variations on the ”classic” Cournot model [Cournot. This means that the level of mathematics involved in the presentation will not go beyond what is expected to be known by a student specializing in control engineering. most of the dynamic models will be discrete time. However. The Control engineers will certainly observe that we present dynamic games as an extension of optimal control whereas economists will see also that dynamic games are only a particular aspect of the classical theory of games which is considered to have been launched in [Von Neumann & Morgenstern 1944]. in the control of a possible global warming effect) are well captured in the realm of this theory. related to control theory. for the mathematical description of the dynamics. A nonexhaustive list in . The term dynamic comes from Greek dynasthai (which means to be able) and refers to phenomena which undergo a timeevolution.
3. MOTIVATION 11 cludes [Owen. 1975b]. readers without a strong mathematical background will probably ﬁnd that book difﬁcult. however. However.1. [Shubik. [Shubik. [Basar & Olsder. 1975a]. 1989]. and more recently [Friedman 1986] and [Fudenberg & Tirole. 1982]. 1991]. 1982] does ¸ cover extensively the dynamic game paradigms. This text is therefore a modest attempt to bridge the gap. . [Aumann. they do not introduce the reader to the most general dynamic games.
FOREWORD .12 CHAPTER 1.
Part I Elements of Classical Game Theory 13 .
.
As far as we know. . a corporation. “her” and personal “he”.1 The Basic Concepts of Game Theory In a game we deal with the following concepts • Players. borrowing a term from control theory. a nation. For an exhaustive treatment of most of the deﬁnitions of classical game theory see e. simply. 1991]. this distinction is only possible through usage of the traditional grammar gender exclusive pronouns: possessive “his”. “she”. We ﬁnd that the traditional grammar better suits your purpose (to avoid) 1 15 . a political party. It is therefore proper to start our exposition with those basic concepts of game theory which provide the fundamental tread of the theory of dynamic games.Chapter 2 Decision Analysis with Many Agents As we said in the introduction to these notes dynamic games constitute a subclass of the mathematical models studied in what is usually called the classical theory of game.g. in games. Also. [Friedman 1986] and [Fudenberg & Tirole. • A move or a decision will be a player’s action. Notice that a player may be an individual. a captain of a submarine. However. we will frequently have to address an individual player’s action and distinguish it from a collective action taken by a set of several players. a pilot of an aircraft. 1975a]. • A player’s (pure) strategy will be a rule (or function) that associates a player’s move with the information available to him1 at the time when he decides which move to choose. etc. Political correctness promotes the usage of gender inclusive pronouns “they” and “their”. They will compete in the game. his control. [Owen. 1982]. in English. [Shubik. a move will be realization of a player’s control or. a set of individuals (or a team . 2.
e. These strategies are intimately linked with mixed strategies and it has been proved early [Kuhn. e. In other words. The information at the disposal of each player at the nodes where he has to select an action is described by the information structure of the game . A more rigorous deﬁnition can be given if we set the theory in the realm of decision analysis where decision trees give a representation of the dependence of outcomes on actions and uncertainties . information and randomness A game in extensive form is described by a set of players. i. This will be called the extensive form of a game. performance indices or criteria. Other names of payoffs can be: rewards. and a set of positions described as nodes on a tree structure. a set of nodes and a set of arcs) which has the structure of a tree3 and which represents the possible sequence of actions and random perturbations which inﬂuence the outcome of a game played by a set of players.g. 3 A tree is a graph where all nodes are connected but there are no cycles. 2. There is always a single path from the root to any leaf. for many games the two concepts coincide. • Payoffs are real numbers measuring desirability of the possible outcomes of the game. At each node one particular player has the right to move.e. he has to select a possible action in an admissible set represented by the arcs emanating from the node. . .2 Games in Extensive Form A game in extensive form is a graph (i. . etc. 2 A similar concept has been introduced in control theory under the name of relaxed controls. the ”leaves”. In a tree there is a single node without ”parent”. The player controls the probabilities in this random experiment. • A player’s behavioral strategy is a rule which deﬁnes a random draw of the admissible move as a function of the information available2 . including one particular player called Nature . The concepts we have introduced above are described in relatively imprecise terms. utility measures.2.1 Description of moves. called the ”root” and a set of nodes without descendants. the amounts of money the players may win (or loose). In general the player confusion and we will refer in this book to a singular genderless agent as “he” and the agent’s possession as “his”.16 CHAPTER 2. a mixed strategy consists of a random draw of a pure strategy. DECISION ANALYSIS WITH MANY AGENTS • A player’s mixed strategy is a probability measure on the player’s space of pure strategies. 1953] that. 2.
etc which can be. The game has a stopping rule described by terminal nodes of the tree. The extensive form provides indeed a very detailed description of the game. In many models we want to deal with. GAMES IN EXTENSIVE FORM 17 may not know exactly at which node of the tree structure the game is currently located. would lead to a combinatorial explosion. Nature’s moves are selected at random. The information of the second player is represented by the oval box. also called payoffs . i.2. this correponds to selecting an arc of the graph which deﬁnes a transition to a new node. Bridge . In that particular case we assume that three possible elementary events are equiprobable. We also say that this game has the simultaneous move information structure .2. In particular it provides the fundamental illustration of the dynamic structure of a game. absolutely huge. It is however rather non practical because the size of the tree becomes very quickly. The ordering of the sequence of moves.1 shows the extensive form of a twoplayer. onestage stochastic game with simultaneous moves. dif . at least theoretically. Dynamic games theory is also about sequencing of actions and reactions. An attempt to provide a complete description of a complex game like Bridge . Nevertheless extensive form is useful in many ways. Here. where another player has to select his move.e. is present in most games. When the player selects a move. even for simple games. however. the nodes marked D2 correspond to the move of Player 2. Nature is playing randomly. This representation of games is obviously inspired from parlor games like Chess . Another drawback of the extensive form description is that the states (nodes) and actions (arcs) are essentially ﬁnite or enumerable. using an extensive form. the randomness of Nature ’s play is the representation of card or dice draws realized in the course of the game. Therefore Player 2 does not know what has been the action chosen by Player 1. Among the players. Then the players are paid their rewards. etc. highlighted by extensive form. It corresponds to a situation where Player 2 does not know which action has been selected by Player 1 and vice versa. Figure 2. we will need a different method of problem description. He does not know which speciﬁc one it is. In such a context. The nodes marked E correspond to Nature’s move. For such models. His information has the following form: he knows that the current position of the game is an element in a given subset of nodes. Poker . actions and states will also often be continuous variables. correctly described in this framework. In this ﬁgure the node marked D1 corresponds to the move of player 1. The nodes represented by dark circles are the terminal nodes where the game stops and the payoffs are collected.
the players will have to compare and choose among different random perspectives in their decision making. If the player chooses action a1 he faces a random perspective of expected value 100. If the player is risk neutral he will be indifferent between the two actions.18 CHAPTER 2.2 Comparing Random Perspectives Due to Nature’s randomness. 2. If he chooses action a2 he faces a sure gain of 100.1: A game in extensive form ferent mathematical tools are used for the representation of the game dynamics. In particular.2. differential and/or difference equations are utilized for this purpose.2. If he is risk . DECISION ANALYSIS WITH MANY AGENTS ¨ Es E ¨r r r 1/3 rr s j '$ D ! ¡ 2d ¡ d ¡ Bs d a1 1/3 ¨¨¨ 1 ¡ d ¨ ¡ Es d E ¨ rr 2 ¡ r a2 ¡ 1/3 rr s j ¡ ¡ D1 ¡ e e Bs e 1/3 ¨¨¨ 1 e a2 ¨ Es e E ¨r r e r e 1/3 rr s j 2 e a1 e e D2 d d &% Bs d 1/3 ¨¨¨ d ¨ Es d E ¨ rr 2 a2 r 1/3 rr s j a1 2 B 1/3 ¨¨¨ s [payoffs] [payoffs] [payoffs] [payoffs] [payoffs] [payoffs] [payoffs] [payoffs] [payoffs] [payoffs] [payoffs] [payoffs] Figure 2. The fundamental decision structure is described in Figure 2.
2. This says that the player choices will not be affected if the utilities are modiﬁed through an afﬁne transformation. As a ﬁnal reminder of the foundations of utility theory let’s recall that the Von NeumannMorgenstern utility function is deﬁned up to an afﬁne transformation.v. .2.2: Decision in uncertainty represent the attitude toward risk of a decision maker Von Neumann and Morgenstern introduced the concept of cardinal utility [Von Neumann & Morgenstern 1944]. In order to 1/3 e. GAMES IN EXTENSIVE FORM 19 averse he will choose action a2 . However this also introduces a new way to play the game. This solves the problem of comparing random perspectives. if he is risk lover he will choose action a1 . A player can set a random experiment in order to generate his decision.=100 0 a1 E d d d 1/3 E 100 D d d 1/3 dd d d d d 200 d d d d a2 d d d 100 Figure 2. Since he uses utility functions the principle of maximization of expected utility permits him to compare deterministic action choices with random ones. If one accepts the axioms of utility theory then a rational player should take the action which leads toward the random perspective with the highest expected utility .
the one shown in Figure 2. then we say that the players have perfect information . A player has complete information if he knows • who the players are • the set of actions available to all players • all possible outcomes to all players. as e. A game with complete information and common knowledge is a game where all players have complete information and all players know that the other players have complete information. 2. if each information set consists of just one node.1 A game with simultaneous moves.3. We refer brieﬂy to the concepts of complete and perfect information. Perfect vs Imperfect Information We consider now the information available to a player when he decides about speciﬁc move.1 Complete and perfect information The information structure of a game indicates what is known by each player at the time the game starts and at each of his moves.3 Additional concepts about information What is known by the players who interact in a game is of paramount importance.20 CHAPTER 2. Complete vs Incomplete Information Let us consider ﬁrst the information available to the players when they enter a game play. DECISION ANALYSIS WITH MANY AGENTS 2. Example 2.g. is of imperfect information.3. If that is not the case the game is one of imperfect information . In a game deﬁned in its extensive form. .1.
.4.. to be binding an agreement requires an outside authority that can monitor the agreement at no cost and impose on violators sanctions so severe that cheating is prevented. . To be effective commitments have to be credible . . We denote by Γj the set This idea of playing games through the use of automata will be discussed in more details when we present the folk theorem for repeated games in Part II 4 . and the information he has received. with a binding contract that forces the implementation of the agreement.2 Commitment A commitment is an action taken by a player that is binding on him and that is known to the other players. A pure strategy γj for Player j is a mapping which transforms the information available to Player j at a decision node where he is making a move into his set of admissible actions. GAMES IN NORMAL FORM Perfect recall 21 If the information structure is such that a player can always remember all past moves he has selected. 2. .4 2. the strategy vector γ is deﬁned and the game is played as it were controlled by an automaton4 . then the game is one of perfect recall . Otherwise it is one of imperfect recall .3.3. In making a commitment a player can persuade the other players to take actions that are favorable to him. . 2.3 Binding agreement Binding agreements are restrictions on the possible actions decided by two or more players.2. Usually.1 Games in Normal Form Playing games through strategies Let M = {1. A particular class of commitments are threats . 2.4. An outcome (expressed in terms of expected utility to each player if the game includes chance nodes) is associated with a strategy vector γ. Once a strategy is selected by each player. We call strategy vector the mtuple γ = (γ)j=1..m . m} be the set of players.
In the ﬁrst move the players have no information. We can easily identify the whole set of possible strategies.2 From the extensive form to the strategic or normal form We consider a simple twoplayer game. DECISION ANALYSIS WITH MANY AGENTS of strategies for Player j. Player 2 receives the opposite values. Then the game can be represented by the m mappings Vj : Γ1 × · · · Γj × · · · Γm → I . called “matching pennies”. The player who has the move is indicated on top of the graph.1 we have identiﬁed the 12 different strategies that can be used by each of the two players in the game of Matching pennies. At the second stage. A dotted line connects the different nodes forming an information set for a player. Player 2 wins $5 and Payer 1 wins $5. T). the player who lost at stage 1 has the choice of either stopping the game or playing another penny matching with the same type of payoffs as in the ﬁrst stage (Q. Listing all strategies In Table 2. R j∈M that associate a unique (expected utility) outcome Vj (γ) for each player j ∈ M with a given strategy vector in γ ∈ Γ1 × · · · Γj × · · · Γm . If the coins match.4. 2. H.3. At ﬁrst stage each player chooses head (H) or tail (T) without knowing the other player’s choice. If the coins do not match. We have represented the information structure in a slightly different way here. The extensive form tree This game is represented in its extensive form in Figure 2. Then they reveal their choices to one another. One then says that the game is deﬁned in its normal form . Each player moves twice. The rules of the game are as follows: The game is played over two stages. . The terminal payoffs represent what Player 1 wins. in the second move they know what have been the choices made at ﬁrst stage.22 CHAPTER 2. Player 1 wins $5 and Payer 2 wins $5.
2.4.3: The extensive form tree of the matching pennies game 23 P1 P2 P1 P2 ¨ H¨ B ¨¨ ¨ ¨¨ Q¨ ¨ ¨ TX ¨¨ $$$ ¨¨ ¨ t ¨¨$$$ H Q H ¨¨$$ HX t z$$ $$$ p pp $$$ $$ TE p s$ t H T t $ p 0 p X p $$ Q $$$$ ppp T t $$$ H q$ pp I E pp z H pt pp p T pp p T t s E pp t H p t pp q T$ t p $X pp t Q $$$$ p $$ t pp I $ It $ t pp H z t H E H p t pp T p t p E T T ppt t t p s rr H r q Q rrt TE j T E z t H pp p H spt T z r rr T rr z rr H rr j Tr B ¨¨ ¨¨ ¨¨ P1 5 10 0 0 10 5 0 10 10 0 5 0 10 10 0 5 10 0 0 10 Payoff matrix In Table 2. GAMES IN NORMAL FORM Figure 2.2 we have represented the payoffs that are obtained by Player 1 when both players choose one of the 12 possible strategies. .
p xjk ≥ 0. DECISION ANALYSIS WITH MANY AGENTS Strategies of Player 1 Strategies of Player 2 1st scnd move 1st scnd move move if player 2 move if player 1 has played move has played H T H T H Q H H H Q H Q T H T Q H H H H H H H H T H T H H T H H H T H T T H T T T H Q T Q H T T Q T Q T T H H T H H T T H T H T T H T T T H T T T T T T Table 2. This introduces one supplementary chance move in the game description. . k = 1. .. R . We note that the set Xj is compact and convex in I p .3 Mixed and Behavior Strategies Mixing strategies Since a player evaluates outcomes according to his VNMutility functions he can envision to “mix” strategies by selecting one of them randomly. . . according to a lottery that he will deﬁne. . k = 1. For example. ..1: List of strategies 1 2 3 4 5 6 7 8 9 10 11 12 2.. k=1 xjk = 1. Now the possible choices of action by Player j are elements of the set of all the probability distributions p Xj = {xj = (xjk )k=1.4. . . if Player j has p pure strategies γjk . p he can select the strategy he will play through a lottery which gives a probability xjk to the pure strategy γjk . p..24 CHAPTER 2.
establishes that these two ways of introding randomness in the choice of actions are equivalent in a large class of games. GAMES IN NORMAL FORM 1 5 5 0 10 0 0 5 5 5 5 5 5 2 3 4 5 6 5 5 5 5 5 5 5 5 5 5 10 0 10 10 0 0 10 0 10 0 10 0 10 0 10 10 0 10 0 10 5 0 0 10 10 5 10 10 0 0 5 0 0 10 10 5 10 10 0 0 5 0 0 10 10 5 10 10 0 0 7 8 9 10 11 12 5 5 0 0 10 10 5 5 10 10 0 0 5 5 0 0 10 10 5 5 10 10 0 0 5 5 0 0 10 10 5 5 10 10 0 0 5 5 5 5 5 5 5 5 5 5 5 5 10 0 10 0 10 0 10 0 10 0 10 0 0 10 0 10 0 10 0 10 0 10 0 10 25 1 2 3 4 5 6 7 8 9 10 11 12 Table 2. 1953].4. that we give without proof. Theorem 2. In summary we can say that a behavior strategy is a strategy that includes randomness at each decision node. . this design being contingent to the information available at this node. In a mixed strategy.1 In an extensive game of perfect recall all mixed strategies can be represented as behavior strategies. according to a carefully designed lottery. at random.2. In a behavior strategy the player designs a strategy that consists in deciding at each decision node.4. according to a carefully designed lottery. the player considers the set of possible strategies and picks one.2: Payoff matrix Behavior strategies A behavior strategy is deﬁned as a mapping which associates with the information available to Player j at a decision node where he is making a move a probability distribution over his set of actions. The difference between mixed and behavior strategies is subtle. A famous theorem [Kuhn.
26 CHAPTER 2. DECISION ANALYSIS WITH MANY AGENTS .
and for each player j ∈ M . Recall that an mperson game in normal form is deﬁned by the following data {M. will determine the payoffs of every player. To speak of a solution concept for a game one needs. they don’t enter into negotiation for achieving a common course of action. j ∈ M . In this chapter we shall study different classes of games in normal form. . .Chapter 3 Solution concepts for noncooperative games 3. However they will not cooperate. ﬁrst of all.1 introduction In this chapter we consider games described in their normal form and we propose different solution concept under the assumption that the players are non cooperating.. . their own and the actions of the other players.. The solution of an mplayer game will thus be a set of strategy vectors γ that have attractive properties expressed in terms of the payoffs received by the players. They know how the actions. 2. They know that they are playing a game. is the payoff function that assigns a real number Vj (γ) with a strategy vector γ ∈ Γ1 × Γ2 × . M = {1.. to describe the game in its normal form. (Γi ). (Vj ) for j ∈ M }. m}. Γj is the set of strategies (also called the strategy space ) and Vj . × Γm . where M is the set of players. In noncooperative games the players do not communicate between each other. The ﬁrst category consists in the socalled matrix games describing a situation where two players are in a complete antagonistic situation since what a player gains the other 27 .
Conventionally.1 A game is zerosum if the sum of the players’ payoffs is always zero...28 CHAPTER 3.2. but where the payoffs are not zerosum. The set of all possible payoffs that Player 1 can obtain is represented in the form of the m × n matrix A with entries aij for i = 1. Player 1 is the maximizer and has m (pure) strategies.2 Matrix Games Deﬁnition 3. again with a ﬁnite strategy set for each player. Now.. uniqueness and stability results for a noncooperative game solution concept called equilibrium .1 Consider a game deﬁned by the following matrix: 3 1 8 4 10 0 Negative payments are allowed.. We could have said also that Player 1 receives the amount aij and Player 2 receives the amount −aij . . Player 2 pays Player 1 the amount of money speciﬁed by the element of the matrix in the selected row and column. We number the players 1 and 2 respectively.2.2 A twoplayer zerosum game in which each player has only a ﬁnite number of actions to choose from is called a matrix game. Player 1 (the maximizer) selects rows of A while Player 2 (the minimizer) selects columns of that matrix. A twoplayer zerosum game is also called a duel.. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES player looses. 2. will be the socalled concave games that encompass the previous classes of matrix and bimatrix games and for which we will be able to prove nice existence. say j = 1. say i = 1.. These are the nonzerosum matrix games or bimatrix games . Let us explore how matrix games can be “solved”... . 2. Otherwise the game is nonzerosum. 3. Matrix games are also called two player zerosum ﬁnite games . 1 . the element in the i−th row and j−th column of the matrix A corresponds to the amount that Player 2 will pay Player 1 if the latter chooses strategy i and the former chooses strategy j. 2. n. and as the result of the play. Deﬁnition 3. 2. . m and j = 1. and Player 2 is the minimizer and has n strategies to choose from. .. then Player 2 pays Player 1 the amount aij 1 . and where each player has a ﬁnite choice of strategies.2. The third category. The second category will consist of two player games. Thus one can say that in the game under consideration. n. If Player 1 chooses strategy i while Player 2 picks strategy j. m... Example 3..
2) which holds for all possible k and l. Thus we say that Player 1’s security level is 1 which is ensured by the choice of the ﬁrst row. Player 1 risks getting 0. then. let (i∗ .1 The following inequality holds max min aij ≤ min max aij . j ∗ ) and (i∗ . i j j i (3. while Player 2’s security level is 4 and it is ensured by the choice of the ﬁrst column.4) A util is the utility unit. they necessarily satisfy the inequality (3. j∗ ) be deﬁned by ai∗ j ∗ = max min aij (3. while the choice of the second or third column may cost him 10 or 8. It is easy to see that if Player 1 chooses the ﬁrst row. since both security levels are achievable. whatever Player 2 does.2. Symmetrically. MATRIX GAMES The question now is what can be considered as players’ best strategies. the strategy which ensures that Player 2 will not have to pay more than his security level is called his minimax strategy . More formally. Notice that 1 = max min aij i j and 4 = min max aij j i which is the reason why the strategy which ensures that Player 1 will get at least the payoff equal to his security level is called his maximin strategy . Similarly. . on the other hand. Player 1 will get a payoff equal to at least 1 (util2 ). Lemma 3.1) Proof: The proof of this result is based on the remark that.3. respectively. by choosing the ﬁrst column Player 2 ensures that he will not have to pay more than 4.3) i j and ai∗ j∗ = min max aij j i 2 (3. We note the obvious set of inequalities min akj ≤ akl ≤ max ail j i (3. By choosing the second row.2.1). We give below a more detailed proof. 29 One possibility is to consider the players’ security levels .
then his best choice will be the minimax strategy and he will have to pay 4. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES respectively. 60} = −15 j i and that the pair of maximin and minimax strategies is given by (i. Then. If the situation is reversed and it is Player 2 who moves ﬁrst. Now the question is what happens if the players move simultaneously. 30 −45 60 Can we ﬁnd satisfactory strategy pairs? It is easy to see that max min aij = max{−15. That means that Player 1 should choose the ﬁrst row while Player 2 should select the second column. −30. 2). The careful study of the example shows that when the players move simultaneously the minimax and maximin strategies are not satisfactory “solutions” to this game. . Example 3. In the result of that we will see a process which in this case will not converge to any solution. Consider now another example. Notice that the players may try to improve their payoffs by anticipating each other’s strategy. −45} = −15 i j and min max aij = min{30. we can see that the players’ maximin and minimax strategies “solve” the game in the sense that the players will be best off if they use these strategies. i j j i QED.2) with k = i∗ and l = j∗ we get max(min aij ) ≤ ai∗ j∗ ≤ min(max aij ).2. which will lead to the payoff equal to 15.2 Let the matrix game A be given as follows 10 −15∗ 20 20 −30 40 . Now consider the payoff ai∗ j∗ . by (3. j) = (1. then the maximin strategy is Player 1’s best choice which leads to the payoff equal to 1. −15.30 CHAPTER 3. In the above example. An important observation is that if Player 1 has to move ﬁrst and Player 2 acts having seen the move made by Player 1.
i. j ∗ ) such that.3 If in a matrix game A = [aij ]i=1. . Player 2). . . . . Deﬁnition 3.. n ai∗ j ≥ min ai∗ j = max min aij j i j (3.2. As an immediate consequence. Saddle point strategies provide a solution to the game problem even if the players move simultaneously. the security levels of the two players are equal.2. if the security levels are equal then there exists a saddle point. Lemma 3.1 SaddlePoints Let us explore in more depth this class of strategies that solve the zerosum matrix game. Indeed. i j j i What is less obvious is the fact that. we see that.2. ... in Example 3. max min aij = min max aij = ai∗ j ∗ . MATRIX GAMES 31 3. . i j i Since max min aij = min max aij = ai∗ j ∗ = v i j j i by (3.7) we obtain aij ∗ ≤ ai∗ j ∗ ≤ ai∗ j which is the saddle point condition. in a matrix game. . . . j ∗ ) is a saddle point in pure strategies for the matrix game. QED.5) we say that the pair (i∗ . n aij ∗ ≤ ai∗ j ∗ ≤ ai∗ j (3..2. . Proof: Let i∗ and j ∗ be a strategy pair that yields the security level payoffs v (resp. −v) for Player 1 (resp. if Player 1 expects Player 2 to choose . . . at a saddle point of a zerosum game. . the following holds max min aij = min max aij = v i j j i then the game admits a saddle point in pure strategies.2.7) aij ∗ ≤ max aij ∗ = min max aij . for all i1.. We thus have for all i = 1.2.2 If. m and j = 1. ...3.n. .6) (3.e.. there exists a pair (i∗ .j=1. m and j1.6)(3. .m.
lead to both an equilibrium and a pair of guaranteed payoffs.2. We have noticed that. R . and x1 + x2 + . The set of possible mixed strategies of Player 1 constitutes a simplex in the space I m . y2 . . as shown in Example 3. Player 2 has n strategies)... Note that a pure strategy can be considered as a particular mixed strategy with one coordinate equal to one and all others equal to zero. + ym = 1. Similarly the set of mixed strategies of Player 2 is a simplex in I m . n... then the ﬁrst row will be his optimal choice..1 Using strategies i∗ and j ∗ .2. We call such strategies an equilibrium. and y1 + y2 + . a mixed strategy for Player 2 is an ntuple y = (y1 . Therefore such a strategy pair..2.2. if Player 2 expects Player 1 to choose the ﬁrst row. the smallest closed convex set that contains n + 1 R points in I n . 2.1.. However Von Neumann proved that saddle point strategy pairs exist in the class of mixed strategies . m. (As before.32 CHAPTER 3. yn ) where yj are nonnegative for j = 1. Players 1 and 2 cannot improve their payoff by unilaterally deviating from (i)∗ or ((j)∗ ) respectively. + xm = 1. Saddle point strategies. . x2 . many matrix games do not possess saddle points in the class of pure strategies.. A reason to introduce mixed strategies in a matrix game is to enlarge the set of possible choices.. xm ) where xi are nonnegative for i = 1. neither player can gain anything by unilaterally deviating from his saddle point strategy. which is “good” in that rational players are likely to adopt this strategy pair.1 for m = 3.. 2. Player 1 has m strategies. . Similarly.2 Mixed strategies We have already indicated in chapter 2 that a player could “mix” his strategies by resorting to a lottery to decide what to play. Consider the matrix game deﬁned by an m × n matrix A.. then it will be optimal for him to choose the second column. 3. A simplex is. This is illustrated in R Figure 3. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES the second column. On the other hand. A mixed strategy for Player 1 is an mtuple x = (x1 . Each strategy constitutes the best reply the player can have to the strategy choice of his opponent.2. provides a solution to a matrix game.. if it exists. like in Example 3. This observation leads to the following deﬁnition.. ... by construction. In other words.. Remark 3.
3.2. MATRIX GAMES
33
x3 1 T
£ £ £
£ £ £ £ £ 1 )
£
£
£
£
£
E x
1
2
x1
Figure 3.1: The simplex of mixed strategies
The interpretation of a mixed strategy, say x, is that Player 1, chooses his pure strategy i with probability xi , i = 1, 2, ..., m. Since the two lotteries deﬁning the random draws are independent events, the joint probability that the strategy pair (i, j) be selected is given by xi yj . Therefore, with each pair of mixed strategies (x, y) we can associate an expected payoff given by the quadratic expression (in x. y)3
m n
xi yj aij = xT Ay.
i=1 j=1
The ﬁrst important result of game theory proved in [Von Neumann 1928] showed the following Theorem 3.2.1 Any matrix game has a saddle point in the class of mixed strategies, i.e. , there exist probability vectors x and y such that max min xT Ay = min max xT Ay = (x∗ )T Ay ∗ = v ∗
x y y x
where v ∗ is called the value of the game.
3
The superscript T denotes the transposition operator on a matrix.
34
CHAPTER 3. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES
We shall not repeat the complex proof given by von Neumann. Instead we shall relate the search for saddle points with the solution of linear programs. A well known duality property will give us the saddle point existence result.
3.2.3
Algorithms for the Computation of SaddlePoints
Matrix games can be solved as linear programs. It is easy to show that the following two relations hold: v = max min x Ay = max min
x y x j i=1 n ∗ m T
xi aij
(3.8)
and z ∗ = min max xT Ay = min max y x y
i
yj aij
j=1
(3.9)
These two relations imply that the value of the matrix game can be obtained by solving any of the following two linear programs: 1. Primal problem max subject to v ≤ 1 =
i=1
v
m
xi aij , j = 1, 2, ..., n
i=1 m
xi
xi ≥ 0, i = 1, 2, ...m 2. Dual problem min subject to z ≥ 1 =
j=1
z
n
yi aij , i = 1, 2, ..., m
j=1 n
yj
yj ≥ 0, j = 1, 2, ...n The following theorem relates the two programs together. Theorem 3.2.2 (Von Neumann [Von Neumann 1928]): Any ﬁnite twoperson zerosum matrix game A has a value
3.2. MATRIX GAMES
35
proof: The value v ∗ of the zerosum matrix game A is obtained as the common optimal value of the following pair of dual linear programming problems. The respective optimal programs deﬁne the saddlepoint mixed strategies. Primal max v subject to xT A ≥ v1T xT 1 = 1 x≥0 where
Dual min z subject to A y ≤ z1 1T y = 1 y≥0
1= 1 . . . 1
1 . . .
denotes a vector of appropriate dimension with all components equal to 1. One needs to solve only one of the programs. The primal and dual solutions give a pair of saddle point strategies. • Remark 3.2.2 Simple n × n games can be solved more easily (see [Owen, 1982]). Suppose A is an n × n matrix game which does not have a saddle point in pure strategies. The players’ unique saddle point mixed strategies and the game value are given by: 1T AD x= T D (3.10) 1 A 1 AD 1 (3.11) y= T D 1 A 1 detA v= T D (3.12) 1 A 1 where AD is the adjoint matrix of A, det A the determinant of A, and 1 the vector of ones as before.
Let us illustrate the usefulness of the above formulae on the following example. Example 3.2.3 We want to solve the matrix game 1 0 −1 2 .
. (As aij and bij are the players’ payoff this conclusion agrees with Deﬁnition 3. Hence the best mixed strategies for the players are: 3 1 x = [ .) Example 3. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES The game. The Player 2’s best strategy will be to use the ﬁrst and the second column 50% of times which ensures him a loss of .5 (only. The payoffs for the two players corresponding to all possible combinations of pure strategies can be represented by two m × n payoff matrices A and B with entries aij and bij respectively (from here the name). has no saddle point (in pure strategies).1 Consider the bimatrix game deﬁned by the following matrices 52 44 44 42 46 39 and 50 44 41 42 49 43 . In a bimatrix game there are two players.2. the game is a zerosum matrix game. det A = 2.5 if he uses the ﬁrst row 75% of times and the second 25% of times. Now. Notice that a (zerosum) matrix game is a bimatrix game where bij = −aij . ] 2 2 3.36 CHAPTER 3. 1AD 1T = 4. Otherwise. AD 1T = [2 2]. the game is nonzerosum. where aij and bij are some given numbers. using other strategies he is supposed to lose more).1. 4 4 and the value of the play is: 1 2 In other words in the long run Player 1 is supposed to win . v= 1 1 y=[ . if the players select a pair of pure strategies.3 Bimatrix Games We shall now extend the theory to the case of a nonzero sum game. When aij + bij = 0.3. j). say (i. The adjoint AD is 2 0 1 1 and 1AD = [3 1]. ]. say Player 1 and Player 2 who have m and n pure strategies to choose from respectively. obviously. then Player 1 obtains the payoff aij and Player 2 obtains bij . A bimatrix game conveniently represents a twoperson nonzero sum game where each player has a ﬁnite set of possible pure strategies.
Therefore we say that the outcome (52. let us expand the strategy sets to include mixed strategies.3. Deﬁnition 3. In the above bimatrix some cells have been indicated with asterisks ∗. the best reply by Player 2 is to choose strategy “column 1”. no player can improve his payoff by deviating unilaterally from his equilibrium strategy.1 A pair of mixed strategies (x∗ . 49)∗ (39. For zerosum matrix games we have shown the existence of a saddle point in mixed strategies which exhibits equilibrium properties. 41) (42. However there are other examples where no equilibrium can be found in pure strategies. 3. y ∗ ) is said to be a Nash equilibrium of the bimatrix game if 1. BIMATRIX GAMES 37 It is often convenient to combine the data contained in two matrices and write it in the form of one matrix whose entries are ordered pairs (aij . In this case one obtains (52.column 2”. The same concept will be deﬁned later on for a more general mplayer case.1 Nash Equilibria Assume that the players may use mixed strategies .3.3. (x∗ )T A(y ∗ ) ≥ xT A(y ∗ ) for every mixed strategy x. We already see on this simple example that a bimatrix game can have many equilibria. 50)∗ (44. . a concept that we shall introduce now. So. 44) (44. In an equilibrium. and vice versa. 50) is associated with the equilibrium pair “row 1. as we have already done with (zerosum) matrix games. 43) . They correspond to outcomes resulting from equilibra . The situation is the same for the outcome (46. If Player 1 chooses strategy “row 1”. and 2.column 1”. 49) that is associated with another equilibrium pair “row 2. (x∗ )T B(y ∗ ) ≥ (x∗ )T By for every mixed strategy y. 42) (46. We formulate now the Nash equilibrium concept for the bimatrix games.3. bij ).
Indeed if his opponent does not play “well”.3. there is no guarantee.1 Every ﬁnite bimatrix game has at least one Nash equilibrium in mixed strategies . the outcome of a player can be anything. Another important step in the development of the theory of games has been the following theorem [Nash.• 3. none of which dominates the other. ie. It is difﬁcult to decide how this game should be played if the players are to arrive at their decisions independently of one another.3. i.e. does not use the equilibrium strategy. The nonuniqueness of Nash equilibria for bimatrix games is a serious theoretical and practical problem. gives both players higher payoffs. In Example 3. Player 1 will obviously prefer the solution (1.2 Shortcommings of the Nash equilibrium concept Multiple equilibria As noticed in Example 3. 0) (1. while Player 2 will rather have (2.3. 1).2 Consider the following bimatrix game (2. 1951] Theorem 3. . the equilibrium strategy for a player does not guarantee him that he will receive at least the equilibrium payoff.. Thus. 1)∗ (0.3. in a nonzero sum context.1 a bimatrix game may have several equilibria in pure strategies. Example 3. Moreover. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES Remark 3. Proof: A more general existence proof including the case of bimatrix games will be given in the next section. It is easy to deﬁne examples where the situation is not so clear.38 CHAPTER 3.3.1 A Nash equilibrium extends to nonzero sum games the equilibrium property that was observed for the saddle point solution to a zero sum matrix game. j) = (1. 1). 2).1 one equilibrium strictly dominates the other. 0) (0.3. The big difference with the saddle point concept is that. 2)∗ It is easy to see that this game has two equilibria (in pure strategies). There may be additional equilibria in mixed strategies as well. it can be argued that even without any consultations the players will naturally pick the strategy pair (i.
By agreeing to give evidence against the other guy. 1982]) which permit us to ﬁnd an equilibrium of simple bimatrix games. Nash equilibrium of this game is given by the pair of pure strategies (agree − to − testif y. a suspect can shorten his sentence by half.3 Suppose that two suspects are held on the suspicion of committing a serious crime. agree − to − testif y) with the outcome that both suspects will spend ﬁve years in prison. 1989]. Of course. which however is not an equilibrium and thus is not a realistic solution of the problem when the players cannot make binding agreements. [Owen. 2) (1. 10) agrees to testify (10.1 with the entries giving the length of the prison sentence for each suspect. the prisoners are held in separate cells and cannot communicate with each other. In this case. This outcome is strictly dominated by the strategy pair (ref use − to − testif y. The unique Suspect I: refuses agrees to testify Suspect II: refuses (2. BIMATRIX GAMES The prisoner’s dilemma 39 There is a famous example of a bimatrix game. the players are assumed to minimize rather than maximize the outcome of the play. 1) (5.3 Algorithms for the Computation of Nash Equilibria in Bimatrix Games Linear programming is closely associated with the characterization and computation of saddle points in matrix games. This example shows that Nash equilibria could result in outcomes being very far from efﬁciency. There are also a few algorithms (see [Aumann. Example 3.3. The situation is as described in Table 3. 1964] and complementarity problem [Lemke & Howson 1964] formulations. in every possible situation. 3. .3. We will show one for a 2 × 2 bimatrix game and then introduce the quadratic programming [Mangasarian and Stone. For bimatrix games one has to rely to algorithms solving either quadratic programming or complementarity problems. that is used in many contexts to argue that the Nash equilibrium solution is not always a “good” solution to a noncooperative game. Each of them can be convicted only if the other provides evidence against him.3.1: The Prisoner’s Dilemma.3. 5)∗ Table 3. otherwise he will be convicted as guilty of a lesser charge. ref use − to − testif y).
Links between quadratic programming and Nash equilibria in bimatrix games Mangasarian and Stone (1964) have proved the following result that links quadratic programming with the search of equilibria in bimatrix games.e.e. 1 − y ∗ ) is an equilibrium in mixed strategies. 0) (0. 100(1 − y)% times use second row) such that Player 2 will get as much payoff using ﬁrst column as using second column i. 4 3 4 Then the pair of mixed strategies (x∗ . 3 This is true for x∗ = 1 . respectively. Let us ﬁnd a mixed strategy equilibrium. 3 Symmetrically. ( 2 ) and ( 1 ). 1 y + 0(1 − y) = y + 1(1 − y). assume Player 1 is using a strategy x (i.e. (y ∗ . 2 This is true for y ∗ = 2 . (1. 100(1 − y)% times use second column) in such a way that Player 1 (in equilibrium) will get as much payoff using ﬁrst row as using second row i.3. 1 0x + (1 − x) = 1x + 0(1 − x). The players’ payoffs will be. 1 − x∗ ). SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES Equilibrium computation in a 2 × 2 bimatrix game For a simple 2 × 2 bimatrix game one can easily ﬁnd a mixed strategy equilibrium as shown in the following example. 100y% of times use ﬁrst column.40 CHAPTER 3. 0) 2 3 Notice that this game has no pure strategy equilibrium. Example 3. 1) ( 1 . Assume Player 2 chooses his equilibrium strategy y (i.e.4 Consider the game with payoff matrix given below. 100x% of times use ﬁrst row. Consider a bimatrix . 1 ) (1.
v∗ . We know that this equilibrium is a solution to the quadratic programming problem 1 2 (7.19) Lemma 3.1) a value 0.17) and (3.e.1). Then the quadruple (x.13) (3.17) (3. B). Hence the optimal program (x∗ .1)(3.18) (3. We know that an equilibrium exists for a bimatrix game (Nash theorem).16) (3. y T x 1m y T 1n v1.2)(3. BIMATRIX GAMES game (A.19) (ii) (x.14) (3. [xT Ay + xT By − v 1 − v 2 ] ≤ ≤ ≥ = = ∈ v 1 1m v 2 1n 0 1 1 I . R 41 (3.3. ∗ .1 The following two assertions are equivalent (i) (x.19). by (3. v∗ ) must also give a value 0 to the objective function and thus be such that 1 2 xT Ay∗ + xT By∗ = v∗ + v∗ . y∗ . i. v 2 ) is a solution to the quadratic programming problem (3. v∗ . v 2 ). v 1 . v∗ ) be a solution to the quadratic programming problem (7. let (x∗ . Assume that (x.3. y) is an equilibrium for the bimatrix game. We associate with it the quadratic program max s. 1 2 Conversely. y. v2.19).(3. satisﬁes (7.20) For any x ≥ 0 and y ≥ 0 such that xT 1m = 1 and y T 1n = 1 we have.t.18) 1 xT Ay∗ ≤ v∗ 2 xT By ≤ v∗ .19) with optimal value 0. Hence the equilibrium deﬁnes a solution to the quadratic programming problem (7. and gives to the objective function (7.1)(3. y∗ . Ay BT x x. y. Hence the maximum of the program is at most 0. y.3.(3.15) (3. v 1 . y) is an equilibrium for the bimatrix game. ∗ ∗ (3. v 1 = xT Ay.13). v 2 = xT By) is feasible.19). Proof: From the constraints it follows that xT Ay ≤ v 1 and xT By ≤ v 2 for any feasible (x.
17) and (3. since VNM utilities are deﬁned up to an increasing afﬁne transformation. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES In particular we must have 1 xT Ay∗ ≤ v∗ ∗ 2 xT By∗ ≤ v∗ ∗ These two conditions with (3.20) imply 1 xT Ay∗ = v∗ ∗ 2 xT By∗ = v∗ . There is no loss in generality if we assume that the payoff matrices are m × n and have only positive entries (A > 0 and B > 0). A strategy for Player 1 is deﬁned as a vector x ∈ I m that satisﬁes R x ≥ 0 xT 1m = 1 and similarly for Player 2 y ≥ 0 y 1n = 1.23) (3.e. by (3. QED A Complementarity Problem Formulation We have seen that the search for equilibria could be done through solving a quadratic programming problem. .42 CHAPTER 3.18) xT Ay∗ ≤ xT Ay∗ ∗ T x∗ By ≤ xT By∗ ∗ and hence. ∗ Therefore we can conclude that For any x ≥ 0 and y ≥ 0 such that xT 1m = 1 and y T 1n = 1 we have. Here we show that the solution of a bimatrix game can also be obtained as the solution of a complementarity problem .21)(3.22) (3.24) It is easily shown that the pair (x∗ .21) (3. y ∗ ) satisfying (3.24) is an equilibrium iff (x∗ A y ∗ )1m ≥ A y ∗ T (x∗ B y ∗ )1n ≥ B T x∗ T (A > 0) (B > 0) (3.25) i. y∗ ) is a Nash equilibrium for the bimatrix game. if the equilibrium condition is satisﬁed for pure strategy alternatives only. T (3. This is not restrictive. (x∗ .
33) (3.31) (3.27).26) of constraints is equivalent to the system (3.35) (3.(3. This algorithm applies also to quadratic programming . x∗ B y ∗ = v2 . This shows that the above system (3. so this conﬁrms that the solution of a bimatrix game is of the same level of difﬁculty as solving a quadratic programming problem.34) of a socalled a complementarity problem .21)(3. (3.3.32) (3.28) (3.25). 2 (3. (3. s2 = y/v1 and introduce the slack variables u1 and u2 . Deﬁne s1 = x/v2 .36) .3.24). Remark 3.24) and (3.3.21)(3. A pivoting algorithm ([Lemke & Howson 1964].30) Introducing four obvious new variables permits us to rewrite (3.26) can be rewritten u1 u2 = 0 = 0 ≤ 0 ≤ 1m 1n u1 u2 u1 u2 s1 s2 T − 0 A BT 0 s1 s2 s1 s2 (3.30) in the generic formulation u 0 u s = = ≥ ≥ q + Ms uT s 0 0.27) (3.2 Once we obtain x and y.30) we shall have to reconstruct the strategies through the formulae x = s1 /(sT 1m ) 1 y = s2 /(sT 1n ). they simplify to x∗ A y ∗ = v1 . [Lemke 1965]) has been proposedto solve such problems. solution to (3. y ∗ ) satisfying (3. For mixed strategies T T (x∗ . the system of constraints (3.29) T . T y ∗ (B T x∗ − v2 1n ) = 0 T 43 (3.26) The relations on the right are called complementarity constraints .27)(3. BIMATRIX GAMES Consider the following set of constraints with v1 ∈ I and v2 ∈ I R R v1 1m ≥ A y ∗ v2 1n ≥ B T x∗ and x∗ (A y ∗ − v1 1m ) = 0 .
R where ψj : U1 × . . and gets a payoff ψj (u1 . um ) must be in U. by a continuous functions which is concave w. where mj is a given integer. . . This structure is thus generalized in two ways: (i) the strategies can be vectors constrained to be in a more general compact convex set and (ii) the payoffs are represented by more general continuousconcave functions. . . in a situation where one assumes that the players cannot communicate or enter into preplay negotiations. . . uj . The constraint is that the joint action u = (u1 .r. how will a given player choose among the different strategies corresponding to the different equilibrium candidates? In single agent optimization theory we know that strict concavity of the (maximized) objective function and compactness and convexity of the constraint set lead to existence and uniqueness of the solution. . . his own strategic variable. If there are many equilibria. . . a compact convex set. m} controls the action uj ∈ Uj a compact convex subset of I mj . • A coupled constraint is deﬁned as a proper subset U of U1 × . .r. 1965] dealing with concave mperson game . × Um → I is continuous in each ui and concave R in uj . . . × Uj × . . Let us thus introduce the following game in strategic form • Each player j ∈ M = {1. and the payoffs are bilinear or multilinear forms of the strategies. × Uj . for each player the payoff is concave w. for each player.t. i. . introduced in previous sections. × Um . The following question thus arises Can we generalize the mathematical programming approach to a situation where the optimization criterion is a NashCournot equilibrium? can we then give sufﬁcient conditions for existence and uniqueness of an equilibrium solution? The answer has been given by Rosen in a seminal paper [Rosen.44 CHAPTER 3. . . um ). A concave mperson game is described in terms of individual strategies represented by vectors in compact subsets of Euclidian spaces (I mj for Player j) and by R payoffs represented. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES 3. in a matrix or a bimatrix game the mixed strategies of a player are represented as elements of a simplex . . . . Indeed. . .e. .4 Concave mPerson Games The nonuniqueness of equilibria in bimatrix games and a fortiori in mplayer matrix games poses a delicate problem. hence. his own strategic variable.t. . . This is a generalization of the concept of mixed strategies.
um ) are element of a larger set in U1 × . . This may look awkward in the context of nonocooperative games where thw player cannot enter into communication or cannot coordinate their actions. . u∗ . . .3. . . . . u∗ ) 1 m ∗ ∗ s. . u. vj . . This function is continuous in u and concave in v for every ﬁxed u. even if u and v are in U. . . u∗ . u∗ . u∗ ) 1 j m for all uj ∈ Uj ≥ ψj (u∗ . . . However the concept is mathematically well deﬁned. . . . Let us ﬁrst show that an equilibrium is actually deﬁned through a ﬁxed point condition. .4.40) where rj > 0. (u1 . One can think for example of a global emission constraint that is imposed on a ﬁnite set of ﬁrms that are competing on the same market. . um ). . . . . . . r) = j=1 rj ψj (u1 .1 An equilibrium . j = 1. the combined vectors (u1 . This function is helpful as shown in the following result.4. . . . . . . Now each player’s strategy space may depend on the strategy of the other players. um ) ∈ U. For the moment we could take as well rj ≡ 1. .1 Existence of Coupled Equilibria Deﬁnition 3. (3. . . . v. uj . . .1 Let u∗ ∈ U be such that θ(u∗ . . . . r). .t. u∈U (3. . . Lemma 3. . . × Um . r) = max θ(u∗ . . Notice that.2 A coupled equilibrium is a vector u∗ such that ψj (u∗ ) = max{ψj (u∗ . We shall see later on that it ﬁts very well some interesting aspects of environmental management. . This environmental example will be further developed in forthcoming chapters. CONCAVE M PERSON GAMES 45 Deﬁnition 3. . u∗ ) ∈ U}. . . uj . under the coupled constraint U is deﬁned as a decision mtuple (u∗ .38) Remark 3. m are given weights. uj . . vj . u∗ ) ∈ U such that for each player j ∈ M 1 j m ψj (u∗ .4.41) Then u∗ is a coupled equilibrium. . (3. 1 m 1 m uj (3.4. . . . . .4. . u∗ )(u∗ . .39) At such a point no player can improve his payoff by a unilateral change in his strategy which keeps the combined vector in U.37) (3. . . The precise role of this weighting scheme will be explained later. For that purpose we introduce a socalled global reaction function θ : U ×U → I R deﬁned by m θ(u. . 3. . . .1 The consideration of a coupled constraint is a new feature. . We call it a reaction function since the vector v can be interpreted as composed of the reactions of the different players to the given vector u. uj . .4. . . . .
r). 1. . i. 1 m ¯ Then we shall also have θ(u∗ . 2. w∈U (3. u∗ ∈ Γ(u∗ . for one player. i. However. u .1 For any positive weighting r there exists a ﬁxed point of Γ(·.41)). . the deﬁnition of a normalised equilibrium introduced by Rosen establishes a link between mathematical programming and concave games with coupled constraints. . A ﬁxed point of Γ(·.41) but is not a coupled equilibrium. u∗ ) ∈ U 1 m such that ψ (u∗ . r) = {vθ(u. By Lemma 3. u∗ ) > ψ (u∗ ). . To make more precise the ﬁxed point argument we introduce a coupled reaction mapping . It uses a ﬁxedpoint result which is “topological” and not “constructive”.e. .42) is called the coupled reaction mapping associated with the positive weighting r. a point u∗ s.8. . . it associates with an equilibrium an implicit maximization problem . say . Theorem 3. does not satisfy (3. in spirit. . u) > θ(u∗ .t. QED This result has two important consequences. to the theorem of Nash. Deﬁnition 3.46 CHAPTER 3. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES Proof: Assume u∗ satisﬁes (3. i.e.2 This existence theorem is very close. It shows that proving the existence of an equilibrium reduces to proving that a ﬁxed point exists for an appropriately deﬁned reaction mapping (u∗ is the best reply to u∗ in (3.4. . . w. v.4. u∗ ) which is a contradiction to (3. deﬁned in (3.4. r) is a coupled equilibrium. Proof: The proof is based on the Kakutani ﬁxedpoint theorem that is given in the Appendix of section 3. r) is a vector u∗ such that u∗ ∈ Γ(u∗ .e. r) = max θ(u. r).4. One is required to show that the point to set mapping is upper semicontinuous.3 The point to set mapping Γ(u. .1 a ﬁxed point of Γ(·. . it does not provide a computational method. Then. We say that this problem is implicit since it is deﬁned in terms of its solution u∗ .39). r). QED Remark 3. . . .41).41). . u . r)}. . Hence a coupled equilibrium exists. there would exist a vector ¯ u = (u∗ .
uj ]) + k=1. uj ]) (3. (3. ... . . . u.2 Normalized Equilibria KuhnTucker multipliers Suppose that the coupled constraint (3.. . Assume all players other than Player j use their strategy u∗ ..44) veriﬁes. i ∈ M .4 We say that the equilibrium is normalized if the different multipliers λj for j ∈ M are colinear with a common vector λ0 . λj ) j ∂uj 0 ≤ λj 0 = λjk hk ([u∗−j ..48) where rj > 0 is a given weighting of the players. .50) .p λjk hk ([u∗−j .46) (3.37)(3.p λ0k hk (u). p (3. k = 1. Let R us further assume that the payoff functions ψj (·) as well as the constraint functions hk (·) are continuously differentiable and satisfy the constraint qualiﬁcation conditions so that KuhnTucker multipliers exist for each of the implicit single agent optimization problems deﬁned below. λj ) = ψj ([u∗−j .. × Um → I .4. . this common multiplier λ0 is associated with the implicit mathematical programming problem max θ(u∗ . k = 1. Actually. Under the assumed constraint qualiﬁcation assumption there exists a vector of KuhnTucker multipliers λj = (λjk )k=1. are given concave functions. . uj ]. r) (3. p. ..43) where hk : U1 × . p. Then the i equilibrium conditions (3.4. CONCAVE M PERSON GAMES 47 3. j 0 = (3. .4..45) (3. . . i = j. . namely λj = 1 λ0 rj (3. at the optimum ∂ Lj ([u∗−j .47) Deﬁnition 3. uj ]) + k=1.39) deﬁne a single agent optimization problem with concave objective function and convex compact admissible set.p such that the Lagrangean Lj ([u∗−j .39) can be deﬁned by a set of inequalities hk (u) ≥ 0. . λ0 ) = j∈M rj ψj ([u∗−j . u∗ ].3.49) u∈U to which we associate the Lagrangean L0 (u. u∗ ]) k = 1.
.52) (3.51) (3. . uj ].53) 0 ≤ λ0 0 = λ0k hk (u∗ ) k = 1.. the price decomposition principle is still valid. An economic interpretation The multiplier. j = 1. .p j∈M (3. when they use as payoffs the Lagrangeans Lj ([u∗−j . this result to be useful necessitates uniqueness of the normalized equilibrium associated with a given weighting rj > 0.. In a normalized equilibrium. 3. . p. uniqueness of an optimum results from strict concavity of the objective function to be maximized.54) The common multiplier permits then an “implicit pricing” of the common constraint so that it remains compatible with the equilibrium structure. however. .48 CHAPTER 3. . r) = ∂ r1 ∂u1 ψ1 (u) ∂ r2 ∂u2 ψ2 (u) . uj ]) + 1 rj λ0k hk ([u∗−j . . . .. . . ∂uj k=1. The multiplier permits also a price decentralization in the sense that. In a mathematical programming framework. through an adhoc pricing mechanism the optimizing agent is induced to satisfy the constraints. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES and the ﬁrst order necessary conditions 0 = ∂ {rj ψj (u∗ ) + λ0k hk (u∗ )}.55) . m. . in a mathematical programming framework. with the associated weighting rj > 0. Once the common multiplier has been deﬁned.. ∂ rm ∂um ψm (u) (3. the shadow cost interpretation is not so apparent. (3. can be interpreted as a marginal cost associated with the righthand side of the constraint.. In a game structure. j = 1. . Indeed. .p j = 1.. λj ) = ψj ([u∗−j . More precisely it indicates the sensitivity of the optimum solution to marginal changes in this righthandside.3 Uniqueness of Equilibrium Let us consider the socalled pseudogradient deﬁned as the vector g(u. . . . called by Rosen strict diagonal concavity . . k=1. uniqueness of the equilibrium will result from a more stringent strict concavity requirement. m the coupled constraint will be satisﬁed by equilibrium seeking players. m.4. uj ]).
r) = j=1 rj ψj (u) (3.63) by (u1 − u2 )T and we sum together to obtain an expression β + γ = 0. the following holds (u2 − u1 )T g(u1 . r) be diagonally strictly concave is that the symmetric matrix [G(u.58)(3.60) (3. (3. r) + (u1 − u2 )T g(u2 . then for every r > 0 there exists a unique normalized equilibrium Proof We sketch below the proof given by Rosen [Rosen.61) ) + λ1 )+λ T 1 ui h(u ) 2 ui h(u ) = 0 = 0. with the assumptions insuring existence of K. r)T ] be negative deﬁnite for any u1 in U. r) is diagonally strictly concave on U if. CONCAVE M PERSON GAMES 49 We notice that this expression is composed of the partial gradients of the different payoffs with respect to the decision variables of the corresponding player.57) A sufﬁcient condition that σ(u. r) is diagonally strictly concave on the convex set U. multipliers. such that λ1 h(u1 ) = 0 λ h(u ) = 0 and for which the following holds true for each player j ∈ M rj rj ui ψj (u ui ψj (u 1 2 2T 2 T (3. j∈M T T = (3.5 The function σ(u. 1965]. r) with respect to u.T. We also consider the function m σ(u.4.58) (3. Assume that for some r > 0 we have two equilibria u1 and u2 .61) p γ = j∈M k=1 {λ1 (u2 − u1 )T k T 1 ui hk (u ) T + λ2 (u1 − u2 )T k 2 ui hk (u )} ≥ j∈M {λ1 [h(u2 ) − h(u1 )] + λ2 [h(u1 ) − h(u2 )]} {λ1 h(u2 ) + λ2 h(u1 )} ≥ 0.62) by (u2 − u1 )T and (3. for every u1 et u2 in U.63) 2T We multiply (3. λ2 ≥ 0. due to the concavity of the hk and the conditions (3. Then we must have h(u1 ) ≥ 0 h(u2 ) ≥ 0 and there exist multipliers λ1 ≥ 0. where G(u.64) . r) is the Jacobian of g(u0 . Theorem 3.2 If σ(u.3. where. r) + G(u.56) Deﬁnition 3.4.59) (3. (3. r) > 0.62) (3.4.
The basic idea is to ”project”.66). where g(q∗ ) is the pseudogradient at q∗ with weighting 1. .4 A numerical technique The diagonal strict concavity property that yielded the uniqueness result of theorem 3. (3. .1 the vector q∗ = (q1 . r) this projection) and to ¯ proceed through the usual steepest ascent step u +1 = u + τ g (u .4.5 A variational inequality formulation ∗ ∗ Theorem 3. This yields convergence of the procedure toward the unique equilibrium.4. at each step.4. the pseudo gradient g(u. r) is diagonally strictly concave we have β > 0 which contradicts β+γ = 0.50 and CHAPTER 3.4. .4. . r) on the constraint set U = {u : h(u) ≥ 0} (let’s call g (u .3 Under assumptions 3. 3.66) Proof: Apply ﬁrst order necessary and sufﬁcient optimality conditions for each player and aggregate to obtain (3. qm ) is a NashCournot equilibrium if and only if it satisﬁes the following variational inequality (q − q∗ )T g(q∗ ) ≤ 0. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES β = j∈M rj [(u2 − u1 )T ui ψj (u 1 ) + (u1 − u2 )T ui ψj (u 2 )]. QED Remark 3. ¯ Rosen shows that. .65) Since σ(u.5. r). at each step the step size τ > 0 can be chosen small enough for having a reduction of the norm of the projected gradient.3 The diagonal strict concavity assumption is then equivalent to the property of strong monotonicity of the operator g(q∗ ) in the parlance of variational inequality theory. QED 3. (3.2 also provides an interesting extension of the gradient method for the computation of the equilibrium.
The following assumptions are placed on the model Assumption 3. q2 ) + G(q1 . It represents the competition between different ﬁrms supplying a market for a divisible good. The ﬁrm j faces a cost of production Cj (qj ).. 3.. q2 ) = and the jacobian matrix G(q1 .. . If one assumes that each ﬁrm j chooses a supply level qj ∈ [0. D(Q) + q1 D (Q) − C1 (q1 ) D(Q) + q2 D (Q) − C2 (q2 ) The negative deﬁniteness of the symmetric matrix 1 [G(q1 . hence.m qj is the total supply of goods on the market. COURNOT EQUILIBRIUM 51 3. q2 ) = 2D (Q) + q1 D (Q) − C1 (q1 ) D (Q) + q1 D (Q) D (Q) + q2 D (Q) 2D (Q) + q2 D (Q) − C1 (q2 ) . 1838] is still one of the most frequently used game model in economic theory. There exists an equilibrium. Consider the pseudo gradient (there is no need of weighting (r1 .3. Let us consider the uniqueness issue in the duopoly case. ∞). this model satis¯ ` ﬁes the deﬁnition of a concave game a la Rosen (see exercise 3. qm ) represent the production decision vector of the m ﬁrms together.5. In addition D(0) > 0. r2 ) since the constraints are not coupled) g(q1 .. . qj ].5. (ii) Cj (qj ) is deﬁned for all qj ∈ [0. q2 )T ] = 2 2D (Q) + q1 D (Q) − C1 (q1 ) D (Q) + 1 Q D (Q) 2 2D (Q) + q2 D (Q) − C1 (q2 ) D (Q) + 1 Q D (Q) 2 . letting q = (q1 . (iii) Q D(Q) is bounded and strictly concave for all Q such that D(Q) > 0. It is also twice differentiable. . . twice continuously differentiable.5.1 The static Cournot model Let us ﬁrst of all recall the basic Cournot oligopoly model. convex.1 The market demand and the ﬁrms cost functions satisfy the following (i) The inverse demand function is ﬁnite valued. the proﬁt of ﬁrm j is πj (q) = qj D(Q) − Cj (qj ). with D (Q) < 0 wherever D(Q) > 0. ∞). nonnegative. nonnegative and deﬁned for all Q ∈ [0.7).5 Cournot equilibrium The model proposed in 1838 by [Cournot. The market is characterized by its (inverse) demand law p = D(Q) where p is the market clearing price and Q = j=1. We consider a single market on which m ﬁrms are competing. and C (qj ) > 0.
67) (3. There are now optimization softwares that solve these types of problems. Therefore D (Q) = 1 A 1 (A/Q)1/γ−1 (− 2 ) = − D(Q)/Q. at a Cournot equilibrium.70) The relations (3.72) (3.5. 1993]) has also been adapted to handel a problem of this type and to submit it to an efﬁcient NLCP solver like PATH (see [Ferris & Munson.1 This example comes from the AMPL website http://www. 1995] an extension of the GAMS (see [Brooke et al. either qj = 0 and the gradient gj (q) is ≤ 0. 1997b]) We can rewrite the equilibrium condition as an NLCP in canonical form qj ≥ 0 fj (q) ≥ 0 qj fj (q) = 0. In [Rutherford. The production cost function of ﬁrm j is given by βi ci (qi ) = ci qi + (Li qi )(1+βi )/βi . (3.. More recently the modeling language AMPL (see [Fourer et al. where fj (q) = −D(Q) − qj D (Q) + Cj (qj ).73) (3.. a class of problems that has recently been the object of considerable developments in the ﬁeld of Mathematical Programming (see e.69) .g. 1992]) modeling software to provide an ability to solve this type of complementarity problem is described.2 Formulation of a Cournot equilibrium as a nonlinear complementarity problem We have seen. or qj > 0 and gj (q) = 0. 1994]).71) 1 + βi The market demand function is deﬁned by D(Q) = (A/Q)1/γ . 1997a] and the survey paper [Ferris & Pang. 3. A similar formulation holds for a Cournot equilibrium and it will in general lead to a nonlinear complementarity problem or NLCP.com/ampl/TRYAMPL/ Consider an oligopoly with n = 10 ﬁrms. γ Q γ (3. that a solution could be found through the solution of a linear complementarity problem.673.52 CHAPTER 3.68) (3. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES implies uniqueness of the equilibrium. Example 3.5.69) express that. (3.ampl. the book [Ferris & Pang. when we dealt with Nash equilibria in mmatrix games.
2.D. . var q {i in Rn} >= 0. param c {i in Rn}.dat%%%%%%%%%% data. param beta {i in Rn}. and A. 2959. # P. H. s.673. Soyster.t.5.69) where fj (q) = −D(Q) − qj D (Q) + Cj (qj ) 1 = −(A/Q)1/γ + qi D(Q)/Q + ci + (Li qi )1/βi . %%%%%%%FILE nash. "Accelerating the convergence . COURNOT EQUILIBRIUM The Nash Cournot equilibrium will be the solution of the NLCP (3. Harker. var divQ = (5000/Q)**(1/gamma). 10. param gamma := 1. Sherali.L. Mathematical Programming 24 (1986).divQ . # production vector var Q = sum {i in Rn} q[i]. Mathematical # Programming 41 (1988).3.mod%%%%%%%%%% #==>nash. # References: # F. pp. feas {i in Rn}: q[i] >= 0 complements 0 <= c[i] + (L[i] * q[i])**(1/beta[i]) . set Rn := 1 . "A Mathematical # Programming Approach for Determining Oligopolistic Market # Equilibrium".gms # A noncooperative game example: a Nash equilibrium is sought. 92106. .T.H. Murphy. param L {i in Rn} := 10.q[i] * (1/gamma) * divQ / Q.. param c := . pp.". γ 53 AMPL input %%%%%%%FILE nash.
7 8 1.initpoint} >= 0. param initval {Rn.4 3 4 1 10 1.75 . data.2 2 1 3 .6 5 1.0 7 2 1 10 1.6 1 5 1 10 1. data nash.54 1 5 2 3 3 8 4 5 5 1 6 3 7 7 8 4 9 6 10 3 .2 4 3 1 10 1.5 6 9 1 10 2.. set initpoint := 1 .5 6 1 7 . CHAPTER 3.1 9 .7 3 10 1 10 2.9 4 .9 2 . for {point in initpoint} .mod.3 1 8 1 10 2. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES param beta := 1 1.1 4 7 1 10 2.dat.run%%%%%%%%%% model nash.8 18 6 1 10 2.95 10 .param initval : 1 2 3 4 := 1 1 10 1. 4. %%%%%%%FILE nash.
m qj and a sort of primaldual search technique is used [Murphy. CORRELATED EQUILIBRIA { let{i in Rn} q[i] := initval[i. 3. (b) one exploits the fact that the players are uniquely coupled through the condition that Q = j=1. . while keeping an equilibrium property. i. in the sense of Rosen.1 implies Strict Diagonal Concavity (SDC) .Sherali & Soyster..3 Computing the solution of a classical Cournot model There are many approaches to ﬁnd a solution to a static Cournot game. 1982]. hence uniqueness and convergence of various gradient like algorithms. We have described above the one based on a solution of an NLCP. has proposed a mechanism that permits the players to enter into preplay arrangements so that their strategy choices could be correlated.e. [Aumann. (c) one formulates the NashCournot equilibrium problem as a variational inequality.6 Correlated equilibria Aumann.6. strict monotonicity of the pseudogradient operator..3. He called this type of solution correlated equilibrium.5. We list below a few other alternatives (a) one discretizes the decision space and uses the complementarity algorithm proposed in [Lemke & Howson 1964] to compute a solution to the associated bimatrix game.point]..5. instead of being independently chosen. 1974]. include compchk } 55 3.. Assumption 3. solve.
c2 ) and a mixed strategy equilibrium where each player puts the same probality 0.1) E Figure 3.5.5)dd d dt (5. In Figure 3.1 This example has been initially proposed by Aumann [Aumann. The respective outcomes are shown in Table 3. if the players agree to play (r1 . By deciding to have a coin ﬂip deciding on the equilibrium to play.2: The convex hull of Nash equilibria .2 we have represented these different outcomes. c2 ) if it is “tail” then they expect the outcome 3. c1 ) if the result is “head”.2. The empty circle represents the outcome obtained by agreeing to play according to the coin ﬂip mechanism.5.56 CHAPTER 3.6.3) t (2. 1 0.2 Now.5. 3 which is a convex combination of the two pure equilibrium outcomes. c2 ) : 1. c1 ) and (r2 .1 Example of a game with correlated equlibria Example 3.5 Table 3.5) : 2.5 − 0. c1 ) : 5. 5 r1 r2 This game has two pure strategy equilibria (r1 .6. Consider a simple bimatrix game deﬁned as follows c1 c2 5. a new type of equilibrium has been introduced that yields an outcome located in the convex hull of the set of Nash equilibrium outcomes of the bimatrix game.5 (0. T (1. (r2 . The triangle deﬁned by these three points is the convex hull of the Nash equilibrium outcomes. The three full circles represent the three Nash equilibrium outcomes.2: Outcomes of the three equilibria by jointly observing a “coin ﬂip” and playing (r1 . 0.5) t d d d dd (3.5 on each possible pure strategy. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES 3. 0 4. 4 1.2.5 − 0.1 (r2 . 1974].
The information received by each player is not “public” anymore.3: Extensive game formulation with “Nature” playing ﬁrst it is shared by all the players.3. This is an expanded version of the initial game where.4) U cq 1 p p t c1 (5. (r2 . When Player 1 4 Dotted lines represent information sets of Player 2. CORRELATED EQUILIBRIA 57 It is easy to see that this way to play the game deﬁnes an equilibrium for the extensive game shown in Figure 3. We now push the example one step further by assuming that the players agree to play according to the following mechanism: A random device selects one cell in the game matrix with the probabilities shown in Figure 3.1) ct(1. . c1 ). then each player is told to play the corresponding pure strategy.74.6.3. More precisely.5) 2 I 0. c1 ).5) 2 I p t U q r2 ppp ct(4.5e pp p pp p Head ppp c2 (0.5¡ ppp ¡ c2 rppt t(0. r1 r2 c1 c2 1 0 3 1 3 1 3 (3.0) rppt t 1 wp I q ct(5.74) When a cell is selected.4) 1 p Tail ppp t pp ! ¡ p 0. in the sense that ¡ t e ct(1.0) 1 wp I ¡ ¡ q t e e r2 ppp t(4. “Nature” decides randomly the signal that will be observed by the players4 . the three possible signals are (r1 . in a preliminary stage. (r2 . One can easily check that the correlated equilibrium is a Nash equilibrium for the expanded game.1) t e 1 e Figure 3. c2 ). The result of the ”coin ﬂip” is a “public information”. The trick is that a player is told what to play but is not told what the recommendation to the other player is.
each player expects 1 1 1 1 ×5+ ×1+ ×4=3+ 3 3 3 3 from a game played in this way.5 2 2 so he cannot improve.5.6). If the random device gives an information which is common to all players. So.5. 2 2 if he plays c2 instead he expects 1 1 × 0 + × 5 = 2. When player 1 receives the signal “play r1 ” he knows that with a probability 1 the other player has been told “play c1 ”.4. This is illustrated in Figure 3. Consider now what Player 1 can do. then it permits a mixing of the different pure strategy Nash equilibria and the outcome is in the convex hull of the Nash equilibrium outcomes. All in all. If Player 1 has been told “play r2 ” and if he plays so he expects 1 1 × 4 + × 1 = 2. obeying to the recommendation is the best reply to Player 2 behavior when he himself plays according to the suggestion of the signalling scheme. If he has been told “play c2 ” he expects 5 whereas if he plays c1 instead he expects 4 so he is better off with the suggested play. by expanding the game via the adjunction of a ﬁrst stage where “Nature” plays and gives information to the players. some of the original Nashequilibria. whereas if he plays r2 he expects 4. If the random device gives an information which may be different from one player to the other. for Player 1. If Player 1 has been told “play r1 ” and if he plays so he expects 5. in the outcome space. In the above example we have seen that. So we have checked that an equilibrium property holds for this way of playing the game.5 2 2 so he cannot improve his expected reward. a new class of equilibria can be reached that dominate. 2 2 if he plays r1 instead he expects 1 1 × 5 + × 0 = 2. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES receives the signal “play r2 ” he knows that with a probability 1 the other player has 2 been told “play c1 ” and with a probability 1 the other player has been told “play 2 c2 ”.58 CHAPTER 3. If he has been told “play c1 ” he expects 1 1 × 1 + × 4 = 2. then the correlated equilibrium can have an outcome which lies out of the convex hull of Nash equilibrium outcomes. Auman called it a ”correlated equilibrium”. as indicated by the doted line on Figure 3. Now we can repeat the the veriﬁcation for Player 2. if he assumes that the other player plays according to the recommendation. . Indeed we can now mix these equilibria and still keep the correlated equilibrium property.4 where the black spade shows the expected outcome of this mode of play. also we can construct an expanded game in extensive form for which the correlated equilibrium constructed as above deﬁnes a Nash equilibrium (see Exercise 3.
The second deﬁnition which is valid for mmatrix games is much simpler although equivalent. . L} be the ﬁnite set of the possible modes of play. .2.5) t · d· · · · ··· d (3. ∈E . Nash equilibrium in an expanded game Assume that a game is described in normal form.3. is summarized by the data (E.5)dd·· · d·· ·· dt (5. .6. . The correlation device will propose with probability λ( ) the mode of play . . ˜ on the basis of the information received. . 2. Let E = {1. We shall actually give two deﬁnitions. This will be called the original normal form game. . called the information structure of player j. . {˜j : Hj → Γj }j∈M ) that deﬁnes an expanded γ∗ game. CORRELATED EQUILIBRIA 59 T (1. {λ( )} ∈E .33.33) d ··· ·· dd♠ ·· (3. {λ( )} ∈E . . . All this construct. Assume the players may enter into a phase of preplay communication during which they design a correlation device that will provide randomly a signal called proposed mode of play. Deﬁnition 3. . Player j. ∈E .3. {hj ( ) ∈ Hj }j∈M.6. when the mode of play has been selected. More precisely. .5.1) E Figure 3. The device will then give the different players some information about the proposed mode of play. {hj ( ) ∈ Hj }j∈M. with m players j = 1. receives an information denoted hj ( ) ∈ Hj . . γj . their respective strategy sets Γj and payoffs Vj (γ1 .3) t · (2.2 A general deﬁnition of correlated equilibria Let us give a general deﬁnition of a correlated equilibrium in an mplayer normal form game. γm ). . . m. Now.4: The dominating correlated equilibrium 3.6. we associate with each player j a meta strategy denoted γj : Hj → Γj that determines a strategy for the original normal form game. . {˜j : Hj → Γj }j∈M ) γ∗ . The ﬁrst one describes the construct of an expanded game with a random device distributing some preplay information to the players. let Hj be a class of subsets of E.1 The data (E.
i. γM −j (hM −j ( ))] ≥ γ∗ ˜∗ ∈E λ( )Vj ([˜j (hj ( )). 3.. corresponding to the two possible . In this section we look at a particular class of games with incomplete information and we explore the class of socalled Bayesian equilibria. their payoff functions. which is much simpler Deﬁnition 3. by playing γj (hj ( )) instead of γj (hj ( )) ˜ ˜∗ when he receives the signal hj ( ) ∈ Hj .. in a twoplayer game. for every player j and any mapping δj : Sj → Sj the following holds π(s)Vj ([sj . . sM −j ]) ≥ s∈S s∈S π(s)Vj ([δ(sj ).e. × Sm such that. sM −j ]). Player 2 may not know exactly what the payoff function of Player 1 is.76) 3. We deﬁne below two matrix games. γ ˜∗ (3. some players do not know exactly what are the characteristics of other players. Example 3.e.2 A correlated equilibrium is a probability distribution π(s) over the set of pure strategies S = S1 × S2 . their strategy sets) etc. ∈E λ( )Vj ([˜j (hj ( )).7. γM −j (hM −j ( ))]. We were dealing with games of complete information. called θ1 and θ2 respectively. .75) An equivalent deﬁnition for mmatrix games In the case of an mmatrix game.60 CHAPTER 3. if no player can improve his expected payoff by changing unilaterally his meta strategy.1 Consider the case where Player 1 could be of two types.6. For example.7 Bayesian equilibrium with incomplete information Up to now we have considered only games where each player knows everything concerning the rules. the players types (i.7. (3. the deﬁnition given above can be replaced with the following one.e.1 Example of a game with unknown type for a player In a game of incomplete information. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES deﬁnes a correlated equilibrium of the original normal form game if it is a Nash equilibrium for the expanded game i.
0 r2 2. The probability distribution of each type. −1 3. played by Nature which decides randomly the type θi of Player 1. r2 ) by Player 1 when he implements a mixed strategy. 3.3. knowing that he is of type θi . 2 of Player 1 by solving max p1 (x1 bθ + (1 − x1 )bθ ) + p2 (x2 bθ + (1 − x2 )bθ ).2 Reformulation as a game with imperfect information Harsanyi. 0 Game 1 θ2 c1 c2 r1 1. then the bimatrix game 1 is played. the action chosen by Player 1. if Player 1 is of type θ2 . c2 ) by Player 2 when he implements a mixed strategy. 1− xi ). then the bimatrix game 2 is played.7.5. indeed in that game of simultaneous moves.2 5 6 1 1 2 2 The dotted line in Figure 3. whereas Player 2 does not observe the type neither. Call xi (resp. For the example 3. given here in terms of prior probabilities. 196768]. max aθ y + aθ (1 − y) i1 i2 i=1. BAYESIAN EQUILIBRIUM WITH INCOMPLETE INFORMATION types respectively: θ1 c1 c2 r1 0. We can deﬁne the optimal response of Player 2 to the pair of mixed strategy (xi . The information structure5 in the associated extensive game shown in Figure 3. denoted p1 and p2 respectively. 0 Game 2 61 If Player 1 is of type θ1 .5 represents the information set of Player 2. 1 3. represents the beliefs of Player 2.2 2 2 if the type is θ2 . −1 2. 1 − y) the probability of choosing c1 (resp.2 if the type is θ1 . has proposed a transformation of a game of incomplete information into a game with imperfect information. 1 3. We can deﬁne the optimal response of Player 1 to the mixed strategy (y. the problem is that Player 2 does not know the type of Player 1.7. We call aθ and bθ the payoffs of Player 1 and Player 2 respectively when the type is θ . Call y (resp.1 this transformation introduces a preliminary chance move. One assumes that Player 1 knows also these beliefs.7. 1j 2j 1j 2j j=1. indicates that Player 1 knows his type when deciding. 0 r2 2. in [Harsanyi. 1 − xi ) the probability of choosing r1 (resp. 1 − y) of Player 2 by solving6 1 1 max aθ y + aθ (1 − y) i1 i2 i=1. ij ij .5. the prior probabilities are thus common knowledge. i = 1.5. about facing a player of type θ1 or θ2 .
whatever y. if he is of type θ1 xhooses always action r2 i.5: Extensive game formulation of the game of incomplete information Let us rewrite these conditions with the data of the game illustrated in Figure 3.1) cq 1 ¡ t pp e pp ct(3.5. x1 = 0.5.e. We draw in Figure 3.1) 1 p θ2 ppp t pp ! ¡ pp ¡ pp p2¡ c2 rppt t(3. Consider now the optimal reply of Player 2. whereas.0) 2 I e pp t t r2 pp cq (2. he will always choose r2 .62 CHAPTER 3. if Player 1’s type is θ1 . We observe that. 1 1 We conclude that Player 2 chooses action c1 if x2 < 2(1−p1 ) .0) 2 rppt t 1 w I p q ct(0.5y + 3.5. 1 − x). 2y + 3(1 − y)}. So the best reply conditions for Player 2 can be written as follows max{(p1 (−1) + (1 − p1 )(x2 (−1) + (1 − x2 )1)). action c2 if x2 > 2(1−p1 ) 1 and any mixed action with y ∈ [0.0) 2 I t pp U q r2 p ct(2.5.5. the best reply of Player 1 could be any mixed strategy (x.1) 1 Figure 3. We know that Player 1. if his type is θ2 he will choose r1 if y is mall enough and switch to r2 when y > 0. 2y + 3(1 − y)} θ2 ⇒ max{1.6 the lines corresponding to these comparisons between two linear functions. We conclude easily from these observations that the equilibria of this game are characterized as follows: . For y = 0.1) e U 1 p1e pp e p pp t e p θ ppp 1 p c (2. 1] if x2 = 2(1−p1 ) .5(1 − y). First consider the reaction function of Player 1 θ1 ⇒ max{0y + 2(1 − y). 0)}. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES ct(3. (p1 (0) + (1 − p1 )(x2 (0) + (1 − x2 )0))} which boils down to max{(1 − 2(1 − p1 )x).0) 1 w I ¡ pp ¡ pp t(1.
y 2(1−p1 ) = 0. 1 − y) x1 ≡ 0 if p1 ≤ 0. θM −j ]) p(θM −j θj ) = . We assume that all the marginal probabilities are nonzero pj (θj ) > 0. The outcome is then deﬁned by the payoff functions (3. Let Θj be the ﬁnite set of types for Player j. . BAYESIAN EQUILIBRIUM WITH INCOMPLETE INFORMATION 63 3 2 pp pp p pp p p p p p p p p p p p p p p p p pp p p p pp prp p p p p p p p p p p p p p p pp p p p r2 rr pp rr pp rr pp y rprE r T pp pp pp pp 3. according to the prior probability distribution p(·). . Payer j has the same set of pure strategies Sj .5 x2 = 0. Playerj can observe only his own type θj ∈ Θj .7.θm ) ∈ Θ = Θ1 × . . .5 prp p p p p p ppp p p p p p p p ppp p p p p 3 rr pp pp rp p p 2 p p p p p p p p prprppppp p p r p pp p p 1. (3. Let M be a set of m players.3. 1 .5 x2 = 0. . then the normal form of the game is speciﬁed by the payoff functions uj (θ.5 p p p p p p p p ppp p p p p p prprp p r2 p pp pp pp r r1 pp pp y p p E T 0 1 r1 0 0. × Θm is a type speciﬁcation for every player. R j ∈ M.3 A general deﬁnition of Bayesian equilibria We can generalize the analysis performed on the previous example and introduce the following deﬁnitions.77). ·) : S1 × .5 θ2 1 θ1 Figure 3.77) A prior probability distribution p(θ1 . . When Player j observes that he is of type θj ∈ Θj he can construct his revised conditional probality distribution on the types θM −j of the other players through the Bayes formula p([θj .5 3.6: Optimal reaction of Player 1 to Player 2 mixed strategy (y. If θ = (θ1 . × Sm → I . Each player j ∈ M may be of different possible types. y = 1. ·. (3. . .7. . y = 0 or x2 = if p1 > 0. y = 1 or x2 = 1. .78) pj (θj ) We can now introduce an expanded game where Nature draws randomly. Then each player j ∈ M picks a strategy in his own strategy set Sj . a type vector θ ∈ Θ for all players. . . ∀j ∈ M. . . Whatever his type. . .θm ) on Θ is given as common knowledge. .
θM −j ]. . [sj .2. 7 i. . . γm (θm )).80) It is easy to see that.80) lead to the following conditions ∗ γj (θj ) = argmaxsj ∈Sj θM −j ∈ΘM −j ∗ p(θM −j θj ) uj ([θj . Therefore the expanded game is an mmatrix game and. k = 1.7.1 Let Φ : A → 2A be a point to set upper semicontinuous mapping. . . a Bayesian equilibrium is a Nash equilibrium in the “ expanded game ” in which each player pure strategy γj is a map from Θj to Sj .. . 2.1 In a game of incomplete information with m players having respective type sets Θj and pure strategy sets Sj . . Then there exists a ﬁxedpoint for Φ. ∗ ∗ As usual..64 CHAPTER 3. since each pj (θj ) is positive.8.(3. . The expanded game can be described in its normal form as follows. .79) uj ([θj .. γM −j ) ∀γj ∈ Γj .. is such that y ∈ Φ(x0 ). γm ) the payoff to player j is given by Vj (γ) = θj ∈Θj pj (θj ) θM −j ∈ΘM −j p(θM −j θj ) (3. . . We say that this R mapping is upper semicontinuous if. .8. where R 0 yk ∈ Φ(xk ). That is. (3. . . . γj (θj ).. a closed and bounded subset. .81) Remark 3.8 Appendix on Kakutani Fixedpoint theorem Rn Deﬁnition 3.e. . Associated with a strategy proﬁle γ = (γ1 . .1 Since the sets Θj and Sj are ﬁnite. . . . Each player j ∈ M has a strategy set Γj where a strategy is deﬁned as a mapping γj : Θj → Sj . .. the equilibrium conditions (3.2. Theorem 3. 3. a Nash equilibrium is a strategy proﬁle γ ∗ = (γ1 . by Nash theorem there exists at least one mixed strategy equilibrium. γ1 (θ1 ). i = 1. R ∗ ∗ ∗ there is x ∈ Φ(x ) for some x ∈ A. . θM −j ]. in I n . γM −j (θM −j )]) j ∈ M. γm ) ∈ Γ = Γ1 × . m. .. converges in R I m toward x0 then any accumulation point y 0 of the sequence {yk }k=1. where A is a compact subset7 of I m . whenever the sequence {xk }k=1. .1 Let Φ : I m → 2I be a point to set mapping. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES Deﬁnition 3. × Γm such that ∗ Vj (γ ∗ ) ≥ Vj (γj . the set Γj of mappings from Θj to Sj is also ﬁnite.7. .
3.9. EXERCISES
65
3.9
exercises
Exercise 3.1: Consider the matrix game.
3 1 8 4 10 0
Assume that the players are in the simultaneous move information structure. Assume that the players try to guess and counterguess the optimal behavior of the opponent in order to determine their optimal strategy choice. Show that this leads to an unstable process.
Exercise 3.2: trix game.
Find the value and the saddle point mixedstrategies for the above ma
Exercise 3.3: Deﬁne the quadratic programming problem that will ﬁnd a Nash equilibrium for the bimatrix game
(52, 50)∗ (44, 44) (44, 41) (42, 42) (46, 49)∗ (39, 43)
.
Verify that the entries marked with a ∗ correspond to a solution of the associated quadratic programming problem.
Exercise 3.4: formulation.
Do the same as above but using now the complementarity problem
Exercise 3.5: Consider the two player game where the strategy sets are the intervals U1 = [0, 100] and U2 = [0, 100] respectively, and the payoffs ψ1 (u) = 25u1 +15u1 u2 − 4u2 and ψ2 (u) = 100u2 − 50u1 − u1 u2 − 2u2 respectively. Deﬁne the best reply 1 2 mapping. Find an equilibrium point. Is it unique?
66
CHAPTER 3. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES
Exercise 3.6: Consider the two player game where the strategy sets are the intervals U1 = [10, 20] and U2 = [0, 15] respectively, and the payoffs ψ1 (u) = 40u1 + 5u1 u2 − 2u2 and ψ2 (u) = 50u2 − 3u1 u2 − 2u2 respectively. Deﬁne the best reply mapping. 1 2 Find an equilibrium point. Is it unique?
Exercise 3.7: Consider an oligopoly model. Show that the assumptions 3.5.1 deﬁne a concave game in the sense of Rosen.
Exercise 3.6: In example 3.7.1 a correlated equilibrium has been constructed for the game c1 c2 r1 5, 1 0, 0 . r2 4, 4 1, 5 Find the associated extensive form game for which the proposed correlated equilibrium corresponds to a Nash equilibrium.
Part II Repeated and sequential Games
67
.
one can easily see that. In summary. at each stage. The situation becomes more and more complex as the number of stages increases. there is always an inﬁnite number of remaining stages over which the game will be played and. since strategies transform this information into actions. We may also consider games that are played over an inﬁnite number of stages. therefore. Indeed. These inﬁnitely repeated games are particularly interesting. As time unfolds the information at the disposal of the players changes and. If a game is repeated twice. We will show how the class of equilibria in repeated games can be expended very much if one 69 . since the players can base their decisions on histories represented by sequences of actions and outcomes observed over increasing numbers of stages.Chapter 4 Repeated games and memory strategies In this chapter we begin our analysis of “dynamic games”. the important concept to understand here is that. the class of equilibria when the number of periods over which the game is repeated (the horizon ) becomes inﬁnite. the global “repeated game” becomes a fully dynamic system with a much more complex strategic structure than the onestage game. i.e. To say it in an imprecise but intuitive way. even if it is the “same game” which is repeated over a number of periods. We shall explore. the evaluation of strategies will have to be based on the comparison of inﬁnite streams of rewards and we shall explore different ways to do that. the players’ strategic choices are affected. the players can decide on the basis of the outcome of the ﬁrst stage. at the second stage. the same game is played in two successive stages and a reward is obtained at each stage. the dynamic structure of a game expresses a change over time of the conditions under which the game is played. in particular. since. In this chapter we shall consider repeated bimatrix games and repeated concave mplayer games. In a repeated game this structural change over time is due to the accumulation of information about the “history” of the game. there will not be the socalled “end of horizon effect” where the fact that the game is ending up affects the players behavior.
but with a game in normal form deﬁned at each period of time.70 CHAPTER 4.1 Repeating a game in normal form Repeated games have been also called games in semiextensive form in [Friedman 1977]. the extensive form of those repeated games is quite complex. In the onestage game. As the players may recall the past history of the game. a total payoff that implies an inﬁnite sum may not converge to a ﬁnite value. where the time ﬂows continuously. This is the case for the class of Markov or sequential games and a fortiori for the class of differential games. at a given stage. 4. The same matrix game is played repeatedly. Notice that. as in the extensive form.1 describes a repeated bimatrix game structure. In particular. the players actions have no direct inﬂuence on the normal form description of the games that will be played in the forthcoming periods. There are several ways of dealing with the comparison of inﬁnite streams of payoffs. when the number of periods to tend to ∞.1.1 Repeated bimatrix games Figure 4. The class of memory strategies allows the players to incorporate threats in their strategic choices. In order to be effective a threat must be credible. they have the opportunity to implement a class of strategies that incorporate threats. These dynamic games will be discussed in subsequent chapters of this volume. The payoff associated with a repeated game is usually deﬁned as the sum of the payoffs obtained at each period. over a number T of periods or stages that represent the passing of time. Indeed the repetition of the same game in normal form permits the players to adapt their strategies to the observed history of the game. REPEATED GAMES AND MEMORY STRATEGIES allows memory in the information structure. on the basis of the knowledge accumulated by the players on the ”history” of the game. We shall explore the credibility issue and show how the subgame perfectness property of equilibria introduces a reﬁnement criterion that make these threats credible. In the repeated game. 4. . The payoffs are accumulated over time. the global strategy becomes a way to deﬁne the choice of a stage strategy at each period. A threat is meant to deter the opponents to enter into a detrimental behavior. ) is given j by (αk )j=1. A more complex dynamic structure would obtain if one assumes that the players actions at one stage inﬂuence the type of game to be played in the next stage. However. We shall discover some of them in the subsequent examples. where Player 1 has p pure strategies and Player 2 q pure strategies and the onestage payoff pair associated with the startegy pair (k.2 . Indeed this corresponds to an explicit dynamic structure.
which shows that we can build an inﬁnitely repeated bimatrix game. . . T − 1) the actions sequence over the sequence of T periods. um (t)) ∈ U = U1 × U2 × · · · × Um is chosen at period t by the m players. 4.1. . . . . .2 1 E 1 . . played with the help of ﬁnite automata. . Uj . . This function is assumed to be concave in uj . At each time period t the players enter into a noncooperative game deﬁned by (M . j · · · (αpq )j=1.2 ··· q j · · · (α1q )j=1. in fact. . . 1. called the “folk theorem”. We shall use this structure to prove an interesting result. . . p . . .4.1. j (αp1 )j=1. .2 j (α11 )j=1.e. . the possibility to use the history of the game as a source of information to adapt the strategic choice at each stage makes a repeated game a fully dynamic “object”. . Even if the same game seems to be played at each period (it is similar to the others in its normal form description). Let t ∈ {0.2 ··· q j · · · (α1q )j=1. . ψj (·).2 .2 E ··· Figure 4.1: A repeated game The game is deﬁned in a semiextensive form since at each period a normal form (i. . p . . where the class of equilibria permits the approximation of any individually rational outcome of the onestage bimatrix game. . REPEATING A GAME IN NORMAL FORM 71 1 1 . . j · · · (αpq )j=1. . . (4. . . The total payoff of player j over the T horizon is then deﬁned by VjT (˜ ) u T −1 = t=0 ψj (u(t)). . ˜ We shall denote u = (u(t) : t = 0. j (αp1 )j=1.2 j (α11 )j=1. j ∈ M ) where as indicated in section 3. M = {1.1) . .2 . . T − 1} be the time index. an already aggregated description) deﬁnes the consequences of the strategic choices in a onestage game. m} is the set of players.2 Repeated concave games ` Consider now the class of dynamic games where one repeats a concave game a la Rosen. . 1 . Uj is a compact convex set describing the actions available to Player j at each period and ψj (u(t)) is the payoff to player j when the action vector u(t) = (u1 (t).4. . 2.
2. [Radner. The openloop information structure actually eliminates almost every aspect of the dynamic structure of the repeated game context. [Friedman 1986]). each player j can adjust his choice of onestage action uj to the history of the game. . . A closedloop strategy for Player j is deﬁned as a sequence of mappings τ γj : h(τ ) → Uj . . 1 . . This permits the consideration of memory strategies where some threats are included in the announced strategies. a player can declare that some particular histories would “trigger” a retaliation from his part. . For example. . 1980]. Inﬁnite horizon games If the game is repeated over an inﬁnite number of periods then payoffs are represented by inﬁnite streams of rewards ψj (u(t)). . . . an appropriate ﬁltering the actions of the other players. Closedloop information structure At period t. Due to the repetition over time of the same game. 1. each player j knows not only the current time t and what he has played in periods τ = 0. τ − 1).72 CHAPTER 4. . 1. The description of a trigger strategy will therefore include • a nominal “mood” of play that contributes to the expected or desired outcomes • a retaliation “mood” of play that is used as a threat • the set of histories that trigger a switch from the nominal mood of play to the retaliation mood of play . . .g. t − 1. . Many authors have studied equilibria obtained in repeated games through the use of trigger strategies (see for example [Radner. He does not observe what the other players do. . t − 1. each player j knows only the current time t and what he has played in periods τ = 0. We call history of the game at period τ the sequence h(τ ) = (u(t) : t = 0. t = 1. We implicitly assume that the players cannot use the rewards they receive at each period to infer through e. ∞. but also what the other players have done at previous period. . . REPEATED GAMES AND MEMORY STRATEGIES Openloop information structure At period t. 1981].
1) that the sequence u∗ = (u(t) ≡ u∗ : t = 0.2) where βj ∈ [0. u u β (4. by Rosen’s existence theorem. j ∈ M ) ˜ admits an equilibrium u∗ . REPEATING A GAME IN NORMAL FORM 73 Discounted sums: We have assumed that the actions sets Uj are compact.5) In inﬁnite horizon repeated games each player has always enough time to retaliate . Therefore. that a concave ”static game” (M . (4. . . we can compare strategies on the basis of the discounted sums of the rewards they generate for each player. the onestage rewards ψj (u(t)) are uniformly bounded. for well behaved games. therefore. Average rewards per period: If the players do not discount time the comparison of strategies imply the consideration of inﬁnite streams of rewards that sum up to inﬁnity. Therefore the set of strategies with the memory information structure is much more encompassing than the openloop one. since the functions ψj (·) are continuous. Existence of a “trivial” dynamic equilibrium We know. the discounted payoffs ∞ Vj (˜ ) = u t=0 t βj ψj (u(t)). Uj . the following limiting property holds true βj →1 lim gj j (˜ ) = gj (˜ ). Hence. based on the use of threats constitute a very large class. It should be clear (see exercise 4. . (4. T − 1) is also an equilibrium for the repeated game in both the openloop and closedloop (memory) information structure. We shall see that the equilibria. 1) is the discount factor of Player j. (4.3) T →∞ T t=0 To link this evaluation of inﬁnite streams of rewards with the previous one based on discounted sums one may deﬁne the equivalent discounted constant reward gj j (˜ ) = (1 − βj ) u β ∞ t=0 t βj ψj (u(t)). .4) One expects that. ψj (·). if the players discount time. He will always have the possibility to implement an announced threat.4.1. are well deﬁned. 1 . A way to circumvent the difﬁculty is to consider a limit average reward per period in the following way 1 T −1 gj (˜ ) = lim inf u ψj (u(t)). if the opponents are not playing according to some nominal sequence corresponding to an agreement.
REPEATED GAMES AND MEMORY STRATEGIES Assume the one period game is such that a unique equilibrium exists. In our paradigm of a repeated game played by automata.1 Repeated games played by automata A ﬁnite automaton is a logical system that. The strategies will be based on memory. To describe a ﬁnite automaton one needs the following • A list of all possible stored input conﬁgurations. the input element for the automaton a of Player 1 is the stagegame action of Player 2 in the preceding stage. In this way. in order to decide the next action. so there is a question of designing strategies with the help of mechanisms that would exploit the history of the game but with a ﬁnite information storage capacity. Indeed the inﬁnite horizon could give raise to an inﬁnite storage of information. So in our repeated game context. and symmetrically for Player 2. • A transition function tells the automaton how to move from one state to another after it has received a new input element. with outcomes being Pareto efﬁcient. as we shall see it shortly. The openloop repeated game will also have a unique equilibrium. when stimulated by an input. 4.2 Folk theorem In this section we present the celebrated Folk theorem. 4. supported by threats. each player remembering the past moves of game. This name is due to the fact that there is no well deﬁned author of the ﬁrst version of it.74 CHAPTER 4. One element of this list is the initial state. whereas the closedloop repeated game will still have a plethora of equilibria. Finite automata provide such systems. each player can choose a way to process the information he receives in the course of the repeated plays. • An output function determines the action taken by the automaton. The idea is that an inﬁnitely repeated game should permit the players to design equilibria. given its current state. this list is ﬁnite and each element is called a state. The automaton is ﬁnite if the length of the logical input (a stream of 0 − 1 bits) it can store is ﬁnite. a ﬁnite automaton will not permit a player to determine his stagegame action upon an inﬁnite memory of what has happened before.2. . generates an output. For example an automaton could be used by a player to determine the stagegame action he takes given the information he has received.
We call G the game deﬁned by the strategy sets A and B and the payoff functions (4. v). vn ) j = 1.6) is also the limite average reward per period due to the cycling behavior of the two automata.4.2 Minimax point Assume Player 1 wants to threaten Player 2.6) where N is the length of the cycle associated with the pair of automata (a. 2. v) ∈ U × V has been selected by the two players. FOLK THEOREM 75 An important aspect of ﬁnite automata is that. The game is repeated indeﬁnitely. Denote by gj (u. 4. ¯ ¯ ¯ R . we shall restrict the strategy choices of the players to the class of ﬁnite automata. b) and (un . Although the set of strategies for an inﬁnitely repeated game is enormously rich. a bimatrix game. when they are used to play a repeated game. U and V are the sets of pure strategies in the game G for Player 1 and 2 respectively. n=1 (4. Symmetrically a pure strategy for Player 2 is an automaton b ∈ B which has for input the actions v ∈ V of Player 1. A pure strategy for Player 1 is an automaton a ∈ A which has for input the actions u ∈ U of Player 2. v).2. Let G be a ﬁnite two player game. vn ) is the action pair at the nth stage in this cycle. An effective threat would be to deﬁne his action in a onestage game through the solution of m2 = min max g2 (u.2. b) = ˜ 1 N N gj (un . ¯ u v Similarly if Player 2 wants to threaten Player 2.e. an effective threat would be to deﬁne his action in a onestage game through the solution of m1 = min max g1 (u. v) the payoff to Player j in the onestage game when the pure strategy pair (u. ¯ v u We call minimax point the pair m = (m1 . m2 ) in I 2 . We associate with the pair of ﬁnite automata (a. i.6). b) ∈ A × B a pair of average payoffs per stage deﬁned as follows gj (a. which deﬁnes the onestage game. they necessarily generate cycles in the successive states and hence in the actions taken in the successive stagegames. One notices that the expression (4.
such that. v2 ) during n2 stages. The set of these g is dense in P.76 CHAPTER 4. b∗ ) = ˜ 1 N K K nk gj (uk . . gK be the pairs of payoffs that appear in the bimatrix game when it is written in the vector form. vk ) = k=1 k=1 ∗ qk gj (uk . hence qk = with nk . v1 ) during n1 stages. REPEATED GAMES AND MEMORY STRATEGIES 4. . Therefore these two automata will achieve the payoff pair g. that is gk = (g1 (uk .1 Let Pm be the set of outcomes in a matrix game that dominate the min¯ imax point m. etc. Then the convex combination g∗ = q1 g1 + q2 g2 . . + n K = N since the qk ’s must sum to 1. vk ). Now we reﬁne the structure of these automata as follows: • Let u(n) and v (n) be the pure strategy that Player 1 and Player 2 respectively ˜ ˜ are supposed to play at stage n.2. they play (u1 . It is possible to write the K fractions with a common denominator. . We construct two automata a∗ and b∗ . . together. . A pure strategy in the game G is a deterministic choice of an automaton. vk ) = gj . qK be nonnegative rational numbers that sum to 1. . . 1 . The limit average per period reward of Player j in the G game played by these automata is given by gj (a∗ . ¯ Proof: Let g1 . . (u2 . q2 . this automaton will process information coming from the repeated use of mixed strategies in the repeated game. They complete the sequence by playing (uK . g2 . vk )) where (uk . according to the cycle deﬁned above.2. . . . vK ) during nK stages and the N stage cycle begins again. . Then the outcomes corresponding to Nash equilibria in pure strategies1 ¯ of the game G are dense in Pm . that is a fraction.3 Set of outcomes dominating the minimax point Theorem 4. Each qk ¯ ¯ being a rational number can be written as a ratio of two integers. vk ) is a possible pure strategy pair. Let q1 . . N n1 + n 2 + . qK gK ∗ ∗ is an element of P if g1 ≥ m1 and g2 ≥ m2 . g2 (uk .
b∗ ). the strategy that is minimaxing ˜ ¯ Player 1’s one shot payoff and this strategy will be played forever. b∗ ).e. as a consequence. ˜ ¯ ˜ Q. the ¯ limit average per period reward obtained by Player 1 is g1 (a. 4.4. b) ≤ m2 ≤ g2 (a∗ . ∞ be the sequence of time periods. otherwise it plays u. . The class of equilibria is more restricted if one imposes a condition that the threats used by the players in their announced strategies have to be credible. . Indeed. q(t). COLLUSIVE EQUILIBRIUM IN A REPEATED COURNOT GAME 77 • The automaton a∗ plays u(n + 1) at stage n + 1 if automaton b has played v (n) ˜ ˜ at stage n. Let t = 1. the automaton b∗ plays v (n + 1) at stage n + 1 if automaton a ˜ has played u(n) at stage n. . This is the issue discussed in one of the following sections.3. Let q(t) ∈ I m be the production vector R chosen by the m ﬁrms at period t and ψj (q(t)) the payoff to player j at period t. because it has always been in the ”folklore” of game theory without knowing to whom it should be attributed. g ∈ Pm . . Over the inﬁnite time horizon the rewards of the players are deﬁned as ˜ q Vj (˜ ) = ∞ t=0 t βj ψj (q(t)) . except a ﬁnite number. the strategy that is minimaxing Player 2’s one ¯ shot payoff and this strategy will be played forever.1 This result is known as the Folk theorem.E. . the inﬁnite sequence of production decisions of the m ﬁrms.D. We denote ˜ q = q(1). otherwise it plays v . . . b∗ ) ≤ m1 ≤ g1 (a∗ .3 Collusive equilibrium in a repeated Cournot game Consider a Cournot oligopoly game repeated over an inﬁnite sequence of periods. . It should be clear that the two automata a∗ and b∗ deﬁned as above constitute a Nash equilibrium for the repeated game G whenever g is dominating m. i. . ¯ ¯ if Player 1 changes unilaterally his automaton to a then the automaton b∗ will select for all periods. the minimax action v and. a unilateal change to b ∈ B leads to a payoff g2 (a∗ . . • Symmetrically. . ˜ ¯ ˜ Similarly for Player 2.2. Remark 4.
. j = 1.10) Proof: If the players play according to (4. c (i) the Cournot equilibrium. 1 − βj . We say that this Pareto outcome is a collusive outcome dominating the Cournot equilibrium.9) This equilibrium is reached through the use of a socalled trigger strategy deﬁned as follows if otherwise ∗ (q(0).. .. . .. . q∗ ... .78 CHAPTER 4.m and ∗ (ii) a Pareto outcome resulting from the production levels q∗ = (qj )j=1. . .. q(1). q(1). j = 1.8) Φj (q∗ ) = max ψj ([q∗(−j) . q(t − 1)) = (q∗ . qj The following has been shown in [Friedman 1977] Lemma 4..7) (4.3. A Cournot equilibrium for the repeated game is deﬁned by the strategy consisting c for each Player j to choose repeatedly the production levels qj (t) ≡ qj .m and such ∗ c that ψj (q) ≥ ψj (q ). . q∗ ) then qj (t) = qj c qj (t) = qj . .10) the payoff they obtain is given by ˜ Vj∗ = ∞ t=0 t βj ψj (q∗ ) = 1 ψj (q∗ ). We consider. Φj (q∗ ) − ψj (qc ) (4. . q(t − 1)). . REPEATED GAMES AND MEMORY STRATEGIES where βj ∈ [0. 1) is a discount factor used by Player j.. two possible outcomes. Let us deﬁne ˜ Vj∗ = ∞ t=0 t βj ψj (q∗ ) (4. 2. m. qj ]). However there are many other equilibria that can be obtained through the use of memory strategies. . (4. We assume that at each period t all players have the same information h(t) which is the history of passed productions h(t) = (q(0). .1 There exists an equilibrium for the repeated Cournot game that yields ˜ the payoffs Vj∗ if the following inequality holds βj ≥ Φj (q∗ ) − ψj (q∗ ) . . for the onestage game. . supposed to be uniquely deﬁned by qc = (qj )j=1.
E.1 According to this strategy the q∗ production levels correspond to a cooperative mood of play while the qc correspond to a punitive mood of play. He knows that his deviation will be detected at period 1 and that. and.3. Therefore the players will always choose to play in accordance with the Pareto solution q∗ and thus the equilibrium deﬁnes a nondominated outcome. 4. from which we get ψj (q∗ ) − Φj (q∗ ) − βj (ψj (qc ) − Φj (q∗ )) ≥ 0. As the punitive mood of play consists to play the Cournot solution which is indeed an equilibrium. 1 − βj This unilateral deviation is not proﬁtable if the following inequality holds ψj (q∗ ) − Φj (q∗ ) + which can be rewritten as (1 − βj )(ψj (q∗ ) − Φj (q∗ )) + βj (ψj (q∗ ) − ψj (qc )) ≥ 0.4.3. ˜ q Vj (˜ ) = Φj (q∗ ) + βj ψj (qc ). it is a credible threat.3.D. It is important to notice the difference between the threats used in the folk theorem and the ones used in this repeated Cournot game. We say that the memory (trigger strategy) equilibrium is subgame perfect in the sense of Selten [Selten. thereafter the Cournot equilibrium production level qc will be played. So. Φj (q∗ ) − ψj (qc ) Q. 1 − βj The same reasonning holds if the deviation occurs at any other period t. Remark 4. If the same game is repeated only a ﬁnite number of times then.9) βj ≥ ψj (q∗ ) − Φj (q∗ ) . in a subgame perfect equilibrium. βj (ψj (q∗ ) − ψj (qc )) ≥ 0. a single game equilibrium will be repeated at each period. the best he can expect from this deviation is the following payoff which combines the maximal reward he can get in period 0 when the ∗ other ﬁrms play qk with the discounted value of an inﬁnite stream of Cournot outcomes. 1975]. The punitive mood is a threat. ﬁnally the condition (4.1 Finite vs inﬁnite horizon There is an important difference between ﬁnitely and inﬁnitely repeated games. Here the threats are constituting themselves an equilibrium for the dynamic game. COLLUSIVE EQUILIBRIUM IN A REPEATED COURNOT GAME 79 Assume Player j decides to deviate unilaterally at period 0. This is due to the .
corrupted by the random noise. based on the Cournot model with imperfect information. . REPEATED GAMES AND MEMORY STRATEGIES impossibility to retaliate at the last period.i. θ(t))xj (t) − cj (xj (t))]. . . based on the use of trigger strategies.2 A repeated stochastic Cournot game with discounting and imperfect information We give now an example of stochastic repeated game for wich the construction of equilibria based on the use of threats is not trivial. it is clear that the threat gives raise to the Cournot equilibrium payoff in the long run. let us consider a repeated Cournot game. where β < 1 is the discount factor. which are dominating the repeated CournotNash equilibrium. . . Only the sequence of past prices. Then this situation is translated backward in time until the initial period. These strategies use as a threat a switch to the noncooperative solution for all remainig periods. (see [Green & Porter. . 1985]. 1985]. This is the subgame perfect version of the Folk theorem [Friedman 1986].80 CHAPTER 4.d2 random variables affecting these prices. Since the long term average only depends on the inﬁnitely repeated one stage rewards. . j = 1. in a desire to maximize its total expected discounted proﬁt ∞ Vj (x1 (·). Each ﬁrm j chooses its supplies xj (t) ≥ 0. This construct uses the concept of Markov game and will be further discussed in Part 3 2 Independent and identically distributed. as soon as a player has been observed not to play according to the cooperative solution. xj (·). [Porter. This impossibility to detect without error a breach of cooperation increases considerably the difﬁculty of building equilibria based on the use of credible threats. . . xm (·)) = Eθ t=0 β t [D(X(t). This is the stochastic repeated game. Let X(t) = m xj (t) be the total supply of m ﬁrms at time t and θ(t) be a j=1 sequence of i.3. [Porter. In [Green & Porter. . When the game is repeated over an inﬁnite time horizon there is always enough time to retaliate and possibly resume cooperation. 1983] it is shown that there exist subgame perfect equilibria. 1983]). is available. Therefore the players cannot monitor without error the past actions of their opponent(s). . without discounting where the players’ payoffs are determined by the long term average of the one sage rewards. . If the players have perfect information with delay they can observe without errors the moves selected by their opponents in previous stages. The assumed information structure does not allow each player to observe the opponent actions used in the past. to any cooperative solution dominating a static Nash equilibrium corresponds a subgame perfect sequential equilibrium for the repeated game. For example. . In that case it is easy to show that. 4. m. based on the use of trigger strategies. There might be a delay of several stages before this observation is made.
4. Prove that the repeated onestage equilibrium is an equilibrium for the repeated game with additive payoffs and a ﬁnite number of periods. Adapt Lemma 4.9) holds if βj = 1. EXERCISES 81 of these Notes.1. Extend the result to inﬁnitely repeated games.4 Exercises Exercise 5.2. . 4. It will be shown that these equilibria are in fact closely related to the socalled correlated and/or communication equilibria discussed at the end of Part 2.3.1 to the case where the game is evaluated through the long term average reward criterion. Show that condition (4.4. Exercise 5.
REPEATED GAMES AND MEMORY STRATEGIES .82 CHAPTER 4.
in a zerosum game framework. . 2. Let Uj = {1. u). be the ﬁnite action sets of two players. s ∈ S. u ∈ U 1 × U2 . As the transition probabilities depend on the players actions we speak of a controlled Markov chain. in [Shapley. u]. Let S = {1. .s (u) = 1. . deﬁnes the gain of player 1 when the process is in state s and the two players take the action pair u. The structure of this game can also be described as a controlled Markov chain with two competing agents. j = 1. .s (u) ≥ 0 ps. n} be the set of possible states of a discrete time stochastic process {x(t) : t = 0. . since the game is assumed to be zerosum. for all u ∈ U1 × U2 ps. s ∈S s. 2. s ∈ S.Chapter 5 Shapley’s Zero Sum Markov Game 5. u). which satisfy. u ∈ U1 × U2 . .s (u) = P[x(t + 1) = s x(t) = s. . A transition reward function r(s. 83 . 1. 1953]. Player 2’s reward is given by −r(s. . 2.}. s ∈ S. . λj }. . .1 Process and rewards dynamics The concept of Markov game has been introduced. The process dynamics is described by the transition probabilities ps.
· (u) ¨ B ¨ ¨ ¨ E · E ¨r r rr j ps. their actions.· (u) r · '$ D ! ¡ 2d ¡ d 1 ¡ d ps.2.· (u) ¨ · B ¨ d ¨ ¨ d E ¨ rr E · u2 r 2 j r ps.2 Information structure and strategies 5.· (u) ¨ · u1 B ¨ ¡ d ¨ ¨ E ¡ d E ¨ · rr 2 ¡ u2 r ¡ j r ps.1 The extensive form of the game The game deﬁned above corresponds to a game in extensive form with an inﬁnite number of moves. We assume an information structure where the players choose sequentially and simultaneously.1: A Markov game in extensive form . This is illustrated in ﬁgure 5.· (u) r · ¡ ¡ s E P1 ¡ e e e ps.84 CHAPTER 5. with perfect recall.1 ps.· (u) r · 2 e u1 e e P2 d d &% d ps.· (u) r · u1 2 · Figure 5. SHAPLEY’S ZERO SUM MARKOV GAME 5.· (u) ¨ · B ¨ 1 e ¨ u2 ¨ E e · E ¨r r e rr e j ps.
2 Strategies A strategy is a device which transforms the information into action. . γ1 . When the two players have chosen their respective strategies the state process becomes a Markov chain with transition probabilities Pγ1 .2. for all s ∈ S ∗ ∗ ∗ ∗ V (s. it remains to deﬁne the payoffs associated with the admissible strategies of the twp players.5. . γ2 ) = Eγ1 . λ2 the probwhere we have denoted µk1 . In order to formulate the normal form of the game. λ1 . ability distributions induced on U1 and U2 by γ1 (s) and γ2 (s) respectively. γ2 ) ≤ V (s.2. Markov strategies: These strategies are also called feedback strategies.s λ1 λ2 = k=1 =1 µk1 ν γ (s) γ2 (s) ps. INFORMATION STRUCTURE AND STRATEGIES 85 5. ∞ λ1 λ2 γ (s) V (s0 . γ1 . γ1 . .1) The number ∗ ∗ v ∗ (s) = V (s. the general form of a strategy can be quite complex. i. γ2 ). γ2 ) is called the value of the game at state s. 1 γ (s) = 1. γ2 ) ≤ V (s. uk .γ2 [ t=0 βt k=1 =1 µk1 γ (x(t)) γ2 (x(t)) ν r(x(t). is deﬁned as the discounted sum over the inﬁnite time horizon of the expected transitions rewards. . for all strategies γ1 and γ2 of Players 1 and 2 respecively. Since the players may recall the past observations. γ1 .γ2 s. .s (uk .1 A pair of strategies (γ1 . . (5. 1) be a given discount factor. when the game starts in state s0 . 2. u2 ). 1 Player 2’s payoff is equal to −V (s0 . . Let β ∈ [0. ∗ ∗ Deﬁnition 5. Player 1’s payoff. Since the strategies are based only on the information conveyed by the current state of the xprocess. where P (Uj ) is the class of probability distributions over the action set Uj .e. k = 1. The assumed information structure allows the use of strategies deﬁned as mappings γj : S → P (Uj ). γ1 . u2 )x(0) = s0 ]. j = 1. γ1 . they are called Markov strategies. γ2 ) is a saddle point if. . . γ2 ).2. and ν 2 .
Denote the value of each of these games by T (v(·). the zerosum matrix game with pure strategies U1 and U1 and with payoffs H(v(·). (u1 . taking our inspiration from [Denardo. . 1997]. s. u2 )] 1 k = 1. the information on which a player can base his/her decision at period t is the whole statehistory h(t) = {x(0). Notice that we don’t assume that the players have a direct access to the actions used in the past by their opponent. u2 ) + β s ∈S ps. Since we assume that S is ﬁnite this is also a vector. . u2 ) = r(s. . 1967] and [Filar & Vrieze. it can be shown that optimality does not require more than the use of Markov strategies. for any s ∈ S the socalled local reward functions h(v(·). s) = [h(v(·). . s. The solution of the dynamic programming equations for the stochastic game is obtained as a ﬁxed point of an operator acting on the space of value functions. Introduce.86 CHAPTER 5. in this section the powerful formalism of dynamic progaming operators that has been formally introduced in [Denardo. . Let v(·) = (V (s) : s ∈ S) be an arbitrary function with real values. 1967] but was already implicit in Shapley’s work. However. x(t)}. . deﬁned over S. uk . . . u1 . s) := val[H(v(·). SHAPLEY’S ZERO SUM MARKOV GAME Memory strategies: Since the game is of perfect recall.3. λ1 = 1.3 Shapley’sDenardo operator formalism We introduce. This R R mapping is also called the Dynamic Programming operator of the Markov game.s (u1 . This deﬁnes a mapping T : I n → I n . u1 . λ2 . 5. 5. x(1). We give here a brief account of the method. 1953] the existence of optimal stationary Markov strategies has been established via a ﬁxed point argument. .1 Dynamic programming operators In a seminal paper [Shapley. s)] and let T(v(·)) = (T (v(·). for each s ∈ S. . s) : s ∈ S). as in the case of a singleplayer Markov decision processes. u2 ) ∈ U1 × U2 . involving a contracting operator. they can only observe the state history. Deﬁne now. u2 ) v(s ). . . . .
=) V (s.2. =) h(v(·). γ1 (s). (5.3).1 the mapping T is contracting in the “max”norm v(·) = max v(s) s∈S and admits the value function introduced in Deﬁnition 5. . u2 ) + β s ∈S ps. k. u2 ) v(s ) .4) Proof: The proof is relatively straightforward and consists in iterating the inequality (5.2) The proof of this lemma is left as exercise 6.3. γ1 . Lemma 5.s (γ1 (s). u1 . γ2 ). − bk. We use lemma 5. QED We can now establish the following result Theorem 5. γ1 and γ2 are such that. Furthermore the optimal (saddlepoint) strategies are deﬁned as the mixed strategies yielding this value. γ2 s(s)) = r(s.3.e.3.s (u1 .5) Proof: We ﬁrst establish the contraction property.5. γs (s)) + β ps.2 Existence of sequential saddle points Lemma 5. (5. γ1 (s). SHAPLEY’SDENARDO OPERATOR FORMALISM 87 5.3. s. s ∈ S.3) then v(s) ≤ (resp. i. γ2 (s)) = val[H(v ∗ (·). ∗ ∗ h(v ∗ (·). for all s ∈ S v(s) ≤ (resp. u2 ∈U2 max r(s. (5.1 as its unique ﬁxedpoint v ∗ (·) = T(v ∗ (·)).2 If v(·). s. γ2 s(s)) v(s ) s ∈S (5.1. γ1 (s).1 and the transition probability properties to establish the following inequalities T(v(·)) − T(w(·)) ≤ max s∈S u1 ∈U1 .1 If A and B are two matrices of same dimensions. s)].3. ≥ resp. ≥ resp.3. then val[A] − val[B] ≤ max ak.
9) Similarly we would obtain that for any strategy γ1 and for all s ∈ S ∗ V (s. even if the existence theorem can also be translated into a converging algorithm. Hence T is a contraction. 1960].s (u1 . γ2 for wich the saddle point condition holds ∗ ∗ ∗ ∗ V (s. (5. γ2 (s) be the saddle point strategies for the local matrix game with payoffs h(v ∗ (·). By the Banach contraction theorem this implies that there exists a unique ﬁxed point v ∗ (·) to the operator T. γ2 ) = v ∗ (s).10) and ∗ ∗ V (s. γ1 .g. u2 ) v(·) − w(·) = β v(·) − w(·) . u2 )u1 ∈U1 . γ2 ).3.s (u1 . γ1 .u2 ∈U2 ≥ v ∗ (s) ∀s ∈ S.u2 ∈U2 = H(v(·). γ1 . (5. 1991] for a recently proposed converging algorithm). s. s. .8) implies that for all s ∈ S ∗ V (s. (5.u1 ∈U1 . u2 ) w(s ) = ≤ ≤ s∈S. γ2 ) ≤ V (s. ∗ ∗ We now show that there exist stationary Markov strategies γ1 .s (u1 .88 CHAPTER 5. In the twoplayer zerosum case. γ2 (s))u1 ∈U1 . In the single player case (MDP or Markov Decision Process) the contraction and monotonicity of the dynamic programming operator yield readily a converging numerical algorithm for the computation of the optimal strategies [Howard. u2 ∈U2 s∈S. u2 ∈U2 s∈S. u1 . some difﬁculties arise (see e. u2 ∈U2 max β s ∈S ps. u1 . γ2 ) ≥ v ∗ (s). (5.u1 ∈U1 . Consider any strategy γ2 for Player 2.u1 ∈U1 .7) (5. s). [Filar & Tolwinski. u2 ) (v(s ) − w(s )) ps.s (u1 . γ1 (s).8) By Lemma 5.11) QED This establishes the saddle point property. since 0 ≤ β < 1. u2 ) − β s ∈S ps. γ2 ) ≤ v ∗ (s). by deﬁnition ∗ h(v ∗ (·). (5. SHAPLEY’S ZERO SUM MARKOV GAME −r(s. γ2 ) ≤ V (s. γ1 . γ1 . Then. γ1 .2 the inequality (5.6) ∗ ∗ Let γ1 (s). u2 ) v(s ) − w(s ) max β s ∈S max β s ∈S ps.
Uj (s) = {1. . u ∈ U1 (s) × . 2. . u). the set of possible states. u ∈ U1 (s) × . . . .Chapter 6 Nonzerosum Markov and Sequential games In this chapter we consider the possible extension of the Shapley Markov game formalism to nonzerosum games. the ﬁnite action sets at state s of m players and the transition probabilities ps.1 Markov game dynamics Introduce S = {1.1 Sequential Game with Discrete state and action sets We consider a stochastic game played noncooperatively by players that are not in purely antagonistic sutuation. Deﬁne the transition reward of player j when the process is in state s and the players take the action pair u by rj (s. × Um (s). . 2. . s ∈ S. s. λj (s)}. . . n}. . . . 89 . . Shapley’s formalism extends without difﬁculty to a nonzerosum setting. 6. u]. 6. m. We shall consider two steps in these extensions: (i) use the same ﬁnite state and ﬁnite action formalism as in Shapley’s and introduce different reward functions for the different players. j = 1. . .s (u) = P[x(t + 1) = s x(t) = s.1. s ∈ S. . (ii) introduce a more general framework where the state and actions can be continuous variables. × Um (s).
.2 Markov strategies γ (x(t)) Markov strategies are deﬁned as in the zerosum case. uk . um )x(0) = s0 ]. . .1 An mtuple of Markov strategies (γ1 . . 6. 6. . The number ∗ ∗ ∗ ∗ vj (s) = Vj (s. . .1. . γm ) = Eγ1 . . 1980]. . . γj . . γ1 . NONZEROSUM MARKOV AND SEQUENTIAL GAMES Remark 6. γj .90 CHAPTER 6.1) . .. . . A more recent treatment of nonzerosum sequential games can be found in [Whitt. . Shapley’s theory remains valid in that case too. γm ) ≤ Vj (s.1. γj . ..1 Notice that we permit the action sets of the different players to depend on the current state s of the game. i.4 SobelWhitt operator formalism The ﬁrst author to extend Shapley’s work to a nonzerosum framework has been Sobel [?]. . . γm ) for all strategies γj of player j.3 FeedbackNash equilibrium Deﬁne Markov strategies as above.γm [ t=0 βt k=1 . γ1 .e. . . . vj (·). . . 6. as mappings from S into the players mixed actions. u) = rj (s. We introduce the socalled local reward functions hj (s. . We denote µkj the probability given to action uk by player j when he uses strategy γj and the current state is j x. =1 µk1 γ (x(t)) ···µ γm (x(t)) rj (x(t). j ∈ M .. . ..1.1. γm ) is a feedbackNash equilibrium if for all s ∈ S ∗ ∗ ∗ ∗ ∗ Vj (s. . . . γ1 . . . . . γ1 . . γm ) will be called the equilibrium value of the game at state s for Player j. Player j’s payoff is thus deﬁned as ∞ λ1 λm Vj (s0 . j∈M (6. 1 ∗ ∗ Deﬁnition 6.1... . u) + β s ∈S pss (u)vj (s ).
6. γj (6.3) where u is the random action vector and γ (j) is the Markov policy obtained when only ˜ Player j adjusts his/her policy. The local reward (6. SEQUENTIAL GAME WITH DISCRETE STATE AND ACTION SETS 91 where the functions vj (·) : S → I are given rewardtogo functionals (in this case R they are vectors of dimension n = card(S)) deﬁned for each player j.3 deﬁnes the optimal reply of each player j to the Markov strategies chosen by the other players. u]}j∈M . j ∈ M . leads to the following powerful results (see [Whitt. 6. vj (·). We now deﬁne. Remark 6.e.5 Existence of Nashequilibria The dynamic programming formalism introduced above. In other words Eq. (6.4) 4. for any given Markov policy vector γ = {γj }j∈M . vj (·). The stationary Markov policy γ ∗ is an equilibrium strategy iff f γ (s) = vγ ∗ (s).1 Consider the sequential game deﬁned above. ˜ We also introduce the operator Fγ deﬁned as ˜ (Fγ v(·))(s) = {sup Eγ (j) (s) [hj (s.1) deﬁne a matrix game over the pure strategy sets Uj (s).1. an operator Hγ . the local rewards (6. 3.1) is the sum of the transition reward for player j and the discounted expected rewardtogo from the new state reached after the transition. then 1. ndimensional vectors) and deﬁned as (Hγ v(·))(s) = {Eγ(s) [hj (s.2 In the nonzerosum case the existence theorem is not based on a contraction property of the “equilibrium” dynamic programming operator. 2. 1980] for a recent proof) Theorem 6.1. the existence result does not translate easily into a converging algorithm for the numerical solution of these games (see [?]). There exists an equilibrium deﬁned by a stationary Markov policy. . ∗ ∀s ∈ S.2) (6. The expected payoff vector associated with a stationary Markov policy γ is given by the unique ﬁxed point vγ (·) of the contracting operator Hγ . u]}j∈M .6. As a consequence. The operator Fγ is also contracting and thus admits a unique ﬁxedpoint f γ (·). For a given s and a given set of rewardtogo functionals vj (·). acting on the space of rewardtogo functionals.1. while the other players keep their γpolicies ﬁxed.1. (i. j ∈ M .
6. It is assumed that Q(·. j∈M (6. Q(·. Γj (·). This functions are assumed to be continuous on U. ·) is a product measurable transition probability from S × U to S. For each s. for every s ∈ S.β. β ∈ (0. Γj (s) represents the set of admissible actions for player j. rj (·. A stationary Markov strategy for player j is a measurable map γj (·) from S into the set P (Uj ) of probability measure on Uj such that γj (s) ∈ P (Γj (s)) for every s ∈ S. it becomes hj (s. Uj . ·) : S × U → I is a bounded measurable transition reward function for R player j.92 CHAPTER 6. u) + β S vj (t)Q(dts. 3. 199?] for a more precise statement. (S.2. rj (·. where: 1. 4.2 Sequential Games on Borel Spaces The theory of noncooperative markov games has been extended by several authors to the case where the state and the actions are in continuous sets. 1) is the discount factor.1 Description of the game An mplayer sequential game is deﬁned by the following objects: (S. Q(·. Since we are dealing with stochastic processes. Uj is a compact metric space of actions for player j. 6. the apparatus of measure theory is now essential.5) . u) = rj (s. u). 2.1 in the discrete state case has to be adapted to the continuous state format. ·). Σ). ·). We refer the reader to [Nowak. ·) satisﬁes some regularity conditions which are too technical to be given here. NONZEROSUM MARKOV AND SEQUENTIAL GAMES 6. Σ) is a measurable state space with a countably generated σalgebra of Σ subsets of S.2 Dynamic programming formalism The deﬁnition of local reward functions given in 6. Γj (·) is a lower measurable map from S into nonempty compact subsets of Uj .2. 5. 6. vj (·).
6. APPLICATION TO A STOCHASTIC DUOPOLOY MODEL 93 The operators Hγ and Fγ are deﬁned as above. 1985]. The existence of equilibria was obtained only for special cases 1 6.1 Application to a Stochastic Duopoloy Model A stochastic repeated duopoly Consider the stochastic duopoly model deﬁned by the following linear demand equation x(t + 1) = α − ρ[u1 (t) + u2 (t)] + ε(t) which determines the price x(t + 1) of a good at period t + 1 given the total supply u1 (t) + u2 (t) decided at the end of period t by Players 1 and 2. (6. 1980]. then over an inﬁnite time horizon. 2. . 2 see [Nowak. [Nowak. 1993]. therefore an obvious equilibrium solution consists to play repeatedly the (static) Cournot solution uc (t) = j which generates the payoffs Vjc = (α − δ)2 9ρ(1 − β) j = 1. The proﬁts. [Parthasarathy Shina. the existence of εequilibria is proved. This game is repeated.3. 1987]. The existence of equilibria is difﬁcult to establish for this general class of sequential games. using an approximation theory in dynamic programming. In [Whitt. Assume a unit production cost equal to γ.6) A symmetric Pareto (nondominated) solution is given by the repeated actions α−δ . 3ρ j = 1.3. at the end of period t by both players (ﬁrms) are then determined as πj (t) = (x(t + 1) − δ)uj (t).7) α−δ . 4ρ uP (t) = j 1 j = 1. Assume that the two ﬁrms have the same discount rate β. 1989].3 6. [Nowak. the payoff to Player j will be given by ∞ Vj = t=0 β t πj (t). 2 (6.
(6. an indication that at least one player is not respecting the agreement and is maybe trying to take advantage over the other player. We propose below another scheme. denoted v which is used to monitor a cooperative policy that all players have agreed to play and which is deﬁned as φ : v → uj = φ(v). The basic idea consists to extend the state space by introducing a new state variable. The state equation governing the evolution of this state variable is designed as follows v(t + 1) = max{−K. Since it is assumed that the actions of players are not directly observable.94 CHAPTER 6. i. where xe is the expected outcome if both players use the cooperative policy. v(t) + xe (t + 1) − x(t + 1)}. The parameter −K deﬁnes a lower bound for v. xe (t + 1) = α − 2ρφ(v(t)). 2. 6.e. there is a need to proceed to some ﬁltering of the sequence of observed states in order to monitor the possible breaches of agreement. It should be clear that the new state variable v provides a cumulative measure of the positive discrepancies between the expected prices xe and the realized ones x(t). The Pareto outcome dominates the Cournot equilibrium but it does not represent an equilibrium.3. using a multistep memory. 1985] a dominating memory strategy equilibrium is constructed.8) . NONZEROSUM MARKOV AND SEQUENTIAL GAMES and the associated payoffs VjP = where δ = α − γ. that yields an outcome which lies closer to the Pareto solution.e. In [Green & Porter. The question is the following: is it possible to construct a pair of memory strategies which would deﬁne an equilibrium with an outcome dominating the repeated Cournot strategy outcome and which would be as close as possible to the Pareto nondominated solution? (α − δ) 8ρ(1 − β) j = 1. based on a onestep memory scheme.2 A class of trigger strategies based on a monitoring device The random perturbations affecting the price mechanism do not permit a direct extension of the approach described in the deterministic context. This is introduced to prevent a compensation of positive discrepancies by negative ones. i. A positive discrepancy could be an indication of oversupply.
i. denoted y.e. This new state variable will be an indicator of the prevailing mood of play. where we have denoted v = max{−K. not to provide a temptation to cheat to any player. This agreement policy is deﬁned by the function φ(v). if y = 0. y ∈ {0.3. both players use an agreed upon policy which determines their respective controls as a function of the state variable v.e. v + ρ(u − φ(v)) − ε} (6. For this agreement to be stable. when y = 0. j = 1. this state equation tells us that. i. This generates the expected payoffs Vjc . when y = 1.9) where the positive valued function θ : v → θ(v) is a design parameter of this monitoring scheme. The dynamic programming equation characterizing an equilibrium is given below W (v) = max{[α − δ − ρ(φ(v) + u)]u u +βP[v ≥ θ(v)]V c +βP[v < θ(v)]E[W (v )v < θ(v)]}. v(t + 1) < θ(v(t)) (6.10) . then y(t ) ≡ 0 for all periods t > t.6. APPLICATION TO A STOCHASTIC DUOPOLOY MODEL 95 If these discrepancies accumulate too fast. If y = 1 then the game is played cooperatively. then the the game is played in a noncooperative manner.e. i. To model the retaliation process we introduce another state variable.e. According to this state equation. both players use as a punishment (or retaliation) the static Cournot solution forever. the evidence of cheating is mounting and thus some retaliation should be expected. i. This state variable is assumed to evolve according to the following state equation y(t + 1) = 1 if y(t) = 1 and 0 otherwise. the cooperative mood of play will be maintained provided the cumulative positive discrepancies do not increase too fast from one period to the next. 1}. Note that the game is now a sequential Markov game with a continuous state space.7). Also. The expected payoff is then a function W (v) of this state variable v. 2 deﬁned in Eq. interpreted as a punitive or retaliatory mood of play. i. which is a binary variable. (6. once y(t) = 0. When the mood of play is cooperative. When the mood of play is noncooperative. one imposes that it be an equilibrium. In the models discussed later on we shall relax this assumption of everlasting punishment.e a punitive mood of play lasts forever. Since the two players are identical we shall not use the subscript j anymore.
x(t) This is the case where one does not monitor the cooperative policy through the use of a cumulated discrepancy function but rather on the basis of repeated identical tests. We have also used the following notation s = v + ρ(u − φ(v)). this problem has studied in the case where the (inverse) demand law is subject to a multiplicative noise. 1983].96 CHAPTER 6. φ(v)). (6. This is added to the conditional expected payoffs after the transition to either the punishment mood of play. To summarize the approach we introduce the operator (Tφ W )(v. In [Haurie & Tolwinski. using this approach. 1990] show that one can deﬁne. a subgame perfect equilibrium which dominates the repeated Cournot solution. NONZEROSUM MARKOV AND SEQUENTIAL GAMES the random value of the state variable v after the transition generated by the controls (u.10) we recognize the immediate reward [α −δ −ρ(φ(v) +u)]u of Player 1 when he plays u while the opponent sticks to φ(v).13) W (v) = (Tφ W )(v. The numerical experiments reported in [Haurie & Tolwinski. 1990]. u) = [α − δ − ρ(u + φ(v))] u + β(α − δ)2 +βW (−K)[1 − F (s − K)] θ(v) F (s − θ(v)) 9ρ(1 − β) (6. .12) (6. corresponding to the values y = 0 or the cooperative mood of play corresponding to y = 1. 1990] corresponds to the use of a quadratic function θ(·) and a Gaussian distribution law for ε. Also in Porter’s approach the punishment period is ﬁnite. Then one obtains an existence proof for a dominating equilibrium based on a simple onestep memory scheme where the variable v satisﬁes the following equation v(t + 1) = xe − x(t + 1) . φ(·)) such that W (v) = max Tφ (W )(v. as indicated in [Haurie & Tolwinski. In Eq. The case treated in [Haurie & Tolwinski. An equilibrium solution is a pair of functions (w(·).11) +β −K W (τ )f (s − τ ) dτ where F (·) and f (·) are the cumulative distribution function and the density probability function respectively of the random disturbance ε. φ(v)). 1960] permits the computation of the solution of this sort of ﬁxedpoint problem. 1990] it is shown how an adapatation of the Howard policy improvement algorithm [Howard. A solution of these DP equations can be found by solving an associated ﬁxed point problem. u) u (6. In [Porter.
e. APPLICATION TO A STOCHASTIC DUOPOLOY MODEL 97 In [Haurie & Tolwinski.3 Interpretation as a communication device In our approach. we retained a Markov game formalism for an extended game and this has permitted us to use dynamic programming for the characterization of subgame perfect equilibria. An easy extension of the approach described above would lead to random transitions between the two moods of play. introducing the new variables v and y).3.6. with transition probabilities depending on the monitoring statistic v.e. The monitoring scheme is a communication device which receives as input the observation of the state of the system and sends as an output a public signal which is suggesting to play according to two different moods of play. .3. 6. i. A simple model of Fisheries management was used in that work to illustrate this type of sequential game cooperative equilibrium. This is of course reminiscent of the concept of communication device considered in [Forges 1986]for repeated games and discussed in Part 1. Also a punishment of random duration is possible in this model. a sequential game rather than a repeated game. In the next section we illustrate these features when we propose a differential game model with random moods of play. by extending the state space description (i. 1990] it is shown that the approach could be extended to a full ﬂedged Markov game.
NONZEROSUM MARKOV AND SEQUENTIAL GAMES .98 CHAPTER 6.
Part III Differential games 99 .
.
1) (7. The evolution over time of the ˙ dt accumulated capital is then described by a differential equation.Chapter 7 Controlled dynamical systems 7. R 101 . In the parlance of dynamical systems x(t) is the state variable. the capital stock will remain nonnegative (∈ I + ).2) Now. It is easy to check that since the investment rates are never negative. tf ] → u(t) ∈ I + . A capital accumulation path over a time interval [t0 . or more simply its capital. This corresponds to the control variable for the dynamical systems. (7. Let us denote x(t) the derivative over time dx(t) .2). At time t we denote by x(t) ∈ I + the amount of capital R acccumulated by the ﬁrm. called the state equation of the dynamical system x(t) = u(t) − µ x(t) ˙ x(t0 ) = x0 . and R + 0 the initial capital stock x ∈ I being given.1) with initial data (7. In the parlance of dynamical R systems this will be also called a state trajectory. tf ] → x(t) ∈ I + . once the ﬁrm has chosen an investment schedule u(·) : [t0 . We also assume that the capital depreciates (wears out) at a ﬁxed rate of µ.1 A capital accumulation process A ﬁrm is producing a good with some equipment that we call its physical capital. with initial data x0 . tf ] is therefore deﬁned as the graph of the function x(·) : [t0 . there exists a unique capital accumulation R path that is determined as the unique solution of the differential equation (7. Denote by u(t) ∈ I + R the investment realised at time t. The ﬁrm accumulates capital through the investment process.
102
CHAPTER 7. CONTROLLED DYNAMICAL SYSTEMS
7.2 State equations for controlled dynamical systems
In a more general setting a controlled dynamical system is described by the the state equation x(t) = f (x(t), u(t), t) ˙ x(t0 ) = x0 , u(t) ∈ U ⊂ I m R (7.3) (7.4) (7.5)
R where x ∈ I n is the state variable, u ∈ I m is the control variable constrained to R m n remain in the set U ⊂ I , f (·, ·, ·) : I × I m × I → I n is the velocity function and R R R R R x0 ∈ I n is the initial state. Under some regularity conditions that we shall describe R shortly, if the controller chooses a control function u(·) : [t0 , tf ] → u(t) ∈ U , it results a unique solution x(·) : [t0 , tf ] → u(t) ∈ I n to the differential equation (7.3), with R initial condition (7.4). This solution x(·) is called the state trajectory generated by the control u(·) from the initial state x0 . Indeed one must impose some regularity conditions for assuring that the fundamental theorem of existence and uniqueness of the solution to a system of differential equations can be invoked. We give below the simplest set of conditions. There are more general ones that will be needed later on, but for the moment we shall content ourselves with the following
7.2.1
Regularity conditions
1. The velovity function f (x, u, t) is C 1 in x and continuous in u and t. 2. The control function u(·) is piecewise continuous.
7.2.2
The case of stationary systems
If the velocity function does not depend explicitly on t we say that the system is stationary. The capital accumulation process of section 7.1 is an example of a stationary system.
7.3. FEEDBACK CONTROL AND THE STABILITY ISSUE
103
7.2.3
The case of linear systems
A linear stationary system is described by a linear state equation x(t) = Ax(t) + Bu(t) ˙ x(t0 ) = x0 , where A : n × n and B : n × m are given matrices. For linear systems we can give the explicit form of the solution to Eqs. (7.6,7.7). x(t) = Φ(t, t0 )x0 + Φ(t, t0 )−1
t t0
(7.6) (7.7)
Φ(s, t0 )Bu(s)ds,
(7.8)
where Φ(t, t0 ) is the n × n transfer matrix that veriﬁes d Φ(t, t0 ) = AΦ(t, t0 ) dt Φ(t0 , t0 ) = In . (7.9) (7.10)
Indeed, we can easily check in Eq. (7.8) that x(t0 ) = x0 . Furthermore, if one differentiates Eq. (7.8) over time one obtains x(t) = AΦ(t, t0 )x0 + AΦ(t, t0 )−1 ˙ +Φ(t, t0 )−1 Φ(t, t0 )Bu(t) = Ax(t) + Bu(t).
t t0
Φ(s, t0 )Bu(s)ds (7.11) (7.12)
Therefore the function deﬁned by (7.8) satisﬁes the differential equation (7.6) with initial condition (7.7).
7.3
Feedback control and the stability issue
An interesting way to implement a control on a dynamical system described by the general state equation (7.3,7.4) is to deﬁne the control variable u(t) as a function σ(t, x(t)) of the current time t and state x(t). We then say that the control is deﬁned by the feedback law σ(·, ·) : I × I n → I m , or more directly that one uses the feedback control R R R σ. The associated trajectory will then be the solution to the differential equation x(t) = Fσ (t, x(t)) ˙ x(t0 ) = x0 , where we use the notation Fσ (t, x(t)) = f (x(t), σ(t, x(t)), t). We see that, if Fσ (·, ·) is regular enough, the feedback control σ will determine uniquely an associated trajectory emanating from x0 . (7.13) (7.14)
104
CHAPTER 7. CONTROLLED DYNAMICAL SYSTEMS
7.3.1
Feedback control of stationary linear systems
Consider the linear system deﬁned by the state equation (7.6,7.7) and suppose that one uses a linear feedback law u(t) = K(t) x(t), (7.15) where the matrix K(t) : m × n is called the feedback gain at time t. The controlled system is now described by the dufferential equation x(t) = (A + BK(t)) x(t) ˙ x(t0 ) = x0 . (7.16) (7.17)
7.3.2
stabilizing a linear system with a feedback control
When a stationary system is observed on an inﬁnite time horizon it may be natural to use a constant feedback gain K. Stability is an important qualitative property of dynamical systems, observed over an inﬁnite time horizon, that is when tf → ∞. Deﬁnition 7.3.1 The dynamical system (7.16,7.17) is asymptotically stable if
t→∞
lim x(t) = 0;
(7.18)
globally asymptotically stable if (7.18) holds true for any initial condition x0 .
7.4 Optimal control problems 7.5 A model of optimal capital accumulation
Consider the system described in section 7.1, concerning a ﬁrm which accumulates a productive capital. The state equation describing this controlled accumulation process is (7.1,7.2 x(t) = u(t) − µ x(t) ˙ x(t0 ) = x0 , where x(t) and u(t) are the capital stock and the investment rate at time t, respectively. Assume that the output y(t), produced by the ﬁrm at time t is given by a production
21) (7. This problem can be formulated as follows tf x(·). in this formulation we consider that the optimization is performed over the set of admissible control functions and state trajectories that must satisfy the constraints given by the state equation.26) .25) that is to be maximised through the appropriate choice of an admissible control function. tf ] which is given by the integral payoff J(x0 . that is the trajectory x(·). The search for the optimal investment schedule can now be formulated as follows tf x(·).22) (7.24) s. u(t). can be either consumed. The optimisation problem (7. at a rate c(t). tf ].217.t.2 with a criterion function J(t .7. x(·). the initial state and the nonnegativity of the investment rates.u(·) max t0 [φ(x(t)) − u(t)] dt (7. The owner wants to maximise the total proﬁt accumulated over the time horizon [t0 . u(·)) = 0 0 tf t0 L(x(t). u(·)) = = tf t0 tf t0 c(t) dt [φ(x(t)) − u(t)] dt. THE OPTIMAL CONTROL PARADIGM 105 function y(t) = φ(x(t)). or invested. x(·).19) Notice that we permit negative values for c(t).u(·) max t0 L(x(t). which will determine the capital accumulation path.24) is an instance of the generic optimal control problem that we describe now. Thus one must have at any instant t c(t) + u(t) = y(t) = φ(x(t)). This would correspond to an operation of borrowing to invest more. (7. tf ]. t) dt (7. t) dt (7. x(·).20) We have used the notation J(x0 . Notice that.23) (7. u(t). that is distributed as a proﬁt to the owner of the ﬁrm. 7.6 The optimal control paradigm We consider the dynamical system introduced in section 7. (7. u(·)) to emphasize the fact that the payoff depends on the initial capital stock and the investment schedule. considered as an homogenous good. at a rate u(t). Assume now that this output. at any time t ∈ [t0 . x . x(t) ˙ x(t0 ) u(t) t = = ≥ ∈ u(t) − µ x(t) x0 0 [t0 .6.
t) are both R C 1 in x and u. Deﬁnition 7. x(t) ˙ x(t0 ) u(t) t = = ∈ ∈ f (x(t). u. u∗ (·)) of the integral performance criterion is given by [Lx (x∗ (t). t0 δJ(x0 . u.7.106 CHAPTER 7. u∗ (t). CONTROLLED DYNAMICAL SYSTEMS s. (7. We introduce an auxiliary function λ(·) : [t0 . (7.30. u∗ (·)) = tf Now we use a ”trick”. t) δx(t) + Lu (x∗ (t).31). R where λ(t) is of the same dimension as the state variable x(t) and will be called the costate variable. then the differential at (x∗ (·).7. t) δu(t) + .29) 7.27) (7. Under these more stringrnt conditions we can derive necessary conditions for an optimal control by using some simple arguments from the calculus of variations. u(t).7 The Euler equations and the Maximum principle We assume here that U = I m and that the functions f (x.32) In order to express that we only consider variations δx(·) and δu(·) that are compatible with the constraints. we say that δu(·) is chosen arbitrarily and δx(·) is computed according to the ”variational equations” (7. The differential at (x∗ (·). t) x0 U ⊂I m R [t0 . t) and L(x. u∗ (t).31) Lemma 7.1 If u∗ (·) and x∗ (·) are the optimal control and the associated optimal trajectories. 7. t) δx(t) + fu (x(t). t) δu(t)] dt. u(t). t) δu(t)) δx(t0 ) = 0. u∗ (t). x∗ (·).t. t) δx(t) + Lu (x∗ (t).1 The functions δx(·) and δu(·) are called variations locally compatible with the constraints if they satisfy ˙ δx(t) = fx (x(t).30) (7. (7. u∗ (t).28) (7.7.30 is veriﬁed we may thus rewrite the expression (7. These consitions are also called Euler equations as they are equivalent to the celebrated conditions of Euler for the classical variational problems. u(t). tf ]. If Eq. tf ] → I n .32) of δJ as follows tf δJ = t0 [Lx (x∗ (t). u∗ (·)) of the performance criterion must be equal to 0 for all variations that are locally compatible with the constraints.
39) (7.34) +{Lu + λ(t) fu δx(t)} δu(t)] dt. u. x∗ (t). u∗ (t). t) ≡ 0. t) + λ(t)} δx(t) (7. THE EULER EQUATIONS AND THE MAXIMUM PRINCIPLE ˙ λ(t) {−δx(t) + fx (x∗ (t). u∗ (t). We can summarise Eqs.43 in the following theorem (7. ˙ We integrate by parts the expression in λ(t) δx(t) and obtain tf t0 107 (7.7. x∗ (t).397. x∗ (t).38) Since δx(t0 ) = 0 and δu(t) is arbitrary. (7. u. u∗ (t). u∗ (t). x∗ (t). t) δu(t)] dt. we may impose that it satisfy the following adjoint variational equations ˙ λ(t) = −Hx (λ(t).34) as follows δJ = −λ(tf ) δx(tf ) + λ(t0 ) δx(t0 ) tf t0 (7. t)) δJ = −λ(tf ) δx(tf ) + λ(t0 ) δx(t0 ) tf t0 ˙ [{Lx + λ(t) fx δx(t) + λ(t)} δx(t) (7. u. (7. u∗ (t). t). x∗ (t).36) +Hu (λ(t).33) ˙ λ(t) δx(t) dt = λ(tf ) δx(tf ) − λ(t0 ) δx(t0 ) − tf t0 ˙ λ(t) δx(t) dt. t) = L(x. t) δx(t) + ∗ fu (x (t). u∗ (t).35) ˙ [{Hx (λ(t). As we can choose the function λ(·) as we like. t) + λ f (x.37) (7.33) as follows (we don’t write anymore the arguments (x∗ (t). x. t) δu(t))} dt. u∗ (t). Then we can rewrite Eq. Let us introduce the Hamiltonian function H(λ. (7.40) . We can thus rewrite Eq. (7. t) δu(t)] dt. δJ = 0 for all variations locally compatible with the constraints only if Hu (λ(t).7. u∗ (t). t) λ(tf ) = 0. With this particular choice of λ(·) the differential δJ writes simply as δJ = λ(t0 ) δx(t0 ) + Hu (λ(t).
continuous in u and t.8 An economic interpretation of the Maximum Principle It could be shown that the costate variable λ(t) indicates the sensitivity of the performance criterion to a marginal variation of the state x(t). t) + λ f (x.47) where the hamiltonian function H is deﬁned as in Eq. u∗ (t). tf ] → I n which satisﬁes the adjoint R variational equations ˙ λ(t) = −Hx (λ(t). and where the control constraint set U is compact. u.7.42) (7. u∗ (t). t) = L(x.1 If u∗ (·) and x∗ (·) are the optimal control and the associated optimal trajectory. have been generalized by Pontryagin [?] for dynamical systems where the velocity f (x. tf ] λ(tf ) = 0. 7. u∗ (t). t) ≡ 0. (7. u∗ (t). (7. t). CONTROLLED DYNAMICAL SYSTEMS Theorem 7. t ∈ [t0 . It can therefore be interpreted . tf ]. t) = max H(λ(t).44). u∈U 1 (7. x.44) These necessary optimality conditions. t) are C un x.43) where the hamiltonian function H is deﬁned by H(λ. u. and such that Hu (λ(t). t) t ∈ [t0 . The following theorem is the famous ”Maximum principle” of the theory of optimal control. called the Euler equations. t). tf ]. t) and the payoff rate L(x.46) t ∈ [t0 . t) t ∈ [t0 . x∗ (t). tf ] λ(tf ) = 0. x∗ (t). Theorem 7. tf ] → I n which satisﬁes the adjointR variational equations ˙ λ(t) = −Hx (λ(t). u. (7. x∗ (t).2 If u∗ (·) and x∗ (·) are the optimal control and the associated optimal trajectory. x∗ (t). then there exists a function λ(·) : [t0 . (7.41) (7. then there exists a function λ(·) : [t0 . t ∈ [t0 .7. x∗ (t). and such that H(λ(t). u. u. tf ].108 CHAPTER 7. u.45) (7.
x∗ (t). t). t). ∀x u∈U ∂t ∂x ∂ ∗ = H( V (t.S.t. (7. t) and the value of the current state modiﬁcation λ(t) f (x∗ (t). ∀x ∂x ∗ f V (t . u. u∗ (t). This is this global value that is maximised w. of Eq.r. (7.50) Then V ∗ (ti .10 Synthesis of the optimal control Dynamic programming and the optimal feedback control Theorem 7. SYNTHESIS OF THE OPTIMAL CONTROL 109 as a marginal value of the state variable. x). σ ∗ (t. the Hamiltonian is therefore composed of two ”values”.9 7. Proof: ∂ Remark 7. ·) : I × I n → I and a R R R ∗ feedback law σ (t. x) that satisfy the following functional equations ∂ ∗ ∂ V (t. x) = 0. − (7.H. t).S. . for the optimal control problem deﬁned with initial data x(ti ) = xi . x).1 Notice here that the partial derivatives ∂x V ∗ (t. ∀t. 7. u at any point of the optimal trajectory. tf ].9. ti ∈ [t0 . The adjoint variational equations express the evolution of the marginal value of the state variables along the trajectory. x) = max H( V ∗ (t. xi ) is the optimal value of the performance criterion.H. Furthermore the solution of the maximisation in the R. x). x) appearing in the Hamiltonian in the R. ∀t. u∗ (t). of Eq.48) (7.48) is consistent with the interpretation of the costate variable λ(t) as the sensitivity of the optimal payoff to marginal state variations.1 Assume that there exist a function V ∗ (·.10.49) (7.7.48) deﬁnes the optimal feedback control. Along the optimal trajectory. x∗ (t). the current payoff rate L(x∗ (t).10.
. x0 and all the other elements of the dynamical system. x(·).52) (7. t) = max H(λ(t). u. . u. um (·)) = tf t0 Lj (x(t).57) where the hamiltonian function Hj is deﬁned as follows Hj (λj .13. uj ].1 If u∗ (·) and x∗ (·) are the Nash equilibrium controls and the associated optimal trajectory. [u∗−j .11 7. t ]. u.13.51) s.1 Openloop information structure Each player knows the initial data t0 . . x. CONTROLLED DYNAMICAL SYSTEMS 7. x∗ (t). . . 7.110 CHAPTER 7. x(t) ˙ x(t0 ) uj (t) t = = ∈ ∈ f (x(t). t) + λj f (x.53) (7. . Player j selects a control function uj (·) : [t0 .55) x λj (t ) = 0.12 7. tf ]. x∗ (t). . t). u∗ . tf ] → I n which R satisfy the adjointvariational equations ˙ λj (t) = −Hj (λj (t). t) dt (7. tf ] → Uj . t). t) t ∈ [t0 . . . the game is played as a single simultaneous move.58) . um (t). . tf ] (7.13. (7.54) 7.t. x∗ (t). then there exists m functions λj (·) : [t0 .13 Competitive dynamical systems Competition through capital accumulation Openloop differential games Jj (t0 .2 An equilibrium principle Theorem 7.56) t ∈ [t0 . um (t). u1 (·). t) x0 Uj ⊂ I mj R 0 f [t . x0 . uj ∈Uj f (7. u1 (t). u∗ . (7. and such that Hj (λj (t). (7. . t) = L(x. u1 (t). .
the game is played as a single simultaneous move where each player announces the strategy he will use.60) Then Vj∗ (ti .14.1 Assume that there exist m functions Vj∗ (·.H. 7. x) that satisfy the following functional equations − ∂ ∗ ∂ Vj (t. (7. x) = max Hj ( Vj∗ (t. (7.14 7.14.17 7.S.59) (7. xi ) is the equilibrium value of the performance criterion of Player j. x). uj ]. 7.7.15 Why are feedback Nash equilibria outcomes different from Openloop Nash outcomes? The subgame perfectness issue Memory differential games Characterizing all the possible equilibria 7. t ]. for the feedback Nash differential game deﬁned with initial data x(ti ) = xi . he deﬁnes his control as the result of a feedback law σj (t. x ∈ I n R ∗ f Vj (t . of Eq. ti ∈ [t0 .59) deﬁnes the equilibrium feedback control of Player j. Furthermore the solution of the maximisation in the R.1 Feedback differential games Feedback information structure Player j can observe time and state (t. uj ∈U ∂t ∂x 0 f t ∈ [t . tf ]. x) = 0. [σ ∗−j .2 A veriﬁcation theorem Theorem 7.14. x∗ (t).14. t).16 7. x(t)). ·) : I × I n → I and m R R R ∗−j feedback strategies σ (t. FEEDBACK DIFFERENTIAL GAMES 111 7. x).18 .
112 CHAPTER 7. CONTROLLED DYNAMICAL SYSTEMS .
Part IV A Differential Game Model 113 .
.
∞) → Uj . In control theory. A GAME OF R&D INVESTMENT 115 in this last part of our presentation we propose a stochastic differential games model of economic competition through technological innovation (R&D competition) and we show how it is connected to the theory of Markov and/or sequential games discussed in Part 2. . Let xj be the level of accumulation of R&D capital by the ﬁrm j. We focuss here on the description of the R&D dynamics.19.19 7.1 A Game of R&D Investment Dynamics of R&D competition We propose here a model of competition through R&D which extends earlier formulations which can be found in [?].61) 0 xj (0) = xj . Denote Uj the class of admissible functions uj (·). [?] the relationship between these control models and discrete event dynamic programming models is developed. solution of equation (7. Let T be a random stopping time which deﬁnes the date at which the advance takes place. Below we illustrate this class of games through a model of competition through investment in R&D. This example deals with a stochastic game where the random disturbances are modelled as a controlled jump process. Each ﬁrm can invest in R&D for the purpose of making a technological advance which could provide it with a competitive advantage. The R&D capital brings innovation through a discrete event stochastic process. where the random perturbations appear as controlled jump processes. where the control variable uj ∈ Uj gives the investment rate in R&D and µj is the depreciation rate of R&D capital. In [?]. The competitive advantage and product differentiation concepts will be discussed later. We represent ﬁrm’s j policy of R&D investment as a function uj (·) : [0. 7. With an initial condition xj (0) = x0 j and a piecewise continuous control function uj (·) ∈ Uj is associated a unique evolution xj (·) of the de capital stock. an interesting class of stochastic systems has been studied. Consider m ﬁrms competing on a market with differentiated products.61). In [?] a ﬁrst adaptation of this approach to the case of dynamic stochastic games has been proposed. The state equation describing the capital accumulation process is xj (t) = uj (t) − µj xj (t) ˙ (7.7.19.
(7. ω(x(τ )) The impact of innovation on the competitive hedge of the ﬁrm is now discussed. m. . .19.627.2 Product Differentiation We model the market with differentiated products as in [?]. The probability of having a technological advance occuring between times t and t + dt is given by Puj (·) [t < T < t + dt] = ωj (xj (t))e− t 0 ωj (xj (s))ds dt + o(dt) (7. according to the double exponential distribution (see [?] for details).. Then the demand for variant j is given by ˜ Dj = N Pj (7.62) where limdt→0 o(dt) = 0 uniformly in xj . The elementary conditional probability is given by Puj (·) [T ∈ (t. It is assumed that the (indirect) utility of a consumer buying variant j is ˜ Vj = y − pj + θqj + εj . exp[(θqi − pi )/µ] j = 1. R R Consider ﬁrst a static situation where N consumers are buying this type of goods.63) Therefore the probability of having the occurence before time τ is given by Puj (·) [T ≤ τ ] = τ 0 ωj (xj (t))e− t 0 ωj (xj (s))ds dt. (7. t + dt)T > t] = ωj (xj (t))dt + o(dt).66) where Pj = exp[(θqj − pj )/µ] . We assume that each ﬁrm j markets a product characterized by a quality index qj ∈ I and a price pj ∈ I . .65) where θ is the valuation of quality and the εj are i.116 Consider ﬁrst the case with a single ﬁrm j.i. with standard deviation µ.64) deﬁne the dynamics of innovation occurences for ﬁrm j. . the conditionnal probability that the innovation come from ﬁrm j is given by ωj (xj (τ )) . (7.67) m i=1 . The function ωj (xj ) represents the controlled dt intensity of the jump process which describes the innovation. Since there are m ﬁrms the combined jump rate will be m ω(x(t)) = j=1 ωj (xj (t)). Given that a jump occurs at time τ .d. (7.64) Equations (7. 7.
The higher the capital xj at the time of the innovation the lower will be the adjustment cost. Quality improvement. the higher will be the probability given to important quality gains. uj ) be a function which deﬁnes the cost rate of R&D operations and investment for ﬁrm j.1 We implicitly assume that the ﬁrm which beneﬁts from a technological innovation will use that advantage. 117 In a dynamic framework we assume a ﬁxed number N of consumers with a demand rate per unit of time given by (7. Remark 7. xj ). indicated by the capital xj (τ ).67).19.19. . Production costs Assume that the marginal production cost cj for ﬁrm j is constant and does not vary at the time of an innovation. But now both the quality index qj (·) and the price pj (·) are function of t. speciﬁed by the cumulative probability function Fj (·. the quality index of the variant increases by some amount. is supposed to also depend on the current amount of knowhow.7. We use a random variable ∆qj (τ )) to represent this quality improvement. When a ﬁrm j makes a technological advance. 7. We could extend the formalism to the case where a ﬁrm stores its technological advance and delays its implementation. R&D operations and investment costs.19. The higher this stock of capital is. Let Φj (xj ) be a monotonous et decreasing function of xj which represents the adjustment cost for taking advantage of the advance.657. Actually they will be the instruments used by the ﬁrms to compete on this market. Adjustment cost. A GAME OF R&D INVESTMENT This corresponds to a multinomial logit demand model. Let Lj (xj .3 Economics of innovation The economics of innovation is based on the costs and the advantages associated with the R&D activity. The probability distribution of this random variable.
Slow dynamics. j = 1.61) and the distribution functions Fj (·. The fundamental element of a sequential game is the socalled local reward functionals which permit the deﬁnition of the dynamic programming operators.2 Piecewise openloop game. 7. . Assume that at each random time τ of technological innovation all ﬁrms observe the state of the game.e the vectors s(τ ) = (x(τ ).. ∞) → I and an R&D investment R R schedule uj (·) : [τ.61). These controls will be used until the next discrete event (technological advance) occurs. The quality levels qj (t). .20. If ﬁrm j makes a technological advance at time τ then the quality index changes abruptly qj (τ ) = qj (τ − ) + ∆qj (τ ). qj (τ ))j=1. We are thus in the context of piecewise deterministic games as introduced in [?].118 7. .3 A Sequential Game Reformulation In [?] it is shown how piecewise deterministic games can be studied as particular instances of sequential games with Borel state and action spaces.. . . .20. j = 1.m . ..20 Information structure We assume that the ﬁrms observe the state of the game at each occurence of the discrete event which represents a technology advance. Then each ﬁrm j selects a price schedule pj (·) : [τ. The levels of capital accumulation xj (t).1 State variables Fast dynamics. a fast one and a slow one are combined. . 7. xj ).20.. m deﬁne the fast moving state of this system. ∞) → I which are deﬁned as openloop controls. q(τ )) = (xj (τ ). i. These levels change according to the differential state equations (7. as shown in the deﬁnition of the state variables. m deﬁne a slow moving state. This system is an instance where two dynamics. 7. . These levels change according to the random jump process deﬁned by the jump rates (7.
We shall limit our discussion to this deﬁnition of the model and its reformulation as a sequential game in Borel spaces.e. The R&D investment schedule is obtained as a tradeoff between the random (due to the random stopping time τ ) transition cost given by 0τ e−ρt Lj (xj (t). τ is the random time of the next technological event and s(τ ) is the new random state reached right after the occurence of the next technological event. . ∞) → I until the next technological event occurs and the subsequent rewards are R deﬁned by the vj (s) function. 2. The reader will ﬁnd in [?] more information about the numerical approximation of solutions for this type of games. uj (t)) dt and the expected rewardtogo after the next technological event. The price schedule pj (t). . . at time τ . INFORMATION STRUCTURE 119 Let vj (s) be the expected rewardtogo for player j when the system is in initial state s.20. The sequential game format does not lend itself easily to qualitative analysis via analytical solutions. u(·). .7. j = 1. . j = 1. t ≥ τ can be taken as a constant (until the next technological event) and is actually given by the solution of the static oligopoly game with differentiated products of qualities pj (t). .u(·) 0 ˜ e−ρt {Dj (t)(pj (t) − cj ) (7. A numerical analysis seems more appropriate to explore the consequences of these economic assumptions. vj (·). with continuous state space and functional sets as action spaces (∞horizon controls). p(·)) = Es. m. where it is shown to be uniquely deﬁned. From the expression (7. We can write τ hj (s. m. 3. uj (t))} dt + e−ρτ vj (s(τ )) .68) −Lj (xj (t). The local reward functional describes the total expected payoff for ﬁrm j when all ﬁrms play according to the openloop controls pj (·) : [τ. . ∞) → I and uj (·) : R [τ. i. . . We are in a sequential game formalism. The solution of this oligopoly game is discussed in [?].68) we see immediately that: 1. where ρ is the discount rate.
120 .
VAN DER P LOEG AND A. J.0. 1987. O LSDER. P. BAS AR.. pp. pp. IEEE Trans.C. 1967] E. Scientiﬁc Press/Duxbury Press. 1987] D. Amsterdam. [Aumann. Recherches sur les principes math´ matiques de la e th´ orie des richesses. Academic Press. 1989] T. BAS AR AND G. K ENDRICK AND A. 9. GAMS. e [Denardo. [Ferris & Munson. 1994] M. 1983. Florida. Dynamic Policy Games in Economics . Control. SIAM Rev. BAS AR . Westview Press.S. 1983] A. F ERRIS AND T. Vol. 1992] A. [Brooke et al. New York.. Prentice Hall. H AURIE. M EERAUS. [Cournot. IEEE. pp 193203. [Aumann. K. A User’s Guide. 6796. [Basar. Release 2. ¸ [Basar. Autom. 1989. .V. 1982] T. Librairie des sciences politiques et sociales. Paris.25. Time consistency and robustness of equilibria in noncoop¸ ¸ erative dynamic games in F. 1989. M UNSON. B ROOKE . 1989] R. 165177. 1967. 1838. A Dynamic Games Approach to Controller Design: Dis¸ turbance Rejection in Discrete Time in Proceedings of the 28th Conference on Decision and Control. J. 1838] A. to appear in Computational Optimization and Applications. Dynamic Noncooperative ¸ ¸ Game Theory. A LJ AND A.Bibliography [Alj & Haurie. D ENARDO . 1974. AC28. D. 121 . 1989. Tampa. Boulder etc. J.AUMANN Lectures on Game Theory. Contractions Mappings in the Theory Underlying Dynamic Programming. 1982. AUMANN Subjectivity and correlation in randomized strategies. 1974] R. Interfaces to PATH 3. [Bertsekas. Vol. 1. [Basar & Olsder. Vol. J. Economic Theory. 1989] T. 1992. New Jersey. C OURNOT. Englewood Cliffs. B ERTSEKAS Dynamic Programming: Deterministic and Stochastic Models. North Holland. Dynamic Equilibria in Multigeneration Stochastic Games. DE Z EEUW EDS .
[Friedman 1986] J..E. Kluwer Academic Publishers. Scientiﬁc Press/Duxbury Press. [Haurie & Tolwinski 1985a] A. 13751385. 196768. pp. PANG. Oligopoly and the Theory of Games. 1986. 1991] J. pp. 1997a] M.122 BIBLIOGRAPHY [Ferris & Pang.H. Engineering and economic applications of complementarity problems. 39.C. 1993. Acceptable Equilibria in Dynamic bargaining Games. 1993] R. pp. [Green & Porter. T IROLE Game Theory. Vol. P ORTER Noncooperative collusion under imperfect price information . 54. T OLWINSKI Cooperative Equilibria in Discounted Stochastic Sequential Games. 1990] A. [Haurie & Tolwinski. H AURIE AND B. 1985. 1991] D. 196768] J. 669713. 1997. Games with incomplete information played by Bayesian players. [Friedman 1977] J. F UDENBERG AND J. F OURER . F RIEDMAN. 1984. [Filar & Vrieze. Complementarity and Variational Problems: State of the Art.A. SpringerVerlag New York. [Forges 1986] F. Amsterdam: NorthHolland. [Fudenberg & Tirole. 1985] G REEN E.J. 1991. 7389. SIAM Review. Econometrica. pp. SIAM. Oxford: Oxford University Press. 486502. Raghavan et al.. H AURIE AND B. [Filar & Tolwinski. D. Vol. 1997] J.W.S. K ERNIGHAN. H ARSANY. 159182. 1997. AND R.M. F ILAR AND V RIEZE K. F ILAR AND B. 1977. F ERRIS AND J. 1986. F ERRIS AND J. England. 1997b] M.S. London. . 14.S.W.A. An Approach to Communication Equilibria. F ORGES . Massachusetts. Management Science. 1997.W. pp. Vol. 52. 3. 320334. 6. F RIEDMAN. PANG. T OLWINSKI. Competitive Markov Decision Processes. Cambridge. Game Theory with Economic Applications. Econometrica. Large Scale Systems. [Ferris & Pang. Journal of Optimization Theory and Applications. 64. G AY AND B. The MIT Press 1991.C. [Fourer et al. Vol. Vol. AMPL: A Modeling Languagefor Mathematical Programming. 1984. (eds) Stochastic Games and Related Topics. No. T OLWINSKI . [Harsanyi. March 1990. Vol. On the Algorithm of Pollatschek and AviItzhak in T. 87100.
Models & Applications. L UENBERGER. 12.T. Tucker eds. 46i. 413423. 1951. 1969] D. Optimization by Vector Space Methods.H. NASH.H. 92106. Princeton University pres. Wiley & Sons. Wiley & Sons. 1951] J. [Moulin & Vial 1978] H. 1978. J. [Nash. 1953. Extensive games and the problem of information. [Howard. W. G.W. 1965. H AURIE AND B. 1979] D.E. [Haurie & Tolwinski. Vol. pp. 286295. 1960. Vol.. T OLWINSKI. Bimatrix equilibrium points and mathematical programming. 1969. L UENBERGER.. Cooperative Equilibria in Discounted Stochastic Sequential Games. K UHN. [Luenberger. 1982] F. Contributions to the theory of games. 24.. Soc. M ERZ ./bf 9. H AURIE AND B. 11. pp. [Lemke & Howson 1964] C. 1985b] A. Management Sci. Appl. M ANGASARIAN AND S TONE H. Anal. . G.. Indust. 348355. No. 1990] A. 201221. Deﬁnition and Properties of Cooperative Equilibria in a TwoPlayer Game of Inﬁnite Duration. 2. 1990. Vol. pp. L EMKE. [Merz. [Lemke 1965] C. 1976.Sherali & Soyster. J. No. Kuhn and A. M URPHY. V IAL.P. G. Leitmann ed. L EMKE AND J. A mathematical programming approach for determining oligopolistic market equilibrium. 1964. Math.. J. Math.BIBLIOGRAPHY 123 [Haurie & Tolwinski. pp.L. 1953] H. in H. pp. MIT Press.L. [Murphy. strategically Zerosum Games: The Class of Games whose Completely Mixed Equilibria Cannot be Improved Upon.F. [Kuhn. Introduction to Dynamic Systems: Theory. Journal of Optimization Theory and Applications. Noncooperative games.3.D. T OLWINSKI.S HERALI AND A. New York. [Luenberger. International Journal of Game Theory. J. [Mangasarian and Stone. Vol. 1979. Princeton new Jersey. pp. Equilibrium points of bimatrix games. TwoPerson Nonzerosum Games and Quadartic progamming. New York and London. pp. Plenum Press. Annals of Mathematical Studies No 28. Mathematical Programming. 1964] O. 511535. 1964. Cambridge Mass. pp. Annals of Mathematics. 1985. Vol. The Game of Identical Cars in: Multicriteria Decision Making and Differential Games.4. H OWSON. Applic. 681689. 193216.W.W. 64. Dynamic programming and Markov Processes. 1982. Journal of Optimization Theory and Applications. S OYS TER. H OWARD. 7.E. 1976] A. pp. M OULIN AND J. New York. 525534. 1960] R. 54.
pp. 1989] PARTHASARATHY T. 29. 1983] R. Theory Appl. R AGHAVAN. Stationary Equilibria for NonzeroSum Average Payoff Ergodic Stochastic Games with General State Space. 1981] R. [Parthasarathy Shina. S HINA. 1987.22. 33. 1965] J.313338. 520534. pp. 1993. J. N OWAK. Cambridge etc. [Nowak. New York. pp. to appear. 1990. N OWAK AND T.E.136154. 1985] A. 591602. pp. [Radner. Monitoring Cooperative Agreement in a Repeated PrincipalAgent Relationship. AND S. 1985. Vol. [Owen.. . [Rosen. pp. P ETIT. [Nowak.S. Existence and Uniqueness for concave nperson games.H.. 1981.S.124 BIBLIOGRAPHY [Nowak.B. Journal of Economic Theory. Theory Appl. 49. Game Theory. [Porter. 19. Birkhauser. Repeated PrincipalAgent Games with Discounting. Existence of equilibrium Stationary Strategies in Discounted Noncooperative Stochastic Games with Uncountable State Space. Optimal Cartel trigger Strategies . Vol. N OWAK. Econometrica. 1987] A. London etc. 199?] A. Annals of the International Society of Dynamic Games. Econometrica. Cambridge University Press. 429441. Optim.F RUTHERFORD. . [Rutherford. R ADNER. pp. 1965. 1980. Vol. J. pp. 52. Collusive Behavior in Noncooperative εEquilibria of Oligopolies with Long but Finite Lives. N OWAK. Journal of Economic Dynamics and Control. ROSEN. [Radner. Control Theory and Dynamic Games in Economic Policy Analysis. 1993] A. . [Nowak. [Radner. 1983. Vol.S. pp. International Journal of Game Theory. 1982] Owen G. Vol. 12991324. 53. R ADNER. Extensions of GAMS for complementarity problems arising in applied economic analysis. Vol. 1989. Academic Press. 1985] R. 1980] R. Existence of Stationary Equilibrium Strategies in NonzeroSum Discounted Stochastic Games with Uncountable State Space and State Independent Transitions. Econometrica. R ADNER. 1985.S. Optim. Vol. L. 1982. [Petit. 1995. pp. 11271148. 11731198.189194. 1995] T. Nonrandomized Strategy Equilibria in Noncooperative Stochastic Games with Additive Transition and Reward Structures. 45. 18. Vol.S. 1990] M. Journal of Economic Theory. P ORTER. Existence of Stationary Correlated Equilibria with Symmetric Information for Discounted Stochastic Games Mathematics of Operations Research.
Rexaminition of the Perfectness Concept for Equilibrium Points in Extensive Games. Vol. Representation and Approximation of Noncooperative Sequential Games. .J. W HITTLE. Princeton: Princeton University Press. Wiley. Stochastic Games. VON N EUMANN. 1975b] M. 1971. S IMAAN AND J. pp. pp. Journal of Mathematical Analysis and Applications. 1982. Introduction to Sequential Games with Applications. 1975] R. New York and London. pp. Additional Aspects of the Stackelberg Strategy in NonzeroSum Games in: G. B. 18. International Journal of Game Theory. W HITT.1986. 1975. 1976. 1944. C RUZ. pp. 1986] B. Vol. L EITMANN ed. New York etc. [Simaan & Cruz. VON N EUMANN AND O. Elsevier. Games for Society. The Uses and Methods of Gaming. 100. C RUZ. 1953. Business and War. Noncooperative Stochastic games. Annalen. Annals of Mathematical Statistics. 1980. Plenum Press. Cooperative Equilibria in Differential Games. L EITMANN ed. 1982] P. [Simaan & Cruz. T OLWINSKI . Chichester. 1975. 10951100. 1976. 1988. 119. Haurie & Leitmann. Math. 295320. [Shubik.182202. T OLWINSKI. Multicriteria Decision Making and Differential Games. Theory of Games and Economic Behavior. 1976a] M. J. H AURIE AND G. Discussion Paper No. [Sobel 1971] M. [Shapley. Optimization Over Time: Dynamic Programming and Stochastic Control. 1953] L.. L EITMANN . S OBEL. Wellington. [Whitt. 39. S ELTEN. pp. 1953. 2555. B. S IMAAN AND J. S HUBIK. 1988] B. Multicriteria Decision Making and Differential Games. [Von Neumann 1928] J. Victoria University of Wellington. Proceedings of the National Academy of Science Vol. S HAPLEY. Elsevier. New York and London. 42. A. M ORGENSTERN. [Tolwinski. Vol. pp. S HUBIK. 1975. SIAM J. Control. 19301935. New York etc.BIBLIOGRAPHY 125 [Selten. On the Stackelberg Strategy in NonzeroSum Games in: G. Zur Theorie der Gesellshaftsspiele. Plenum Press. [Shubik. 3348. 1975a] M. 1976b] M. . 1928. 4. 46. Vol. [Tolwinski. [Whittle. 1. [Von Neumann & Morgenstern 1944] J. 1980] W. Vol..