Multiple Agent Dynamics and Markov Perfect Equilibria

IO Class Notes: Multiple Agent Dynamics;
An Intro to Markov Perfect Equilibria

(the Early Maskin-Tirole Papers).
Ariel Pakes
November 16, 2009
Introduction.
These three papers explain what an MPE is. To do so in a
way that allows them to provide a detailed analysis of behavior
they make a series of restrictions. The restrictions are both on
the nature of behavior, and on the functional forms of primi-
tives. If the restrictions they use do not hold, we can usually
still obtain an MPE (see their 1997 articles), but the analysis
of behavior will generally require a more detailed specication
of primitives and numerical analysis. In addition to functional
form restrictions, they assume:
1. Decisions are not made simultaneously by all agents, but
rather there are alternating moves. In period one the rst
agent chooses its strategy and the second agent is con-
strained to the implications of history. In period two the
second agent moves but not the rst, etc. The agents cur-
rent strategy then will, in addition to having an eect on
1
current net cash ow, determine the state which the com-
petitor will have to deal with in the next period. The con-
trol is chosen strategically, with the response of the com-
petitor in mind.
2. Strategies are restricted to be only a function of the cur-
rent levels of payo relevant state variables; where payo
relevant state variables are dened as variables which de-
termine either current demand or current cost conditions
(of some actor).
Alternating Moves.
An example of an institutional setup that might be described as
an alternating move game is a game between a regulator and a
regulatee. On the other hand, it is hard to think of alternating
move games that describe the nature of the interaction of rms
in a single industry.
There are good economic reasons for making decisions imply
some form of commitment by the decision maker. However, we
generally think of forms of commitment which are much more
smooth than the alternating move games would imply; one can
change the decision but changes cost something. These kinds
of smooth commitments are embodied in models with convex
adjustment costs, and simultaneous move games with convex
adjustment costs are discussed in Judd (Cournot vs. Bertrand)
and the price commitment version of the Ericson-Pakes model;
in Nakamura (2006). Both of these (later) papers, however, get
very little in the way of analytic results, and have to rely heavily
on numerical analysis.
The alternating move assumption does however simplify greatly.
2
The simplication is even greater if we assume in addition, that,
the games are nite horizon, for then it is straightforward to
prove that there is both; (i) a unique equilibria (at least generi-
cally), and (ii) an obvious computational algorithm for comput-
ing that equilibria. The nite horizon assumption is not made
in these articles.
Strategy Restriction.
It is not dicult to build Markov Perfect games that allow
strategies to be determined by more than the variables that
determine current demand and cost conditions; and we will gen-
erally need such strategies when we move to analyze models
which allow for collusion, or asymmetric information ...
Use of the payo relevant strategies, is however, convenient
since the simple static models we studied early on in the course
deliver prot functions which are a function of these payo rel-
evant state variables. Thus if the static Nash in quantities or
Nash in prices assumptions seem relevant, all we will need to
do the dynamic analysis is the actual determinants of demand
and cost, and the prot function as a function of these variables.
Typically this can be computed pointwise as an output of the
static analysis and fed into the dynamic analysis. The dynamic
analysis will then determine the investments in developing these
state variables.
This nice staging from static to dynamic analysis is used in
most empirical work on dynamics. Note that it allows the es-
timation strategies developed earlier in the course can be used
to deliver a crucial input for the dynamic analysis; the prot
function as a function of states. As we will see their are a set
of additional parameters we need for the dynamics, and we will
3
come back to provide estimation techniques for those later in
the course.
There are good reasons to sometimes allow strategies to de-
pend on other state variables. Examples include;
models with asymmetric information where one of the ac-
tors may not know the other actors costs; instead it uses
beliefs on what the other actors costs might be, and those
beliefs depends on what that actor has done in the past, or
on signals the rm has sent....
or in collusion models (with or without asymmetric infor-
mation) when collusion pricing is made incentive compat-
ible by a policy which punishes for deviation from the
collusive agreement (then strategies depend not only on
the current values of the payo relevant random variables
but also on what strategies the competitors had played in
the past),
Models in which there are lags in implementing investment
decisions or adjustment costs since then current optimal
investment depends not only on current capital stock but
previous investments...
and many others.
It is easy to generalize the Markov Perfect concept by allow-
ing for many state variables. In our examples the new state
variables would include; a distribution of beliefs on the costs of
competitors, a feature of the past pricing policy distribution,
or previous investments or capital stocks. Each of these state
variables would have to be formulated in a way that allows us
to update them in a Markov fashion; i.e. last periods value plus
4
some information made available between last and this year pro-
duces this years value. The additional state variables add to
the computational burden of the model, but do not necessar-
ily cause any other problems.
1
This is the sense in which the
Markov Perfect framework is quite general (see Maskin Tirole,
1997).
Maskin-Tirole 1; A Simple Duopoly Model.
The papers by MT that we cover impose more assumptions then
the assumptions discussed above. They also impose very rigid
restrictions on the choice of primitives (demand and various
costs). This is testimony to the fact that these models are quite
complicated; one needs very stringent restrictions to either solve
the model analytically or to get clean comparative dynamics.
The assumptions used here are clearly too restrictive to be used
as a basis for either empirical work or policy analysis. On the
other hand they do accomplish two things;
They allow us to gain analytic intuition on what can happen
in the rich world of dynamic games; intuition that is alot of
help in interpreting either events which occur in industries,
or the results of a numerical exercise designed to analyze
the impacts of alternative possible policies or events.
The basic nature of the solution to the game is easy to gen-
eralize to more complex situations; its just that the analytic
results go away. On the other hand once we have good es-
timates of the primitives, we no longer need the analytic
1
They could, however, cause the cardinality of the state space to grow without bound,
and then applied analysis becomes problematic for several reasons. This often becomes
an issue in dynamic games with asymmetric information.
5
results. We can simply substitute those estimates in an al-
gorithm designed to compute the equilibrium, and analyze
the phenomena numerically (this assumes that their is a
unique equilibrium, or at least that we know how to pick
between alternative possible equilibria). Of course this is
much easier said then done. On the other hand it has been
done, and is likely to be done increasingly as computers,
data sets, and students, get better.
The Setup.
Consider a duopoly in which rms move sequentially and all
moves are committed for two periods. The rms moves are
alternating.
We conne attention to those, symmetric, perfect equilib-
ria whose strategies satisfy the payo relevant restriction de-
scribed above. In our dynamic terminology the perfect con-
dition translates into two requirements (see Star and Ho,1969)
No matter the current states, each actor chooses policies to
maximize the EDV of future net cash ows conditional on
your perceptions of current and future market conditions,
and,
Those perceptions are in fact consistent with reality (the
actual distribution of future states and actions of your com-
petitors).
The relevant restrictive words here are; symmetric, perfect, and
payo relevant. We have already noted that the payo relevant
restriction on strategies can be done away with.
6
To an applied person a model with asymmetric equilibria
(equilibria where dierent actors make dierent choices when
faced with the same set of state variables) for symmetrically
placed and informed players is an incomplete model. Somewhere
a state variable that is known to the agents (that is they act on
it), not known to us, and distinguishes among agents is missing.
This may well be the case in many models we want to take to
data, but probably the best way to deal with it is to include
in the theoretical specication and then try to deal with the
omitted state variable with econometric assumptions. So we
will come back to this problem when we discuss estimation.
Finally the restriction to perfect is a natural starting place.
We may think that it requires the actors to have too much com-
putational ability. Indeed this is one of the reasons for the inter-
est in learning and bounded rationality theories in economics.
We come back to a bit more on this later in the course.
The Value Functions.
The rms value is:
V
i
=
t
i
(a
1
t
, a
2
t
)
where () is bounded. The a
s are chosen to maximize this

value subject to the alternating move restriction:
a
1
2k+2
= a
1
2k+1
, a
2
2k+1
= a
2
2k
for k Z
+
,
I.e rm 1 moves in odd periods, and rm 2 moves in even periods.
There is a dynamic reaction function:
a
2
2k
= R
2
2k
(a
1
2k1
) = R
2
(a
1
2k1
)
7
where R
2
: A A, and the last equality follows from the time
independence of the primitives. In this notation the symmetry
restriction is R
1
= R
2
.
Representation as a Functional Equation.
V
1
(a
2
) = max
a
1[
1
(a
1
, a
2
) + W
1
(a
1
)]
where:
W
1
(a
1
) =
1
(a
1
, R
2
(a
1
)) + V
1
(R
2
(a
1
)).
Here V
1
is agent 1s value conditional on it being agent 1s
choice period and any value for last periods choice by agent 2
(that choice is the current value of the state vector of the game).
W
1
() provides agent 1s value as a function of the choice it made
in the previous period; i.e. when it is the competitors choice
period. W
1
() is also called the continuation value of agent 1
in agent 1s choice period. Writing everything out as a Bellman
equation in standard dynamic programming framework we have:
V
1
(a
2
) = max
a
1
_
1
(a
1
, a
2
) +
1
(a
1
, R
2
(a
1
)) +
2
V
1
(R
2
(a
1
))
_
For any given reaction function (a given R
2
(a
1
)), Blackwells the-
orem implies that this Bellman equation is a contraction map-
ping (thus there exists a unique solution for the value function,
and we can obtain it by iterating . . .). Also, given R
2
(), this is
a perfect solution; the agent always does the best it can do
conditional on what the other agent actually does.
This contraction produces a R
1
(a
2
), or more precisely an R
1
(R
2
)
()
8
for any given (
1
, ). Similarly from the analogous expressions
for rm 2, we will have an R
2
(R
1
)
()
for every given R
1
.
A pair (R
1
, R
2
) is a Markov Perfect Equilibrium (MPE) i
R
1
maximizes V
1
(; R
2
), and R
2
maximizes V
2
(; R
1
). In this case
each agent is doing the best he can do realizing that both will
do the best they can do at any node of the game tree.
Note that once we specify an R
2
, and specied that the rm
maximizes the EDV of net cash ow, we have implicitly dened
a (V
1
, R
1
) (and once we have specied and R
1
we have dened
a(V
2
, R
2
)).
Existence, Computation, and Uniqueness (Alternating
Moves).
Consider a T period game and count time from the terminal
date. Say that rm 1 acts in period 0, 2,...:
V
1
0
(a
2
0
) =
1
0
= max
a
1
1
(a
1
, a
2
0
)
This gives us
R
1
0
(a
2
0
) = a
1
0
.
Then rm 2 acts in period 1:
V
2
1
(a
1
) = max
a
2
_
2
(a
1
, a
2
) +
2
(R
1
0
(a
2
), a
2
)
_
.
This gives us
R
2
1
(a
1
) = a
2
1
(a
1
).
We then go to period 2 with rm 1:
V
1
2
(a
2
) = max
a
1
_
1
(a
1
, a
2
) + W
1
1
(a
1
)
_
9
where:
W
1
1
(a
1
) =
1
_
a
1
, R
2
1
(a
1
)
_
+ V
1
0
_
R
2
1
(a
1
)
_
,
where:
V
1
0
(a
2
0
) = max
a
1
1
(a
1
, a
2
0
).
Note that this gives us all the ingredients needed to calculate
R
1
2
(a
2
). Given this we can move on to period 3
V
2
3
(a
1
) = max
a
2
_
2
(a
1
, a
2
) + W
2
2
(a
2
)
_
W
2
2
(a
2
) =
2
_
R
1
2
(a
2
), a
2
_
+ V
2
1
_
R
1
2
(a
2
)
_
which gives us all the ingredients to compute R
2
3
(a
1
) and so on.
Note: The alternating moves ensures a unique solution to
every periods problem in the T-period game (you are maximiz-
ing continuous functions over a compact set), and a simple way
of computing it. The only questions remaining are: (i) do the
sequence of value functions and policies obtained from the T
period game converge to a symmetric equilibria in the innite
horizon game (this you might be able to check either analyti-
cally or numerically for a given parametrization), (ii) if it does,
is this the only equilibrium (generally the answer is no).
Quantity Competition with Large Fixed Costs.
Take quantity to be the rms choice variable in the alternating
moves game described above. Quantities here should be thought
of as capacity. The prot function is therefore a reduced form
which gives the outcome of a game in the spot market for current
10
output given the sunk capacity. The capacity choices are sunk
for two-periods; i.e. every two periods you must sink F dollars
if you are active (if q > 0), and we let f =
F
1+
, the per period
equivalent cost.
For the actual example computed in the paper, they assume
that the equilibrium is Nash in capacities, with prot func-
tions:
1
(q
1
, q
2
) =
_
q
1
(1q
1
q
2
)cq
1
f if q=0
0 if q=0
_
and similarly for rm 2. Moreover, they assume that
2f >
m
> f
where
m
is monopoly prots.
This latter condition insures that xed costs are so large that
we can never observe two rms earning positive 1-period prots
(when these are calculated net of the annualized sunk costs).
Note: A Cournot one-shot solution to this game (when there is
no sunkness in capacity choices) has three equilibria: two pure
strategy equilibria (q
m
, 0) and (0, q
m
), and a mixed strategy
equilibria setting q =
f with probability and q = 0 with

probability (1 ).
They show that in a dynamic game, there exists a unique sym-
metric equilibrium and compare it to the Cournot solution.
They do this through a series of lemmata. I reproduce what
I think are the important ones.
11
LEMMA 1: Equilibrium Dynamic Reaction Functions are Down-
ward Sloping (i.e. capacities are strategic substitutes ).
What we want to prove is
q > q r = R
i
(q) R
i
( q) = r.
Proof: Assume to the contrary. Then for a q > q, an r > r
such that:
2
(q, r) + W
2
(r)
2
(q, r) + W
2
( r)
2
( q, r) + W
2
( r)
2
( q, r) + W
2
(r)
(Here we are assuming that r is the best response to q, and r is
the best response to q; hence by choosing r when q is played by
the rival we must do worse....)
Adding these two inequalities the future terms cancel out and
0
2
(q, r)
2
(q, r) +
2
( q, r)
2
( q, r) =
_
r
x= r
[
2
2
(q, x)
2
2
( q, x)]dx =
_
r
x= r
_
q
y= q
2
21
(y, x)dydx
Which is a contradiction [i.e. it is straightforward to check
(
21
0 everywhere) .
The beauty of this proof is that it gets an answer without ever
solving for the value function (which simply drops out of the
calculation). What they have shown is that, at least in a two
person alternating move game, provided the spot market prot
12
functions have negative cross-partial (everywhere), the reaction
functions will be downward sloping, or the policies of rms are
strategic substitutes.
Note three points. First the result is reminiscent of the result
for strategic substitutes for static games; they both depend only
on having a negative cross partial to the static prot function.
I.e. we have shown that in this setting capacities which here are
a dynamic control, are strategic substitutes provided the prot
function has negative cross partial
Second a similar condition will hold for discrete policies played
on a lattice, like setting up plants, provided functions are su-
permodular in the sense of Topkis, 1968. For an application of
related results for entry games see Jia (2006).
Finally the result is reminiscent of Eulers perturbation method
for solving for the optimal control in single agent dynamic pro-
gramming problems, though Eulers technique does not gener-
ally work in games. The reason is we can not hold competitors
states constant when we perturb our own state. However in al-
ternating move games with two agents we can do perturbation
methods as they show in the next article reviewed here.
The method of proof of the rest of the lemmas, and indeed
their results, are more specic to the particular assumptions
made in the paper, so we will simply state those of them that
we need for an explanation of the results and give their intuitive
basis.
The second lemma we need is the stopping or deterrence
q level lemma.
13
LEMMA 4: In a symmetric equilibrium q with > q > 0
such that
q > q, R(q) = 0, and q < q , R(q) > 0.
Structure of Proof. Given the prior Lemma what the authors
have to show here is that R(0) > 0, and there exists a q large
enough such that R(q) = 0. They show that because of the
constraint that 2f >
m
> f, there exists a q large enough that
R(q) = 0, and that R(0) > q.
Note that this lemma has demonstrated the existence of a deter-
rence level of capacity (output) if you choose a plant above
this level capacity your rival will not enter and if you choose
below it your rival will enter. This gives agents the ability to
practice entry deterrence, as in the earlier 2-period models on
entry deterrence (see the article by Dixit () on the reading list).
Of course we do not yet know whether it is optimal for the agent
to do so.
LEMMA 5: In a symmetric equilibrium, q and positive real-
izations r = R(q), R(r) = 0.
Lemma 5 says that except for an initial transitory period, if a
rm operates at all, it operates at above the deterrence level.
That is if r is a reaction to any feasible strategy, either r = 0, or
r q, where q is the deterrence level of output as determined
by the previous lemma.
Note that this implies that deterrence is actually practiced.
I.e. as we will show one agent build mores capacity than it
14
would if there were no threat of entry. This makes playing larger
quantities credible should an entrant decide to enter thereafter,
and by so doing it deters entry. This is just Dixits logic extended
to an innite horizon game, and is a good example of intuition
we can carry over from the simple two period models.
Relatedly you should note that the result implies that, in
equilibrium, a rm either drops out forever, or induces its oppo-
nent to do so. That is there can be no sequence of alternating
entering and exiting for from the lemma the reaction to the re-
action to zero must be zero.
A couple of other points are also worth noting.
A Markov Perfect strategy is generally dened to be strate-
gies from any initial position in the state space. In a deter-
ministic game such as this one, we go from initial strategy
to a sequence of strategies by applying the operator R. In
this game we could start at q. In that case the strategies
of the two competitors are constant forever (he competi-
tor starting with q stays at q and the other stays at zero).
That however is not a complete solution. We have also to
know what would happen if we started elsewhere. Part of
the answer is given in the lemma. Within two periods we
would get to q, and thereafter we would do the same things
as we would have had we started there.
When we move to stochastic dynamic games, there will be
stochastic outcomes to the investment expenditures of the
various agents. So instead of proceeding deterministically
from state to state there will be a probability distribution
of future states given the current state; a Markov pro-
cess on the state space. The Markov process will generally
imply that there will be states that will never be visited in
15
any lasting way say states with say alot of rms active
in a small market, or states with a small number of rms
active in a large market. Still the game is dened from ev-
ery possible initial condition. Indeed Markov processes are
generally dened by a transition probability distribution
from each state (a Markov matrix or transition kernel) ,
and an initial condition.
Finally note an MPE is just what we need for empirical
work; the data denes the initial condition, and the model
predicts what will happen given the initial condition in the
data. So provided we have at least two time periods on
the relevant variables we can go straight from the model
to estimation, or to policy analysis and the methodology
does not rely on the abstractions of two-period games. This
is one reason why MPE of this form are such an attractive
concept to empirical people .
We still have not completed the solution. We know that we
get to an entry deterrence level of output and stay their, but
we do not know what that level of output is, or how we get
their from any initial condition. The last lemma works on these
issues. It is proven by looking at behavior under dierent values
of the discount rate. It works out that behavior is dierent
depending on whether is high, low, or in an intermediate range.
The following proposition claries what happens under the three
alternatives.
16
Proposition:
Part 1: (High discount rate).
1
(0, 1) s.t. if
1
< 1
R(q) =
_
0 if qq
if q<q
where q
is the largest root of:

(q, q) +

(1 )
(q, 0) = 0.
Interpretation of condition. Say the other agent was at a q q
.
Then the rst agent could enter, share the market in the rst
period, induce him not to come in the next (and subsequent)
periods, and just earn zero prot. If you could earn more prot
than zero, then the rival would re-enter thereafter. If you earned
less prot than zero you would not enter.
They show that;
With a high enough discount rate, this q
> q
m
. One can
show this by direct substitution noting that,
(q
m
, q
m
) +

(1 )
(q
m
, 0) > 0.
So you want to have q as small as possible conditional on
it being large enough to deter entry (the smaller it is the
larger the spot prot function; otherwise you might actually
want a q greater than the one above). So there may be a
monopoly but because it needs to deter entry, output is
higher than the monopoly output.
17
q
monotonically increases in the discount factor and de-

creases in the xed cost f (solve for the zero of the equa-
tion and take derivatives). That is the deterring output
increases the less we discount the future, and the less are
sunk costs. In particular as 1, the entry deterring
quantity gets closer to the competitive (zero prot) level.
The fact that q
> q
m
, which recall is the one-shot equi-
librium, is reminiscent of dynamic limit pricing I keep
quantity up (price and prots down), and commit to do so
also in the coming period, to prevent entry. The fact that q
increases with (so that current prots decrease) just says

that the opponent would be getting a higher discounted net
cash ow at a given q if the discount factor were higher. It
decreases in xed costs because we require higher prots
per period to induce entry when xed costs are higher.
Part 3: (Low discount rate).
2
(0, 1) s.t. if 0 < <
2
R(q) =
_
0 if qq
T(q) if q<q
_
where q
satises
[T(q), q] + [T(q), 0] +
m
f
1
= 0
and
T(q) = argmax
q
{( q, q) + ( q, 0)}
The crucial condition here is that q
< q
m
. This implies that
we can sustain a monopoly output and not induce entry. So now
18
the steady state quantity is monopoly, though how the rm gets
there depends on the initial q in the market.
The rm enters here provided the losses generated during the
rst two periods are compensated for by the earning from monopoly
prots thereafter. This generates a T(q), which holds for two
periods, and then the rm goes to monopoly. I.e. T(q) is the
best you can do for the rst two periods and then you move to
monopoly.
Because the discount rate is so low, a monopoly output deters
entry thereafter (if one would enter, then in the rst period one
would get negative prots, and the discount factor is low enough
so that the future prots do not compensate). At equilibrium for
T(q) an entrant who follows this strategy must earn zero prots.
We note that there is an intermediate range of
2
<
1
, for
which q < q
m
, and hence steady state output is still monopoly
output, but we get there in a dierent way. See paper.
Summary: There is a unique symmetric MPE in which only one
rm produces. If the discount factor is not too low, the rm will
operate above the monopoly output level to deter entry (thus
extending the intuition from both the entry deterrence and the
limit-pricing literatures). Furthermore, as the discount factor
approaches one and future prots are increasingly important,
the entry deterring quantity gets closer to the competitive (zero-
prot) level (contestability; look at the equation for the high
discount case, and note that if (q, 0) is positive its contribution
will go o to innity as 1).
19
Maskin and Tirole III:
More General Primitives
(and an Introduction to Euler Equations).
We now go back to the more general case where there may be
room for more than one rm active at the same point in time and
assume that the prot function need only satisfy the condition
12
< 0. Note that this latter condition ensures that R(q) is
decreasing in q, i.e. that quantities are strategic substitutes.
I.e. the proof of strategic complements did not depend at all on
the xed costs.
What we are after is a more complete characterization of
behavior in a dynamic duopoly. To get such a characterization
we will look to a variant of Euler equation methods that works
in game theoretic contexts. These are necessary condition for
an optimal dynamic programme, that are often used to analyze
single agent dynamic problems. As we will now show they can
also be used for some alternating move games. However they
are no Euler equations for most simultaneous move games (and
we will see why).
When we can get them, they often can be used both as a
basis for both
to characterize equilibrium,
as the basis of an estimation algorithm
In the former context, we should note that sometimes we will
be able to show that the necessary conditions are also sucient,
i.e. there is a unique policy that satises the Euler equations
(and sometimes we will not). In cases where they are sucient
the Euler equations fully characterize the solution to the game.
20
In the context of estimation their usefulness is that they al-
low estimation of the parameters without computing the value
function for each parameter vector to be evaluated (Hansen and
Singleton,1986). They have been used extensively in this con-
text but have several disadvantages;
they require you to have the timing of decisions exactly
correct.
they do not work for simultaneous move games
they have trouble with unobserved sources of heterogeneity
among agents
There are now estimation techniques available for dynamic
games which allow one to estimate parameter vectors without
computing the solution to the dynamic game that do work for
simultaneous move games and are not as sensitive to timing de-
cisions. On the other hand the only way to handle unobserved
heterogeneity is still to calculate the whole value function and
analyze their impacts explicitly. We will come back to estima-
tion at a later point in the course.
The Logic of Euler Equations.
Assume that
q
1
t
k > 0, and R(q
1
t
) k > 0,
where R(q) is the reaction function, which is assumed to be
dierentiable.
What we are assuming is that the control and the reaction
to it is not at a corner of the choice set (correspondence).
This is pretty innocuous in the simple setting we are dealing
21
with, but is of some concern when we think of the typical way
Euler equations are used; i.e., to analyze investment policies (we
frequently do observe zero investment).
Then a family of feasible alternative policies for the rst agent
(one for each value of ) is :
Basis of Euler Equations for an Alternating Move Gain
time t t + 1 t + 2 t + 3
alternative R(q
2
t1
) + = q
1
t
+ q
1
t
+ q
1
t+2
= R[R(q
1
t
)] q
1
t+2
Reaction q
2
t1
R(q
1
t
+ ) = q
2
t+1
(q
1
t
+ ) R(q
1
t
+ ) R(q
1
t+2
)
The increment in value from this perturbation to the optimal
program for agent 1 is
V
1
(q
2
t1
; ) V
1
(q
2
t1
) ()
where
() = (q
1
t
+ , q
2
t1
) (q
1
t,
q
2
t1
)
+
_
q
1
t
+ , R(q
1
t
+ )
_
_
q
1
t
, R(q
1
t
)
_
+
2
_
R(R(q
1
t
), R(q
1
t
+ )
_
_
R(R(q
1
t
)), R(q
1
t
)
_
where the rst line gives us the period t change, the next the
period t+1 change, and the last the period t+2 change.
Note that for all years following t +2 the change is zero. This
is what is useful about Eulers perturbations; i.e. a change
in program generally results in an innite sequence of changes
in outputs. Eulers perturbations brings you back to the same
place after a nite number of periods, so we can choose among
programs by considering the implications of those programs for
just a nite number of periods. Also you should now understand
the need to be away from the boundaries; for this perturbation
22
to be feasible for all {; || , for some > 0}, we need q
1
t
and R(q
1
t
) to not be at boundaries. We will need the feasibility
condition to get the derivative conditions below.
Since the alternative policy is feasible, but not optimal, it
must be the case that () 0 || , and (0) = 0. Conse-
quently, provided () is a dierentiable function of , its deriva-
tive must be zero at = 0. For () to be dierentiable, R()
must be dierentiable. In nite horizon problems we can prove
this by induction. Here what we do is simply assume dieren-
tiability. Assuming dierentiability, and taking the derivative
with respect to , we have
(q
1
t
, q
2
t1
)
q
1
+
(q
1
t
, q
2
t+1
)
q
1
+
_
_
(q
1
t
, q
2
t+1
)
q
2
+
2
(q
1
t+2
, q
2
t+1
)
q
2
_
_
R(q
1
t
)
q
1
= 0.
is our Euler equation. By symmetry, there is an analogous equa-
tion for rm 2.
Note however that the Euler equation contains not only prim-
itives but also the MPE reaction function. This dierentiates
it from the Euler equations for the single agent problems (see
Hansen and Singleton ,1986). It implies that the Euler equation
cannot be used directly in estimation (since we dont know the
function form R() without either
solving the problem (which defeats the purpose of Euler
equations)
Or using a nonparametric estimator of R().
However we can still use the Euler equation to characterize
the solution. Solving for the derivative of the reaction function
we have
23
R
(q
1
t
) =

1
1
(q
1
t
, q
2
t1
) +
1
1
(q
1
t
, q
2
t+1
)
1
2
(q
1
t
, q
2
t
+1
) +
2
1
2
(q
1
t+2
, q
2
t+1
)
.
or more generally:
R
(q) =

1
1
(q, R
1
(q)) +
1
1
(q, R(q))
1
2
(q, R(q)) +
2
1
2
(R(R(q)), R(q))
.
Again, by symmetry there is an analogous expression for rm
2. These equations provide a system of dierence-dierential
equations that the reaction functions need to satisfy. However,
they need not, in general, completely determine those functions.
These equations are not sucient because they are only rst
order conditions
2
.
This form of the reaction function can however be helpful for
several reasons. First any policy has to satisfy this condition,
so we can do some analysis of it. Also one may be able to prove
that there is only one solution to these equations for a partic-
ular functional form (one way to prove this is to show that for
particular functional forms the operator dening the xed point
is a contraction mapping in which case not only do we know
there is a unique solution but we also know how to compute it).
In this case we can compute the reaction function from this rst
order condition, as then the value functions as we generally com-
pute single agent value functions. This because once you have
R(), then the value function is a simple contraction mapping.
Note that this assumes that throughout both rms are pro-
ducing positive quantities, there never is a third rm that wants
2
Put dierently: this is a map from a space of continuous dierential functions into
itself. There may be more than one xed point for this map.
24
to enter, and neither of these rms want to exit (you should
be able to state why this solution falls apart when any of these
conditions are violated). We now go even further and assume
the prot function is quadratic.
Linear-Quadratic MPEs.
Assume:
i
= q
i
(d q
i
q
j
)
where d > 0 is the dierence between the intercept of the de-
mand curve and marginal cost.
Note that then the FOCs are linear and one might expect
the reaction functions to be linear and symmetric; at least if
the reaction is greater than zero. So lets postulate this and see
what happens. That is, say:
R(q) = a bq, a, b > 0
Substitute back into the Euler equation and one will nd that
it is satised provided:
f(b) =
2
b
4
+ 2b
2
(1 + )2b + 1 = 0
and
a =
1 + b
3 b
d.
Look at the equation for b : f
(b) > 0, f(0) > 0, f(

1
2
) =
(
1
2
)(
1
2
1) < 0, f(1) =
2
1 < 0, f(
1
) =
1
2 1 > 0. So the
equation (which determines the slope of the reaction function)
has two real roots. One b
1
(0,
1
2
), and b
2
(1,
1
).
25
Now dene the operator:
R
t
(q) = R[R
t1
(q)] with R
1
(q) = R(q).
R
t
(q) tells us q(t) if current period output is q. It is called the
t
th
iterate of R.
Lemma 1 If R(q) = a bq, then
R
t
(q) = a
t
b
t
q
with
b
t
(1)
t
b
t
, and a
t
a(1 b b
2
... b
t1
).
That is, if the one period reaction function is linear, then so are
all its iterates.
Proof: The proof is by induction. The statement is true by
construction for t = 1. Now assume it is true for t = j. That is
assume
R
j
(q) = a
j
b
j
q with a
j
= a(1b...b
j1
) and b
j
= (1)
j
b
j
.
Then:
R
j+1
(q) = R[R
j
(q)] = a b(a
j
b
j
q)
= a ba
j
+ bb
j
q = a ba(1 b ... b
j1
) + b(1)
j
b
j
q
= a(1 b ... b
j
) (b)(1)(1)
j
b
j
q.
Now we have t-period reactions given by:
R
t
(q) = a(1 b ... b
t1
) (1)
t
b
t
q
26
Graph this reaction function against time for each of the two
roots. The larger root is dynamically unstable it goes through
oscillations of increasing magnitude. It therefore implies nega-
tive q and therefore cannot be a feasible path.
The smaller root goes through oscillations of decreasing mag-
nitude with cobb-web type convergence (because of the nega-
tive root of the reaction function). Note also that at the smaller
root the values of a and b satisfy the second order conditions for
Nash behavior, so they indeed could be an equilibrium. More-
over were only two couples (a, b) that satisfy the functional equa-
tion for R(q) (the other giving an infeasible path). Thus we have
the only equilibrium with linear reaction functions.
Also for the smaller root we have:
lim
t
R
t
(q) =
a
1 b
=
d
3 b
= q
e
Note that
the limit does not depend on the initial q. Very roughly,
when limits, as t does not depend on the intimal
condition we say the model is ergodic.
R(q
e
) = q
e
, as it should for a limit. This is an equation
which provides another way of solving for q
e
(try it and see
that you get to the same value).
Hence we have a limit rule where each agent plays q
e
, i.e. (q
e
,
q
e
), which reproduces itself (it is the only element of the ergodic
class for this solution). Note also that we have shown that there
can only be one linear MPE (there were two possibilities and we
ruled one out).
Theorem 1 For any ,
27
1. a unique linear MPE.
2. The MPE is dynamically stable; i.e. for any initial con-
dition (q
1
t
, q
2
t
), lim
(q
1
t+
, q
2
t+
) = (q
e
, q
e
).
3. if = 0, q
e
=
d
3
, which is the Nash-Cournot solution. If
> 0, q
e
=
d
3b
>
d
3
.
This says that if there are adjustment costs, rms care about
the future ( > 0), and reaction functions are downward sloping,
in equilibrium there is a strategic motive for investing and this
generates a higher output then in the 1-shot game (one way to
think of this is as in the entry deterring literature but this time
with a smooth, rather than a discrete, control). The intuition
is that rms know that if they produce slightly more today the
negatively sloped reaction functions will induce their competi-
tors to choose a slightly lower output tomorrow, and they will
compensate their lower prots today with higher prots tomor-
row.
As for prices, if q
e
=
d
3b
we have:
p
e
= d 2q
e
= d
2d
3 b
=
d(1 b)
3 b
where, recall, we have set marginal cost to zero.
Here are some extra notes on this model.
The above equation in itself does not imply that price is
declining in (though our economic intuition says that this
should be true) because b = b(). They do however prove
the lemma that a() and b() are dierentiable functions
28
of and

()
b() > 0, which proves our intuition. So as
the discount factor grows, output grows, and we get a more
competitive looking industry even in an industry where
entry is restricted.
The value function, given R(a), can be written as:
V
1
(q
2
) = max
q(1)
[(q
1
, q
2
) + (q
1
, R(q
1
))
+
2
(R
2
(q
1
), R
(
q
1
)) +
3
(R
2
(q
1
), R
3
(q
1
) + ....
But since:
(q
1
t
, q
2
t
) = q
1
t
(d q
1
t
q
2
t
)
= (a
t
b
t
q
1
)(d [a
t
b
t
q
1
] [a
t1
b
t1
q
1
])
t
is a quadratic function of q
1
(given q
2
). Since the sum
of quadratics is a quadratic,
V
1
(q
2
) = max
q(1)
_
o
+
1
q
1
+
2
q
1
q
2
+
3
(q
1
)
2
_
This also makes it easy to show uniqueness of a solution
given R(q) is linear (the quadratic has a unique solution
to the f.o.c., and this in turn gives a unique value func-
tion). Note also that it implies that we need to determine
only 3 coecients to determine the value function and the
reaction function. By expressing the reaction function in
terms of these coecients and then substituting back we
can solve for them. This is sometimes called the method of
undetermined coecients.
They add also adjustment costs. Adjustment costs generate
a problem with 2-state variables; the opponents q and my q
last period. Thus policies and the value function must be a
function of both state variables. If everything is quadratic
29
we still get linear reaction functions, but this time they are
a system of interrelated rst order dierence equations:
q
i
t+1
= +
1
q
j
t
+
2
q
i
t1
.
The eect of costs of adjustment is to move the equilibrium
output closer to the Cournot level, for now when I increase
my output in a given period it is more costly for my com-
petitor to change his output, so he moves his output down
less and as a result there is less benet from my action.
Maskin and Tirole II:
MPEs with Price Competition.
Spot market. Total prots are:
(p) = (p c)D(p)
and its division is given by
i
(p
1
t
, p
2
t
) =
(p
i
t
) if p
i
t
< p
j
t
,
(p
i
t
)/2 if p
i
t
= p
j
t
,
0 if p
i
t
> p
j
t
.
Note: This is Bertrand competition in a homogeneous prod-
uct market where we have simply assumed that prots are split
if they both have the same price. Note also that the distinct
discontinuities here may seem unreasonable (from an applied
point of view). If it does, it just says that this is not a model
30
one would want to take to data. That doesnt mean we cant
get some intuition from it.
The equivalent value functions are now:
V
1
( p) = max
p
_
1
(p, p) + W
1
(p)
_
where
W
1
( p) = E
p
_
1
( p, p) + V
1
(p)
_
where R
1
( p) will be the maximizing choice of p, and E
p
is taken
with respect to the distribution of R
2
( p) if it is a mixed strategy
equilibrium. We can write symmetric expressions for rm 2.
Simple Examples of MPE in a Price Space:
Kinked Demand Curves and Edgeworth Cycles.
Kinked demand curves entered the literature papers by R.L.
Hall and C.J. Hitch in the Oxford Economic Papers in 1939,
and by Paul Sweezy in the J.P.E. in 1939. The idea has since
been used by many, including macro economists who are looking
for reasons for sticky prices (though once we allow for dynam-
ics there are many reasons for generating sticky prices). This
Maskin- Tirole article provides a dynamic model which results
in such equilibria.
Simple example. We are still in an alternating move game,
but now the dynamic control is prices and we have the following
demand curve
D(p) = 36 (1 p) and c = 0 (p) = 36 p(1 p).
Also choose p from the set
_
i
6
_
for i = 0, 1, ...6.
31
Note that the monopoly price is p
m
= p(3) =
1
2
. We will verify
verify that the following reaction function generates a symmetric
MPE [here p(i = x) = x/6, etc]:
p(i) R[p(i)]
p(6), p(3)
p(5), p(3)
p(4), p(3)
p(3), p(3)
p(2), p(1)
p(1),
_
p(1) with probability ()
p(3) with probability 1()
p(0), p(3)
where () = 2
5+
5+9
, and the same prot is made no matter
which of the mixed strategies is played.
We have to verify that each R(p) is in fact optimal. It should
be clear that from any p above p(3) we should move to p(3). You
make more prots this year, and the opponent will undercut you
next year for any p > p(3).
Ask if we are both at p(3) ought the player whose turn it is to
play, undercut given that R(p) is the reaction of the competitors.
Say they played 3, then we would stay at 3 forever and we would
32
get
V (3) =
4.5
1
.
If, in contrast, they played 2, then they would take over the
market in the period, earn nothing in the period thereafter (the
other player would play p(1)) and be at V (1) two periods hence
V (play 2|3) = (2) + 0 +
2
V (1) = 8 +
2
V (1)
But
V (1) = ()V (1) + (1 ())[0 + V (3)] V (1) = V (3).
So
V (play2 |3) = 8 +
3
V (3) < V (3) if > .53
Similarly for p < p(3) you can work through algebra that is
similar to the algebra above, and see that the player will actually
do best by choosing R(p).
Note that no matter where we start the market price p
m
and stays there. This is a kinked demand curve equilibrium. If
p = p(3) and rm 1 contemplated a higher price, rm 2 would
not follow. If rm 1 contemplated undercutting, it would in-
crease its prots in the next year but induce a price war; the
opponent would decrease its price thereafter. The discounted
losses due to the price war would outweigh the short-run gains.
Note that at p(1) each rm would like the other rm to move
up; and we get a war of attrition.
Edgeworth Cycles.
Here is another symmetric MPE.
33
p(i), R[p(i)]
p(6), p(4)
p(5), p(4)
p(4), p(3)
p(3), p(2)
p(2), p(1)
p(1), p(0)
p(0),
_
p(0) with probability ()
p(5) with probability 1()
Diagram: Figure 1, Maskin-Tirole II (price over time).
Dynamics. If we start at the top, we observe a price war with
a gradual fall in price until we hit the bottom. At the bottom
there is a war of attrition. In the war of attrition phase each
rm would like the other to raise price so it can just slightly
undercut. We note that there have been a number of recent
articles on gas stations, that show that there are gas station
markets where Edgeworth cycles seem to prevail.
Here are some notes summarizing other aspects of this set of
articles.
34
Perhaps the biggest lesson to be learned is just how much
can happen in even this very simple set up; and that what
can happen depends on particular parameter values. This
suggests several things; i)that we may need more struc-
ture, or more knowledge (of industry specic conditions),
to choose among alternatives, and ii) we want to bring in-
formation on the appropriate parameter values to help us
choose even among qualitative features of the nature of the
solution.
They introduce some terminology in the price setting arti-
cle. A MPE is a kinked demand curve equilibrium if it has
an ergodic class consisting of a single price. It is an Edge-
worth cycle equilibrium if it has an ergodic class that is not
a singleton. Then they show that all symmetric MPE can
be subdivided into two classes. In one the market price con-
verges in nite time to a focal price. In the other, market
price never settles down (the Edgeworth cycle). We come
back to the notion of ergodicity in a somewhat more gen-
eral environment that allows for stochastic outcomes below.
There an ergodic class is a set of points that have the prop-
erty that once you are in the set you will stay in that set
forever with probability one.
Note that the price model can have a multiplicity of sym-
metric equilibria while the quantity competition model
has a unique symmetric (linear) equilibrium. Partly this is
a result of the fact that prices act more like strategic com-
plements in the price model and that model has disconti-
nuities in the prot function. Note that prices need not
be strategic complements even in a dierentiated products
35
model that is Nash in prices, and those models typically do
not have discontinuous prot functions. So we might hope
that the multiplicity problem might not be as important if
we allowed for dierentiation.
Also in this model the combination of the latter fact (strate-
gic complements) and the fact that we commit to prices for
two periods implies that an increase in lead to a less com-
petitive solution. To see this note that if = 0 we get the
classic Bertrand solution with p = c. If is high one might
increase price and lose current prots in the hope that your
competitor will follow suit and you will both make more
money in the next period. The dierence here is again due
to the fact that in one model controls are strategic comple-
ments and the other they are strategic substitutes.
Note also that in the quantity model rms generally like
increases in the length of commitment (if commitment were
innitely long the rst mover could be a monopolist). In
the price model the length of commitment is a negative to
rms because it does not let them react to the competitors
move.
We now move to a much more general setup, one that might
even be molded to approach realism for some industries. On the
other hand the new setup gives up entirely on the hope of pro-
viding analytic answers to questions of interest. So the theory
done with this framework requires an idea of parameter values.
Ideally one would estimate parameters and then numerically an-
alyze what happens at those parameter values.
36

Multiple Agent Dynamics and Markov Perfect Equilibria

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multiple Agent Dynamics and Markov Perfect Equilibria

Uploaded by

Copyright:

Available Formats

IO Class Notes: Multiple Agent Dynamics;

An Intro to Markov Perfect Equilibria

s are chosen to maximize this

f with probability and q = 0 with

is the largest root of:

monotonically increases in the discount factor and de-

increases with (so that current prots decrease) just says

(b) > 0, f(0) > 0, f(

You might also like