You are on page 1of 19

Nash Demand Game & Simulation

Assignment
Nguyen Linh Chi

There are many situations in society that involves bargaining. If Prisoner Dilemma is about the inferior situation
that both parties slide in due to their attempt to take advantage of the others cooperation, then Bargaining game is
a game that society as a whole can always reach the Pareto frontier. The problem hence goes further into the concern
for individual welfare because at the Pareto frontier, the increase of ones benefit necessarily sacrifices that of an other.
Eventually, after learning how to cooperate, it will come to the point of raising the question of surplus division. And
the solution seeking process reveals other interesting aspects of this game about power among bargainers, about the way
they perceive time and money, the way strategies and punishment-reward mechanism of individuals happen to synchronise
without any intended impose from a higher conscious mind yet. And these works support the general belief of micro-fields
that problem solving at individual level can certainly have substantial impact on macro surface by creating spontaneous
harmony and rhythm. The complexity is, however, and indeed ultimately, preserved to be unpredictable.
This assignment thus tries to sketch the literature with overview and general information, then provides some results
from the simulated evolutionary model of automata playing one shot game.

1 Nash Demand Game


a. The game
The simplest version of a bargaining situation is Nash Demand Game. Two players are given $10 (manna from heaven)
to divide. There is not yet any contribution made so far hence no one has official right to claim any proportion of the
prize (or pie). Anyway, both of them will make a demand. If the sum of their demand is not over 10, they get what they
claim. Else, both get nothing.
This hypothetical game can be found in surprisingly real problem. For example it can be reflected in a part of the
situation when human species agrees upon claiming the land, the sea, natural resources.. of the Earth based on their
geographical distribution that just happens to be.

b. The matrix
The simple version of the game is represented in the following matrix, when choices are restricted to only three cases
for simplification reason: demanding a high share (say 8), demanding a fair share (which is 5) or demanding a low share
(necessarily be 2).

High Medium Low


High 00 00 82
Medium 00 55 52
Low 28 25 22
The corresponding payoff is therefore (0,0) if both players claim High, or either High or Medium. If both of them claim
Medium, they will both get 5, etc. Also, this game has other versions in which the share is infinitely divisible or finitely
divisible.

c. The pure NEs


Using best response logic, its easy to spot all the 3 Nash equilibria (NEs) of this symetric game. If one plays High,
its best to respond by playing Low. If one plays Medium, its best to respond by playing Medium, too. The NEs are
highlighted in the following matrix:

1
High Medium Low
High 00 00 82
Medium 00 55 52
Low 28 25 22
In correspond with the numerical matrix, these payoffs can also be dotted on coordination axis as a geometrical visu-
alisation. Following is the payoff space equivalent with the above matrix. All dots are the feasible outcomes and red dots
are the furthest point to the northeast that both players can ever reach.

The unit of the above graph is in monetary outcome. These red dots hence be the Pareto points if that monetary
graph is equivalent to the graph of utility. Which means that after transforming the monetary outcomes into the utility
number, the utility graph saves the ratio (the relative distance) among these points. One suitable assumption of this is
that players are all risk neutral.
Another statement to make before ending this part is that these red dots are the NEs in one shot game. One shot (or
stage) means that it is played only once.

e. The minimax
The set of scattered points above can be considered as the feasible payoff space of this Nash Demand game. According
to that picture, both players clearly can get 0s in the worst case. But in the simultaneous framework using best response
logic, that point will not happen. Given whatever the other plays (High, Medium or Low), one can always get at least
2 by playing Low. Thus 2 is called the minimax payoff. Minimax is a guaranteed payoff and it makes point (0,0) out of
question.

2 Repeated Game
a. The convex hull
Things change when the one shot game is repeated. Each player will get a stream of payoffs over a time horizon. Each
payoff in that stream belongs in the feasible set of the one shot game. As a conveniently traditional practice, this stream
needs to be boiled down to a number. There are more than one way to do that but to be sensical and simple, its better
to let it lie somewhere between the extreme range [0,8] of the one shot game.
As a consequence of repeating the game then compressing the result, the feasible outcome set of the game now expands.
Any points in the convex hull of the initial point set above can be reached given that they both agree to make that happen.
Each of these points corresponds to an agreed strategy profile containing details on who do what over a course of upcoming
time period.
The following picture illustrates the repeated game when the calculation is to make the relative ratio among points
and absolute value of the payoff space to be the same as in the stage game:

2
A convex set is a set that satisfies the following condition: if any two points belong to the set, then the line connecting
them also belongs to the set. The new feasible payoff set of the repeated game is therefore convex in the sense that: if
two extreme points (corresponding to two pure strateties) can be reached, then any convex combination of that pair can
be reached also, by letting the two players alternate the two pure strategies over time. A convex combination is a special
linear combination in which combination factors sum to 1.

b. Two ways to discount


As mentioned above, there are more than one way to process a stream of payoffs in a repeated sequence of stage games.
Following are the most two popular methods. In these methods, linear transformation is used, in order to keep the shape
of the payoff space the same as in the stage game. Further, to keep the payoff space more or less at the same absolute
value, one more step can be made by normalising these numbers.

+ Method 1:
The stage-game payoffs will be lined up in time axis. Each of the values that are in the future will be discounted
back to present value of today, with discount factor . These numbers will be processed further by a simple operation
of summing in order to compress the whole process into a single number at the end. Depending on the infiniteness or
finiteness of the series, some calculus can helps.
Example of a sequence of payoffs when payoff in each stage game is 1:

Round 1 2 3 4 5 ...
Payoff 1 1 1 1 1 ...
Present value 0 1 2 3 4 ...
1
=> Sum of this sequence (Prevent value of the payoff sequence) = 1 + + 2 ... = 1

1
=> Normalisation: 1 (1 ) = 1

Time serves here as an idionsyncratic feature of human mind. The discount rate will make the value of a future that
happens tomorrow matter more than the value in another future that is too far away to comprehend. The notion of time
hence becomes functional, an instrumental and handy way to perceive the happening. Time itself doesnt have an instric
meaning. This method will be mentioned in the coming part of Rubinstein sequential game.

+ Method 2:
If method 1 puts weight on value of today rather than tomorrow, method 2 treats all value as equal by taking simple
arithmetic average of the sequence. The weight for each stage is 1 for all rounds, given that n is the number of rounds.
The symbol in this method inherently changes its meaning, it is now interpreted as probability to continue the game
after the first game playing for sure. (This is an abuse of notation for the convenience, i think). The finiteness of the
repeated game is captured by letting < 1. Its equivalent to method 1 in the sense that it represents the not-sure if
tomorrow will happens. The player thus still needs to reason over the gain of long term and the cost of short term benefit.
Example of a sequence of payoffs when payoff in each stage game is 1:

3
Round Payoff Continue?
(with probability )
1 1 yes
2 1 yes
3 1 yes
4 1 yes
5 1 no
Sum 5
=> Normalisation: 5 / 5 = 1

To treat the infinite series, this method lets the number n of stages run to infinity hence -> 1. In light of any
methods, this is as if the player has infinite patience. Either he treats benefit in far future just like it happens today or
he is sure that tomorrow will always absolutely happens, its equivalent.

+ Normalisation & comments


As in the examples of two methods, the payoff for one repeated game between two players will be bigger corresponding
to the number of rounds they play. Hence the payoff space will be inflated in the repeated game because we are summing
up all the stages. Normalisation can helps to rescale the payoff space back to more or less the same as the stage game.
In infinite game, the normalised payoff space will have the same absolute values as the one in the one shot game. For
method 1, we can normalise the result by multiplying with (1 ). For method 2, we divide the sum by the number of
rounds n.
The slightly difference between two methods, however, lies in finite game at < 1. Using method 2, we still have the
payoff space with absolute values as the one in the one shot game after normalisation, but the payoff space resulted in
method 1 will be indetermine because the normalisation factor (1 ) is not accurate any more. One way to compare
them is just letting them be without normalisation.
Some final words for the comparison which are in favor of method 1, the difference in two methods is also presented
in the way they model mental minds, given that the slowly evolved and for-now limited cognitive ability of human should
be taken into account.

c. The minimax space and Folk Theorem in repeated game


Given that the repeated game is just a repetition of the stage game, the minimax logic applies for each single stage
in the repeated game therefore it works fine with the repeated game. The minimax stream (comprising of the minimax
payoff in each stage game) will also be boiled down to a value. Though the discount factor or continuation probability
may generate different payoff space in absolute values, the shape of the payoff space remains. Hence in the new feasible
space, the minimax strategy to get a guaranteed positive payoff is still playing Low.
This minimax logic therefore defines the individual rationality constraint. Saying in light of the stage game, it means
that no players will play to get the point (0,0) when he can get at least 2 in the worst NEs from his point of view: (2,8).
This individual rationality cuts off the irrelevant part of the feasible space into this:

The Folk Theorem therefore is about the minimax space. It says that after being cut off by individual rationality
(based on best response logic), whatever left of the feasible payoff space can be maintained by repeated game agreements.
A general comment (regarding the introduction part about this minimax space) needs to be made. The minimax space
is the space that is bounded by the pure NEs and the minimax point. In this game, the pure NEs constitute the Pareto
optimal frontier and these efficient outcome can be reached naturally. Plus, the minimax point is not among the pure

4
NEs so the efficient points can always be reached. This explains the fundamental difference between bargaining game
(Nash Demand Game - NDG) and cooperation game (Prisoner Dilemma Game - PDG). In a PD, the minimax is the NE
and the Pareto point lies outside the best response logic in a simulataneous framework. Hence the problem is to set up
cooperative mechanism to harvest the surplus for society as a whole. But the problem here is that there are more than
one NE in the one shot game and so many more NEs in the repeated game, considering the Pareto frontier alone. Each
NE represents a different share ratio for individuals. And the solution for this problem must answer the question of which
point to choose and how come to that.

3 Nash Bargaining solution


Up to this point, the problem is about equilirium selection enveloped in the simple Nash Demand Game. Nash himself
develops a general and normative solution for the problem.

a. The problem setup


Consider a payoff space P which is convex and compact (closed and bounded). This is to make sure that any convex
combination between two point in the feasible space is also feasible. The second ingredient is a minimax point d (a
disagreement point or a status quo).

b. A list of criteria that the solution needs to satisfy


+ Criterion 1: Utility comparability
As mentioned in the introduction, at the Pareto frontier, the increase in ones utility necessarily be the sacrifice of an
other. So the ability to compare among utility is essential and cannot be avoided any longer. Utility comparison, however,
is not provided in the framework of ordinal utility. Some steps need to be done to make them comparable.
As in the following Binmore graph, Quarter III is about how the money will be divided. Map each amount of money
for person 1 into Quarter II to have player 1s utility curve. In the best case, we have a well behaved curve. Do the
same to have utility curve of person 2 in Quarter IV. Automatically, the minimum utility for one is when he receives $0,
and the maximum utility for one is when he gets it all, say $10. It hasnt mattered how much is the maximum utility of
person 1 or person 2. It hasnt mattered about the absolute value of the utility because utility is treated in relative ratio.
The assumption here is that the absolute value ranges of min and max utility of both players are set (or normalised) to
be equivalent. The curvature of the utility curve and the maximum values of the utiliy can be well different. It hasnt
mattered yet.
One more step is to map each point of these two curves (corresponding to one point on the monetary division) into
Quarter I to have a combined curve of utility. In the best case we have a well defined curve. And this curve, without any
intended impose yet, represents the exchange rate between two utility. It says how much of ones utility will be decreased
in favor of the increase of the other. A convenient result of this translating process is that the disagreement points of both
players coincide (conventionally at origin).

5
Note that any point like the green point in Quarter III will be mapped to a point on the Pareto frontier in Quarter
I. Dividing the money efficiently always is the Pareto solution. The blue point, however, is clearly not a Pareto solution,
because from that point, both players can get more without harming someone else.
One consequence of making utility comparable is that the solution will not change if one utility is transformed under
affine transformation. (The ratio is presereved). Another consequence of this process is that if we go further to equate the
absolute value of utility of both players (symmetric utility function), then the payoff space is symmetric and the solution
lies on the bisector (this will be illustrated in the solution part).

+ Criterion 2: Independence from irrelevant alternatives


This criterion relaxes the well-behaved requirement of all the curves in the graph. As long as the set of feasible
outcomes contains the solution and the disagreement point, other part of the set can be cut off.

c. The solution
In general, the solution will be the point that maximizes the product of the projections of the distance between that
point u to the disagreement point d on two axises:

max (u1 - d)*(u2 - d)

6
One consequence of describing the solution this way is the Pareto optimality. When both utility can be increased
without harming anyones utility, the product will increase. Another consequence is that it implies individual rationality.
When one distance is negative, the product is negative.

In order to solve the optimization, the first step is to conventionally set d to 0:


max u1 *u2 subject to the contrainst g(u1 , u2 ) which is the function that describes the technical transformation curve
between utility.

After solving for the Lagrangian function, the solution will be at the point that satisfies the following condition:
u2 g 0 (u1 )
the ratio u1 = the inverse ratio between their rate of change g 0 (u2 ) .

The ratio of the rate of change here inherently has the meaning of how much utility one has to give up for the increase
in the other. Hence it represents the need of individual. Who needs more will get more share of the manna from heaven.

+ Now if the frontier curve of payoff space P is a quarter of a circle (reference point at origin), we have:
g(u1 , u2 ) = u21 + u22 k with k is the radius of the circle

The Lagrangian condition becomes:


u2 g 0 (u1 ) 2u1 u1
u1 = g 0 (u2 ) = 2u2 = u2

-> u21 = u22 -> u1 = u2 > 0

This mean that the solution lies on the bisector (as in the following picture)

-> Apply this solution to the Nash demand game:


In the Nash demand game, the boundary curve is a line but the feasible payoff space is still convex -> the equilibrium
to be selected is (5,5). A small check can be carried by 5*5 = 25. Its larger than the product 2*8=16. It is indeed the
largest product given that the sum of two share is 10. This solution works complete fine with infinitely divisable problem

7
because Nash Product Solution is defined for infinitely divisible prize/pie.

4 Sequentiality in Rubenstein game


a. Adjust the assumptions
Nash product solution is for the simultaneous game. Rubinstein makes it to be sequential. Hence the game setting
changes a bit. Two players A and B have a pie of 1 to divide. A proposes first, if B rejects, the first round ends. They
wait for the next day that B has the right to propose and so on. The difference is that players change roles when the
turn changes but the one who proposes in the first day will propose in the last day. The prize is still infinitely divisible.
However, waiting incurs a psychological cost that make the pie seem to get smaller when each day passes by. Rubinstein
uses in [0,1] as the cost of waiting, and in easy case, it is the same for both players.

b. Backward induction
The general line of reasoning in this game will be that when its ones turn to be proposer, he will offer a part that
make the follower indifferent between accepting today and rejecting to wait for tomorrow. In fact he can offer  to break
the tie but for a simple calculation, it can be set to be indifferent.
This game is perfect information so if we suppose a finite number of rounds, A can put himself at the end the game
and work backward. This is the last day when A offers B a little  0:

The following table hence list out the value that each player can get in a sequence of time order. For example, at
the last day T, A gets 1 and B get 0 so in the day before the last day T-1, B will offer for A because of the following
reason: if A rejects B to wait for the day T, A will get the whole pie in day T, but the whole pie in day T only worth
left (psychologically). So A is indifferent between getting in day T-1 and the whole pie 1 in day T. The reasoning goes on.

proposer -> A B A B
A 1 1 + 2 2 + 3 ...
B 0 1- (1 ) 1 + 2 3 ...
It can stop after a finite number of rounds but if the number of days goes to infinity, we use algebra formula for infinite
series:

1 1
uA = 1 + 2 ... = ()0 + ()1 + ()2 + ... = 1() = 1+
1
uB = 1 uA = 1 1+ = 1+

c. The result interpretation


In one extreme case when = 0, it means 10$ (or the whole pie) of tomorrow means nothing and it doesnt worth
waiting. Because the proposer has the bargaining power, he almost gets everything. He can offer a litte  to break the
tie though. Its like in the Stackelberg equilibrium, when the powerful one spares just enough for the follower to keep the
follower continuing to play the game. In an other case when = 1, they both are infinitely patient, 10$ tomorrow is just
like 10$ today -> both are ready to reject to wait for their turn to propose tomorrow. With that reasoning, fair division
can be reached immediately. They will divide the prize 50:50, as Nash expected in his solution.

8
In the between, when 0< < 1, uB < uA . It means that the first proposer has the advantage and the bargaining
power helps him get more than the follower.

5 Sequentiality in evolutionary Binmore game


a. MESS condition - modified evolutionary stable strategy
This condition is to solve the Rubinstein sequential game without using backward induction. It manages to derive
the solution by evolution hence avoids the complicated mental calculation of backward induction logic (which bases the
subgame perfect NEs in the previous section).

A strategy (demand) x is MESS if at any point in the game, if one is called up, he offers x and the offer is accepted
immediately (no one wants to wait for his next turn to offer). The psychological cost is also kept.

b. The process
As in the following picture, suppose that the blue point is the point that p1 offers to divide the pie.

What p1 gets is x and p2 will get 1-x if he accepts. But what if he rejects and waits for tomorrow so that its his turn
to offer? Lets say p2 rejects to offer the green point:

P2 still offers x because x is the optimal offer at any round. But it will cost him because in tomorrow, the pie will be
smaller with discount rate . So p2 will get x and spares p1 a proportion of (1 x). The first condition for x to be a
MESS ends here.

For the second condition that the offer is accepted immediately, its necessary that what p1 offers p2 today should be
at least as good as what p2 will ask tomorrow. And vice versa. Hence we have this system of requirement:
x (1 x)
1-x x

9
1
which ends up to be: 1+ x 1+

c. The result interpretation


After relaxing almost all strict requirements, Binmore derives the result of Rubinstein as a special case of Binmores
solution. Rubinsteins result is in the extreme of Binmores general solution: when = 1, we have 12 x 21 -> x = 1/2.
1
More general, any x that lies in [ 1+ , 1+ ] can be evolutionarily stable in a modified sense. There are extension to this
model by letting the psychological cost of both players to be different. In a broad sense, it comes to the conclusion that
more patient brings more bargaining power.

6 Evolution of simultaneous game in 2 populations


This section considers another approach to the simultaneous NDG with some modification.
a. The set up
Suppose that there are 2 populations. Each time we pick one out of each population to play the simultaneous NDG.
The prize is 1 and still infinitely divisible. Each player will observe the probability distribution of the other population in
the last period and use a best response to that distribution. Mutation is introduced by adding a little  chance of acting
non-best response instead of acting best response.

b. The process
For example the norm for population A is now claiming x and for B is claiming 1-x. But there is  A wants to claim
x+ , and B wants to claim 1-x+ . These  and will get in the game and change belief of the other about the current
situation and will change their behavior next time.
As a consequence, the game has chance to visit all kinds of equilibria in the Pareto optimal frontier (which are also
the NEs set) over time. But among that dynamics of norms, there are norms that will prevail longer.

c. The solution
Let say is the force that drives the norm x up to x+.
is the force that will drive down x to x-
The x that has (x) = (x) is the norm that will persist.

Consider picking A out of population A, he will observe that in the last period, there is B claiming 1-x+. Hence A
makes calculation about two options for this time:
- If A claims x-, he gets for sure x- no matter who he is dealing with.
- If A claims x:
+ with probability, he meets the odds and gets 0.
+ with 1-, he meets the normal B that claims 1-x and A will get x.
Using expected utility of Von Neuman, A will claim x if: (1 - ).u(x) u(x-)
From this we derive the at the equality = u(x)u(x)
u(x)
Applying the same reasoning for B, the norm is to claimn 1-x but B observes A claimed x+ in the previous period,
so he will claim 1-x for this period if (1).u(1 x) u(1 x )
Hence the turning point of is: = v(1x)v(1x)
v(1x)
The x that will persist happens when equating and:
v(1x)v(1x)
v(1x) = u(x)u(x)
u(x)
in approximation, it says that x persists if the rate of change of population B is equal to the rate of change in population
A.
Given that these functions are defined over the same domain of x in [0,1], it says that: the rate of change at (1-x)
equates the rate of change at x. With symmetric condition imposed on two populations: same aggressiveness, same
adventurous attitude.., these best-response based functions will cross at the middle point of 50:50.

10
And the 50:50 institution is the persist.

7 Replicator Dynamic - one shot game


Player 1 sets the probability of player 2 to play High, Medium, Low to be x, y, z, and player 1 will react accordingly.
First, this is the dynamic surface without any imposed constraint: its a plane described by function: x+y+z=1

Given x, y, z as the probability that player 2 will play High, Medium and Low, player 1 can calculate his expected
utility over playing High, Medium and Low as in the following table:

x y z EU of player 1
High Medium Low
High 0 0 8 EU(h)=8z
Medium 0 5 5 EU(m) = 5y+5z
Low 2 2 2 EU(l) = 2x+2y+2z
The expected payoff of playing High is higher than the expected payoff of playing Medium when:
EU(h) > EU(m) => 8z > 5y + 5z => z > 53 y

The plane z = 53 y is drawn in pink in the following picture. And the corresponding area of EU(h) > EU(m) is pointed
out in the right picture:

11
EU(h) > EU(m) is the region above the plane. It means that when there are enough number of Low in the population:
z > 53 y, the best response will point to play High to gain advantage of people playing Low -> High can easily get in this
population.
Accordingly, the expected payoff of playing High will be higher than the expected payoff of playing Low when:
EU(h)>EU(l) => 3z > x+y. The plane 3z = x+y is plotted as the green plane in the following picture:

Two areas separated by the green plane has opposite sign when we compare between EU(h) and EU(l). It means that
EU(h)>EU(l) when the number of playing Low reach some point above the portion of x & y (playing High and Medium).
Combining both conditions, the force to play High is strongest in area H (the pink area). Its where strategy High gets in
the easiest:

Now we add the last inequality: EU(m)>EU(l) => 2x > 3y+3z. The plane 2x=2y+3z is plotted in blue color as in
the following picture. Two areas separated by the blue plane has opposite sign when we compare EU(m) and EU(l).

12
With these parallel and mixed forces, we have the area M (green area) is where the urge to play Medium is dominant,
and area L (blue area) is where the urge to play Low is dominant.

Putting these areas together, the dynamic map looks like this:

So we see that each point in this dynamic corresponds to a distribution of 3 pure strategies. If the population starts
will all playing Low, (point z = 1), it will be dragged toward playing High (point y = 1) rather fast. If we start at point
y = 1, then it will be dragged to point z=1 with slower rate. In between, there is a mixed strategy equilibrium. However,
with some mutation, the population will get in the green region and it will be dragged to playing Fair (point x=1) pretty
fast. And playing Fair is another equilibrium. In the evolution of this game, equilibrium of playing Fair will be more
stable than the mixed equilibrium because it needs a lot mutation to get out of the green region but it just needs very
few mutation from the point of mixed strategy to enter the basin of attraction of the Fair equilibrium.

8 Simulation
a. Automata
Simulation is set up as one population of 100 automata. They are randomly matched. Each automaton can carry states
of different strategy. And they are prepared for repeated game of any length. For example: one automaton can choose

13
to play High forever no matter what the other plays, but there are automata that will switch states pretty fast based on
what the other plays in the previous round.
Example of an automaton that starts playing High and will keeps playing High whatever the other plays is in the
following picture:

Accordingly, here are the ones that plays Medium and Low all the time:

Using the color red, green, blue for strategy High, Medium and Low, here is one automaton that starts playing Medium
but continues by best respond the others previous strategy:

Note that all green arrows point to the green state. It means that if the other plays Medium in the previous round, it
jumps to play Medium. But if the other plays High, it keeps playing Low and vice versa.

b. Repeated game
This is example of All-High automaton (playing High all the time) versus All-Medium automaton in infinite rounds:

The red dots are the strategy that All-High plays across rounds. It means that this automaton always plays red (High)
strategy. Accordingly, green dots say that All-Medium keeps playing Medium no matter what happens in the previous
round. The payoff stream associated with this match is in the following table:

Round 1 2 3 4 5 ...
All-High 0 0 0 0 0 ...
All-Medium 0 0 0 0 0 ...
No matter what method is used to boil down these stream of payoffs, the result is that they both get 0.
This is another example match of Accommodator versus All-Low:

14
In the first round, Accommodator starts with Medium and All-Low starts with Low. But in the second row, Accom-
modator immediately best responds to All-Low by playing High and the situation keeps going on. The payoff stream
associated with this match is in the following table:

Round 1 2 3 4 5 ...
Accommodator 5 8 8 8 8 ...
All-Low 2 2 2 2 2 ...
Using the method 1 with = .5 for instance, we can calculated their final payoffs as follows:

Sum
Accommodator 5 8 8 8 8 ...
1 8
PV 5 8 8 2 8 3 8 4 ... 5 + 8(1 + + 2 + ...) = 5 + 8 1 =5+ 10.5 = 21
All-Low 2 2 2 2 2 ...
2 2
PV 2 2 2 2 2 3 2 4 ... 2(1 + + 2 + ...) = 1 = 10.5 =4
Hence in this infinitely repeated game, Accommodator gets 21 and All-Low gets 4.

c. Regeneration process
After playing the repeated game with discounted factor in [0,1], the fitness of each type of automata will be measured
and used for the regeneration process at the end of each day (or cycle).
For example, one population of 12 comprises of 7 Accomodators and 3 All-Lows. They are randomly matched as follows:

population associated payoff


A-L A-L 21-4 21-4
L-A A-A -> 4-21 10-10 -> Total payoff: 123
A-A L-L 10-10 4-4
In this example, there are only 2 types of automata: Accommodator and All-Low. We sum up payoff of all Acco-
modators: 21*3 + 10*4 = 63 + 40 = 103. So the share of Accomodator payoff in total payoff is 103/123. Accordingly,
the share of All-Low in the total payoff is: 20/123. These ratios of 103/123 and 20/103 are considered to be the fitness
of Accommodator and All-Low, respectively. Then at the end of each cycle, basing on this fitness information, we can
regenerate the population.
The regeneration process happens at speed of in [0,1]. Lets = 0.03, it means that only 3% of the population
will change. As in the above example, they will change to be Accommodator or All-Low with probability of 103/123 and
20/103.
is needed not as a convenient degree of freedom but its to slow down the rate of learning in the population so that
some mutation can have time to grow under the strong pressure of the incumbent strategy.

d. Mutation
Mutation for now is modeled as follows. There are four types of mutations: Nature can change the strategy in one state of
the automaton, or it can add one more state, or it can delete one state, or it can change the trajectory among the states.
At each cycle, one automaton is allowed to mutate one type at probability .

Lets start with automaton All-High:

15
It can go through 2 mutation of adding states as follows:

Then it can change one trajectory, change strategy in one state or delete a state:

e. The cycle
In general, each simulation run can have many cycles. One cycle will go as follows:

16
f. Result
Setting = 0 (which means all pairs play one shot game in each cycle). If we start the population with automata playing
all Low, their initial payoff is 2 but some mutation happens and the average payoff of the population gets up rather quickly.
After 200 days, it starts to have automata that play High in the population. After 400 days, they get to 4.5 in average
(quite close to the fair-fair point). At this poitn, the number of playing Fair strategy in the population has risen a lot
higher. They also have efficient punishment mechanism for playing High. For example: there are automata that starts
with Fair but will jump to High if the other plays High and it only jumps back to play Fair if the other play Fair. It keeps
playing High if the other plays High or Low.
Another automaton starts with High and only jumps to Fair if the other plays High, then it only jumps back to High
if the other plays Low. Another example, one automaton starts with Fair and only jumps to High if the other plays High,
it does not even jump to High if the other plays Low. But once it jumps to High it will stay to play High and only jump
back to Fair if the other play Low. Note that these automata dont play Low that much. At round 600, they get to 5 in
average. Now playing Fair becomes dominant in the population. Then it fluctuates to approximately Fair but lower than
Fair point. Its the effect of mutation.
An example of this simulation is as follows:

17
If we start with a population playing all High, the average payoff mean is initially 0 and it rises to 1.8, much slower
than in the previous case though. At this point, in the population there are a mixture of playing Medium and Low. At
round 700, it rises pretty fast to 5. Most of the automata now playing Fair at round 900. If we start with a population
all playing High, they stay in low payoff for very long time because there are too many High in the population and when
all the High automata pair up, they pull down the payoff mean quite hard.
An example of this simulation is as follows:

There can be simulation that the phase of population being at 2 can be very long.
If we start with a population of all Medium, then the average payoff stays in around 4.2, 4.9 up to 5. There are Fair
and Low citizens in the population and some time they paves the way for High strategy but High and Low strategy are
all kept at very low level. These small deviation is just the mutation effect.
Here is an example simulation:

18
In general, the evolutionary simulation aligns with the replicator dynamic in one shot game (developed in the previous
part).

Things get much more complicated when automata playing repeated game (with positive discount rate). They can
develop quite complex strategy plan and the payoff space when they match changes the shape accordingly. Hence the
evolution of the repeated game becomes more complicated at exponential rate because they may have different payoff
spaces due to different value of discount rate and different length of the repeated game.

19

You might also like