An Analysis of Game Theory

Don’t Hate the Player, Hate the Game: An Analysis of Game Theory
Game theory is a field of economics that concerns the mathematical aspect of decision
making and strategic interaction. The Stanford Encyclopedia of Philosophy defines game theory
as the study of the ways in which interacting choices of economic agents produce outcomes with
respect to the preferences (or utilities) of those agents. Game theory focuses on the analysis of
games of strategy, in contrast with games of chance, like UNO, or games of skill, like basketball.
What differentiates games of strategy from the latter two is the fact that each player’s actions
affect other players’ decisions. In a strategic interaction, one must consider both what their
opponents have done in the past and will be expected to do in the future. This notion is what
gave birth to the field of analysis known as game theory. Game theory was vaguely studied in
past centuries before being formally defined and introduced as a discipline in the 1900’s by the
mathematician John Von Neumann and economist Oskar Morgenstern. It has been applied to a
variety of topics, including sports, card games, and historical battles. In this paper, I intend to
analyze the successes and failures of game theory in modelling human decision making, explore
several interesting insights and lessons that the discipline provides, and examine alternative
perspectives proposed by social scientists.
Introduction
A game is defined as having two or more players, a set of strategies for each player, and a
set of outcomes based on the players’ payoffs, or utilities. A game can be visualized in two
different ways. A matrix shows each outcome as an intersection of the players’ choices; it is used
in normal form games where both players make decisions simultaneously, such as Rock-Paper-
Scissors.
1/2 Rock Paper Scissors

Rock (0,0) (0,1) (1,0)
Paper (1,0) (0,0) (0,1)
Scissors (1,0) (1,0) (0,0)
In contrast, an extensive form game in which decisions are taken sequentially uses a
game tree, where each ending node represents an outcome. For example, in a game of poker, the
players take turns to choose their action, and each player’s choice is based on what previous
players have chosen.
(Artificial intelligence, Poker, and Regret)
Rationality and Reputation

Conventional economics assumes that humans act rationally when making choices; in
other words, humans follow “rational self-interest”, always making choices in a way that
maximizes their own utility. Economists gave the name homo economicus to describe a
hypothetical person that makes decisions with perfect rationality; this type of person has
“complete information about the options available for choice, perfect foresight of the
consequences from choosing those options, and the wherewithal to solve an optimization
problem that identifies an option which maximizes the agent’s personal utility” (Stanford
Encyclopedia of Philosophy). This stipulation of rationality is the basis for countless theories
about how choices are made and how economic systems function. Prominent economist Thomas
Sowell wrote that people “respond rationally to the incentives and constraints of the system in
which they work. Under any economic or political system, people can make their choices only
among the alternatives actually available” (121). However, in recent years a rival discipline,
known as behavioral economics, has been breaking ground among academics. Behavioral
economists, such as Dan Ariely and Daniel Kahneman, combine psychology and economics to
suggest that humans are irrational, and their decision making is frequently impaired by external
and internal biases. Classical game theory agrees with orthodox economics, in that it stipulates
that players in strategic actions will always behave rationally. Because rational behavior is the
end goal -- that is, we want to get rid of biases and make decisions optimally--, game theoretical
analysis works well as a model for every day choices. However, biases and psychological
heuristics that are a part of human nature must be factored into this decision making model.
Nonetheless, actual choices made by humans frequently differ to those that rational
agents would make in theory. After all, most humans are not constantly taking account of the
order of their preferences and optimizing to make the best decisions like a homo economicus
would. Thomas Schelling, an economist who utilized game theory to analyze deterrence during
the Cold War, argued that while “the premise of ‘rational behavior’ is a potent one for the
production of theory”, the resulting theory might not adequately explain actual behavior:
“If we confine our study to the theory of strategy, we seriously restrict ourselves by the
assumption of rational behavior -- not just of intelligent behavior motivated by a conscious
calculation of advantages, a calculation that in turn is based on an explicit and internally
consistent value system. We thus limit the applicability of any results we reach.” (4)
Schelling took this idea one step further with the rationality paradox, which stipulates that
(1) irrationality can be a strategic advantage and (2) rationality can be a strategic disadvantage.
This paradox rests on the basis that a player’s reputation determines how credible he is and how
opponents will respond to threats he makes. A parent who has punished a child adequately in the
past will not need to shout or threaten the child to the same extent in the future, since the child
understands the consequences for bad behavior. A mob leader who has brutally killed off
informants can worry less about potential traitors in the organization, as his reputation warns
them against betrayal. The paradox of rationality lies in the fact that certain irrational traits -- an
inconsistent value system, a frequent randomization of actions, an inability to communicate --
can change the opponents’ actions in favorable ways. For example, asylum inmates that
deliberately cultivate irrational behavior, such as threatening to harm themselves or acting as if
they are unable to speak or understand others, develop reputations that make them immune to
threats. On the other side of the spectrum, attributes belonging to a rational agent -- the ability to
communicate effectively, a uniform decision-making system, sound judgement -- can at times
become a weakness and should accordingly be suspended. Schelling gives the example of a man
threatened with extortion breaking his own hand to suspend his ability to sign checks. Likewise,
a woman kidnapped and forced to call her family for ransom money is put at a disadvantage by
indicating to the kidnappers that she has sound judgement and rational will to live.
A great example of the benefits of an “irrational” reputation is the chain store paradox.
The paradox involves a monopolist who controls 20 different markets around the world. A
competitor arrives at each market, seeking to break the monopolist’s control. One by one, the 20
competitors choose whether to stay out of the market or enter. After each competitor makes a
choice, the monopolist then decides whether to cooperate and let the competitor gain some
market share, or to be aggressive and fight to drive the competitor out. A fight will involve the
monopolist adopting the costly action of predatory pricing, hurting all firms in the market.
Theoretical payoffs of the game are shown below:
Monopolist/Competitor Stay Out Enter
Cooperate (5,1) (2,2)
Fight (0,0)
If all players played rationally, all 20 competitors would enter their respective markets.
This is deduced using backward induction, the process of starting at the end of the game and
moving to the beginning. If the monopolist fights, it is not only to keep control of a market, but
also to scare off successive competitors. At the end of the game, after the monopolist has dealt
with the first nineteen markets, he has no reason to fight with the twentieth competitor because
there are no more competitors to scare off. Because he knows he will avoid a fight with the
twentieth competitor, he has no reason to fight with the nineteenth competitor, and because he
knows he will avoid a fight with the nineteenth competitor, he has no reason to fight the
eighteenth competitor. Using this line of reasoning and working all the way to the beginning of
the game, the monopolist would always choose to cooperate, allowing all twenty competitors a
piece of the market.
The Chain Store paradox has several major implications. Firstly, reputation has strategic
value. Reinhard Selten proposed that if the monopolist differs from the equilibrium strategy and
instead fights early competitors, the competitors that follow will be much less likely to enter the
market. An established reputation for aggression increases the credibility of the threat of
predatory pricing. Furthermore, rationality can be a great disadvantage. If the monopolist
follows the purely rational strategy prescribed by backward induction, he only gets a payoff of
40. Yet, if he behaves extremely aggressively or irrationally towards the first several
competitors, he will be successful in deterring competitors. It does not matter whether the
monopolist is actually irrational or not, only that the monopolist appears irrational to the other
players. A player that changes the way he or she is perceived by other players is essentially
sending a strategic message, influencing the behavior of opponents. Stanford professors Paul
Milgrom and John Roberts have argued that if the game is one of asymmetric information, where
competitors are unsure of the monopolist’s payoffs, motivations, and strategy, predatory pricing
would be a rational response to a rival firm entering the market. In their view, it is the “lack of
complete information that gives rise to reputation possibilities.” When competitors are in doubt
and must rely on the results of previous interactions to base their information on, predation will
emerge as an equilibrium strategy. Generalizing their conclusions, Milgrom and Roberts explain
that “in any situation where individuals are unsure about one another’s options or motivation and
where they deal with each other repeatedly in related circumstances, we would expect to see
reputations develop” (303).

Political scientist James E. Alt and his colleagues constructed a model in which a
hegemon -- a global power -- benefits by incurring costs to establish a reputation for coercion
early on in the game rather than cooperate with 2 smaller allies. The model involves uncertainty
and asymmetric information, as the allies are unaware of whether each instance of coercion will
be costly or cheap to the hegemon. Whether or not the hegemon punishes the first ally will
influence the second ally’s decision in either challenging the hegemon or obeying. Alt provides a
real life example of his model with the example of the 1980’s oil glut, in which Saudi Arabia
pursued overproduction of oil in response to the 6 year decline of oil prices and frequent
violations of OPEC’s quota system by other countries. Because of Saudi Arabia’s low costs of
production allow it to withstand periods of price decline, the country was able to establish a
reputation for toughness and deter other oil producers, especially non-OPEC nations. Even
recently, in 2020, Saudi Arabia was able to win a two month price war with Russia because of its
low costs of coercion. Thus, cooperation is not always beneficial; if it is not costly to be
aggressive, such aggression can pay off in the long run.
Ultimately, we can adjust the meaning of “rationality” to argue that humans actually do
behave rationally. Examples in this paper show that rational behavior can change drastically
when going from a simultaneous game to a sequential game or when going from a single shot
game to a repeated game. Behavioral economists mostly focus on cognitive issues, such as
humans’ inability to understand statistics (i.e. Bayes’ rule) and inconsistent decision making
when problems are presented in different ways. These issues mostly deal with the individual and
his or her perspective of the world. Yet, when it comes to strategic interaction and the study of
two or more agents, it could be said that human beings act in an optimal way. For one thing,
humans are able to learn from their mistakes and alter strategies in a way that converges to
equilibrium. Experimental economists have proven this concept with their studies of repeated
games. In other situations, such as sports, players intuitively know the probabilities required to
achieve the optimal results. Moreover, a large aspect of strategy involves communication:
accurately predicting others’ actions and expectations, while getting one’s own intentions across
when necessary. On the whole, human beings show skill in making these predictions. In one
study, Schelling asked participants questions that required mutual coordination, such as:
● Name “heads” or “tails”. If you and your partner choose the same option, you win a
prize.
● Write a positive number. If you all write the same number, you win.
For the first question, 6/7 of participants chose heads. For the second, 40% of participants agreed
on the number 1. If that percentage seems a bit small, just think about how many positive
numbers there are in total. It is amazing that even 1% of participants could coordinate their
responses, let alone 40%. Humans’ incredible ability to coordinate without explicit
communication has been shown not only in experiments, but throughout history. Following
World War 1, where poison gas injured about 500,000 soldiers, the combatants of World War 2
had an unspoken agreement not to use any chemical weapons. In China, the 2 warring
governments, the Communists and the Nationalists, had no official armistice or peace treaty but
still agreed that the Formosa Strait was the official border between their jurisdictions. Even in the
school, when the teacher tells students to form pairs, one can see pairs of kids immediately look
at each other knowing that each of them wants the other to be their partner. People’s uncanny
ability to coordinate expectations can be seen everywhere, from the classroom to the battlefield.
It is important to distinguish between zero-sum games and non zero-sum games in terms
of optimal communication. In zero-sum games, where one player’s gain is another’s loss, there is
no intention to communicate. In fact, players must do the opposite and hide their strategies; one
way of doing this is through randomization, or mixed strategies. A randomized strategy is
“dramatically anti communicative” (Schelling 105). If a soccer player flips a coin to determine
where he shoots right or left during a penalty kick, the goalie will never be able to accurately
predict the direction. By contrast, in mixed-motive games or pure coordination games, players
have an incentive to communicate, either explicitly or tacitly if open communication is
unavailable. Schelling gives a great example of this concept. He explains that “in chess [a zero
sum game] it does not matter what the players know about each other or whether they speak the
same language or have a common culture; nor does it matter who played the game previously or
how it came out” (106). Yet, we can change the game of chess to simulate real life war, where
each player has a unique value system and all players seek to avoid mutual destruction. In this
new version of chess, players are rewarded for three different things: the pieces they capture, the
pieces they are left with at the game’s end, and the squares these pieces occupy. Chess 2.0 forces
the players into negotiation, as each is unsure of which squares are valuable to the opponent, but
both want to minimize the capture of pieces. In any conflict like Chess 2.0, when players must
coordinate but communication is limited, they must rely on other ways to form an idea of their
opponent’s intentions: patterns of behavior, traditions, historical precedents, etc. This is why
setting precedents is so important for a new leader or nation state. By setting a precedent or
following a tradition, an agent is sending a message to the other players about his strategy and in
turn influencing their decisions.
Reputation is a way for agents to communicate their intentions and influence other
players. Yet is it better to cultivate a reputation of unpredictability and irrationality, like that of
the Chain Store paradox, or one of consistency, based on previous experiences? In the 1970’s,
Richard Nixon proposed the “madman theory” of foreign policy; Nixon believed that if
communist nations thought the American president was impulsive and irrational, they would
show reluctance in upsetting the United States. Nixon used this strategy to try to bring an end to
the Vietnam War, while many global experts saw Trump’s policy against North Korea as another
example of madman theory (Coll). While this theory may seem constructive, it has been
criticized by both political scientists and economists. When a leader behaves like a madman, he
creates two major problems. Firstly, like in the case of Nixon and Brezhnev, the leader’s
adversaries might not understand what intention he is trying to communicate. Secondly, the
madman will not influence his opponents’ behavior, as any strategic move he makes will be
perceived as incredible by other nations. Sandeep Baliga, a professor at Northwestern University,
explains this complication well: “If you are a truly mad leader, why would anyone change their
behavior as a function of what you do? If they know you might do something crazy whether they
do something you like or not, they might just say ‘the hell with it, I’ll do whatever I want.’ The
‘madman’ actually has to be clever, doing something crazy if you don’t do what he wants and
being accommodating if you do. In that case, well, he’s no longer mad” (Calvert). The world of
diplomacy is not a zero-sum game; communication is key. While a monopolist may find it
beneficial to be a “madman”, a global leader must cultivate a reputation which best achieves
communication: a reputation driven by predictability.
Utility
Within economics, utility has a special meaning. It is a measure of satisfaction that a
person gets from a good or service. Within game theory, utility is used to determine the value of
outcomes. An ordinal utility function orders outcomes based on their pure payoff; an outcome
whose utility is 3 is valued thrice as much as an outcome with a utility of 1. By contrast, a

cardinal utility function places outcomes in order of the player’s preference. For example, an
outcome of utility 2 is not worth twice as much as an outcome of utility 1, but rather the player
simply prefers it more.
Utility plays a big part in decisions such as lotteries, where there is high risk and high
uncertainty. Economists consider the weighted average value of all possible outcomes as the
❑
“expected value”. The mathematical definition is∑ ❑xiP(xi) : where x is the value of the
i
outcome and P is the probability of an outcome. A gamble in which a person has a 50% chance
of winning a dollar and a 50% chance of losing a dollar as an expected value of 0 (50*1 + 50*-
1). If a person equates his or her preferences to monetary values, then expected utility is the same
as expected value. Value is a concrete, unchangeable statistic, while utility measures the weight
different agents give to specific outcomes. Thus, while expected value is the same for all players,
expected utility may be different. Expected utility theory states that rational agents will make
decisions that give them a positive expected utility.
The mathematician Daniel Bernoulli was the first to study the implications of marginal
utility on a decision-making model. He proposed that the accumulation of wealth was affected by
diminishing marginal utility; in other words, if a person started at nothing and gained a dollar a
day for a year, the first dollar gained would provide greater utility than the last. Money gained
would mean more for a poor man than a rich man, as they have different starting points.
Bernoulli’s utility function factors in risk aversion, as the function is concave down rather than
linear. Each unit of wealth gained provides less utility than the previous one. Bernoulli described
the function as logarithmic, because it would take money to multiply by the same proportion for
utility to increase by the same value. If 2 dollars gives the player a utility of 1 and 4 dollars a
utility of 2, then 8 dollars gives the player a utility of 3, 16 dollars a utility of 4, and so on. Von
Neumann and Morgenstern expanded on this utility function to describe the behavior of risk-
neutral, risk-averse, and risk-prone players.
● For a risk-neutral player, expected utility is directly proportional to expected payoff.
● For a risk-averse player, expected utility is proportional to the root of expected payoff.
● For a risk-prone player, expected utility is proportional to the power of expected payoff.
In 1979, Israeli psychologists Daniel Kahneman and Amos Tversky launched a strong
criticism of expected utility theory and proposed a new concept, called prospect theory, to
explain how humans value choices. The pair suggested that while expected utility theory may
explain how rational agents, such as the hypothetical homo economicus, weight decisions, the
theory is frequently violated in real life and does not sufficiently explain actual behavior. In
expected utility theory, decisions are weighted exactly by their probability. In a gamble decided
by a coin flip, each choice will be weighted 50%. However, Kahneman and Tversky suggest that
human cognition involves a certainty effect, where “people overweight outcomes that are certain,
relative to outcomes that are merely probable.” Kahneman and Tversky proved their critique by
asking research subjects to name their choices in response to several pairs of hypothetical
gambles, such as the following (note that the monetary units are Israeli shekels): “Choose
between (A) an 80% chance of gaining 4000 or (B) a 100% chance of gaining 3000.” 80% of
participants chose the latter option. The substitution axiom of expected utility theory states that if
an individual prefers choice A over choice B, then he will prefer some probability p of choice A
over the same probability p of choice B. However, when the question posed above was
manipulated, and each probability divided by 4 so that the choices became either (A) a 20%
chance of gaining 4000 or (B) a 25% chance of gaining 3000, participants switched their
response, with 65% of participants choosing A. Repeated violations of the substitution axiom
illustrate this certainty effect; humans prefer the probable to the possible, yet when all choices
have a low probability, humans prefer a higher monetary gain. The certainty effect states that
high probability outcomes are underweighted, and conversely the possibility effect states that
low probability outcomes are overweighted. These two principles are shown in the table below,
which shows estimates for decision weights based on empirical studies (Kahneman 2013).
Kahneman and Tversky also pointed out the reflection effect, referring to the fact that
when the researchers changed the questions by replacing gains with losses, the participants’
preferences switched. For example, when the question posed above was changed, and the choices
became either (A) an 80% chance of losing 4000 or (B) a 100% chance of losing 3000, 92% of
participants chose A. This effect was shown multiple times throughout the experiment.
Kahneman and Tversky used the certainty, possibility, and reflection effects to draw several
major conclusions which form the basis for prospect theory. Humans are strongly risk averse to
maintain gains, but strongly risk seeking to avoid losses. Moreover, the utility lost from a
decrease in assets is much greater than the utility gained from an equal increase in assets.
Ultimately, expected utility theory fails in that it only focuses on final states of wealth and
ignores whether an individual gained or lost to reach this final state. In reality, human decision
making revolves around gains and losses; “the carriers of value or utility are changes of wealth,
rather than final asset positions that include current wealth” (10). Kahneman and Tversky
propose that unlike the value function in expected value theory, which takes monetary assets as
an input, the value function for actual human behavior is “(i) defined on deviations from the
reference point; (ii) generally concave for gains and commonly convex for losses; (iii) steeper for
losses than for gains.” This function is illustrated below.
There is an obvious flaw to mathematical models of utility; they do not hold up in
situations where payoffs are not numerical and players take into account abstract concepts like
friendship, pride, or social customs. “Any relationship between value and utility only makes
sense if the pay-off is numerical – which usually means monetary” (Kelly). Pride especially is a
big factor in duels and games of attrition, where players remain in the game even when the cost
of playing has far exceeded the payoff from a chance of winning. “One cannot proceed to make
any inference about the empirical status of GT [game theory] without figuring out first what
individuals care about. If they care about fairness, for example, it is still entirely possible that
they are rational maximisers of utility – once the latter has been defined appropriately to capture
fairness considerations. The definition of the utility function as ranging exclusively over one’s
own monetary outcomes is unnecessarily restrictive and in principle can be relaxed to allow for
more varied (and interesting) individual preferences” (Guala 11).
Take the ultimatum game for example. In this famous game, there are 2 players and a
fixed sum of money, or “pie”. Player A makes the first choice of how to divide the pie; he can
choose an amount to give to Player B, and the rest he can take for himself. Then, Player B has a
choice of either accepting or rejecting Player A’s offer. Obviously, when utility is measured in
terms of the amount of money, any rational player with the role of Player B would accept any
offer greater than 0 from Player A. Correspondingly, a rational player with the role of A would
offer the smallest possible amount to Player B to maximize his own utility. However, the game
works differently in real life. On average, 40% of the pie is offered to Player B. In a study of 75
different results taken from ultimatum game experiments, the percentage of the pie offered
ranged from 26% to 58%. Researchers have found that fairness is a priority for those receiving
the offer, who would rather gain nothing than be given pennies compared to Player A; in other
words, utility is a combination of different factors, including abstract values like fairness and
concrete measurements like monetary gain. Thus, when the sum of money becomes bigger, most
receivers put their pride away in favor of the money, and rejection rates decrease substantially.
The ultimatum game is useful for another key reason; it is an easy way to highlight the
differences in decision making between populations and important cultural values within
communities. Research subjects in the United States and Yugoslavia offered much more money
to their partners than those in Israel did (Roth). In the 75-experiment meta analysis conducted by
Oosterbeek, people in Asian countries had a much higher rejection rate than Americans. This
could be attributed to the dichotomy of collectivism -- common in the Asian countries -- versus
individualism which forms a cornerstone of American culture. However, Oosterbeek and his
colleagues found no significant correlation between a country’s experiment results and its degree
of individualism. Still, in countries with high respect for authority, Player A’s offered a lesser
percentage of the pie to their partners. Cultural values can have an enormous impact on a
nation’s economy; countries where strangers are not trusted, and business is conducted within
families or tribes, have lower GDPs than countries with more open societies (Sowell 966).
Simple game theoretical experiments like the ultimatum game can pinpoint the differences in
how various cultures and societies measure utility.
The Role of Choice and Strategic Moves
When Hernan Cortes landed on the coast of Mexico, he burned his own ships; this
decision was crucial, as it prevented his soldiers from turning back and forced them to only one
choice. This “point of no return” commits players to one decision by removing all other
alternatives. Cortes was not the first to do so; The legendary commander Alexander the Great
and the Chinese general Sun Tzu both advocated for the strategy of burning the boats. Not only
military history, but history as a whole, has been driven by the presence (or absence) of alternate
choices and dominant strategies.
After the Black Plague, peasants in Western Europe had new mobility from a scarcity of
labor, which led to economic freedom. Their counterparts in Eastern Europe were still under
tyrannic control of feudal masters even long after the Black Plague; the absence of alternatives
allowed the feudal system to remain stable. Before, the peasants were forced to work and were
unable to leave their land without the permission of the lord. But after the plague, the peasants
had new strategies with new payoffs and could even change certain existing payoffs. The
peasants could push for better conditions and higher wages or threaten to leave. By contrast,
peasants in the East were tied to the land and were forced to work. The elites in Eastern Europe
exploited the serfs by taking away any dominant strategies available to them.The serfs could not
even choose between any two actions; the only option was to work. It is this removal of
strategies which caused Eastern Europe to remain a feudal society for the next several centuries.
Similarly, Spanish colonists in Latin America removed alternate choices of workers by tying
them to the land in a system of debt peonage known as encomienda. In contrast, American
settlers had substantial freedom from the British government, because settlers had too many
alternatives to be controlled (Acemoglu). Throughout history, there is a pattern of the removal of
choice to control populations.
More generally, it is advantageous not only to remove an opponent’s choices if able to,
but also to remove one’s own choices. A player can do this through commitment, the strategy of
forcing oneself to choose a certain action regardless of the other players’ choices. By committing
to an action, a player is not just removing all alternative choices, but he is also forcing the
decision onto his opponent. Thus, commitment is often a useful tool in bargaining. When a
player is able to commit and his opponent is not, the player has an advantage, as his offer is final.
If both players are able to commit, each player has an incentive to commit to an action first. If an
object with a value of $10 is being sold and the buyer proposes a final offer of $7, the seller is
unable to make his own offer and his actions are restricted. Likewise, if the seller commits first
and makes a final offer of $12, the buyer is unable to negotiate and has only 2 choices. Thus,
whichever player is the first to make a commitment has a strategic advantage.

While commitments are advantageous for Player 1, threats favor Player 2 and serve as a
countermeasure to commitments. A commitment removes a player’s alternatives, and a threat
does the same contingent on a specific action made by an opponent. The more likely that a player
would carry out a threat, more credible the player is and the less likely that he or she would
actually have to fulfill the threat. Yet if the player has no incentive to fulfill the threat, then he
must convince his opponent that he would actually fulfill it. Thus, the reputation of a player is
extremely crucial in the responses of his opponents. Moreover, the player must effectively
communicate threats to the opponent. A hostage who only speaks a foreign language will be
unable to do the kidnapper’s bidding. Foreign troops brought in to put down a rebellion are
unable to communicate with the natives and thus immune to threats.
Strategic moves -- commitments, threats, and promises -- are frequently used by global
leaders in order to influence other nations. Examples of this abound within the history of U.S.
foreign policy, especially during the Cold War. The Truman Doctrine was an American
commitment to aid democratic countries threatened by Soviet aggression. The Formosa
Resolution was a threat designed to deter the Communists in China from invading the
Nationalists in Taiwan. Deterrence has been a cornerstone of United States policy not only in the
late twentieth century, but also in the early 1900’s in response to European expansionism and in
the twenty-first century in response to terrorist threats. In 2007, amid American operations in
Iraq and Afghanistan, Roger Myerson, a professor from the University of Chicago, wrote that “a
successful deterrent strategy requires a balance between resolve and restraint, and this balance
must be recognized and understood by our adversaries”. Myerson, expanding on Schelling’s
main ideas, argued that it is useless for the U.S. to only use the threat of military action as a
deterrent. Such a threat alone, like Bush’s threat to nations that sponsor terrorism, would not stop
enemies but encourage them further, as the enemies have no guarantee that cooperation would be
beneficial to them. Bombing a hostile country without presenting an option for peace only
increases the enemy’s conviction to defend itself. Building up one’s military convinces an enemy
that you will attack them, leading to the build up of their own military to defend against a
possible attack. “Retaliatory actions and threats that lack clearly defined limits can raise fears of
deep invasions and thus can motivate people on the other side to seek militant leadership that
may be better able to defend them” (15). Instead, the U.S. should combine the credible threat of
retaliation in response to aggression -- resolve -- with the credible promise of cooperation in
response to cooperation -- restraint. Myerson makes the point that whenever self-interest doesn’t
support either action, reputation is necessary to back it up. For example, if it is more profitable
for the United States to be aggressive to cooperative adversaries, and if these groups believe the
U.S. will be aggressive no matter what they do, a reputation for cooperation will convince
adversaries to cooperate. Likewise, in situations where it is believed that aggression would be
costly for the U.S., and that the U.S. will acquiesce to hostile enemies, the U.S. must rely on a
reputation of aggression to credibly threaten these enemies with punishment.
In one study, mathematician Mark Kilgour and political scientist Frank Zagare compared
deterrence under complete and incomplete information. In game theory, credibility is associated
with rationality; assuming all players are rational, a credible threat would only be one which is a
rational choice for a player. Thus, in a perfect world where all nations are aware of each other’s
payoffs, there are only three possible outcomes in a conflict between two nations. Either both
nations find it profitable to engage in conflict when challenged, only one does, or neither
does.These 3 cases are called “Prisoner’s Dilemma”, “Called Bluff”, and “Chicken”,
respectively. Mutual deterrence is only possible in the Prisoner’s Dilemma case, when both
nations have a credible threat to retaliate. In the Called Bluff case, the country with the credible
threat has an advantage, as its opponent will prefer to surrender when challenged rather than
retaliate. The last case, Chicken, is much more unpredictable, since neither country has a credible
threat and thus both will find it optimal to not cooperate, risking mutual destruction. However, in
the real world, neither nation is aware of the other’s payoffs, and neither knows whether the
other possesses a credible threat. The real world is characterized by “nuance, ambiguity,
equivocation, duplicity, and ultimately uncertainty” (312).
In Kilgour and Zagare’s model of deterrence under uncertainty, each nation has some
probability (pA , pB) of being a “Prisoner’s Dilemma” player which prefers conflict to capitulation
and some probability (1 - pA , 1 - pB) of being a “Chicken” player which prefers capitulation to
conflict. While each nation knows its own preferences and both pA and pB , neither nation knows
the other’s preferences. Thus, each nation may correctly or incorrectly guess whether the other
possesses a credible threat. This model has several major implications for sustaining deterrence.
Firstly, neither side needs to correctly assess what type of player the other is in order to bring
about a deterrence equilibrium. Suppose both players are “Chicken”, yet both believe the other is
a “Prisoner’s Dilemma” type; both players, preferring cooperation to capitulation, will cooperate.
“Accurate assessments of the strategic environment are neither necessary nor sufficient for the
success of mutual deterrence” (317). Secondly, there is some threshold of pA, and pB such that
when both probabilities pass the threshold, both sides will cooperate regardless of what type of
player they are. To decrease the threshold so that the chance for deterrence is more likely, there
are several strategies available. One such strategy is to increase the payoffs associated with
mutual cooperation, or the status quo. The more satisfied a nation is with the existing state of
affairs, the less likely it is to engage in conflict. Another strategy is to decrease the payoffs
associated with mutual conflict in a “Prisoner’s Dilemma” situation. Even if a nation prefers
conflict to capitulation, the higher the cost of this conflict is, the greater the chance for peace.
Ultimately, the model gives mathematical support for policies that can help maintain peace in the
nuclear era. Kilgour and Zagare advise global powers to “never behave aggressively, but threaten
with high credibility a harsh retaliatory strike” (328).
There is an interesting paradox regarding the concept of commitment within game theory.
Just as in some situations it is rational for an agent to behave irrationally, in similar ones it is
strategically advantageous to be weak. The word “weak” refers not to a player’s physical or
mental strength, but rather the collapse of alternate possibilities which stems from the inability of
the player to make a choice. Schelling notes that “when a person has lost the ability to help
himself, or the power to avert mutual damage, the other interested party has no choice but to
assume the cost or responsibility” (37). A driver whose high speed prevents him from avoiding a
collision forces the other driver sole responsibility in stopping an accident. When two people are
dropped in some area at different locations and must find each other to escape, they will most
likely agree to meet up at a halfway point so that each one would have to walk some distance.
Yet, if one person was unable to communicate with the other, he would have an advantage. If he
knows that his partner knows that he is unable to communicate, then all he has to do is sit in the
same location, forcing his partner to come to him. By making the decision of denying oneself a
choice, agents play optimally. Relinquishing the initiative and imposing it on the opponent is
often the most effective strategy for a disadvantaged agent, which is why nonviolent acts like the
restaurant sit-ins during the Civil Rights Movement were so powerful.
In game theory, a strategy that is always better than a different one is known as a strictly
dominant strategy. A strategy s is strictly dominated if there is another strategy s’ where no
matter what action the other player chooses, the payoff from playing s’ will always be greater
than the payoff from playing s. A strategy that is always at least as good as a different one is
known as a weakly dominant strategy. A strategy s is weakly dominated if there is another
strategy s’ where no matter what action the other player chooses, the payoff from playing s’ will
always be at least as much as the payoff from playing s. The concept of dominant strategies is
especially useful in a game of complete information where each player knows the structure of the
game as well as the other players’ possible actions and payoffs. A rational player would never
play a dominated strategy, one that always has a better alternative, and therefore would
completely disregard any such strategy. Therefore, if player A knows that player B has a
dominated or a dominant strategy, he can easily use that to his advantage to maximize his own
payoff.
Iterated removal of strictly dominated strategies refers to the process of removing all
strategies that will never be played to decrease the number of outcomes in the game. Game
theorists use this concept to their advantage to break large games into smaller ones and locate
equilibrium points. Here is an example where each player has 3 actions. Player 1 can choose
between A, B, and C, while Player 2 has actions D, E, and F.
1/2 D E F
A (2,1) (4,0) (1,4)
B (3,4) (3,2) (2,3)
C (1,0) (2,5) (0,2)
At first glance, it seems like multiple outcomes are possible. However, if we look closely at the
payoffs, we can see that C is a strictly dominated strategy. Whether Player 2 plays D, E, or F
doesn’t matter, because Player 1 will always do better by switching from C to a different
strategy. This means that Player 1 will never choose C and we can eliminate C from the payoff
matrix.
1/2 D E F
A (2,1) (4,0) (1,4)
B (3,4) (3,2) (2,3)
Now we see that E is a strictly dominated strategy. We can repeat the process.
1/2 D F
A (2,1) (1,4)
B (3,4) (2,3)
A is a strictly dominated strategy.
1/2 D F
B (3,4) (2,3)
Now there is only one action left. Player 2 will choose a payoff of 4 from D over a payoff of 3
from F. By iterative removal, we have reduced a complicated game of 9 possible outcomes to a
simple subgame where only one outcome is possible: (B,D).
1/2 D
B (3,4)
In real situations, knowing the behavior of our opponents and which strategies are
dominated (which choices they are least likely to make) can narrow down our viewpoint from a
variety of outcomes to only a few possible scenarios. The Battle of the Bismarck Sea was a
World War 2 battle between the Allies (the U.S. and Australia) and Japan. The Japanese were
sending a convoy on a 3 day journey from Rabaul, New Britain to Lae, New Guinea. The
Japanese could choose to send the convoy along the north of New Britain or along the south of
New Britain. Likewise, the Allies could choose to fly air reconnaissance units to the north or to
the south. At the time, there was heavy rain in the north which would decrease visibility for
reconnaissance units. Thus, there were four different possible outcomes:
● If both sides chose north, low visibility would prevent the Allies from spotting the
convoy on the 1st day, leaving only 2 days of bombing.
● If both sides chose south, the Allies would spot the convoy immediately and get all 3
days for bombing.
● If the Allies chose north and the Japanese chose south, lack of reconnaissance would
prevent the Allies from spotting the convoy on the 1st day, leaving only 2 days of
bombing.
● If the Allies chose south and the Japanese chose north, lack of reconnaissance and low
visibility from the rain would prevent the Allies from spotting the convoy on the first 2
days, leaving only one day of bombing.
Allies/Japanese North South
North (2,-2) (2,-2)
South (1,-1) (3,-3)
The northern route is a weakly dominant strategy for Japanese. From a purely tactical standpoint,
the Japanese preferred the northern route because it would always give them an extra day of
protection. However, the Americans knew this and accordingly placed reconnaissance along the
northern route, guaranteeing 2 days of bombing. (North, North) is a Nash Equilibrium and is also
how the battle turned out historically. The Allies exploited the game by checking for dominated
strategies and then removing them. This allowed them to easily win the battle.
Mixed Strategies and Probability
In certain games, it is not advisable to select one action and play it consistently; this type
of behavior is called a pure strategy. Penalty kicks in soccer illustrate a perfect example of this
concept. If the kicker kept kicking left, the goalie would figure this out and move to the kicker’s
left. The kicker, seeing the goalie’s behavior, now changes his pure strategy to kick right, but the
goalie recognizes this and starts moving to the kicker’s right, and so the cycle continues on and
on.
A far better strategy for such situations is a mixed strategy, where an agent alternates
between two or more actions with certain probabilities. In a game of heads or tails, a player
chooses each action with a ½ probability. In a game of rock paper scissors, it would be a ⅓
probability. These probabilities are optimal; if, in the latter example, a player played rock with a
greater probability than the other two options, the opponent could easily exploit this by playing
paper more often. “The essence of randomization in a two-person zero-sum game is to preclude
the adversary’s gaining intelligence about one’s own mode of play -- to prevent his deductive
anticipation about how one may make up one’s own mind, and to protect oneself from tell-tale
regularities of behavior that an adversary might discern or from inadvertent bias in one’s choice
that an adversary might anticipate” (Schelling 175). By randomizing one’s strategy, a player
protects himself by minimizing the maximum possible loss he could receive.
In one study, Ignacio Palacios-Huerta recorded the statistics of more than 1000 different
professional soccer games to determine how well players make use of the minimax strategy. The
strategy rests on the idea that a player seeks to make the opponent indifferent by making his
expected value from both actions the same. Equalizing these values minimizes the opponent’s
maximum possible gain, hence the name “minimax” strategy. Where L denotes a kicker’s natural
side and R denotes a kicker’s nonnatural side, the chances of scoring can be seen in the payoff
matrix below:
Kicker/Goalie GL GR
KL 0.583 0.9497
KR 0.9291 0.6992
Using these numbers, we can mathematically derive optimal probabilities in accordance
with the indifference theorem.
For the goalie to make the kicker indifferent:
0.583GL + 0.9497(1 - GL) = 0.9291GL + 0.6992(1 - GL)
GL = 0.4199 ; GR = 0.5801
For the kicker to make the goalie indifferent:
(1 - 0.583)KL + (1 - 0.9291)(1 - KL) = (1 - 0.9497)KL + (1 - 0.6992)(1 - KL)
KL = 0.3854 ; KR = 0.6146
To play optimally, the kicker should kick to his nonnatural side about 39% of the time and to his
natural side about 61%, while the goalie should move to the kicker’s nonnatural side 42% of the
time and his natural side 58%. The real values that the researchers found, rounded to the nearest
whole number, were 40%, 60%, 42%, and 58% respectively. It is almost as if the players
intuitively knew the probabilities required to play rationally. This finding is just another example
of the powerful real world applications of game theory and how it can be used as an accurate
model for behavior.
To better understand the concept of minimax and the indifference theorem, one can see it
represented graphically. The graph below shows the kicker’s utility as a function of GL, the
probability the goalie moves to the kicker’s nonnatural side.
The blue line represents the kicker’s utility when shooting towards his natural side. The
red line represents the kicker’s utility when shooting towards his nonnatural side. Let’s call the
kicker’s utility at optimal GL (GL = 0.4199) UK’. When the goalie moves to the kicker’s
nonnatural side with a probability less than optimal, the kicker can do better than UK’ by
shooting to his nonnatural side. Conversely, when the goalie moves to the kicker’s nonnatural
side with a probability greater than optimal, the kicker can do better than UK’ by shooting to his
natural side. Only at the optimal probability, where the two lines meet, does the goalie guarantee
that the kicker can do no better than UK’ no matter what direction he shoots. UK’ is the point at
which kicker’s maximum possible gain is at a minimum. If I were to add a second graph showing
the goalie’s utility as a function of KL, I would see the exact same thing, where the goalie’s two
lines of utility intersect at KL = 0.3854.
The benefits of randomization do not only apply to zero-sum games, but also to strategic
moves like threats and promises in a more complex mixed-motive game. Schelling notes that
randomization is a way to make indivisible objects divisible and to scale down both threats and
promises. If the only threat available is something massive, like dropping a nuclear bomb, a
player does not want to threaten with certainty because there is a higher chance of failure
associated with bigger threats. Moreover, between nations, extreme threats reduce the enemy to
choose between two extremes, which is not what a country wants to do if an enemy has only
committed a small act of aggression. One way a player can scale down a massive threat is
through randomization. Instead of threatening to drop a nuclear bomb with 100% certainty, a
country can threaten to drop it 25% of the time, 50%, or any other optimal probability the
situation calls for. Using a “fractional” threat, a threat carried out with some probability less than
1, is beneficial when there is some chance that the threat will fail.
We can illustrate this concept using an example from Schelling:
1/2 C D
A (1,0) (0,1)
B (0,0) (-X,-Y)
Because Player 1 wants the outcome (A, C), he will threaten to play B with some probability p
such that Player 2 is induced to select C instead. But what if there is some probability P that the
threat fails? Because in this situation, like many others in real life, the cost of a threat failing is
too high, Player 1 will want to pursue a fractional threat. An effective threat will satisfy 2
requirements:
1. Player 2 actually has an incentive to give in to the threat; Player 2’s utility from ignoring
the threat must be lower than Player 2’s utility from capitulating. Thus, there is a lower
bound to p such that:

1(1 - p) - Y(p) < 0 or
p > 1/(1+Y)
2. Player 1’s expected value from a threat with success rate (1 - P) must be greater than his
expected value from not making the threat at all. Thus, there is an upper bound to p such
that:
(1 - P)(1) + (P)(0(1 - p) - X(p)) > 0 or
p < (1 - P)/(PX)
We now have calculated the optimal range of probability that Player 1 can threaten with:
1/(1+Y) < p < (1 - P)/(PX)
Note that in this case, a threat with 100% certainty is not optimal because Player 1 would be
better off not using a threat at all. Moreover, Player 1 has incentive to threaten with as little
probability as possible, as the lower p is the higher his expected gain. Therefore, certain
situations, especially those with a high chance of failure, call for fractional threats as an optimal
solution. Fractional threats may have been a powerful strategy had either the United States or the
Soviet Union used them during the Cold War. Because of the United States’ adoption of a policy
of massive retaliation, it was unable to deter Soviet aggression in Eastern Europe. One striking
example was the Soviet invasion of Hungary. The U.S. threatened massive retaliation, but this
approach failed as the threat was much too big for the situation at hand. The Soviets did not find
the threat credible enough for something in Eastern Europe, far from American strategic
interests. Had the U.S. followed Schelling’s advice and committed a fractional threat, it may
have been much more successful in deterring a Soviet invasion.
Equilibrium, Convergence, and Repeated Games

Equilibrium is a term used in many different fields like physics, chemistry, and
economics. Within game theory, there is the famous Nash equilibrium proposed by John Nash.
The Nash equilibrium is simply an outcome of the game where given what all other players
chose as a strategy, none of the players have an incentive to deviate from his own strategy. The
equilibrium is stable because the payoff structures of the game incentivize it. Furthermore, we
can see that in many different games, players alter their strategies until the equilibrium outcome
is achieved. This “convergence” to equilibrium can be seen in many different contexts, from
competitive markets (Smith) to cooperation experiments (Selten).
One famous example is Cournot competition. Cournot was a French economist who
studied competition between rival firms in the 19th century. In Cournot’s example, two firms
with the exact same product are unable to set their own prices. Prices are set by the market and
are a function of the total quantity of the good produced. Therefore, the two firms compete by
choosing different quantities of production in response to the other’s quantity. Price is given by
the following equation, where a and b are two random parameters:
P = a - b(q1 + q2)
The firms’ profits can be calculated by subtracting the cost, c, from revenue. It is assumed
marginal cost is constant, and it costs both firms the same amount to produce the good.
W1 = q1(a - b(q1 + q2)) - q1c
W2 = q2(a - b(q1 + q2)) - q2c
We can derive with respect to each firm’s quantity and set the equations to 0 to find the profit
maximizing equation for each firm.
W1’ = a - 2bq1 - bq2 - c; q1 = ((a-c)/2b) - (q2/2)
W2’ = a - bq1 - 2bq2 - c; q2 = ((a-c)/2b) - (q1/2)

Each firm’s quantity is simply a function of the other firm’s quantity, so we can use these
equations to graph the model.
The Cournot model has several downsides. Strict requirements, like a homogenous good
and the same marginal cost between firms, limit the model’s practical applications. Cournot
assumed that both firms were acting simultaneously to decide quantities, yet competition
between firms rarely involves such clear cut choices. Nonetheless, Cournot’s proposal has
important implications for the field of game theory. If both firms choose actions that maximize
their profits — their “best responses” — eventually the two firms will converge at a single state
of production. This phenomenon of convergence happens not only with quantity, but also with
price. Cournot’s colleague Joseph Bertrand proposed that if two rival firms produced the exact
same good with infinite price-elasticity, each firm would undercut the other until prices
converged to marginal cost.

German economist Heinrich von Stackelberg highlighted the power of commitment by
changing Cournot’s model from a simultaneous game, where both players choose quantity at the
same time, to a sequential game, where Player 1, the leader, chooses a quantity and Player 2, the
follower, responds. In such a situation, the leader knows that for any quantity he chooses, the
follower will respond accordingly using the best response function:
W2’ = a - bq1 - 2bq2 - c; q2 = ((a-c)/2b) - (q1/2)
The leader simply substitutes this q2 into his own profit maximizing function:
W1 = q1(a - b(q1 + q2)) - q1c
W1 = aq1 - bq12 + bq1q2 - q1c
W1 = aq1 - bq12 + bq1[((a-c)/2b) - (q1/2)] - q1c
W1’ = ((a - c )/2) - bq1
q1 = (a-c)/2b ; q2 = (a-c)/4b
Note that Cournot’s equilibrium has both firms producing at q = (a-c)/3b. Thus, when the
game changes from simultaneous to sequential, the leader produces more and the follower
produces less. Because the leader is able to commit first, he reaps higher profits by producing
more of the good before the follower has a chance to produce. The follower’s best option is to
produce less of the good, resulting in a first-mover advantage and an uneven share of the market.
Of course, in line with Schelling’s doctrine of strategic moves, the follower can counter this
commitment advantage by threatening to produce an excessive amount such that the good will
sell at dirt-cheap prices, hurting both the leader and the follower. As previously mentioned, this
strategy is what oil hegemon Saudi Arabia used to deter other countries from increasing their oil
production.
The convergence of equilibrium is shown more practically using repeated games.
Experimental research regarding repeated games has provided valuable insights into how
humans learn and change their choices over time. In fact, numerous studies show that over time,
agents behave more and more optimally, with their behavior converging to the equilibrium point.
One powerful tool that applies to repeated games is the Folk
Theorem, which states that “the set of Nash equilibrium outcomes of
the repeated game G∗ is precisely the set of feasible and
individually rational outcomes of the one-shot game G” (Hart 4).
Basically, an equilibrium in a repeated game will be any outcome that
can (1) be obtained from some combination of actions in the single
game G and where (2) each player’s payoff is greater or equal to his
minmax payoff. The minmax payoff refers to the maximum payoff the
player can receive in the case that the other players are trying to
minimize his maximum payoff; regardless of the other players’
decisions, a player can always get some minimum payoff, and this
payoff is called the minmax payoff. If the player is not individually
rational, he is receiving a payoff less than his minmax payoff and
thus has an incentive to deviate. This concept sets the stage for
equilibrium in repeated games. If there is an outcome a that is both feasible
and individually rational, then all players agree to play a combination of actions that will result
in a. If any player deviates from this plan, another player can threaten to switch to a minimax
strategy, reducing the first player’s payoffs from a to r, r being the highest payoff the first player
can get no matter what the other players do. In this way, any outcome a will be an
equilibrium outcome in the repeated game due to the threat of
punishment. The Folk Theorem is important because it “relates
cooperative behavior in the game G to non-cooperative behavior in
its supergame G∗” (Aumann). It provides a rational explanation to

cooperative and altruistic actions, which are normally considered
“irrational”; through cooperation, players are actually increasing
their individual payoffs in repeated games. Ultimately, the Folk
Theorem shows us that it is rational to sacrifice short term payoffs
for long term gains. However, real humans do not value future
payoffs as much as they value payoffs in the present. This disparity
is shown by the discount factor, a coefficient between 0 and 1. If the
discount factor is 0.9, the payoff in a certain period will be worth 90%
of the same payoff received in the previous period. Mathematicians
have proven that there exists some threshold, such that when the
discount factor passes below this threshold, the Folk Theorem’s
repeated game equilibrium will break down since players do not care
enough about future payoffs relative to present payoffs.
How does game theory explain learning -- the way rational agents change their behavior
over time as a result of experience? Economics frequently relates to the other social sciences, and
accordingly the two major models of learning in game theory are each connected to a different
discipline. Fictitious play is the first major model, and it is connected to statistics. Fictitious play
states that players learn by constantly updating the observed frequencies with which their
opponents play strategies. For example, suppose I am playing a third round of Rock, Paper,
Scissors with a friend. In the previous two rounds, I observed my friend play Rock, so I now
believe that he is playing Rock with 100% probability. I accordingly choose Paper, while my
friend chooses Scissors. Now, my observations show that my friend has played Rock twice and
Scissors once, so based on this distribution, my expected utility from any action is 0.66R +
0.33S, where R represents some payoff I receive when my friend plays Rock, and S is the payoff
I receive when my friend plays Scissors. This utility formula is maximized by Paper, so I choose
Paper, while my friend chooses Scissors. Since my friend has played Rock twice and Scissors
twice, I update my own utility formula to 0.5R + 0.5S and choose the action which maximizes
this -- in this case, I am indifferent between Paper and Rock. Simultaneously, my friend is
tracking my actions and updating his own utility formula. This process continues until the end of
the game, or in the case of an infinite game, forever. When players use fictitious play, they are
constantly updating their beliefs about the opponent based on past observations. Players believe
their opponent’s distribution of past plays represents their strategy. According to fictitious play,
if a goalie observes that the kicker has kicked left 3 times and right 2 times in the past, the goalie
believes the kicker is playing with the strategy of 60% Left and 40% Right.
While fictitious play may seem reasonable, it has serious problems. Past observations
may not provide adequate information with which to judge an opponent’s strategy. Furthermore,
because players are hopping between different combinations of outcomes and not actually
judging the success of their actions, fictitious play often leads to cycles of outcomes that never
converge to equilibrium. In fact, American mathematician Lloyd Shapley proved that if both
players use fictitious play in a game of Rock Paper Scissors, the game would never converge to
the equilibrium outcome of each action being played with ⅓ probability.
The second major model of game theoretic learning is called reinforcement learning,
which is based on the behavioral psychology pioneered by B. F. Skinner and John B. Watson.
Thorndike’s law of effect states that actions that are rewarded will be more likely to be repeated,
while actions that are punished will be less likely to be repeated. In a similar vein, game theory’s
reinforcement learning proposes that players will be more likely to choose actions that have been
successful in the past and less likely to choose actions that have been unsuccessful in the past.
Economists Alvin Roth and Ido Erev modeled this concept using the equation:
Where qnj(t)is the propensity of player n to play his kth strategy j during period t, and R(x) is the
reinforcement function x - xmin -- the difference between the player’s payoff and his smallest
possible payoff. In plain terms, this equation states that if a player played some strategy in the
previous period, his propensity to play it again will increase based on its success. Note that in
Roth and Erev’s original model, R(x) is never negative, so the player is not punished -- that is, he
will never lose the propensity to play a certain strategy. Instead, the player starts out with equal
propensities for all available strategies, and each propensity will either increase or stay the same
based on experience. However, different models of reinforcement have negative reinforcement
functions that punish players.
Both fictitious play and reinforcement learning have issues, yet they both model real life
decision making remarkably well. When Roth and Erev ran simulations based on both learning
models for a variety of experimental games, the simulations were much more accurate in
predicting outcomes than standard mixed strategy equilibrium predictions. Quite simply, while
the concept of equilibrium describes what players should do, learning models predict what
players will do. Because most real world situations will be of incomplete information, where
players do not know the payoff structure of the game, it will be difficult for them to deduce the
equilibrium strategy. “The justification of a Nash Equilibrium requires the existence of a
commonly known prior distribution of the uncertain parameters in the game” (Kalai 4). If players
do not know each others’ payoffs, reinforcement learning models are especially applicable in
place of equilibrium predictions.
In a famous 1991 study of equilibrium convergence, Roth and his colleagues had research
subjects play consecutive rounds of 2 different games: the ultimatum game and the market game.
The ultimatum game, discussed above, is also known as the bargaining game and involves 2
players sharing a sum of money. The market game is described as follows:
“Multiple buyers (nine in most sessions) each submit an offer to a single seller to buy an
indivisible object worth the same amount to each buyer (and nothing to the seller). The seller has
the opportunity to accept or reject the highest price offered. If the seller accepts, then the seller
earns the highest price offered, the buyer who made the highest offer (or, in case of ties, a buyer
selected by lottery from among those who made the highest offer) receives the difference
between the object’s value and the price he offered, and all other buyers receive zero. If the seller
rejects, then all players receive zero.” (Roth et al. 1991)
Note that in both environments, the equilibrium point involves an unequal distribution of
money, with one player earning the majority of the sum. The bargaining game involves the
concept of subgame perfect equilibrium. A subgame is simply a smaller part of a bigger game,
and the requirement of subgame perfect equilibrium means that any equilibrium for the entire
game would also have to be an equilibrium for its subgames. In Roth’s experiment, the sum of
money is 10 dollars, and the smallest unit that can be offered is 5 cents. We can see that in any
subgame with some money offered, Player 2 would behave optimally by accepting. In other
words, Player 2 would never go through with a “threat” to reject a low offer because such an
action would not maximize his utility. Knowing this, it is in Player 1’s best interest to offer
Player 2 the lowest possible amount: in this case, 5 cents. It is safer for Player 1 to offer Player 2
at least some money rather than nothing at all because an offer of nothing is riskier; Player 2 is
indifferent between accepting and rejecting an offer of 0, and Player 1 would rather take a
definite $9.95 rather than a chance at $10. Thus, there is a subgame perfect equilibrium when
Player 1 chooses the lowest possible offer and Player 2 chooses to accept. There is also an
equilibrium when Player 1 chooses to offer nothing and Player 2 still accepts. Roth notes that
“these two equilibria become one as the smallest unit of transaction goes to 0.”
Similarly, in the market environment, the requirement of subgame perfect equilibrium
means that a seller would never reject a positive offer. Because of so many buyers there are
many different equilibria, but all equilibria involve a selling price of either $9.95 or $10.00, as
any other price would mean that a buyer could do better by bidding more. If both games are
played by rational agents, the majority of the money will always go to one player.
When played in real life by real people across the world (U.S., Japan, Slovenia, and
Israel), the results of these games were startling. In the market game, transactions quickly
converged to equilibrium; a price of either $9.95 or $10.00 was always achieved between round
3 and round 7. The great differences in distribution between countries in early rounds became
smaller and smaller as all 4 countries reached an equilibrium price by later rounds. By contrast,
in the bargaining game, offers usually ranged 30% to 60% of the 10 dollars, far from the
equilibrium offer of 5 cents. Instead of converging to the equilibrium, prices converged to 40%
or 50% of the pie, and differences between countries increased from round to round.
Roth’s experiment is just one of the hundreds conducted in hopes of finding new insights
into how humans make decisions. Experiments like these pose important questions for
understanding human nature. Why were the outcomes of the bargaining and market
environments so different? More specifically, when do people prioritize fairness over self-
interest, and when is it vice versa? Can fairness be considered a rational motive, or does
rationality only constitute self-interest? Economists Ernst Fehr and Klaus Schmidt attempted to
answer these questions in their model of fairness. According to the model, certain individuals are
“inequity averse”, meaning that they are dissatisfied with outcomes perceived as inequitable.
This concept of inequity aversion stems from the psychological concept of relativity. Humans are
inclined to compare themselves to others; the saying, “the grass is always greener on the other
side”, describes this inclination that is a fundamental part of our thought process. Even if a man
has a substantial amount of money, he will become unhappy upon seeing someone richer. Thus,
Fehr and Schmidt theorize that individuals lose utility in the case of an inequitable outcome.
Inequity averse individuals are not only dissatisfied when they are worse off than someone, but
also when they are better off. However, the case of being worse off has a greater impact on
utility than the case of being better off. This concept, seen graphically in the image, draws
heavily on the loss aversion that constitutes Kahneman and Tversky’s prospect theory.
Fehr and Schmidt explain that the difference between market and bargain environments is
caused by competition. They propose that inequity aversion is made irrelevant by competition.
Because any inequity averse individual in the market environment is unable to enforce equity,
the consideration of fairness does not matter. For example, even if one buyer wanted a fair
outcome of the buyer and seller getting equal amounts, any other buyer willing to offer a higher
amount immediately makes this desire for equity irrelevant. However, when equity is actually
enforceable, inequity averse agents can have a substantial impact on outcomes. For example,
workers’ unions frequently force employers to drive wages up past their competitive level. In
game theoretic experiments, free riding -- a player’s enjoyment of a public good without any
contribution of his own -- came to an end after inequity averse players gained the ability to
punish those who exhibited greater selfishness.
Conclusion
In this paper, I explored several major themes that play a large role in game theory
analysis, such as rationality, reputation, and equilibrium convergence. I also examined differing
perspectives on topics like utility, decision weighting, and learning within game theory. Despite
the large amount of research conducted, some scholars do not see any usefulness in the
discipline. In a scathing critique, Lars Parsson Syll, a professor at Malmo University, writes,
“Reductionist and atomistic models of social interaction – such as the ones mainstream
economics and game theory are founded on – will never deliver sustainable building blocks for a
realist and relevant social science. That is also the reason why game theory never will be
anything but a footnote in the history of social science” (18). To some extent, Syll is correct. Not
every situation can be explained with mathematical proofs. Not every interaction can be divided
into smaller components — players, payoffs, and actions. Game theory is much too rigid in its
requirements; its assumption of rationality often fails to carry on to the real world, as shown by
the field of experimental economics. Game theorists frequently limit their analyses to unrealistic
situations, such as those of complete information: “The reason that traditional game theory
focuses so much attention on the special case when players have complete information about
these things is that equilibrium predictions are easier to motivate and derive in the complete
information case, and often have little empirical content in the incomplete information case”
(Erev and Roth 29). Social scientists must find a way to reconcile theory — precise, predictable,
analytical — with real life, which is clouded with uncertainty, irrationality, and instability.
Nonetheless, I believe game theory has positively impacted the world of economics and society
as a whole. Game theoretic analyses have supported criminal system reform, monopoly
legislation, and deterrence policies, and have contributed to vast improvements in the medical
field. Moreover, the field has frequently intersected with other disciplines, such as computer
science. Game theory’s strength lies in its versatility, and while it may not provide all the
solutions, it is bound to receive further advancements in the future.
Works Cited
Acemoglu, Daron, and James A. Robinson. Why Nations Fail. Crown Publishing Group, 2012.
Alt, James E., et al. “Reputation and Hegemonic Stability: A Game-Theoretic Analysis.” The
American Political Science Review, vol. 82, no. 2, 1988, pp. 445–466.
Aumann, R. J. “Survey of Repeated Games”. Essays in Game Theory and Mathematical
Economics in Honor of Oskar Morgenstern, Vol. 4, 1981, pp. 11–42.
Calvert, Drew. “Is an Unpredictable Leader Good for National Security?” Kellogg Insight,
Northwestern University, 19 June 2017, insight.kellogg.northwestern.edu/article/is-an-
unpredictable-leader-good-for-national-security.
Coll, Steve. “The Madman Theory of North Korea.” The New Yorker, The New Yorker, 9 July
2019, www.newyorker.com/magazine/2017/10/02/the-madman-theory-of-north-korea.
Erev, Ido, and Alvin E. Roth. “Predicting How People Play Games: Reinforcement Learning in
Experimental Games with Unique, Mixed Strategy Equilibria.” The American Economic
Review, vol. 88, no. 4, 1998, pp. 848–881. JSTOR, www.jstor.org/stable/117009.
Accessed 7 May 2021.
Fehr, Ernst and Klaus M. Schmidt. “A Theory of Fairness, Competition, and Cooperation.”
Gallego, Lope. “Stackelberg Duopoly.” Policonomics, 2017, policonomics.com/stackelberg-
duopoly-model/.
Guala, Francisco. “Has Game Theory Been Refuted?” Journal of Philosophy, vol. 103, no. 5,
2006, pp. 239-263.
Hart, Sergiu. “Robert Aumann's Game and Economic Theory.” The Scandinavian Journal of
Economics, vol. 108, no. 2, 2006, pp. 185–211.

Haywood, O. G. “Military Decision and Game Theory.” Journal of the Operations Research
Society of America, vol. 2, no. 4, 1954, pp. 365–385. JSTOR,
www.jstor.org/stable/166693.
Kahneman, Daniel, and Amos Tversky. “Prospect Theory. An Analysis of Decision Making
Under Risk.” Econometrica, vol. 47, no. 2, Mar. 1979, pp. 263–291.
Kahneman, Daniel. Thinking, Fast and Slow. Farrar, Straus and Giroux, 2013.
Kalai, Ehud, and Ehud Lehrer. “Rational Learning Leads to Nash Equilibrium.”
Econometrica, vol. 61, no. 5, 1993, pp. 1019–1045. JSTOR,
www.jstor.org/stable/2951492. Accessed 7 May 2021.
Kilgour, D. Marc, and Frank C. Zagare. “Credibility, Uncertainty, and Deterrence.” American
Journal of Political Science, vol. 35, no. 2, 1991, pp. 305–334.
Lafayette, Lev. “Cournot Competition.” Lev Lafayette, Lev Lafayette, 24 Apr. 2019,
levlafayette.com/node/623.
Milgrom, Paul and John Roberts. “Predation, Reputation, and Entry Deterrence.” Journal of
Economic Theory, vol. 7, no. 2, Aug. 1982, pp. 280-312.
Myerson, Roger B. “Force and Restraint in Strategic Deterrence: A Game-Theorist’s
Perspective.” University of Chicago, 2007.
Oosterbeek, Hessel, et al. “Cultural Differences in Ultimatum Game Experiments: Evidence
from a Meta-Analysis.” Experimental Economics, vol. 7, 2004, pp. 171-188.
Palacios-Huerta, Ignacio. “Professors Play Minimax.” Review of Economic Studies, vol. 70, no.
2, 2003, pp. 395-415.

Ross, Don, "Game Theory", The Stanford Encyclopedia of Philosophy (Winter 2019 Edition),
Edward N. Zalta (ed.), https://plato.stanford.edu/archives/win2019/entries/game-theory/>.
Roth, Alvin E., et al. “Bargaining and Market Behavior in Jerusalem, Ljubljana, Pittsburgh, and
Tokyo: An Experimental Study.” The American Economic Review, vol. 81, no. 5, Dec.
1991, pp. 1068-1095.
Remi AI. “Artificial Intelligence, Poker and Regret. Part 3.” Medium, Medium, 25 Sept. 2019,
medium.com/@RemiStudios/artificial-intelligence-poker-and-regret-part-3-
bb3210c79211.
Schelling, Thomas C. The Strategy of Conflict. Harvard University, 1997.
Selten, Reinhard and Rolf Stoecker. “End Behavior in Sequences of Finite Prisoner’s Dilemma
Games: A Learning Theory Approach.” Journal of Economic Behavior and
Organization, vol. 7, 1986, pp. 47-70.
Selten, Reinhard. “The Chain Store Paradox.” Theory and Decision, vol. 9, 1978, 127–159.
Smith, Vernon L. “An Experimental Study of Competitive Market Behavior.” Journal of
Political Economy, vol. 70, no. 2, 1962, pp. 111-137.
Sowell, Thomas. Basic Economics: A Common Sense Guide to the Economy. Basic Books, 2015.
Syl, Lars Parsson. “Why Game Theory Never Will Be Anything But a Footnote in the History of
Social Science.” Real-World Economics Review, no. 83, 2018, pp. 1-20.
Warren, Marian. “Chapter 2 Prospect Theory and Expected Utility Theory.” SlidePlayer,
SlidePlayer, slideplayer.com/slide/8475017/.
Wheeler, Gregory, "Bounded Rationality", The Stanford Encyclopedia of Philosophy (Fall 2020
Edition), Edward N. Zalta (ed.),
https://plato.stanford.edu/archives/fall2020/entries/bounded-rationality/>.

An Analysis of Game Theory

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

An Analysis of Game Theory

Uploaded by

Copyright:

Available Formats

Don’t Hate the Player, Hate the Game: An Analysis of Game Theory

perspectives proposed by social scientists.

1/2 Rock Paper Scissors

Paper (1,0) (0,0) (0,1)

Scissors (1,0) (1,0) (0,0)

players have chosen.

(Artificial intelligence, Poker, and Regret)

Rationality and Reputation

assumption of rational behavior -- not just of intelligent behavior motivated by a conscious

calculation of advantages, a calculation that in turn is based on an explicit and internally

inconsistent value system, a frequent randomization of actions, an inability to communicate --

deliberately cultivate irrational behavior, such as threatening to harm themselves or acting as if

communicate effectively, a uniform decision-making system, sound judgement -- can at times

Theoretical payoffs of the game are shown below:

Monopolist/Competitor Stay Out Enter

Cooperate (5,1) (2,2)

piece of the market.

predatory pricing. Furthermore, rationality can be a great disadvantage. If the monopolist

reputations develop” (303).

aggressive, such aggression can pay off in the long run.

way of doing this is through randomization, or mixed strategies. A randomized strategy is

have an incentive to communicate, either explicitly or tacitly if open communication is

turn influencing their decisions.

perceived as incredible by other nations. Sandeep Baliga, a professor at Northwestern University,

communication: a reputation driven by predictability.

Within economics, utility has a special meaning. It is a measure of satisfaction that a

whose utility is 3 is valued thrice as much as an outcome with a utility of 1. By contrast, a

simply prefers it more.

decisions that give them a positive expected utility.

neutral, risk-averse, and risk-prone players.

● For a risk-neutral player, expected utility is directly proportional to expected payoff.

losses than for gains.” This function is illustrated below.

There is an obvious flaw to mathematical models of utility; they do not hold up in

more varied (and interesting) individual preferences” (Guala 11).

how various cultures and societies measure utility.

The Role of Choice and Strategic Moves

choices and dominant strategies.

alternatives to be controlled (Acemoglu). Throughout history, there is a pattern of the removal of

choice to control populations.

whichever player is the first to make a commitment has a strategic advantage.

countermeasure to commitments. A commitment removes a player’s alternatives, and a threat

unable to communicate with the natives and thus immune to threats.

commitment to aid democratic countries threatened by Soviet aggression. The Formosa

must be recognized and understood by our adversaries”. Myerson, expanding on Schelling’s

retaliation in response to aggression -- resolve -- with the credible promise of cooperation in

adversaries to cooperate. Likewise, in situations where it is believed that aggression would be

reputation of aggression to credibly threaten these enemies with punishment.

equivocation, duplicity, and ultimately uncertainty” (312).

with high credibility a harsh retaliatory strike” (328).

restaurant sit-ins during the Civil Rights Movement were so powerful.

dominant strategy. A strategy s is strictly dominated if there is another strategy s’ where no

known as a weakly dominant strategy. A strategy s is weakly dominated if there is another

between A, B, and C, while Player 2 has actions D, E, and F.

A (2,1) (4,0) (1,4)

B (3,4) (3,2) (2,3)

C (1,0) (2,5) (0,2)

A (2,1) (4,0) (1,4)

B (3,4) (3,2) (2,3)

A is a strictly dominated strategy.

from F. By iterative removal, we have reduced a complicated game of 9 possible outcomes to a

simple subgame where only one outcome is possible: (B,D).

reconnaissance units. Thus, there were four different possible outcomes:

convoy on the 1st day, leaving only 2 days of bombing.