You are on page 1of 47

Don’t Hate the Player, Hate the Game: An Analysis of Game Theory

Game theory is a field of economics that concerns the mathematical aspect of decision

making and strategic interaction. The Stanford Encyclopedia of Philosophy defines game theory

as the study of the ways in which interacting choices of economic agents produce outcomes with

respect to the preferences (or utilities) of those agents. Game theory focuses on the analysis of

games of strategy, in contrast with games of chance, like UNO, or games of skill, like basketball.

What differentiates games of strategy from the latter two is the fact that each player’s actions

affect other players’ decisions. In a strategic interaction, one must consider both what their

opponents have done in the past and will be expected to do in the future. This notion is what

gave birth to the field of analysis known as game theory. Game theory was vaguely studied in

past centuries before being formally defined and introduced as a discipline in the 1900’s by the

mathematician John Von Neumann and economist Oskar Morgenstern. It has been applied to a

variety of topics, including sports, card games, and historical battles. In this paper, I intend to

analyze the successes and failures of game theory in modelling human decision making, explore

several interesting insights and lessons that the discipline provides, and examine alternative

perspectives proposed by social scientists.

Introduction

A game is defined as having two or more players, a set of strategies for each player, and a

set of outcomes based on the players’ payoffs, or utilities. A game can be visualized in two

different ways. A matrix shows each outcome as an intersection of the players’ choices; it is used

in normal form games where both players make decisions simultaneously, such as Rock-Paper-

Scissors.

1/2 Rock Paper Scissors


Rock (0,0) (0,1) (1,0)

Paper (1,0) (0,0) (0,1)

Scissors (1,0) (1,0) (0,0)

In contrast, an extensive form game in which decisions are taken sequentially uses a

game tree, where each ending node represents an outcome. For example, in a game of poker, the

players take turns to choose their action, and each player’s choice is based on what previous

players have chosen.

(Artificial intelligence, Poker, and Regret)

Rationality and Reputation


Conventional economics assumes that humans act rationally when making choices; in

other words, humans follow “rational self-interest”, always making choices in a way that

maximizes their own utility. Economists gave the name homo economicus to describe a

hypothetical person that makes decisions with perfect rationality; this type of person has

“complete information about the options available for choice, perfect foresight of the

consequences from choosing those options, and the wherewithal to solve an optimization

problem that identifies an option which maximizes the agent’s personal utility” (Stanford

Encyclopedia of Philosophy). This stipulation of rationality is the basis for countless theories

about how choices are made and how economic systems function. Prominent economist Thomas

Sowell wrote that people “respond rationally to the incentives and constraints of the system in

which they work. Under any economic or political system, people can make their choices only

among the alternatives actually available” (121). However, in recent years a rival discipline,

known as behavioral economics, has been breaking ground among academics. Behavioral

economists, such as Dan Ariely and Daniel Kahneman, combine psychology and economics to

suggest that humans are irrational, and their decision making is frequently impaired by external

and internal biases. Classical game theory agrees with orthodox economics, in that it stipulates

that players in strategic actions will always behave rationally. Because rational behavior is the

end goal -- that is, we want to get rid of biases and make decisions optimally--, game theoretical

analysis works well as a model for every day choices. However, biases and psychological

heuristics that are a part of human nature must be factored into this decision making model.

Nonetheless, actual choices made by humans frequently differ to those that rational

agents would make in theory. After all, most humans are not constantly taking account of the

order of their preferences and optimizing to make the best decisions like a homo economicus
would. Thomas Schelling, an economist who utilized game theory to analyze deterrence during

the Cold War, argued that while “the premise of ‘rational behavior’ is a potent one for the

production of theory”, the resulting theory might not adequately explain actual behavior:

“If we confine our study to the theory of strategy, we seriously restrict ourselves by the

assumption of rational behavior -- not just of intelligent behavior motivated by a conscious

calculation of advantages, a calculation that in turn is based on an explicit and internally

consistent value system. We thus limit the applicability of any results we reach.” (4)

Schelling took this idea one step further with the rationality paradox, which stipulates that

(1) irrationality can be a strategic advantage and (2) rationality can be a strategic disadvantage.

This paradox rests on the basis that a player’s reputation determines how credible he is and how

opponents will respond to threats he makes. A parent who has punished a child adequately in the

past will not need to shout or threaten the child to the same extent in the future, since the child

understands the consequences for bad behavior. A mob leader who has brutally killed off

informants can worry less about potential traitors in the organization, as his reputation warns

them against betrayal. The paradox of rationality lies in the fact that certain irrational traits -- an

inconsistent value system, a frequent randomization of actions, an inability to communicate --

can change the opponents’ actions in favorable ways. For example, asylum inmates that

deliberately cultivate irrational behavior, such as threatening to harm themselves or acting as if

they are unable to speak or understand others, develop reputations that make them immune to

threats. On the other side of the spectrum, attributes belonging to a rational agent -- the ability to

communicate effectively, a uniform decision-making system, sound judgement -- can at times

become a weakness and should accordingly be suspended. Schelling gives the example of a man

threatened with extortion breaking his own hand to suspend his ability to sign checks. Likewise,
a woman kidnapped and forced to call her family for ransom money is put at a disadvantage by

indicating to the kidnappers that she has sound judgement and rational will to live.

A great example of the benefits of an “irrational” reputation is the chain store paradox.

The paradox involves a monopolist who controls 20 different markets around the world. A

competitor arrives at each market, seeking to break the monopolist’s control. One by one, the 20

competitors choose whether to stay out of the market or enter. After each competitor makes a

choice, the monopolist then decides whether to cooperate and let the competitor gain some

market share, or to be aggressive and fight to drive the competitor out. A fight will involve the

monopolist adopting the costly action of predatory pricing, hurting all firms in the market.

Theoretical payoffs of the game are shown below:

Monopolist/Competitor Stay Out Enter

Cooperate (5,1) (2,2)

Fight (0,0)

If all players played rationally, all 20 competitors would enter their respective markets.

This is deduced using backward induction, the process of starting at the end of the game and

moving to the beginning. If the monopolist fights, it is not only to keep control of a market, but

also to scare off successive competitors. At the end of the game, after the monopolist has dealt

with the first nineteen markets, he has no reason to fight with the twentieth competitor because

there are no more competitors to scare off. Because he knows he will avoid a fight with the

twentieth competitor, he has no reason to fight with the nineteenth competitor, and because he

knows he will avoid a fight with the nineteenth competitor, he has no reason to fight the
eighteenth competitor. Using this line of reasoning and working all the way to the beginning of

the game, the monopolist would always choose to cooperate, allowing all twenty competitors a

piece of the market.

The Chain Store paradox has several major implications. Firstly, reputation has strategic

value. Reinhard Selten proposed that if the monopolist differs from the equilibrium strategy and

instead fights early competitors, the competitors that follow will be much less likely to enter the

market. An established reputation for aggression increases the credibility of the threat of

predatory pricing. Furthermore, rationality can be a great disadvantage. If the monopolist

follows the purely rational strategy prescribed by backward induction, he only gets a payoff of

40. Yet, if he behaves extremely aggressively or irrationally towards the first several

competitors, he will be successful in deterring competitors. It does not matter whether the

monopolist is actually irrational or not, only that the monopolist appears irrational to the other

players. A player that changes the way he or she is perceived by other players is essentially

sending a strategic message, influencing the behavior of opponents. Stanford professors Paul

Milgrom and John Roberts have argued that if the game is one of asymmetric information, where

competitors are unsure of the monopolist’s payoffs, motivations, and strategy, predatory pricing

would be a rational response to a rival firm entering the market. In their view, it is the “lack of

complete information that gives rise to reputation possibilities.” When competitors are in doubt

and must rely on the results of previous interactions to base their information on, predation will

emerge as an equilibrium strategy. Generalizing their conclusions, Milgrom and Roberts explain

that “in any situation where individuals are unsure about one another’s options or motivation and

where they deal with each other repeatedly in related circumstances, we would expect to see

reputations develop” (303).


Political scientist James E. Alt and his colleagues constructed a model in which a

hegemon -- a global power -- benefits by incurring costs to establish a reputation for coercion

early on in the game rather than cooperate with 2 smaller allies. The model involves uncertainty

and asymmetric information, as the allies are unaware of whether each instance of coercion will

be costly or cheap to the hegemon. Whether or not the hegemon punishes the first ally will

influence the second ally’s decision in either challenging the hegemon or obeying. Alt provides a

real life example of his model with the example of the 1980’s oil glut, in which Saudi Arabia

pursued overproduction of oil in response to the 6 year decline of oil prices and frequent

violations of OPEC’s quota system by other countries. Because of Saudi Arabia’s low costs of

production allow it to withstand periods of price decline, the country was able to establish a

reputation for toughness and deter other oil producers, especially non-OPEC nations. Even

recently, in 2020, Saudi Arabia was able to win a two month price war with Russia because of its

low costs of coercion. Thus, cooperation is not always beneficial; if it is not costly to be

aggressive, such aggression can pay off in the long run.

Ultimately, we can adjust the meaning of “rationality” to argue that humans actually do

behave rationally. Examples in this paper show that rational behavior can change drastically

when going from a simultaneous game to a sequential game or when going from a single shot

game to a repeated game. Behavioral economists mostly focus on cognitive issues, such as

humans’ inability to understand statistics (i.e. Bayes’ rule) and inconsistent decision making

when problems are presented in different ways. These issues mostly deal with the individual and

his or her perspective of the world. Yet, when it comes to strategic interaction and the study of

two or more agents, it could be said that human beings act in an optimal way. For one thing,

humans are able to learn from their mistakes and alter strategies in a way that converges to
equilibrium. Experimental economists have proven this concept with their studies of repeated

games. In other situations, such as sports, players intuitively know the probabilities required to

achieve the optimal results. Moreover, a large aspect of strategy involves communication:

accurately predicting others’ actions and expectations, while getting one’s own intentions across

when necessary. On the whole, human beings show skill in making these predictions. In one

study, Schelling asked participants questions that required mutual coordination, such as:

● Name “heads” or “tails”. If you and your partner choose the same option, you win a

prize.

● Write a positive number. If you all write the same number, you win.

For the first question, 6/7 of participants chose heads. For the second, 40% of participants agreed

on the number 1. If that percentage seems a bit small, just think about how many positive

numbers there are in total. It is amazing that even 1% of participants could coordinate their

responses, let alone 40%. Humans’ incredible ability to coordinate without explicit

communication has been shown not only in experiments, but throughout history. Following

World War 1, where poison gas injured about 500,000 soldiers, the combatants of World War 2

had an unspoken agreement not to use any chemical weapons. In China, the 2 warring

governments, the Communists and the Nationalists, had no official armistice or peace treaty but

still agreed that the Formosa Strait was the official border between their jurisdictions. Even in the

school, when the teacher tells students to form pairs, one can see pairs of kids immediately look

at each other knowing that each of them wants the other to be their partner. People’s uncanny

ability to coordinate expectations can be seen everywhere, from the classroom to the battlefield.

It is important to distinguish between zero-sum games and non zero-sum games in terms

of optimal communication. In zero-sum games, where one player’s gain is another’s loss, there is
no intention to communicate. In fact, players must do the opposite and hide their strategies; one

way of doing this is through randomization, or mixed strategies. A randomized strategy is

“dramatically anti communicative” (Schelling 105). If a soccer player flips a coin to determine

where he shoots right or left during a penalty kick, the goalie will never be able to accurately

predict the direction. By contrast, in mixed-motive games or pure coordination games, players

have an incentive to communicate, either explicitly or tacitly if open communication is

unavailable. Schelling gives a great example of this concept. He explains that “in chess [a zero

sum game] it does not matter what the players know about each other or whether they speak the

same language or have a common culture; nor does it matter who played the game previously or

how it came out” (106). Yet, we can change the game of chess to simulate real life war, where

each player has a unique value system and all players seek to avoid mutual destruction. In this

new version of chess, players are rewarded for three different things: the pieces they capture, the

pieces they are left with at the game’s end, and the squares these pieces occupy. Chess 2.0 forces

the players into negotiation, as each is unsure of which squares are valuable to the opponent, but

both want to minimize the capture of pieces. In any conflict like Chess 2.0, when players must

coordinate but communication is limited, they must rely on other ways to form an idea of their

opponent’s intentions: patterns of behavior, traditions, historical precedents, etc. This is why

setting precedents is so important for a new leader or nation state. By setting a precedent or

following a tradition, an agent is sending a message to the other players about his strategy and in

turn influencing their decisions.

Reputation is a way for agents to communicate their intentions and influence other

players. Yet is it better to cultivate a reputation of unpredictability and irrationality, like that of

the Chain Store paradox, or one of consistency, based on previous experiences? In the 1970’s,
Richard Nixon proposed the “madman theory” of foreign policy; Nixon believed that if

communist nations thought the American president was impulsive and irrational, they would

show reluctance in upsetting the United States. Nixon used this strategy to try to bring an end to

the Vietnam War, while many global experts saw Trump’s policy against North Korea as another

example of madman theory (Coll). While this theory may seem constructive, it has been

criticized by both political scientists and economists. When a leader behaves like a madman, he

creates two major problems. Firstly, like in the case of Nixon and Brezhnev, the leader’s

adversaries might not understand what intention he is trying to communicate. Secondly, the

madman will not influence his opponents’ behavior, as any strategic move he makes will be

perceived as incredible by other nations. Sandeep Baliga, a professor at Northwestern University,

explains this complication well: “If you are a truly mad leader, why would anyone change their

behavior as a function of what you do? If they know you might do something crazy whether they

do something you like or not, they might just say ‘the hell with it, I’ll do whatever I want.’ The

‘madman’ actually has to be clever, doing something crazy if you don’t do what he wants and

being accommodating if you do. In that case, well, he’s no longer mad” (Calvert). The world of

diplomacy is not a zero-sum game; communication is key. While a monopolist may find it

beneficial to be a “madman”, a global leader must cultivate a reputation which best achieves

communication: a reputation driven by predictability.

Utility

Within economics, utility has a special meaning. It is a measure of satisfaction that a

person gets from a good or service. Within game theory, utility is used to determine the value of

outcomes. An ordinal utility function orders outcomes based on their pure payoff; an outcome

whose utility is 3 is valued thrice as much as an outcome with a utility of 1. By contrast, a


cardinal utility function places outcomes in order of the player’s preference. For example, an

outcome of utility 2 is not worth twice as much as an outcome of utility 1, but rather the player

simply prefers it more.

Utility plays a big part in decisions such as lotteries, where there is high risk and high

uncertainty. Economists consider the weighted average value of all possible outcomes as the

“expected value”. The mathematical definition is∑ ❑xiP(xi) : where x is the value of the
i

outcome and P is the probability of an outcome. A gamble in which a person has a 50% chance

of winning a dollar and a 50% chance of losing a dollar as an expected value of 0 (50*1 + 50*-

1). If a person equates his or her preferences to monetary values, then expected utility is the same

as expected value. Value is a concrete, unchangeable statistic, while utility measures the weight

different agents give to specific outcomes. Thus, while expected value is the same for all players,

expected utility may be different. Expected utility theory states that rational agents will make

decisions that give them a positive expected utility.

The mathematician Daniel Bernoulli was the first to study the implications of marginal

utility on a decision-making model. He proposed that the accumulation of wealth was affected by

diminishing marginal utility; in other words, if a person started at nothing and gained a dollar a

day for a year, the first dollar gained would provide greater utility than the last. Money gained

would mean more for a poor man than a rich man, as they have different starting points.

Bernoulli’s utility function factors in risk aversion, as the function is concave down rather than

linear. Each unit of wealth gained provides less utility than the previous one. Bernoulli described

the function as logarithmic, because it would take money to multiply by the same proportion for

utility to increase by the same value. If 2 dollars gives the player a utility of 1 and 4 dollars a

utility of 2, then 8 dollars gives the player a utility of 3, 16 dollars a utility of 4, and so on. Von
Neumann and Morgenstern expanded on this utility function to describe the behavior of risk-

neutral, risk-averse, and risk-prone players.

● For a risk-neutral player, expected utility is directly proportional to expected payoff.

● For a risk-averse player, expected utility is proportional to the root of expected payoff.

● For a risk-prone player, expected utility is proportional to the power of expected payoff.

In 1979, Israeli psychologists Daniel Kahneman and Amos Tversky launched a strong

criticism of expected utility theory and proposed a new concept, called prospect theory, to

explain how humans value choices. The pair suggested that while expected utility theory may

explain how rational agents, such as the hypothetical homo economicus, weight decisions, the

theory is frequently violated in real life and does not sufficiently explain actual behavior. In

expected utility theory, decisions are weighted exactly by their probability. In a gamble decided

by a coin flip, each choice will be weighted 50%. However, Kahneman and Tversky suggest that

human cognition involves a certainty effect, where “people overweight outcomes that are certain,

relative to outcomes that are merely probable.” Kahneman and Tversky proved their critique by

asking research subjects to name their choices in response to several pairs of hypothetical
gambles, such as the following (note that the monetary units are Israeli shekels): “Choose

between (A) an 80% chance of gaining 4000 or (B) a 100% chance of gaining 3000.” 80% of

participants chose the latter option. The substitution axiom of expected utility theory states that if

an individual prefers choice A over choice B, then he will prefer some probability p of choice A

over the same probability p of choice B. However, when the question posed above was

manipulated, and each probability divided by 4 so that the choices became either (A) a 20%

chance of gaining 4000 or (B) a 25% chance of gaining 3000, participants switched their

response, with 65% of participants choosing A. Repeated violations of the substitution axiom

illustrate this certainty effect; humans prefer the probable to the possible, yet when all choices

have a low probability, humans prefer a higher monetary gain. The certainty effect states that

high probability outcomes are underweighted, and conversely the possibility effect states that

low probability outcomes are overweighted. These two principles are shown in the table below,

which shows estimates for decision weights based on empirical studies (Kahneman 2013).

Kahneman and Tversky also pointed out the reflection effect, referring to the fact that

when the researchers changed the questions by replacing gains with losses, the participants’

preferences switched. For example, when the question posed above was changed, and the choices

became either (A) an 80% chance of losing 4000 or (B) a 100% chance of losing 3000, 92% of

participants chose A. This effect was shown multiple times throughout the experiment.

Kahneman and Tversky used the certainty, possibility, and reflection effects to draw several

major conclusions which form the basis for prospect theory. Humans are strongly risk averse to

maintain gains, but strongly risk seeking to avoid losses. Moreover, the utility lost from a
decrease in assets is much greater than the utility gained from an equal increase in assets.

Ultimately, expected utility theory fails in that it only focuses on final states of wealth and

ignores whether an individual gained or lost to reach this final state. In reality, human decision

making revolves around gains and losses; “the carriers of value or utility are changes of wealth,

rather than final asset positions that include current wealth” (10). Kahneman and Tversky

propose that unlike the value function in expected value theory, which takes monetary assets as

an input, the value function for actual human behavior is “(i) defined on deviations from the

reference point; (ii) generally concave for gains and commonly convex for losses; (iii) steeper for

losses than for gains.” This function is illustrated below.

There is an obvious flaw to mathematical models of utility; they do not hold up in

situations where payoffs are not numerical and players take into account abstract concepts like

friendship, pride, or social customs. “Any relationship between value and utility only makes

sense if the pay-off is numerical – which usually means monetary” (Kelly). Pride especially is a

big factor in duels and games of attrition, where players remain in the game even when the cost

of playing has far exceeded the payoff from a chance of winning. “One cannot proceed to make
any inference about the empirical status of GT [game theory] without figuring out first what

individuals care about. If they care about fairness, for example, it is still entirely possible that

they are rational maximisers of utility – once the latter has been defined appropriately to capture

fairness considerations. The definition of the utility function as ranging exclusively over one’s

own monetary outcomes is unnecessarily restrictive and in principle can be relaxed to allow for

more varied (and interesting) individual preferences” (Guala 11).

Take the ultimatum game for example. In this famous game, there are 2 players and a

fixed sum of money, or “pie”. Player A makes the first choice of how to divide the pie; he can

choose an amount to give to Player B, and the rest he can take for himself. Then, Player B has a

choice of either accepting or rejecting Player A’s offer. Obviously, when utility is measured in

terms of the amount of money, any rational player with the role of Player B would accept any

offer greater than 0 from Player A. Correspondingly, a rational player with the role of A would

offer the smallest possible amount to Player B to maximize his own utility. However, the game

works differently in real life. On average, 40% of the pie is offered to Player B. In a study of 75

different results taken from ultimatum game experiments, the percentage of the pie offered

ranged from 26% to 58%. Researchers have found that fairness is a priority for those receiving

the offer, who would rather gain nothing than be given pennies compared to Player A; in other

words, utility is a combination of different factors, including abstract values like fairness and

concrete measurements like monetary gain. Thus, when the sum of money becomes bigger, most

receivers put their pride away in favor of the money, and rejection rates decrease substantially.

The ultimatum game is useful for another key reason; it is an easy way to highlight the

differences in decision making between populations and important cultural values within

communities. Research subjects in the United States and Yugoslavia offered much more money
to their partners than those in Israel did (Roth). In the 75-experiment meta analysis conducted by

Oosterbeek, people in Asian countries had a much higher rejection rate than Americans. This

could be attributed to the dichotomy of collectivism -- common in the Asian countries -- versus

individualism which forms a cornerstone of American culture. However, Oosterbeek and his

colleagues found no significant correlation between a country’s experiment results and its degree

of individualism. Still, in countries with high respect for authority, Player A’s offered a lesser

percentage of the pie to their partners. Cultural values can have an enormous impact on a

nation’s economy; countries where strangers are not trusted, and business is conducted within

families or tribes, have lower GDPs than countries with more open societies (Sowell 966).

Simple game theoretical experiments like the ultimatum game can pinpoint the differences in

how various cultures and societies measure utility.

The Role of Choice and Strategic Moves

When Hernan Cortes landed on the coast of Mexico, he burned his own ships; this

decision was crucial, as it prevented his soldiers from turning back and forced them to only one

choice. This “point of no return” commits players to one decision by removing all other

alternatives. Cortes was not the first to do so; The legendary commander Alexander the Great

and the Chinese general Sun Tzu both advocated for the strategy of burning the boats. Not only

military history, but history as a whole, has been driven by the presence (or absence) of alternate

choices and dominant strategies.

After the Black Plague, peasants in Western Europe had new mobility from a scarcity of

labor, which led to economic freedom. Their counterparts in Eastern Europe were still under

tyrannic control of feudal masters even long after the Black Plague; the absence of alternatives

allowed the feudal system to remain stable. Before, the peasants were forced to work and were
unable to leave their land without the permission of the lord. But after the plague, the peasants

had new strategies with new payoffs and could even change certain existing payoffs. The

peasants could push for better conditions and higher wages or threaten to leave. By contrast,

peasants in the East were tied to the land and were forced to work. The elites in Eastern Europe

exploited the serfs by taking away any dominant strategies available to them.The serfs could not

even choose between any two actions; the only option was to work. It is this removal of

strategies which caused Eastern Europe to remain a feudal society for the next several centuries.

Similarly, Spanish colonists in Latin America removed alternate choices of workers by tying

them to the land in a system of debt peonage known as encomienda. In contrast, American

settlers had substantial freedom from the British government, because settlers had too many

alternatives to be controlled (Acemoglu). Throughout history, there is a pattern of the removal of

choice to control populations.

More generally, it is advantageous not only to remove an opponent’s choices if able to,

but also to remove one’s own choices. A player can do this through commitment, the strategy of

forcing oneself to choose a certain action regardless of the other players’ choices. By committing

to an action, a player is not just removing all alternative choices, but he is also forcing the

decision onto his opponent. Thus, commitment is often a useful tool in bargaining. When a

player is able to commit and his opponent is not, the player has an advantage, as his offer is final.

If both players are able to commit, each player has an incentive to commit to an action first. If an

object with a value of $10 is being sold and the buyer proposes a final offer of $7, the seller is

unable to make his own offer and his actions are restricted. Likewise, if the seller commits first

and makes a final offer of $12, the buyer is unable to negotiate and has only 2 choices. Thus,

whichever player is the first to make a commitment has a strategic advantage.


While commitments are advantageous for Player 1, threats favor Player 2 and serve as a

countermeasure to commitments. A commitment removes a player’s alternatives, and a threat

does the same contingent on a specific action made by an opponent. The more likely that a player

would carry out a threat, more credible the player is and the less likely that he or she would

actually have to fulfill the threat. Yet if the player has no incentive to fulfill the threat, then he

must convince his opponent that he would actually fulfill it. Thus, the reputation of a player is

extremely crucial in the responses of his opponents. Moreover, the player must effectively

communicate threats to the opponent. A hostage who only speaks a foreign language will be

unable to do the kidnapper’s bidding. Foreign troops brought in to put down a rebellion are

unable to communicate with the natives and thus immune to threats.

Strategic moves -- commitments, threats, and promises -- are frequently used by global

leaders in order to influence other nations. Examples of this abound within the history of U.S.

foreign policy, especially during the Cold War. The Truman Doctrine was an American

commitment to aid democratic countries threatened by Soviet aggression. The Formosa

Resolution was a threat designed to deter the Communists in China from invading the

Nationalists in Taiwan. Deterrence has been a cornerstone of United States policy not only in the

late twentieth century, but also in the early 1900’s in response to European expansionism and in

the twenty-first century in response to terrorist threats. In 2007, amid American operations in

Iraq and Afghanistan, Roger Myerson, a professor from the University of Chicago, wrote that “a

successful deterrent strategy requires a balance between resolve and restraint, and this balance

must be recognized and understood by our adversaries”. Myerson, expanding on Schelling’s

main ideas, argued that it is useless for the U.S. to only use the threat of military action as a

deterrent. Such a threat alone, like Bush’s threat to nations that sponsor terrorism, would not stop
enemies but encourage them further, as the enemies have no guarantee that cooperation would be

beneficial to them. Bombing a hostile country without presenting an option for peace only

increases the enemy’s conviction to defend itself. Building up one’s military convinces an enemy

that you will attack them, leading to the build up of their own military to defend against a

possible attack. “Retaliatory actions and threats that lack clearly defined limits can raise fears of

deep invasions and thus can motivate people on the other side to seek militant leadership that

may be better able to defend them” (15). Instead, the U.S. should combine the credible threat of

retaliation in response to aggression -- resolve -- with the credible promise of cooperation in

response to cooperation -- restraint. Myerson makes the point that whenever self-interest doesn’t

support either action, reputation is necessary to back it up. For example, if it is more profitable

for the United States to be aggressive to cooperative adversaries, and if these groups believe the

U.S. will be aggressive no matter what they do, a reputation for cooperation will convince

adversaries to cooperate. Likewise, in situations where it is believed that aggression would be

costly for the U.S., and that the U.S. will acquiesce to hostile enemies, the U.S. must rely on a

reputation of aggression to credibly threaten these enemies with punishment.

In one study, mathematician Mark Kilgour and political scientist Frank Zagare compared

deterrence under complete and incomplete information. In game theory, credibility is associated

with rationality; assuming all players are rational, a credible threat would only be one which is a

rational choice for a player. Thus, in a perfect world where all nations are aware of each other’s

payoffs, there are only three possible outcomes in a conflict between two nations. Either both

nations find it profitable to engage in conflict when challenged, only one does, or neither

does.These 3 cases are called “Prisoner’s Dilemma”, “Called Bluff”, and “Chicken”,

respectively. Mutual deterrence is only possible in the Prisoner’s Dilemma case, when both
nations have a credible threat to retaliate. In the Called Bluff case, the country with the credible

threat has an advantage, as its opponent will prefer to surrender when challenged rather than

retaliate. The last case, Chicken, is much more unpredictable, since neither country has a credible

threat and thus both will find it optimal to not cooperate, risking mutual destruction. However, in

the real world, neither nation is aware of the other’s payoffs, and neither knows whether the

other possesses a credible threat. The real world is characterized by “nuance, ambiguity,

equivocation, duplicity, and ultimately uncertainty” (312).

In Kilgour and Zagare’s model of deterrence under uncertainty, each nation has some

probability (pA , pB) of being a “Prisoner’s Dilemma” player which prefers conflict to capitulation

and some probability (1 - pA , 1 - pB) of being a “Chicken” player which prefers capitulation to

conflict. While each nation knows its own preferences and both pA and pB , neither nation knows

the other’s preferences. Thus, each nation may correctly or incorrectly guess whether the other

possesses a credible threat. This model has several major implications for sustaining deterrence.

Firstly, neither side needs to correctly assess what type of player the other is in order to bring

about a deterrence equilibrium. Suppose both players are “Chicken”, yet both believe the other is

a “Prisoner’s Dilemma” type; both players, preferring cooperation to capitulation, will cooperate.

“Accurate assessments of the strategic environment are neither necessary nor sufficient for the

success of mutual deterrence” (317). Secondly, there is some threshold of pA, and pB such that

when both probabilities pass the threshold, both sides will cooperate regardless of what type of

player they are. To decrease the threshold so that the chance for deterrence is more likely, there

are several strategies available. One such strategy is to increase the payoffs associated with

mutual cooperation, or the status quo. The more satisfied a nation is with the existing state of

affairs, the less likely it is to engage in conflict. Another strategy is to decrease the payoffs
associated with mutual conflict in a “Prisoner’s Dilemma” situation. Even if a nation prefers

conflict to capitulation, the higher the cost of this conflict is, the greater the chance for peace.

Ultimately, the model gives mathematical support for policies that can help maintain peace in the

nuclear era. Kilgour and Zagare advise global powers to “never behave aggressively, but threaten

with high credibility a harsh retaliatory strike” (328).

There is an interesting paradox regarding the concept of commitment within game theory.

Just as in some situations it is rational for an agent to behave irrationally, in similar ones it is

strategically advantageous to be weak. The word “weak” refers not to a player’s physical or

mental strength, but rather the collapse of alternate possibilities which stems from the inability of

the player to make a choice. Schelling notes that “when a person has lost the ability to help

himself, or the power to avert mutual damage, the other interested party has no choice but to

assume the cost or responsibility” (37). A driver whose high speed prevents him from avoiding a

collision forces the other driver sole responsibility in stopping an accident. When two people are
dropped in some area at different locations and must find each other to escape, they will most

likely agree to meet up at a halfway point so that each one would have to walk some distance.

Yet, if one person was unable to communicate with the other, he would have an advantage. If he

knows that his partner knows that he is unable to communicate, then all he has to do is sit in the

same location, forcing his partner to come to him. By making the decision of denying oneself a

choice, agents play optimally. Relinquishing the initiative and imposing it on the opponent is

often the most effective strategy for a disadvantaged agent, which is why nonviolent acts like the

restaurant sit-ins during the Civil Rights Movement were so powerful.

In game theory, a strategy that is always better than a different one is known as a strictly

dominant strategy. A strategy s is strictly dominated if there is another strategy s’ where no

matter what action the other player chooses, the payoff from playing s’ will always be greater

than the payoff from playing s. A strategy that is always at least as good as a different one is

known as a weakly dominant strategy. A strategy s is weakly dominated if there is another

strategy s’ where no matter what action the other player chooses, the payoff from playing s’ will

always be at least as much as the payoff from playing s. The concept of dominant strategies is

especially useful in a game of complete information where each player knows the structure of the

game as well as the other players’ possible actions and payoffs. A rational player would never

play a dominated strategy, one that always has a better alternative, and therefore would

completely disregard any such strategy. Therefore, if player A knows that player B has a

dominated or a dominant strategy, he can easily use that to his advantage to maximize his own

payoff.

Iterated removal of strictly dominated strategies refers to the process of removing all

strategies that will never be played to decrease the number of outcomes in the game. Game
theorists use this concept to their advantage to break large games into smaller ones and locate

equilibrium points. Here is an example where each player has 3 actions. Player 1 can choose

between A, B, and C, while Player 2 has actions D, E, and F.

1/2 D E F

A (2,1) (4,0) (1,4)

B (3,4) (3,2) (2,3)

C (1,0) (2,5) (0,2)

At first glance, it seems like multiple outcomes are possible. However, if we look closely at the

payoffs, we can see that C is a strictly dominated strategy. Whether Player 2 plays D, E, or F

doesn’t matter, because Player 1 will always do better by switching from C to a different

strategy. This means that Player 1 will never choose C and we can eliminate C from the payoff

matrix.

1/2 D E F

A (2,1) (4,0) (1,4)

B (3,4) (3,2) (2,3)

Now we see that E is a strictly dominated strategy. We can repeat the process.

1/2 D F

A (2,1) (1,4)
B (3,4) (2,3)

A is a strictly dominated strategy.

1/2 D F

B (3,4) (2,3)

Now there is only one action left. Player 2 will choose a payoff of 4 from D over a payoff of 3

from F. By iterative removal, we have reduced a complicated game of 9 possible outcomes to a

simple subgame where only one outcome is possible: (B,D).

1/2 D

B (3,4)

In real situations, knowing the behavior of our opponents and which strategies are

dominated (which choices they are least likely to make) can narrow down our viewpoint from a

variety of outcomes to only a few possible scenarios. The Battle of the Bismarck Sea was a

World War 2 battle between the Allies (the U.S. and Australia) and Japan. The Japanese were

sending a convoy on a 3 day journey from Rabaul, New Britain to Lae, New Guinea. The

Japanese could choose to send the convoy along the north of New Britain or along the south of

New Britain. Likewise, the Allies could choose to fly air reconnaissance units to the north or to
the south. At the time, there was heavy rain in the north which would decrease visibility for

reconnaissance units. Thus, there were four different possible outcomes:

● If both sides chose north, low visibility would prevent the Allies from spotting the

convoy on the 1st day, leaving only 2 days of bombing.

● If both sides chose south, the Allies would spot the convoy immediately and get all 3

days for bombing.

● If the Allies chose north and the Japanese chose south, lack of reconnaissance would

prevent the Allies from spotting the convoy on the 1st day, leaving only 2 days of

bombing.

● If the Allies chose south and the Japanese chose north, lack of reconnaissance and low

visibility from the rain would prevent the Allies from spotting the convoy on the first 2

days, leaving only one day of bombing.

Allies/Japanese North South

North (2,-2) (2,-2)

South (1,-1) (3,-3)

The northern route is a weakly dominant strategy for Japanese. From a purely tactical standpoint,

the Japanese preferred the northern route because it would always give them an extra day of

protection. However, the Americans knew this and accordingly placed reconnaissance along the

northern route, guaranteeing 2 days of bombing. (North, North) is a Nash Equilibrium and is also

how the battle turned out historically. The Allies exploited the game by checking for dominated

strategies and then removing them. This allowed them to easily win the battle.
Mixed Strategies and Probability

In certain games, it is not advisable to select one action and play it consistently; this type

of behavior is called a pure strategy. Penalty kicks in soccer illustrate a perfect example of this

concept. If the kicker kept kicking left, the goalie would figure this out and move to the kicker’s

left. The kicker, seeing the goalie’s behavior, now changes his pure strategy to kick right, but the

goalie recognizes this and starts moving to the kicker’s right, and so the cycle continues on and

on.

A far better strategy for such situations is a mixed strategy, where an agent alternates

between two or more actions with certain probabilities. In a game of heads or tails, a player

chooses each action with a ½ probability. In a game of rock paper scissors, it would be a ⅓

probability. These probabilities are optimal; if, in the latter example, a player played rock with a

greater probability than the other two options, the opponent could easily exploit this by playing

paper more often. “The essence of randomization in a two-person zero-sum game is to preclude

the adversary’s gaining intelligence about one’s own mode of play -- to prevent his deductive

anticipation about how one may make up one’s own mind, and to protect oneself from tell-tale

regularities of behavior that an adversary might discern or from inadvertent bias in one’s choice

that an adversary might anticipate” (Schelling 175). By randomizing one’s strategy, a player

protects himself by minimizing the maximum possible loss he could receive.

In one study, Ignacio Palacios-Huerta recorded the statistics of more than 1000 different

professional soccer games to determine how well players make use of the minimax strategy. The

strategy rests on the idea that a player seeks to make the opponent indifferent by making his

expected value from both actions the same. Equalizing these values minimizes the opponent’s

maximum possible gain, hence the name “minimax” strategy. Where L denotes a kicker’s natural
side and R denotes a kicker’s nonnatural side, the chances of scoring can be seen in the payoff

matrix below:

Kicker/Goalie GL GR

KL 0.583 0.9497

KR 0.9291 0.6992

Using these numbers, we can mathematically derive optimal probabilities in accordance

with the indifference theorem.

For the goalie to make the kicker indifferent:

0.583GL + 0.9497(1 - GL) = 0.9291GL + 0.6992(1 - GL)

GL = 0.4199 ; GR = 0.5801

For the kicker to make the goalie indifferent:

(1 - 0.583)KL + (1 - 0.9291)(1 - KL) = (1 - 0.9497)KL + (1 - 0.6992)(1 - KL)

KL = 0.3854 ; KR = 0.6146

To play optimally, the kicker should kick to his nonnatural side about 39% of the time and to his

natural side about 61%, while the goalie should move to the kicker’s nonnatural side 42% of the

time and his natural side 58%. The real values that the researchers found, rounded to the nearest

whole number, were 40%, 60%, 42%, and 58% respectively. It is almost as if the players

intuitively knew the probabilities required to play rationally. This finding is just another example

of the powerful real world applications of game theory and how it can be used as an accurate

model for behavior.

To better understand the concept of minimax and the indifference theorem, one can see it
represented graphically. The graph below shows the kicker’s utility as a function of GL, the

probability the goalie moves to the kicker’s nonnatural side.

The blue line represents the kicker’s utility when shooting towards his natural side. The

red line represents the kicker’s utility when shooting towards his nonnatural side. Let’s call the

kicker’s utility at optimal GL (GL = 0.4199) UK’. When the goalie moves to the kicker’s

nonnatural side with a probability less than optimal, the kicker can do better than UK’ by

shooting to his nonnatural side. Conversely, when the goalie moves to the kicker’s nonnatural

side with a probability greater than optimal, the kicker can do better than UK’ by shooting to his

natural side. Only at the optimal probability, where the two lines meet, does the goalie guarantee

that the kicker can do no better than UK’ no matter what direction he shoots. UK’ is the point at

which kicker’s maximum possible gain is at a minimum. If I were to add a second graph showing

the goalie’s utility as a function of KL, I would see the exact same thing, where the goalie’s two

lines of utility intersect at KL = 0.3854.

The benefits of randomization do not only apply to zero-sum games, but also to strategic

moves like threats and promises in a more complex mixed-motive game. Schelling notes that

randomization is a way to make indivisible objects divisible and to scale down both threats and
promises. If the only threat available is something massive, like dropping a nuclear bomb, a

player does not want to threaten with certainty because there is a higher chance of failure

associated with bigger threats. Moreover, between nations, extreme threats reduce the enemy to

choose between two extremes, which is not what a country wants to do if an enemy has only

committed a small act of aggression. One way a player can scale down a massive threat is

through randomization. Instead of threatening to drop a nuclear bomb with 100% certainty, a

country can threaten to drop it 25% of the time, 50%, or any other optimal probability the

situation calls for. Using a “fractional” threat, a threat carried out with some probability less than

1, is beneficial when there is some chance that the threat will fail.

We can illustrate this concept using an example from Schelling:

1/2 C D

A (1,0) (0,1)

B (0,0) (-X,-Y)

Because Player 1 wants the outcome (A, C), he will threaten to play B with some probability p

such that Player 2 is induced to select C instead. But what if there is some probability P that the

threat fails? Because in this situation, like many others in real life, the cost of a threat failing is

too high, Player 1 will want to pursue a fractional threat. An effective threat will satisfy 2

requirements:

1. Player 2 actually has an incentive to give in to the threat; Player 2’s utility from ignoring

the threat must be lower than Player 2’s utility from capitulating. Thus, there is a lower

bound to p such that:


1(1 - p) - Y(p) < 0 or

p > 1/(1+Y)

2. Player 1’s expected value from a threat with success rate (1 - P) must be greater than his

expected value from not making the threat at all. Thus, there is an upper bound to p such

that:

(1 - P)(1) + (P)(0(1 - p) - X(p)) > 0 or

p < (1 - P)/(PX)

We now have calculated the optimal range of probability that Player 1 can threaten with:

1/(1+Y) < p < (1 - P)/(PX)

Note that in this case, a threat with 100% certainty is not optimal because Player 1 would be

better off not using a threat at all. Moreover, Player 1 has incentive to threaten with as little

probability as possible, as the lower p is the higher his expected gain. Therefore, certain

situations, especially those with a high chance of failure, call for fractional threats as an optimal

solution. Fractional threats may have been a powerful strategy had either the United States or the

Soviet Union used them during the Cold War. Because of the United States’ adoption of a policy

of massive retaliation, it was unable to deter Soviet aggression in Eastern Europe. One striking

example was the Soviet invasion of Hungary. The U.S. threatened massive retaliation, but this

approach failed as the threat was much too big for the situation at hand. The Soviets did not find

the threat credible enough for something in Eastern Europe, far from American strategic

interests. Had the U.S. followed Schelling’s advice and committed a fractional threat, it may

have been much more successful in deterring a Soviet invasion.

Equilibrium, Convergence, and Repeated Games


Equilibrium is a term used in many different fields like physics, chemistry, and

economics. Within game theory, there is the famous Nash equilibrium proposed by John Nash.

The Nash equilibrium is simply an outcome of the game where given what all other players

chose as a strategy, none of the players have an incentive to deviate from his own strategy. The

equilibrium is stable because the payoff structures of the game incentivize it. Furthermore, we

can see that in many different games, players alter their strategies until the equilibrium outcome

is achieved. This “convergence” to equilibrium can be seen in many different contexts, from

competitive markets (Smith) to cooperation experiments (Selten).

One famous example is Cournot competition. Cournot was a French economist who

studied competition between rival firms in the 19th century. In Cournot’s example, two firms

with the exact same product are unable to set their own prices. Prices are set by the market and

are a function of the total quantity of the good produced. Therefore, the two firms compete by

choosing different quantities of production in response to the other’s quantity. Price is given by

the following equation, where a and b are two random parameters:

P = a - b(q1 + q2)

The firms’ profits can be calculated by subtracting the cost, c, from revenue. It is assumed

marginal cost is constant, and it costs both firms the same amount to produce the good.

W1 = q1(a - b(q1 + q2)) - q1c

W2 = q2(a - b(q1 + q2)) - q2c

We can derive with respect to each firm’s quantity and set the equations to 0 to find the profit

maximizing equation for each firm.

W1’ = a - 2bq1 - bq2 - c; q1 = ((a-c)/2b) - (q2/2)

W2’ = a - bq1 - 2bq2 - c; q2 = ((a-c)/2b) - (q1/2)


Each firm’s quantity is simply a function of the other firm’s quantity, so we can use these

equations to graph the model.

The Cournot model has several downsides. Strict requirements, like a homogenous good

and the same marginal cost between firms, limit the model’s practical applications. Cournot

assumed that both firms were acting simultaneously to decide quantities, yet competition

between firms rarely involves such clear cut choices. Nonetheless, Cournot’s proposal has

important implications for the field of game theory. If both firms choose actions that maximize

their profits — their “best responses” — eventually the two firms will converge at a single state

of production. This phenomenon of convergence happens not only with quantity, but also with

price. Cournot’s colleague Joseph Bertrand proposed that if two rival firms produced the exact

same good with infinite price-elasticity, each firm would undercut the other until prices

converged to marginal cost.


German economist Heinrich von Stackelberg highlighted the power of commitment by

changing Cournot’s model from a simultaneous game, where both players choose quantity at the

same time, to a sequential game, where Player 1, the leader, chooses a quantity and Player 2, the

follower, responds. In such a situation, the leader knows that for any quantity he chooses, the

follower will respond accordingly using the best response function:

W2’ = a - bq1 - 2bq2 - c; q2 = ((a-c)/2b) - (q1/2)

The leader simply substitutes this q2 into his own profit maximizing function:

W1 = q1(a - b(q1 + q2)) - q1c

W1 = aq1 - bq12 + bq1q2 - q1c

W1 = aq1 - bq12 + bq1[((a-c)/2b) - (q1/2)] - q1c

W1’ = ((a - c )/2) - bq1

q1 = (a-c)/2b ; q2 = (a-c)/4b
Note that Cournot’s equilibrium has both firms producing at q = (a-c)/3b. Thus, when the

game changes from simultaneous to sequential, the leader produces more and the follower

produces less. Because the leader is able to commit first, he reaps higher profits by producing

more of the good before the follower has a chance to produce. The follower’s best option is to

produce less of the good, resulting in a first-mover advantage and an uneven share of the market.

Of course, in line with Schelling’s doctrine of strategic moves, the follower can counter this

commitment advantage by threatening to produce an excessive amount such that the good will

sell at dirt-cheap prices, hurting both the leader and the follower. As previously mentioned, this

strategy is what oil hegemon Saudi Arabia used to deter other countries from increasing their oil

production.

The convergence of equilibrium is shown more practically using repeated games.

Experimental research regarding repeated games has provided valuable insights into how

humans learn and change their choices over time. In fact, numerous studies show that over time,

agents behave more and more optimally, with their behavior converging to the equilibrium point.
One powerful tool that applies to repeated games is the Folk

Theorem, which states that “the set of Nash equilibrium outcomes of

the repeated game G∗ is precisely the set of feasible and

individually rational outcomes of the one-shot game G” (Hart 4).

Basically, an equilibrium in a repeated game will be any outcome that

can (1) be obtained from some combination of actions in the single

game G and where (2) each player’s payoff is greater or equal to his

minmax payoff. The minmax payoff refers to the maximum payoff the

player can receive in the case that the other players are trying to

minimize his maximum payoff; regardless of the other players’

decisions, a player can always get some minimum payoff, and this

payoff is called the minmax payoff. If the player is not individually

rational, he is receiving a payoff less than his minmax payoff and

thus has an incentive to deviate. This concept sets the stage for

equilibrium in repeated games. If there is an outcome a that is both feasible

and individually rational, then all players agree to play a combination of actions that will result

in a. If any player deviates from this plan, another player can threaten to switch to a minimax

strategy, reducing the first player’s payoffs from a to r, r being the highest payoff the first player

can get no matter what the other players do. In this way, any outcome a will be an

equilibrium outcome in the repeated game due to the threat of

punishment. The Folk Theorem is important because it “relates

cooperative behavior in the game G to non-cooperative behavior in

its supergame G∗” (Aumann). It provides a rational explanation to


cooperative and altruistic actions, which are normally considered

“irrational”; through cooperation, players are actually increasing

their individual payoffs in repeated games. Ultimately, the Folk

Theorem shows us that it is rational to sacrifice short term payoffs

for long term gains. However, real humans do not value future

payoffs as much as they value payoffs in the present. This disparity

is shown by the discount factor, a coefficient between 0 and 1. If the

discount factor is 0.9, the payoff in a certain period will be worth 90%

of the same payoff received in the previous period. Mathematicians

have proven that there exists some threshold, such that when the

discount factor passes below this threshold, the Folk Theorem’s

repeated game equilibrium will break down since players do not care

enough about future payoffs relative to present payoffs.

How does game theory explain learning -- the way rational agents change their behavior

over time as a result of experience? Economics frequently relates to the other social sciences, and

accordingly the two major models of learning in game theory are each connected to a different

discipline. Fictitious play is the first major model, and it is connected to statistics. Fictitious play

states that players learn by constantly updating the observed frequencies with which their

opponents play strategies. For example, suppose I am playing a third round of Rock, Paper,

Scissors with a friend. In the previous two rounds, I observed my friend play Rock, so I now

believe that he is playing Rock with 100% probability. I accordingly choose Paper, while my

friend chooses Scissors. Now, my observations show that my friend has played Rock twice and

Scissors once, so based on this distribution, my expected utility from any action is 0.66R +
0.33S, where R represents some payoff I receive when my friend plays Rock, and S is the payoff

I receive when my friend plays Scissors. This utility formula is maximized by Paper, so I choose

Paper, while my friend chooses Scissors. Since my friend has played Rock twice and Scissors

twice, I update my own utility formula to 0.5R + 0.5S and choose the action which maximizes

this -- in this case, I am indifferent between Paper and Rock. Simultaneously, my friend is

tracking my actions and updating his own utility formula. This process continues until the end of

the game, or in the case of an infinite game, forever. When players use fictitious play, they are

constantly updating their beliefs about the opponent based on past observations. Players believe

their opponent’s distribution of past plays represents their strategy. According to fictitious play,

if a goalie observes that the kicker has kicked left 3 times and right 2 times in the past, the goalie

believes the kicker is playing with the strategy of 60% Left and 40% Right.

While fictitious play may seem reasonable, it has serious problems. Past observations

may not provide adequate information with which to judge an opponent’s strategy. Furthermore,

because players are hopping between different combinations of outcomes and not actually

judging the success of their actions, fictitious play often leads to cycles of outcomes that never

converge to equilibrium. In fact, American mathematician Lloyd Shapley proved that if both

players use fictitious play in a game of Rock Paper Scissors, the game would never converge to

the equilibrium outcome of each action being played with ⅓ probability.

The second major model of game theoretic learning is called reinforcement learning,

which is based on the behavioral psychology pioneered by B. F. Skinner and John B. Watson.

Thorndike’s law of effect states that actions that are rewarded will be more likely to be repeated,

while actions that are punished will be less likely to be repeated. In a similar vein, game theory’s

reinforcement learning proposes that players will be more likely to choose actions that have been
successful in the past and less likely to choose actions that have been unsuccessful in the past.

Economists Alvin Roth and Ido Erev modeled this concept using the equation:

Where qnj(t)is the propensity of player n to play his kth strategy j during period t, and R(x) is the

reinforcement function x - xmin -- the difference between the player’s payoff and his smallest

possible payoff. In plain terms, this equation states that if a player played some strategy in the

previous period, his propensity to play it again will increase based on its success. Note that in

Roth and Erev’s original model, R(x) is never negative, so the player is not punished -- that is, he

will never lose the propensity to play a certain strategy. Instead, the player starts out with equal

propensities for all available strategies, and each propensity will either increase or stay the same

based on experience. However, different models of reinforcement have negative reinforcement

functions that punish players.

Both fictitious play and reinforcement learning have issues, yet they both model real life

decision making remarkably well. When Roth and Erev ran simulations based on both learning

models for a variety of experimental games, the simulations were much more accurate in

predicting outcomes than standard mixed strategy equilibrium predictions. Quite simply, while

the concept of equilibrium describes what players should do, learning models predict what

players will do. Because most real world situations will be of incomplete information, where

players do not know the payoff structure of the game, it will be difficult for them to deduce the

equilibrium strategy. “The justification of a Nash Equilibrium requires the existence of a

commonly known prior distribution of the uncertain parameters in the game” (Kalai 4). If players
do not know each others’ payoffs, reinforcement learning models are especially applicable in

place of equilibrium predictions.

In a famous 1991 study of equilibrium convergence, Roth and his colleagues had research

subjects play consecutive rounds of 2 different games: the ultimatum game and the market game.

The ultimatum game, discussed above, is also known as the bargaining game and involves 2

players sharing a sum of money. The market game is described as follows:

“Multiple buyers (nine in most sessions) each submit an offer to a single seller to buy an

indivisible object worth the same amount to each buyer (and nothing to the seller). The seller has

the opportunity to accept or reject the highest price offered. If the seller accepts, then the seller

earns the highest price offered, the buyer who made the highest offer (or, in case of ties, a buyer

selected by lottery from among those who made the highest offer) receives the difference

between the object’s value and the price he offered, and all other buyers receive zero. If the seller

rejects, then all players receive zero.” (Roth et al. 1991)

Note that in both environments, the equilibrium point involves an unequal distribution of

money, with one player earning the majority of the sum. The bargaining game involves the

concept of subgame perfect equilibrium. A subgame is simply a smaller part of a bigger game,

and the requirement of subgame perfect equilibrium means that any equilibrium for the entire

game would also have to be an equilibrium for its subgames. In Roth’s experiment, the sum of

money is 10 dollars, and the smallest unit that can be offered is 5 cents. We can see that in any

subgame with some money offered, Player 2 would behave optimally by accepting. In other

words, Player 2 would never go through with a “threat” to reject a low offer because such an

action would not maximize his utility. Knowing this, it is in Player 1’s best interest to offer

Player 2 the lowest possible amount: in this case, 5 cents. It is safer for Player 1 to offer Player 2
at least some money rather than nothing at all because an offer of nothing is riskier; Player 2 is

indifferent between accepting and rejecting an offer of 0, and Player 1 would rather take a

definite $9.95 rather than a chance at $10. Thus, there is a subgame perfect equilibrium when

Player 1 chooses the lowest possible offer and Player 2 chooses to accept. There is also an

equilibrium when Player 1 chooses to offer nothing and Player 2 still accepts. Roth notes that

“these two equilibria become one as the smallest unit of transaction goes to 0.”

Similarly, in the market environment, the requirement of subgame perfect equilibrium

means that a seller would never reject a positive offer. Because of so many buyers there are

many different equilibria, but all equilibria involve a selling price of either $9.95 or $10.00, as

any other price would mean that a buyer could do better by bidding more. If both games are

played by rational agents, the majority of the money will always go to one player.

When played in real life by real people across the world (U.S., Japan, Slovenia, and

Israel), the results of these games were startling. In the market game, transactions quickly

converged to equilibrium; a price of either $9.95 or $10.00 was always achieved between round

3 and round 7. The great differences in distribution between countries in early rounds became

smaller and smaller as all 4 countries reached an equilibrium price by later rounds. By contrast,
in the bargaining game, offers usually ranged 30% to 60% of the 10 dollars, far from the

equilibrium offer of 5 cents. Instead of converging to the equilibrium, prices converged to 40%

or 50% of the pie, and differences between countries increased from round to round.

Roth’s experiment is just one of the hundreds conducted in hopes of finding new insights

into how humans make decisions. Experiments like these pose important questions for

understanding human nature. Why were the outcomes of the bargaining and market

environments so different? More specifically, when do people prioritize fairness over self-

interest, and when is it vice versa? Can fairness be considered a rational motive, or does

rationality only constitute self-interest? Economists Ernst Fehr and Klaus Schmidt attempted to

answer these questions in their model of fairness. According to the model, certain individuals are

“inequity averse”, meaning that they are dissatisfied with outcomes perceived as inequitable.

This concept of inequity aversion stems from the psychological concept of relativity. Humans are

inclined to compare themselves to others; the saying, “the grass is always greener on the other

side”, describes this inclination that is a fundamental part of our thought process. Even if a man

has a substantial amount of money, he will become unhappy upon seeing someone richer. Thus,

Fehr and Schmidt theorize that individuals lose utility in the case of an inequitable outcome.

Inequity averse individuals are not only dissatisfied when they are worse off than someone, but

also when they are better off. However, the case of being worse off has a greater impact on

utility than the case of being better off. This concept, seen graphically in the image, draws

heavily on the loss aversion that constitutes Kahneman and Tversky’s prospect theory.

Fehr and Schmidt explain that the difference between market and bargain environments is

caused by competition. They propose that inequity aversion is made irrelevant by competition.

Because any inequity averse individual in the market environment is unable to enforce equity,
the consideration of fairness does not matter. For example, even if one buyer wanted a fair

outcome of the buyer and seller getting equal amounts, any other buyer willing to offer a higher

amount immediately makes this desire for equity irrelevant. However, when equity is actually

enforceable, inequity averse agents can have a substantial impact on outcomes. For example,

workers’ unions frequently force employers to drive wages up past their competitive level. In

game theoretic experiments, free riding -- a player’s enjoyment of a public good without any

contribution of his own -- came to an end after inequity averse players gained the ability to

punish those who exhibited greater selfishness.

Conclusion

In this paper, I explored several major themes that play a large role in game theory

analysis, such as rationality, reputation, and equilibrium convergence. I also examined differing

perspectives on topics like utility, decision weighting, and learning within game theory. Despite

the large amount of research conducted, some scholars do not see any usefulness in the

discipline. In a scathing critique, Lars Parsson Syll, a professor at Malmo University, writes,

“Reductionist and atomistic models of social interaction – such as the ones mainstream

economics and game theory are founded on – will never deliver sustainable building blocks for a
realist and relevant social science. That is also the reason why game theory never will be

anything but a footnote in the history of social science” (18). To some extent, Syll is correct. Not

every situation can be explained with mathematical proofs. Not every interaction can be divided

into smaller components — players, payoffs, and actions. Game theory is much too rigid in its

requirements; its assumption of rationality often fails to carry on to the real world, as shown by

the field of experimental economics. Game theorists frequently limit their analyses to unrealistic

situations, such as those of complete information: “The reason that traditional game theory

focuses so much attention on the special case when players have complete information about

these things is that equilibrium predictions are easier to motivate and derive in the complete

information case, and often have little empirical content in the incomplete information case”

(Erev and Roth 29). Social scientists must find a way to reconcile theory — precise, predictable,

analytical — with real life, which is clouded with uncertainty, irrationality, and instability.

Nonetheless, I believe game theory has positively impacted the world of economics and society

as a whole. Game theoretic analyses have supported criminal system reform, monopoly

legislation, and deterrence policies, and have contributed to vast improvements in the medical

field. Moreover, the field has frequently intersected with other disciplines, such as computer

science. Game theory’s strength lies in its versatility, and while it may not provide all the

solutions, it is bound to receive further advancements in the future.

Works Cited

Acemoglu, Daron, and James A. Robinson. Why Nations Fail. Crown Publishing Group, 2012.

Alt, James E., et al. “Reputation and Hegemonic Stability: A Game-Theoretic Analysis.” The

American Political Science Review, vol. 82, no. 2, 1988, pp. 445–466.
Aumann, R. J. “Survey of Repeated Games”. Essays in Game Theory and Mathematical

Economics in Honor of Oskar Morgenstern, Vol. 4, 1981, pp. 11–42.

Calvert, Drew. “Is an Unpredictable Leader Good for National Security?” Kellogg Insight,

Northwestern University, 19 June 2017, insight.kellogg.northwestern.edu/article/is-an-

unpredictable-leader-good-for-national-security.

Coll, Steve. “The Madman Theory of North Korea.” The New Yorker, The New Yorker, 9 July

2019, www.newyorker.com/magazine/2017/10/02/the-madman-theory-of-north-korea.

Erev, Ido, and Alvin E. Roth. “Predicting How People Play Games: Reinforcement Learning in

Experimental Games with Unique, Mixed Strategy Equilibria.” The American Economic

Review, vol. 88, no. 4, 1998, pp. 848–881. JSTOR, www.jstor.org/stable/117009.

Accessed 7 May 2021.

Fehr, Ernst and Klaus M. Schmidt. “A Theory of Fairness, Competition, and Cooperation.”

Gallego, Lope. “Stackelberg Duopoly.” Policonomics, 2017, policonomics.com/stackelberg-

duopoly-model/.

Guala, Francisco. “Has Game Theory Been Refuted?” Journal of Philosophy, vol. 103, no. 5,

2006, pp. 239-263.

Hart, Sergiu. “Robert Aumann's Game and Economic Theory.” The Scandinavian Journal of

Economics, vol. 108, no. 2, 2006, pp. 185–211.


Haywood, O. G. “Military Decision and Game Theory.” Journal of the Operations Research

Society of America, vol. 2, no. 4, 1954, pp. 365–385. JSTOR,

www.jstor.org/stable/166693.

Kahneman, Daniel, and Amos Tversky. “Prospect Theory. An Analysis of Decision Making

Under Risk.” Econometrica, vol. 47, no. 2, Mar. 1979, pp. 263–291.

Kahneman, Daniel. Thinking, Fast and Slow. Farrar, Straus and Giroux, 2013.

Kalai, Ehud, and Ehud Lehrer. “Rational Learning Leads to Nash Equilibrium.”

Econometrica, vol. 61, no. 5, 1993, pp. 1019–1045. JSTOR,

www.jstor.org/stable/2951492. Accessed 7 May 2021.

Kilgour, D. Marc, and Frank C. Zagare. “Credibility, Uncertainty, and Deterrence.” American

Journal of Political Science, vol. 35, no. 2, 1991, pp. 305–334.

Lafayette, Lev. “Cournot Competition.” Lev Lafayette, Lev Lafayette, 24 Apr. 2019,

levlafayette.com/node/623.

Milgrom, Paul and John Roberts. “Predation, Reputation, and Entry Deterrence.” Journal of

Economic Theory, vol. 7, no. 2, Aug. 1982, pp. 280-312.

Myerson, Roger B. “Force and Restraint in Strategic Deterrence: A Game-Theorist’s

Perspective.” University of Chicago, 2007.

Oosterbeek, Hessel, et al. “Cultural Differences in Ultimatum Game Experiments: Evidence

from a Meta-Analysis.” Experimental Economics, vol. 7, 2004, pp. 171-188.

Palacios-Huerta, Ignacio. “Professors Play Minimax.” Review of Economic Studies, vol. 70, no.

2, 2003, pp. 395-415.


Ross, Don, "Game Theory", The Stanford Encyclopedia of Philosophy (Winter 2019 Edition),

Edward N. Zalta (ed.), https://plato.stanford.edu/archives/win2019/entries/game-theory/>.

Roth, Alvin E., et al. “Bargaining and Market Behavior in Jerusalem, Ljubljana, Pittsburgh, and

Tokyo: An Experimental Study.” The American Economic Review, vol. 81, no. 5, Dec.

1991, pp. 1068-1095.

Remi AI. “Artificial Intelligence, Poker and Regret. Part 3.” Medium, Medium, 25 Sept. 2019,

medium.com/@RemiStudios/artificial-intelligence-poker-and-regret-part-3-

bb3210c79211.

Schelling, Thomas C. The Strategy of Conflict. Harvard University, 1997.

Selten, Reinhard and Rolf Stoecker. “End Behavior in Sequences of Finite Prisoner’s Dilemma

Games: A Learning Theory Approach.” Journal of Economic Behavior and

Organization, vol. 7, 1986, pp. 47-70.

Selten, Reinhard. “The Chain Store Paradox.” Theory and Decision, vol. 9, 1978, 127–159.

Smith, Vernon L. “An Experimental Study of Competitive Market Behavior.” Journal of

Political Economy, vol. 70, no. 2, 1962, pp. 111-137.

Sowell, Thomas. Basic Economics: A Common Sense Guide to the Economy. Basic Books, 2015.

Syl, Lars Parsson. “Why Game Theory Never Will Be Anything But a Footnote in the History of

Social Science.” Real-World Economics Review, no. 83, 2018, pp. 1-20.

Warren, Marian. “Chapter 2 Prospect Theory and Expected Utility Theory.” SlidePlayer,

SlidePlayer, slideplayer.com/slide/8475017/.
Wheeler, Gregory, "Bounded Rationality", The Stanford Encyclopedia of Philosophy (Fall 2020

Edition), Edward N. Zalta (ed.),

https://plato.stanford.edu/archives/fall2020/entries/bounded-rationality/>.

You might also like