You are on page 1of 13

iii)Loss

6.4 Game Theory

Game theory
does not prescribe a way or say how
to play a game. Came
Game theory is
conflict situations between two or

set of ideas
and techniques for analyzing more partie
determined by their decisions.
The outcomes are

General Game Theorem

sum, non-random, perfect t knowle


knowled.
Inevery two players, game like zero
dge game
to at least result in a tie game.
there exists a perfect strategy guaranteed
The frequently used terms in game theory:
The term "game" means a sort of conflict in whichn individuals or
groups (know
as players) participate.
A list of "rules" stipulates the conditions under which the game begins.
A game is said to have "perfect information'" if all moves are known to each o
the palyers involved.
A "strategy" is a list of the optimal choices for each player at every stage ofa

given game.
A "move" is the way in which game progresses from one stage to another,
beginning with an initial state of the game to the final state.
The total number of moves constitute the entirety of the game.
The payoff or outcome, referes to what happens at the end of a game.
Minimax : The least good of all good outcomes.
Maximin The least bad of all bad outcomes.
The important and basic game theory theorem is the mini-max theorem. 1n
theorem says,
"If minimax of one player corresponds to a maximin
a

that outcome is the best that both


of the other player, then
u
players can hope for."
LS.

Relevance
of Game Theory and Game Playing
6.6
How relevan
.
game theory is to mathematics, Computer science and
the figure below: economics is
s h o w ni n t h e

Mathematics

Mathematical
game
theory
Al:
Game theory
Game playing
Computer Economics
science

Fig. 6.6.1 Game theory and its relevance to other fields


TECHNICAL PUBLICATIONS-An up thrust for knowledge
6-6
Artificial inteligence Adversa
Search and
6.7 Game Playing
Gamo.
Characteristics of Game Playing:
1. There is always an "'unpredictable" opponent
Opponent introduces uncertainty.

Opponent also wants to win.

2. Time limit
Games are always played in time constraint environment. Therefor
handle time efficiently.

6.8 Types of Games


1. Based on chance

i)Deterministic (not involving chance)


For example -

Chess, Checkers, Tic-tac-toe


ii) Non-deterministic (can involve chance)
For example -

Backgammon, Monopoly.
2. Based on information

i) Perfect information
Here all moves of all players are known to everyone.
For example-
Chess, Checker, Tic-tac-toe.
ii) Imperfect information -

Here all moves are not known to everyone.


For example -

Bridge, Pocker, Scrabble.


3. General zero-sum games
Players must choose their strategies simultaneously, neither knowing what the
other player is going to do.

For example
If you play a single game of chess with someone, e
one person will lose ana
person will win. The win (+1) added to the loss (- 1) equals zero.

TECHNICAL PUBLICATIONS An up thrust for


knowledge
Artiicoa t e g e n c e
6-7 Adversarial Search and Games in Al

sum game
Constant -
algebraic sum of the outcomes are always constant, though not
Here the
necessarily zero.

to zero-sum games.
strategically equivalent
Itis
sum game
Non-zero-
5.
the outcomes are not constant. In these games, the
Hore the algebraic sum of sum

are not the same for all outcomes.


of the payoffs
solvable but provide insights into important areas
They are not always completely
of inter-dependent choice.

losses do not always equal another player's gains.


In these games, one player's
are of two types:-
Thenon-zero-sum games
(Competitive)-
i) Negative sum games
loses.
wins, rather everybody
Here nobody really
strike.
Example A - war or a

ii) Positive sum games (Co-operative)


that they contribute together.
Here all players have
one goal
or a science exhibit.
An educational game, building blocks,
Example -

6. N-person game
It involves more than two players.
m o r e complex
than zero-sum games.
is
Analysis of such games
Conflicts of interest are less obvious.

Game as a Problem
Formal Representation of
a
6.9
Problem
6.9.1 A Game is Essentially a Kind of a Search
: -

following four components


Game is formally defined with to
and identifies the player
which includes the board position
1. The initial state,
move. a
each indicating
list of (move, state) pairs,
2. A successor function, which returns a

legal move and the resulting state.


the game
States where
which determines when the game is
over.
3. A terminal test,
are called a s terminal
states.
has ended
function), which
called a n objective function o r payoff
4. A utility function (also the outcome is a win, loss
a numeric value for the terminal states. In chess,
gives of possible
draw, with
values +1, 1, o r 0. Some game have a wider variety
-

or
in backgammon range from + 192 to 192. -

outcomes; the payoffs


Artificial Intelligence ames in A

6.9.2 Game Playing Strategies


of action for what.
game is a complete plan r situation
strategy in
a
A players's the ng aa player
game, telling plavor
arise. It is a complete algorithm
for playing what to
might
situation throughout the game.
for every possible
player will play
definition ot how
ay a game a

provides a complete
A pure strategy situation thev
move a player
will make for any could face
determines the
perticular, it strategies
available to that player.
set, is the set of pure
A player's strategy
each pure strateo
is a n assignment
of a probabiblity to
A mixed strategy Since probabilities are confin.
to randomly select a pure strategy.
allows for a player even if their
available to a player,
there are infinitely many
mixed strategies trategy se
is finite.
on the set of his
player is a probability distribution, pure
A mixed stretegy for a

strategies.
either a single-person playing playing
or multiperson
Games can be classified as
For solving
and 8-tile puzzle are single person games.
For example, Rubik's cube be used. These
like best-first or A* algorithm can
such problems strategies
in a clear fashion.
strategies help in identifying paths
or checkers
where two person play agame like chess
While for solving problem,
search or A algorithms. As here each player
etc., be solved by best-first
cannot
Each has their own way of evaluating the
tries to outsmart the opponent.
situation.
nature i.e., explore
must be look ahead in
The basic characteristic of the strategy one. The basit
levels downwards and choose the optimal
the tree for two or more

are
ethods available for game playing
i) Minimax strategy
cutoffs.
ii) Minimax strategy with alpha-beta
Table.
A Two Player Strategy
etc.
Player A Player A Player AA
Players Strategies
Strategy 1 Strategy 2 Strategy3

1 Tie A wins B wins


Player B strategy
B wins Tie A wins
Player B strategy 2
B strategy 3 A wins B Wins Tie
Player
etc
It is a techniques
which
always leads to superior solution than any other strategy,
as opponent is playing in perfect manner.

Roughly speaking, an
optimal strategy leads to outcomes that are at least as good
a s any other strateg)y
when one is
playing an infallible opponent.
Mini-Max Value

T minimax value for a given game tree is determined by optimal strategy by


minimax value of each node.
evaluating
Mini-Max Theorem

Players adopt those strategies which will maximize their gains, while minimizing
their losses. Therefore the solution is, the best each player can do for him/herself in the

face of opPposition of the other player.


Determining Optimal Strategy
1. Given a game tree the optimal strategy can be determined by examining the
minimax value of each node.

2. The minimax value of a node is the utility (for player called MAX) of being in the

corresponding state, assuming that both players play optimally from this stage to

the end of the game.


3. The minimax value of a terminal state is just its utility.
4. Given a choice, MAX will perfer to move to a state of maximum value,

whereas MIN prefers a state of minimum value.

6.9.3 The Game Tree


The initial state and the legal moves for each side, define the game tree for the game.

Description of the game tree [Refer Fig. 6.9.1]


1. Root node -

to what is the best single


Represents board configuration and decision, required
as

next move.

If my turn to move, then the root is labeled a MAX node indicating it is my turn;
to indicate it is my opponent's turn.
Otherwise it is labeled a MIN, node
2. ArcsS
arcs emanate from.
Represent the possible legal moves for the player that the
3. At each level, the tree has nodes that are all MAX or all MIN.

TECHNICAL PUBLICATIONS"- An up thrust for knowledge


Adversarial Search and Gamo
Artificial Intelligence
6-10
Games inA
nodes at level T are of opposite kind from
4 Since moves alternate, the thos at
level i+ 1.

MAX move

X
MIN move

MAX move

XO

MIN move

XOXXOX
0 X X --

Terminal
XXOXolo
-1 0 +1 Utility
Fig. 6.9.1 Example of game tree

1. Above is a
partial tree for the
game of tic-tac-toe.
2. The top node (root) is the initial state and MAX (player 1) moves first, acing X

in an empty square.
3. The rest of the search tree shows X
alternate moves for MIN (player 2) and M
4 Terminal states are assign utilities according to the rules of game.
Definition
tochastic game
-

A stochastic game
game is
collection of normal-form
a
games that the agents
Alr The
repeatedly. Th particular game
played at any time depends probabilistically onplay the
game played and the actions of the agents in that
P r e

game. Like a probabilistic


state machines in which the states are the
finite games and the transition labels are joint
action-payoff pairs.
A stochastic game also called as Markov game is defined by
. a finite set Q of states (games)
. Aset of strategies Si(x) for each player for each state xe x

. a set N = {1, ..., n} of agents

.Foreach agent i, a finite set Ai of possible actions


A transition probability function P:QxAj x x An x Q [0, 1]
P (q,a, .., an q') =Probability of transitioning to state q
ifthe action profile (a1,., an) is used in state q
A set of rewards dependant on the state and the actions of the other players

ujxS1,S2)
real-valued payoff function
For each agent i, a

r i : Qx Aj x x An > R (set of real numbers)

played at set of discrete times t.


Each stage game is a

made with
solving purpose following assumptions
are
In current discussion for game
respect to stochastic games,
known (infinite horizon) -

heence discounting v
is not
the game
1. The length of
used; dependent.
are not
transition probabilities
rewards and
2. The considered.
Markov strategies is
3. For solving game only thad
1s the sequence of states
t in a stochastic game he game
A history of length the actions in t
that the players played in
players played the
visited in the first t stages, as well as first t
how to play the game; that is, a
stages. A strategy
of a player is a prescription
play should that histor
function
finite history an action to history occur 4A
thatassigns to every function that assigns to every finite history
every finite
history a
behavior strategy of a player
is a
lotter
over the set of available actions.
actions.
Earlier, a history was just a sequence of
individual actions, and each profilo
But now there are action profiles rather than has
is a sequence ht =
(, a, g, a',., alo
several possible outcomes. Thus a history
two most commorn methods to aggregat
where t is the number of stages. As before, the
average reward
and future discounted reward.
payoffs into an overall payoff are
Stochastic games and MDPs

Note that Stochastic games generalize both


Markov Decision Processes (MDPs) and
with only 1 player. A repeated game is a
repeated games. An MDP is a stochastic game
stochastiC game with only 1 state. For example, Iterated Prisoner's Dilemma, Roshambo,
Iterated Battle of the Sexes.

Strategies For Solving Stochastic Games


F o r agent i, a deterministic strategy specifies a choice of action for i at every stage

of every possible history


A mixed strateg8y is a probability distribution over deterministic strategies

Several restricted classes of strategies


As in extensive-form games, a behavioral strategy is a mixed strategy in whichn
the mixing take place at each history independently
A Markov strategy is a behavioral strategy such that for each time t, the
distribution over actions depends only on the current state. But the distribution
different at time t than at time t'.
may be
A stationary strategy is a Markov strategy in which the distribution over actions

only on the current state (not on the time t)


depends
is defined to
A 7ero-sum game have a "value" v if
)player
1 has strategy (which is then said to be
a

expected
Orall
overall payoff over time does not
payoff over
fall below
"optimal"), which ensures that
his
v, no matter what is the
followed by player 2,
and
strategy
T the symmetric property holds when
Shapley proved the existence of a value. exchanging the roles of the two
players.
Rocause the parameters that define the
game are
independent oftime, the situation
hat the players face. If today the
play 1s in certain state is the
a
same situation
would face tomorrow if tomorrow the
play is in that state. they
In particular, one expects to have
that areoptimal strategies
is, depend only on the current state of the game. Shapley stationary
they Markov, that
optimal strategies exist, and characterized the value as the proved
that indeed such

nonlinear functional operator-a two-player version of the unique


fixed point of a

dynamic programming
principle.
The existence of stationary Markov optimal strategies implies that, to play well, a
player needs to know only the current state. In
particular, the value of the game does
not change if players receive partial information on each other's actions, and/or if
forget previously visited states.
they

Solving Stochastic games


Nash Equilibrium in game theory
Nash equilibriunm is a within game theory where the
concept optimal outcome of a
game is where there is no incentive to deviate from their initial strategy. More
specifically, the Nash equilibrium is a concept of game theory where the optimal
Outcome of a game is one where no player has an incentive to deviate from his chosen
strategy after considering an opponent's choice. Overall, an individual can receive no
ncremental benefit from changing actions, assuming other players remain constant i
their strategies. A game may have multiple Nash equilibria or none at all.

The Nash equilibrium is the solution to a game in which two or more players have a
Surategy, and with each participant considering an opponent's choice, he has no
lncentive, nothing to gain, by switching his strategy. In the Nash equilibrium, each
players strategy is optimal when considering the decisions of other players. Every
Player wins because everyone gets the outcome they desire. To quickly test if the Nash
equilibrium exists, reveal each player's strategy to the other players. If no one changes
nis
strategy, then the Nash equilibrium is proven.
Fo example, imagine a game between Anil and Sunil. In this simple game, both
Or
can choose strategy A, to receive 1, or strateg8y B, to lose 1. Logically, both
y crs
S nA
Artificial Intelligence
1. If unill's str ts
revealed
one
receive a payoft of
choose strategy A and deviates from the
he
players one can see that no player
doesn't change
doesn't change
achginal choice
Anil and vice
versa,

player's
move means little and
either pla
the other a Nash
equilibrium.
Knowing A represents
The outcome A,
behavior.

Prisoner's Dilemma

prisoner's
dilemma is a common
situation analyzed in game
theory
thec
os
that can
The
In this game, two and each is criminals are arrested and
employ the Nash equilibrium. with the other. The nre
held
with no means of communicating
confinement
in solitary so they offer
each prisoner the onne
have the evidence to convict the pair, unity
donot that the other
committed the crime or co

to either betray
other the by testifying
each other, each serves five years in +
cooperale
by remaining
prisoners betray
silent. If both prison
If A betrays B but B remains silent, prisoner
A is set free and prisoner B serves
10Tteyears
in prison, or vice versa. If each remains silent, then each serves just one year in rison
The Nash equilibrium in this example is for both players to betray each other. Even
en

though mutual to a better outcome, if one prisoner


cooperation leads
chooses mual
1al
cooperation and the other does not, one prisoner's outcome is worse.

Irreducible stochastic game

A stochastic game is said to be irreducible if every game can be reached with


positive probability regardless of the strategy adopted.
Theorem: Every 2-player, general-sum, average reward, irreducible stochastic game
has a Nash equilibrium. A payoff profile is feasible if it is a convex combination of the
outcomes in a game, where the coefficients are rational numbers.
There's a folk theorem similar to the one for repeated games
If (pl,p2) is a feasible pair of payoffs such that each pi is at least as big as agent 1
minimax value, then (pl,p2) can be achieved in equilibrium through the
enforcement.
use
Backgammon a example Two-Player Zero-Sum Stochastic Gamee
.For two-player zero-sum stochastic t
games, the folk theorem still applies,
becomes vacuous (empty).
.The situation is similar
to what
pair of payoffs is the minimax
happened in repeated games. The ouy feasible
One
payoffs.
example of
two-player zero-sum stochastic
a

Two game is Backgammol


agents who take turns. Before
.The set of his/her move,an dice
available moves agent must rOu
depends on the
results of the dice rol.
Backgammon into a Markov game is
Mapping straightforward, but slightly
d Basic idea is to give each move a stochastic outcome, by
a w k w a r d .

after it.
combining it
ith the dice
roll that comes

1s a pair:
.Everystate
(current board, current dice configuration)
Tnitial set of states = finitial board) x lall possible results of agent 1's first dice

roll
Set of possible states after agent 1's move = {the board produced by agent 1's
move) x fall possible results of agent 2's dice roll)
Vice versa for agent 2's move one can extend the minimax algorithm to deal with
this.
But it's easier if one doesn't try to combine the moves and the dice rolls. These
two events need to keep separate.

Using Expectiminimax Algorithm for solving stochastic game

This algorithm is used to solve, two-player zero-sum game in which each agent's
move has a deterministic outcome.
I n addition to the two agents' moves, there are chance moves.
The algorithm is as
The algorithm gives optimal play (highest expected utility).
follows.
function EXPECTIMINIMAX(s) returns an expected utility

if s is a terminal state then return Max's payoff at s

if s is a "chance" node then

return Ps'| s)EXPECTIMINIMAX(s)

else if it is Max's move at s then

return max{EXPECTIMINIMAX(result (a, s) : a is applicable to s)

else return min{EXPECTIMINIMAX(result (a,


s)) : a is applicable to s)
stochastic game following points are to be
While finding solution for two player
noted.
factor. 21 possible rolls are there with 2 dice.
1. Dice rolls increase branching
20 legal moves on an average. For some dice roles, it
Given the dice roll, there are

can be much higher,


1.2x109
depth 4= 20x(21x20)3-
am
tificial Intelligence aines in kA
mes
a given shrinks nd
node and

increases, probability of reaching the value o


2. As depth a -B pruning is less effecti
ive.
diminished. In such situation
lookahead is
that depth-2 search and it has
3. There is algorithm,
TDGammon uses

level.
very good
It achieve world-champion
evaluation function. can

function was created automatically using


a machine

algorithmJearninameng
4. The evaluation
called Temporal Difference learning. Hence in the algorithn
technique Difference learning.
TDGammon', 'TD' stands for Temporal
Discounted Stochastic Games

model is equivalent to one in which players discount their future na


Shapley's current state and on the nla ers
depends the
to a discount factor that
on
according
actions. The game is called "discounted"
if all stopping probabilities equal the sama
constant is called the "discount factor." Models of
constant, and one minus this
where the discount factor has a
discounted stochastic games are prevalent in economics,
clear economic interpretation.
In non-zero-sum games, a collection of player, is a "(Nash)
strategies, one for each
all
equilibrium" if no player can profit by deviating from his strategy, assuming other
players follow their prescribed strategies.

You might also like