Professional Documents
Culture Documents
Game theory
does not prescribe a way or say how
to play a game. Came
Game theory is
conflict situations between two or
set of ideas
and techniques for analyzing more partie
determined by their decisions.
The outcomes are
given game.
A "move" is the way in which game progresses from one stage to another,
beginning with an initial state of the game to the final state.
The total number of moves constitute the entirety of the game.
The payoff or outcome, referes to what happens at the end of a game.
Minimax : The least good of all good outcomes.
Maximin The least bad of all bad outcomes.
The important and basic game theory theorem is the mini-max theorem. 1n
theorem says,
"If minimax of one player corresponds to a maximin
a
Relevance
of Game Theory and Game Playing
6.6
How relevan
.
game theory is to mathematics, Computer science and
the figure below: economics is
s h o w ni n t h e
Mathematics
Mathematical
game
theory
Al:
Game theory
Game playing
Computer Economics
science
2. Time limit
Games are always played in time constraint environment. Therefor
handle time efficiently.
Backgammon, Monopoly.
2. Based on information
i) Perfect information
Here all moves of all players are known to everyone.
For example-
Chess, Checker, Tic-tac-toe.
ii) Imperfect information -
For example
If you play a single game of chess with someone, e
one person will lose ana
person will win. The win (+1) added to the loss (- 1) equals zero.
sum game
Constant -
algebraic sum of the outcomes are always constant, though not
Here the
necessarily zero.
to zero-sum games.
strategically equivalent
Itis
sum game
Non-zero-
5.
the outcomes are not constant. In these games, the
Hore the algebraic sum of sum
6. N-person game
It involves more than two players.
m o r e complex
than zero-sum games.
is
Analysis of such games
Conflicts of interest are less obvious.
Game as a Problem
Formal Representation of
a
6.9
Problem
6.9.1 A Game is Essentially a Kind of a Search
: -
or
in backgammon range from + 192 to 192. -
provides a complete
A pure strategy situation thev
move a player
will make for any could face
determines the
perticular, it strategies
available to that player.
set, is the set of pure
A player's strategy
each pure strateo
is a n assignment
of a probabiblity to
A mixed strategy Since probabilities are confin.
to randomly select a pure strategy.
allows for a player even if their
available to a player,
there are infinitely many
mixed strategies trategy se
is finite.
on the set of his
player is a probability distribution, pure
A mixed stretegy for a
strategies.
either a single-person playing playing
or multiperson
Games can be classified as
For solving
and 8-tile puzzle are single person games.
For example, Rubik's cube be used. These
like best-first or A* algorithm can
such problems strategies
in a clear fashion.
strategies help in identifying paths
or checkers
where two person play agame like chess
While for solving problem,
search or A algorithms. As here each player
etc., be solved by best-first
cannot
Each has their own way of evaluating the
tries to outsmart the opponent.
situation.
nature i.e., explore
must be look ahead in
The basic characteristic of the strategy one. The basit
levels downwards and choose the optimal
the tree for two or more
are
ethods available for game playing
i) Minimax strategy
cutoffs.
ii) Minimax strategy with alpha-beta
Table.
A Two Player Strategy
etc.
Player A Player A Player AA
Players Strategies
Strategy 1 Strategy 2 Strategy3
Roughly speaking, an
optimal strategy leads to outcomes that are at least as good
a s any other strateg)y
when one is
playing an infallible opponent.
Mini-Max Value
Players adopt those strategies which will maximize their gains, while minimizing
their losses. Therefore the solution is, the best each player can do for him/herself in the
2. The minimax value of a node is the utility (for player called MAX) of being in the
corresponding state, assuming that both players play optimally from this stage to
next move.
If my turn to move, then the root is labeled a MAX node indicating it is my turn;
to indicate it is my opponent's turn.
Otherwise it is labeled a MIN, node
2. ArcsS
arcs emanate from.
Represent the possible legal moves for the player that the
3. At each level, the tree has nodes that are all MAX or all MIN.
MAX move
X
MIN move
MAX move
XO
MIN move
XOXXOX
0 X X --
Terminal
XXOXolo
-1 0 +1 Utility
Fig. 6.9.1 Example of game tree
1. Above is a
partial tree for the
game of tic-tac-toe.
2. The top node (root) is the initial state and MAX (player 1) moves first, acing X
in an empty square.
3. The rest of the search tree shows X
alternate moves for MIN (player 2) and M
4 Terminal states are assign utilities according to the rules of game.
Definition
tochastic game
-
A stochastic game
game is
collection of normal-form
a
games that the agents
Alr The
repeatedly. Th particular game
played at any time depends probabilistically onplay the
game played and the actions of the agents in that
P r e
ujxS1,S2)
real-valued payoff function
For each agent i, a
made with
solving purpose following assumptions
are
In current discussion for game
respect to stochastic games,
known (infinite horizon) -
heence discounting v
is not
the game
1. The length of
used; dependent.
are not
transition probabilities
rewards and
2. The considered.
Markov strategies is
3. For solving game only thad
1s the sequence of states
t in a stochastic game he game
A history of length the actions in t
that the players played in
players played the
visited in the first t stages, as well as first t
how to play the game; that is, a
stages. A strategy
of a player is a prescription
play should that histor
function
finite history an action to history occur 4A
thatassigns to every function that assigns to every finite history
every finite
history a
behavior strategy of a player
is a
lotter
over the set of available actions.
actions.
Earlier, a history was just a sequence of
individual actions, and each profilo
But now there are action profiles rather than has
is a sequence ht =
(, a, g, a',., alo
several possible outcomes. Thus a history
two most commorn methods to aggregat
where t is the number of stages. As before, the
average reward
and future discounted reward.
payoffs into an overall payoff are
Stochastic games and MDPs
expected
Orall
overall payoff over time does not
payoff over
fall below
"optimal"), which ensures that
his
v, no matter what is the
followed by player 2,
and
strategy
T the symmetric property holds when
Shapley proved the existence of a value. exchanging the roles of the two
players.
Rocause the parameters that define the
game are
independent oftime, the situation
hat the players face. If today the
play 1s in certain state is the
a
same situation
would face tomorrow if tomorrow the
play is in that state. they
In particular, one expects to have
that areoptimal strategies
is, depend only on the current state of the game. Shapley stationary
they Markov, that
optimal strategies exist, and characterized the value as the proved
that indeed such
dynamic programming
principle.
The existence of stationary Markov optimal strategies implies that, to play well, a
player needs to know only the current state. In
particular, the value of the game does
not change if players receive partial information on each other's actions, and/or if
forget previously visited states.
they
The Nash equilibrium is the solution to a game in which two or more players have a
Surategy, and with each participant considering an opponent's choice, he has no
lncentive, nothing to gain, by switching his strategy. In the Nash equilibrium, each
players strategy is optimal when considering the decisions of other players. Every
Player wins because everyone gets the outcome they desire. To quickly test if the Nash
equilibrium exists, reveal each player's strategy to the other players. If no one changes
nis
strategy, then the Nash equilibrium is proven.
Fo example, imagine a game between Anil and Sunil. In this simple game, both
Or
can choose strategy A, to receive 1, or strateg8y B, to lose 1. Logically, both
y crs
S nA
Artificial Intelligence
1. If unill's str ts
revealed
one
receive a payoft of
choose strategy A and deviates from the
he
players one can see that no player
doesn't change
doesn't change
achginal choice
Anil and vice
versa,
player's
move means little and
either pla
the other a Nash
equilibrium.
Knowing A represents
The outcome A,
behavior.
Prisoner's Dilemma
prisoner's
dilemma is a common
situation analyzed in game
theory
thec
os
that can
The
In this game, two and each is criminals are arrested and
employ the Nash equilibrium. with the other. The nre
held
with no means of communicating
confinement
in solitary so they offer
each prisoner the onne
have the evidence to convict the pair, unity
donot that the other
committed the crime or co
to either betray
other the by testifying
each other, each serves five years in +
cooperale
by remaining
prisoners betray
silent. If both prison
If A betrays B but B remains silent, prisoner
A is set free and prisoner B serves
10Tteyears
in prison, or vice versa. If each remains silent, then each serves just one year in rison
The Nash equilibrium in this example is for both players to betray each other. Even
en
after it.
combining it
ith the dice
roll that comes
1s a pair:
.Everystate
(current board, current dice configuration)
Tnitial set of states = finitial board) x lall possible results of agent 1's first dice
roll
Set of possible states after agent 1's move = {the board produced by agent 1's
move) x fall possible results of agent 2's dice roll)
Vice versa for agent 2's move one can extend the minimax algorithm to deal with
this.
But it's easier if one doesn't try to combine the moves and the dice rolls. These
two events need to keep separate.
This algorithm is used to solve, two-player zero-sum game in which each agent's
move has a deterministic outcome.
I n addition to the two agents' moves, there are chance moves.
The algorithm is as
The algorithm gives optimal play (highest expected utility).
follows.
function EXPECTIMINIMAX(s) returns an expected utility
level.
very good
It achieve world-champion
evaluation function. can
algorithmJearninameng
4. The evaluation
called Temporal Difference learning. Hence in the algorithn
technique Difference learning.
TDGammon', 'TD' stands for Temporal
Discounted Stochastic Games