Professional Documents
Culture Documents
Axes:
Deterministic or stochastic?
Perfect information (fully observable)?
One, two, or more players?
Turn-taking or simultaneous?
Zero sum?
Want algorithms for calculating a contingent plan (a.k.a. strategy or policy) which
recommends a move for every possible eventuality
Types of Games
Game = task environment with > 1 agent
Axes:
Deterministic or stochastic?
Perfect information (fully observable)?
One, two, or more players?
Turn-taking or simultaneous?
Zero sum?
Want algorithms for calculating a contingent plan (a.k.a. strategy or policy) which
recommends a move for every possible eventuality
“Standard” Games
Standard games are deterministic, observable, two-
player, turn-taking, zero-sum
Game formulation:
Initial state: s0
Players: Player(s) indicates whose move it is
Actions: Actions(s) for player on move
Transition model: Result(s,a)
Terminal test: Terminal-Test(s)
Terminal values: Utility(s,p) for player p
Or just Utility(s) for player making the decision at root
Zero-Sum Games
Minimax
A type of algorithm in adversarial search, Minimax
represents winning conditions as (-1) for one side and
(+1) for the other side. Further actions will be driven by
these conditions, with the minimizing side trying to get
the lowest score, and the maximizer trying to get the
highest score.
Representing a Tic-Tac-Toe AI
S₀: Initial state (in our case, an empty 3X3 board)
Players(s): a function that, given a state s, returns which player’s turn it is (X or
O).
Actions(s): a function that, given a state s, return all the legal moves in this
state (what spots are free on the board).
Result(s, a): a function that, given a state s and action a, returns a new state.
This is the board that resulted from performing the action a on state s (making
a move in the game).
Terminal(s): a function that, given a state s, checks whether this is the last step
in the game, i.e. if someone won or there is a tie. Returns True if the game has
ended, False otherwise.
Utility(s): a function that, given a terminal state s, returns the utility value of the
state: -1,0, or 1.
7
Tic-Tac-Toe Game
We define the complete game tree as a Game tree search tree that follows every
sequence of moves all the way to a terminal state.
Figure shows part of the game tree for tic-tac-toe (noughts and crosses).
From the initial state, MAX has nine possible moves.
Play alternates between MAX’s placing an X and MIN’s placing an O until we reach leaf
nodes corresponding to terminal states such that one player has three squares in a row or
all the squares are filled.
The number on each leaf node indicates the utility value of the terminal state from the
point of view of MAX; high values are good for MAX and bad for MIN (which is how the
players get their names).
For tic-tac-toe the game tree is relatively small—fewer than 9!=362;880 terminal nodes
(with only 5,478 distinct states).
But for chess there are over 1040 nodes,
8
Tic-Tac-Toe Game Tree
11
Minimax Algorithm
12
Implementation
3
Minimax Example
3 12
Minimax Example
3 12 8
Minimax Example
3 12 8
Minimax Example
3 12 8 2 4 6
Minimax Example
3 2
3 12 8 2 4 6
Minimax Example
3 2
3 12 8 2 4 6 14 5 2
Minimax Example
3 2 2
3 12 8 2 4 6 14 5 2
Minimax Example
3 2 2
3 12 8 2 4 6 14 5 2
Alpha-Beta Pruning
minimax search is depth-first, so at any one time we just have to consider the nodes along a single
path in the tree.
The number of game states is exponential in the depth of the tree.
No algorithm can completely eliminate the exponent, but we can sometimes cut it in half,
computing the correct minimax decision without examining every state by pruning large parts of
the tree that make no difference to the outcome.
The particular technique we examine is called alpha–beta pruning.
26
Alpha-Beta Example
α = best option so far from any
MAX node on this path
α =3 α =3
3
3 12 8 2 14 5 2
we can identify the minimax decision without ever The order of generation matters: more pruning
evaluating two of the leaf nodes is possible if good moves come first
Alpha-Beta Example
(a) The first leaf below B has the value (b) The second leaf below B has a value of
3. Hence, B, which is a MIN node, has 12; MIN would avoid this move, so the
a value of at most 3. value of B is still at most 3.
28
Alpha-Beta Example
29
Alpha-Beta Example
(e) The first leaf below D has the value 14, so D is (f) The second successor of D is worth 5, so again we
worth at most 14. This is still higher than MAX’s best need to keep exploring. The third successor is worth
alternative (i.e., 3), so we need to keep exploring D’s 2, so now D is worth exactly 2. MAX’s decision at
successor states. Notice also that we now have bounds the root is to move to B, giving a value of 3.
on all of the successors of the root, so the root’s value
is also at most 14.
30
𝛼 − 𝛽 Pruning 𝛼: lower-bound of minimax value
𝛽: upper-bound of minimax value
32
Alpha-Beta Another Example
33
Alpha-Beta Example
A maximizing player knows that, at the next step, the minimizing player will try to
achieve the lowest score. Suppose the maximizing player has three possible
actions, and the first one is valued at 4. Then the player starts generating the
value for the next action. To do this, the player generates the values of the
minimizer’s actions if the current player makes this action, knowing that the
minimizer will choose the lowest one. However, before finishing the computation
for all the possible actions of the minimizer, the player sees that one of the
options has a value of three. This means that there is no reason to keep on
exploring the other possible actions for the minimizing player. The value of the
not-yet-valued action doesn’t matter, be it 10 or (-10). If the value is 10, the
minimizer will choose the lowest option, 3, which is already worse than the
preestablished 4. If the not-yet-valued action would turn out to be (-10), the
minimizer will this option, (-10), which is even more unfavorable to the maximizer.
Therefore, computing additional possible actions for the minimizer at this point is
irrelevant to the maximizer, because the maximizing player already has an
unequivocally better choice whose value is 4.
34
Alpha-Beta Pruning
General case (pruning children of MIN node)
We’re computing the MIN-VALUE at some node n
We’re looping over n’s children MAX
n’s estimate of the childrens’ min is dropping
Who cares about n’s value? MAX MIN a
Let α be the best value that MAX can get so far at any choice point along
the current path from the root
If n becomes worse than α, MAX will avoid it, so we can prune n’s other
children (it’s already bad enough that it won’t be played) MAX
max
Good child ordering improves effectiveness of pruning
Iterative deepening helps with this
min
38
Resource Limits
Problem: In realistic games, cannot search to leaves! 4 max
-2 4 min
Solution 1: Bounded lookahead
Search only to a preset depth limit or horizon -1 -2 4 9
Use an evaluation function for non-terminal positions
Example:
Suppose we have 100 seconds, can explore 10K nodes
/ sec ? ? ? ?
So can check 1M nodes per move
Chess with alpha-beta, 35(8/2) =~ 1M; depth 8 is good