You are on page 1of 39

Adversarial Search

Instructor: Dr. Ayesha Kashif


Riphah School of Computing and Innovation
Riphah International University
Types of Games
 Game = task environment with > 1 agent

 Axes:
 Deterministic or stochastic?
 Perfect information (fully observable)?
 One, two, or more players?
 Turn-taking or simultaneous?
 Zero sum?

 Want algorithms for calculating a contingent plan (a.k.a. strategy or policy) which
recommends a move for every possible eventuality
Types of Games
 Game = task environment with > 1 agent

 Axes:
 Deterministic or stochastic?
 Perfect information (fully observable)?
 One, two, or more players?
 Turn-taking or simultaneous?
 Zero sum?

 Want algorithms for calculating a contingent plan (a.k.a. strategy or policy) which
recommends a move for every possible eventuality
“Standard” Games
 Standard games are deterministic, observable, two-
player, turn-taking, zero-sum
 Game formulation:
 Initial state: s0
 Players: Player(s) indicates whose move it is
 Actions: Actions(s) for player on move
 Transition model: Result(s,a)
 Terminal test: Terminal-Test(s)
 Terminal values: Utility(s,p) for player p
 Or just Utility(s) for player making the decision at root
Zero-Sum Games

 Zero-Sum Games  General Games


 Agents have opposite utilities  Agents have independent utilities
 Pure competition:  Cooperation, indifference, competition, shifting
 One maximizes, the other minimizes alliances, and more are all possible
Adversarial Search
• Whereas, previously, we have discussed algorithms that need to
find an answer to a question, in adversarial search the
algorithm faces an opponent that tries to achieve the opposite
goal.
• Often, AI that uses adversarial search is encountered in games,
such as tic tac toe.

Minimax
A type of algorithm in adversarial search, Minimax
represents winning conditions as (-1) for one side and
(+1) for the other side. Further actions will be driven by
these conditions, with the minimizing side trying to get
the lowest score, and the maximizer trying to get the
highest score.
Representing a Tic-Tac-Toe AI
 S₀: Initial state (in our case, an empty 3X3 board)
 Players(s): a function that, given a state s, returns which player’s turn it is (X or
O).
 Actions(s): a function that, given a state s, return all the legal moves in this
state (what spots are free on the board).
 Result(s, a): a function that, given a state s and action a, returns a new state.
This is the board that resulted from performing the action a on state s (making
a move in the game).
 Terminal(s): a function that, given a state s, checks whether this is the last step
in the game, i.e. if someone won or there is a tie. Returns True if the game has
ended, False otherwise.
 Utility(s): a function that, given a terminal state s, returns the utility value of the
state: -1,0, or 1.

7
Tic-Tac-Toe Game
 We define the complete game tree as a Game tree search tree that follows every
sequence of moves all the way to a terminal state.
 Figure shows part of the game tree for tic-tac-toe (noughts and crosses).
 From the initial state, MAX has nine possible moves.
 Play alternates between MAX’s placing an X and MIN’s placing an O until we reach leaf
nodes corresponding to terminal states such that one player has three squares in a row or
all the squares are filled.
 The number on each leaf node indicates the utility value of the terminal state from the
point of view of MAX; high values are good for MAX and bad for MIN (which is how the
players get their names).
 For tic-tac-toe the game tree is relatively small—fewer than 9!=362;880 terminal nodes
(with only 5,478 distinct states).
 But for chess there are over 1040 nodes,

8
Tic-Tac-Toe Game Tree

A (partial) game tree for the game of tic-tac-


toe. We show part of the tree, giving
alternating moves by MIN (O) and MAX
(X), until we eventually reach terminal
states, which can be assigned utilities
according to the rules of the game.
Optimal Decisions in Games
Minimax algorithm
 Choose action leading to state with best minimax value
 Assumes all future moves will be optimal
 => rational against a rational player

11
Minimax Algorithm

12
Implementation

function minimax-decision(s) returns an action


return the action a in Actions(s) with the highest
minimax_value(Result(s,a))

function minimax_value(s) returns a value


if Terminal-Test(s) then return Utility(s)
if Player(s) = MAX then return maxa in Actions(s) minimax_value(Result(s,a))
if Player(s) = MIN then return mina in Actions(s) minimax_value(Result(s,a))
Minimax Efficiency

 How efficient is minimax?


 Just like (exhaustive) DFS
 Time: O(bm)
 Space: O(bm)

 Example: For chess, b  35, m  100


 Exact solution is completely infeasible
 Humans can’t do this either, so how do we
play chess?
Game Tree Pruning
Minimax Example
Minimax Example

3
Minimax Example

3 12
Minimax Example

3 12 8
Minimax Example

3 12 8
Minimax Example

3 12 8 2 4 6
Minimax Example

3 2

3 12 8 2 4 6
Minimax Example

3 2

3 12 8 2 4 6 14 5 2
Minimax Example

3 2 2

3 12 8 2 4 6 14 5 2
Minimax Example

3 2 2

3 12 8 2 4 6 14 5 2
Alpha-Beta Pruning
 minimax search is depth-first, so at any one time we just have to consider the nodes along a single
path in the tree.
 The number of game states is exponential in the depth of the tree.
 No algorithm can completely eliminate the exponent, but we can sometimes cut it in half,
computing the correct minimax decision without examining every state by pruning large parts of
the tree that make no difference to the outcome.
 The particular technique we examine is called alpha–beta pruning.

26
Alpha-Beta Example
α = best option so far from any
MAX node on this path

α =3 α =3
3

3 12 8 2 14 5 2

we can identify the minimax decision without ever The order of generation matters: more pruning
evaluating two of the leaf nodes is possible if good moves come first
Alpha-Beta Example

(a) The first leaf below B has the value (b) The second leaf below B has a value of
3. Hence, B, which is a MIN node, has 12; MIN would avoid this move, so the
a value of at most 3. value of B is still at most 3.

28
Alpha-Beta Example

(d) The first leaf below C has the value 2. Hence, C,


(c) The third leaf below B has a value of 8; we have
which is a MIN node, has a value of at most 2. But
seen all B’s successor states, so the value of B is
we know that B is worth 3, so MAX would never
exactly 3. Now we can infer that the value of the root
choose C. Therefore, there is no point in looking at
is at least 3, because MAX has a choice worth 3 at the
the other successor states of C. This is an example of
root.
alpha–beta pruning.

29
Alpha-Beta Example

(e) The first leaf below D has the value 14, so D is (f) The second successor of D is worth 5, so again we
worth at most 14. This is still higher than MAX’s best need to keep exploring. The third successor is worth
alternative (i.e., 3), so we need to keep exploring D’s 2, so now D is worth exactly 2. MAX’s decision at
successor states. Notice also that we now have bounds the root is to move to B, giving a value of 3.
on all of the successors of the root, so the root’s value
is also at most 14.

30
𝛼 − 𝛽 Pruning 𝛼: lower-bound of minimax value
𝛽: upper-bound of minimax value

32
Alpha-Beta Another Example

33
Alpha-Beta Example
 A maximizing player knows that, at the next step, the minimizing player will try to
achieve the lowest score. Suppose the maximizing player has three possible
actions, and the first one is valued at 4. Then the player starts generating the
value for the next action. To do this, the player generates the values of the
minimizer’s actions if the current player makes this action, knowing that the
minimizer will choose the lowest one. However, before finishing the computation
for all the possible actions of the minimizer, the player sees that one of the
options has a value of three. This means that there is no reason to keep on
exploring the other possible actions for the minimizing player. The value of the
not-yet-valued action doesn’t matter, be it 10 or (-10). If the value is 10, the
minimizer will choose the lowest option, 3, which is already worse than the
preestablished 4. If the not-yet-valued action would turn out to be (-10), the
minimizer will this option, (-10), which is even more unfavorable to the maximizer.
Therefore, computing additional possible actions for the minimizer at this point is
irrelevant to the maximizer, because the maximizing player already has an
unequivocally better choice whose value is 4.

34
Alpha-Beta Pruning
 General case (pruning children of MIN node)
 We’re computing the MIN-VALUE at some node n
 We’re looping over n’s children MAX
 n’s estimate of the childrens’ min is dropping
 Who cares about n’s value? MAX MIN a
 Let α be the best value that MAX can get so far at any choice point along
the current path from the root
 If n becomes worse than α, MAX will avoid it, so we can prune n’s other
children (it’s already bad enough that it won’t be played) MAX

 Pruning children of MAX node is symmetric


MIN n
 Let β be the best value that MIN can get so far at any choice point along
the current path from the root
Alpha-Beta Implementation

α: MAX’s best option on path to root


β: MIN’s best option on path to root

def max-value(state, α, β): def min-value(state , α, β):


initialize v = -∞ initialize v = +∞
for each successor of state: for each successor of state:
v = max(v, value(successor, α, β)) v = min(v, value(successor, α, β))
if v ≥ β if v ≤ α
return v return v
α = max(α, v) β = min(β, v)
return v return v
Alpha-Beta Pruning Properties
 Theorem: This pruning has no effect on minimax value computed for the root!

max
 Good child ordering improves effectiveness of pruning
 Iterative deepening helps with this
min

 With “perfect ordering”:


 Time complexity drops to O(bm/2)
 Doubles solvable depth! 10 10 0

 This is a simple example of metareasoning (reasoning about reasoning)

 For chess: only 3550 instead of 35100!! Yaaay!!!!!


Iterative Deepening Search
 If we only care about the game value, or the optimal action at
a specific game state, can we do better?
 Trick 3: Iterative Deepening Search
 Minimax Algorithm with varying depth limit 𝑑
 Can be integrated with 𝛼 − 𝛽 pruning: use result of a small 𝑑
to decide the ordering when search for a larger 𝑑

38
Resource Limits
 Problem: In realistic games, cannot search to leaves! 4 max
-2 4 min
 Solution 1: Bounded lookahead
 Search only to a preset depth limit or horizon -1 -2 4 9
 Use an evaluation function for non-terminal positions

 Guarantee of optimal play is gone

 More plies make a BIG difference

 Example:
 Suppose we have 100 seconds, can explore 10K nodes
/ sec ? ? ? ?
 So can check 1M nodes per move
 Chess with alpha-beta, 35(8/2) =~ 1M; depth 8 is good

You might also like