You are on page 1of 46

Adversarial Search

SE-805 Advance AI – Lecture

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 1


Adversarial Search
• Adversarial search problems ≡ games
• They occur in multiagent competitive environments
• There is an opponent we can’t control planning again us!
• Game vs. search: optimal solution is not a sequence of actions
but a strategy (policy)
– If opponent does a, agent does b, else if opponent does c, agent does
d, etc.
• Tedious and fragile if hard-coded (i.e., implemented with rules)
• Good news: Games are modeled as search problems and use
heuristic evaluation functions.

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 2


Games: hard topic
• Games are a big deal in AI
• Games are interesting to AI because they are hard to solve
• Chess has an average branching factor of 35, with 35100 nodes ≈
10154
• Need to make some decision even when the optimal decision is
infeasible

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 3


Adversarial Search
• Checkers:
– Chinook ended 40-year-reign of human world champion Marion
Tinsley in 1994
– Used an endgame database defining perfect play for all positions
involving 8 or fewer pieces on the board, a total of 443,748,401,247
positions

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 4


Adversarial Search
• Chess:
– IBM Deep Blue (using alpha beta search algo) defeated human world
champion Gary Kasparov in a six-game match in 1997
– Chess engine Stockfish uses multiple evaluations + searches and is
almost unbeatable

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 5


Adversarial Search
• Go: 𝑏 > 300
– Google DeepMind project AlphaGo, in 2016, beat 18-time world
champion Lee Sedol
– AlphaGo uses deep reinforcement learning for self training using
Monte Carlo Tree Search

https://www.youtube.com/watch?v=WXuK6gekU1Y
Khawir Mahmood SE-805 AAI - Adversarial Search and Games 6
Types of Games
Deterministic chance
perfect information chess, backgammon,
checkers, go… monopoly…
imperfect information battleships, bridge, poker,
blind tictactoe… scrabble…

We are mostly interested in deterministic games, fully observable


environments, zero-sum, where two agents act alternately

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 7


Zero-sum Games
• Adversarial: Pure competition
• One person’s gain is equivalent to another’s loss
– Net change in wealth or benefit is zero
• E.g. Poker and gambling
– the sum of the amounts won by some players equals the combined
losses of the others

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 8


Game Formalization
• Game
– 𝑆0 : initial state
– 𝑃𝑙𝑎𝑦𝑒𝑟(𝑠) : returns which player to move in state 𝑠
– 𝐴𝑐𝑡𝑖𝑜𝑛𝑠(𝑠) : returns legal moves in state 𝑠
– 𝑅𝑒𝑠𝑢𝑙𝑡(𝑠, 𝑎) : returns state after action 𝑎 taken in state 𝑠
– 𝑇𝑒𝑟𝑚𝑖𝑛𝑎𝑙(𝑠) : checks if state 𝑠 is a terminal state
– 𝑈𝑡𝑖𝑙𝑖𝑡𝑦(𝑠) : final numerical value for terminal state 𝑠

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 9


Game Formalization
• Initial state

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 10


Game Formalization
• 𝑃𝑙𝑎𝑦𝑒𝑟(𝑠)

Player( ) = X

Player( X ) = O

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 11


Game Formalization
• 𝐴𝑐𝑡𝑖𝑜𝑛𝑠(𝑠)

X O O
Actions( O X X ) = { , }
X O O

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 12


Game Formalization
• 𝑅𝑒𝑠𝑢𝑙𝑡(𝑠, 𝑎)

X O O O X O
Result( O X X , ) = O X X
X O X O

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 13


Game Formalization
• 𝑇𝑒𝑟𝑚𝑖𝑛𝑎𝑙(𝑠)

O
Terminal( O X ) = false
X O X

O X
Terminal( O X ) = true
X O X

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 14


Game Formalization
• 𝑈𝑡𝑖𝑙𝑖𝑡𝑦(𝑠)

O X
Utility( O X ) = 1
X O X

O X X
Utility( O O ) = -1
O X O

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 15


Single Player

• In case of 1 player, nothing will prevent Max from winning


• Unless there is another player Min who will do everything to make Max lose
Khawir Mahmood SE-805 AAI - Adversarial Search and Games 16
Minimax
• Two players: Max and Min
• Players alternate turns
– Max moves first
• Max maximizes results & Min minimizes results
• Compute each node’s minimax values
– The best achievable utility against an optimal adversary

𝑀𝑖𝑛𝑖𝑚𝑎𝑥 𝑣𝑎𝑙𝑢𝑒 ≡ 𝑏𝑒𝑠𝑡 𝑎𝑐ℎ𝑖𝑒𝑣𝑎𝑏𝑙𝑒 𝑝𝑙𝑎𝑦𝑜𝑓𝑓 𝑎𝑔𝑎𝑖𝑛𝑠𝑡 𝑏𝑒𝑠𝑡 𝑝𝑙𝑎𝑦

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 17


Minimax

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 18


Minimax

O X X X O X O X X
O O O O X X O
O X X X X O X O X

-1 0 1

• Max(X) aims to maximize score


• Min(O) aims to minimize score
Khawir Mahmood SE-805 AAI - Adversarial Search and Games 19
Minimax
• 𝑃𝑙𝑎𝑦𝑒𝑟 𝑠 = O X O
Min-Value:
0 O X X
X O

O X O X O
Max-Value: Max-Value:
O X X O X X
1 0
X O X O O

O X X X X O
Value: Value:
O X X O X X
1 0
X X O X O O
Khawir Mahmood SE-805 AAI - Adversarial Search and Games 20
Minimax
• 𝑃𝑙𝑎𝑦𝑒𝑟 𝑠 = X Max-Value: X O
1 O X
X O

Min-Value: X O Min-Value: X X O X O
Value:
0 O X X -1 O X O X
1
X O X O X X O

O X O X O O X O X X O
Max-Value: O X X Max-Value: O X X Value: O X O Max-Value: O X
1 0 -1 X O
0 X O O
X O X O O

O X X X X O X X O
Value: O X X Value: O X X Value: O X X
1 X X O
0 X O O
0 X O O

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 21


Minimax
• Find the optimal strategy for Max:
– Depth-first search of the game tree
– An optimal leaf node could appear at any depth of the tree
• Minimax principle:
– compute the utility of being in a state assuming both players play
optimally from there until the end of the game
– Propagate minimax values up the tree once terminal nodes are
discovered

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 22


Minimax

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 23


Minimax

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 24


Minimax

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 25


Minimax

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 26


Minimax

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 27


Minimax
• Optimal (opponent plays optimally) and complete (finite tree)
• DFS time: O(bm)
• DFS space: O(bm)

• Tic-Tac-Toe
– ≈ 5 legal moves on average, total of 9 moves (9 plies)
– 59 = 1, 953, 125
– 9! = 362, 880 terminal nodes
• Chess
– b ≈ 35 (average branching factor)
– d ≈ 100 (depth of game tree for a typical game)
– bd ≈ 35100 ≈ 10154 nodes
• Go branching factor starts at 361 (19x19 board)
Khawir Mahmood SE-805 AAI - Adversarial Search and Games 28
Minimax
• Given a state s :
– Max picks action a in Actions(s) that produces highest value of
Min-Value(Result(s, a))
– Min picks action a in Actions(s) that produces smallest value of
Max-Value(Result(s, a))

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 29


Minimax
function Max-Value(state): function Min-Value(state):
if Terminal(state): if Terminal(state):
return Utility(state) return Utility(state)
v = -∞ v=∞
for action in Actions(state): for action in Actions(state):
v = Max(v, Min-Value( v = Min(v, Max-Value(
Result(state, action))) Result(state, action)))
return v return v

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 30


Case of limited resources
• Problem: In real games, we are limited in time, so we can’t
search the leaves
– To be practical and run in a reasonable amount of time, minimax can
only search to some depth
▪ More plies (moves) make a big difference

• Solution:
1. Replace terminal utilities with an evaluation function for non-terminal
positions
2. Use Iterative Deepening Search (IDS)
3. Use pruning: eliminate large parts of the tree

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 31


α – β pruning

• A two-ply game tree

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 32


α – β pruning

3 2 2

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 33


α – β pruning
3

3 2 2

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 34


α – β pruning
3

3 2 2

X Y

• Which values are necessary?

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 35


α – β pruning
• Strategy: Just like minimax, it performs a DFS
• Parameters: Keep track of two bounds
– α : largest value for Max across seen children (current lower bound on MAX’s
outcome)
– β : lowest value for MIN across seen children (current upper bound on MIN’s
outcome)
• Initialization: α = -∞, β = ∞
• Propagation: Send α, β values down during the search to be used for
pruning
– Update α, β values by propagating upwards values of terminal nodes
– Update α only at Max nodes and update β only at Min nodes

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 36


α – β pruning

• If α is better than a for Max, then • If β is better than b for Min, then
Max will avoid it, that is prune that Min will avoid it, that is prune that
branch branch
Khawir Mahmood SE-805 AAI - Adversarial Search and Games 37
Move Ordering

• It does matter as it affects the effectiveness of α – β pruning


• Example: We could not prune any successor of D because the worst
successors for Min were generated first
– If the third one (leaf 2) was generated first we would have pruned the two
others (14 and 5)
• Idea of ordering: examine first successors that are likely best
Khawir Mahmood SE-805 AAI - Adversarial Search and Games 38
Move ordering
• Worst ordering:
– no pruning happens (best moves are on the right of the game tree)
– Complexity O(bm)
• Ideal ordering:
– lots of pruning happens (best moves are on the left of the game tree)
– This solves tree twice as deep as minimax in the same amount of time
– Complexity O(bm/2)
• How to find a good ordering?
– Remember the best moves from shallowest nodes
– Order the nodes so as the best are checked first
– Use domain knowledge: e.g., for chess, try order: captures first, then threats, then
forward moves, backward moves
– Bookkeep the states, they may repeat!

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 39


Real-time decisions
• Minimax: generates the entire game search space
• α – β algorithm: prune large chunks of the trees
– BUT still has to go all the way to the leaves (impractical in real-time)
• Solution:
– bound the depth of search (cut off search) and
– replace utiliy(s) with eval(s) to estimate value of current board configurations
▪ An ideal evaluation function would rank terminal states in the same way as the true
utility function; but must be fast
▪ Use domain knowledge to craft useful features and make the function a linearly
weighted sum of these features
𝑛

𝑒𝑣𝑎𝑙 𝑠 = ෍ 𝑤𝑖 𝑓𝑖 (𝑠)
𝑖=1

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 40


Stochastic games
• Include a random element (e.g. throwing a die)
→ chance nodes

• Backgammon: old board game combining skills and chance


– The goal is that each player tries to move all of his pieces off the board before his
opponent does

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 41


Stochastic games

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 42


Stochastic games
• Algorithm Expectiminimax – generalized Minimax to handle
chance nodes

𝑈𝑡𝑖𝑙𝑖𝑡𝑦 𝑠 𝑖𝑓 𝑇𝑒𝑟𝑚𝑖𝑛𝑎𝑙 − 𝑡𝑒𝑠𝑡(𝑠)


𝑚𝑎𝑥𝑎∈𝐴𝑐𝑡𝑖𝑜𝑛𝑠 𝑠 𝐸𝑥𝑝𝑒𝑐𝑡𝑖𝑚𝑖𝑛𝑖𝑚𝑎𝑥 𝑅𝑒𝑠𝑢𝑙𝑡 𝑠, 𝑎 𝑖𝑓 𝑝𝑙𝑎𝑦𝑒𝑟 𝑠 = 𝑀𝑎𝑥
Expectiminimax(s) = 𝑚𝑖𝑛𝑎∈𝐴𝑐𝑡𝑖𝑜𝑛𝑠 𝑠 𝐸𝑥𝑝𝑒𝑐𝑡𝑖𝑚𝑖𝑛𝑖𝑚𝑎𝑥 𝑅𝑒𝑠𝑢𝑙𝑡 𝑠, 𝑎 𝑖𝑓 𝑝𝑙𝑎𝑦𝑒𝑟 𝑠 = 𝑀𝑖𝑛

෍ 𝑃 𝑟 𝐸𝑥𝑝𝑒𝑐𝑡𝑖𝑚𝑖𝑛𝑖𝑚𝑎𝑥 𝑅𝑒𝑠𝑢𝑙𝑡 𝑠, 𝑟 𝑖𝑓 𝑃𝑙𝑎𝑦𝑒𝑟 𝑠 = 𝐶ℎ𝑎𝑛𝑐𝑒


𝑟

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 43


Stochastic games
• Example with coin-flipping

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 44


Khawir Mahmood SE-805 AAI - Adversarial Search and Games 45
Credit
• Artificial Intelligence: A Modern Approach, 3rd Edition by Stuart
J. Russel and Peter Norvig
– Chapter 5

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 46

You might also like