Lec4 - Adversarial Search and Games

Adversarial Search
SE-805 Advance AI – Lecture
Khawir Mahmood SE-805 AAI - Adversarial Search and Games 1

Adversarial Search
• Adversarial search problems ≡ games
• They occur in multiagent competitive environments
• There is an opponent we can’t control planning again us!
• Game vs. search: optimal solution is not a sequence of actions
but a strategy (policy)
– If opponent does a, agent does b, else if opponent does c, agent does
d, etc.
• Tedious and fragile if hard-coded (i.e., implemented with rules)
• Good news: Games are modeled as search problems and use
heuristic evaluation functions.

Games: hard topic
• Games are a big deal in AI
• Games are interesting to AI because they are hard to solve
• Chess has an average branching factor of 35, with 35100 nodes ≈
10154
• Need to make some decision even when the optimal decision is
infeasible

Adversarial Search
• Checkers:
– Chinook ended 40-year-reign of human world champion Marion
Tinsley in 1994
– Used an endgame database defining perfect play for all positions
involving 8 or fewer pieces on the board, a total of 443,748,401,247
positions

Adversarial Search
• Chess:
– IBM Deep Blue (using alpha beta search algo) defeated human world
champion Gary Kasparov in a six-game match in 1997
– Chess engine Stockfish uses multiple evaluations + searches and is
almost unbeatable

Adversarial Search
• Go: 𝑏 > 300
– Google DeepMind project AlphaGo, in 2016, beat 18-time world
champion Lee Sedol
– AlphaGo uses deep reinforcement learning for self training using
Monte Carlo Tree Search
https://www.youtube.com/watch?v=WXuK6gekU1Y
Types of Games
Deterministic chance
perfect information chess, backgammon,
checkers, go… monopoly…
imperfect information battleships, bridge, poker,
blind tictactoe… scrabble…
We are mostly interested in deterministic games, fully observable

environments, zero-sum, where two agents act alternately

Zero-sum Games
• Adversarial: Pure competition
• One person’s gain is equivalent to another’s loss
– Net change in wealth or benefit is zero
• E.g. Poker and gambling
– the sum of the amounts won by some players equals the combined
losses of the others

Game Formalization
• Game
– 𝑆0 : initial state
– 𝑃𝑙𝑎𝑦𝑒𝑟(𝑠) : returns which player to move in state 𝑠
– 𝐴𝑐𝑡𝑖𝑜𝑛𝑠(𝑠) : returns legal moves in state 𝑠
– 𝑅𝑒𝑠𝑢𝑙𝑡(𝑠, 𝑎) : returns state after action 𝑎 taken in state 𝑠
– 𝑇𝑒𝑟𝑚𝑖𝑛𝑎𝑙(𝑠) : checks if state 𝑠 is a terminal state
– 𝑈𝑡𝑖𝑙𝑖𝑡𝑦(𝑠) : final numerical value for terminal state 𝑠

Game Formalization
• Initial state

Game Formalization
• 𝑃𝑙𝑎𝑦𝑒𝑟(𝑠)
Player( ) = X
Player( X ) = O

Game Formalization
• 𝐴𝑐𝑡𝑖𝑜𝑛𝑠(𝑠)
X O O
Actions( O X X ) = { , }
X O O

Game Formalization
• 𝑅𝑒𝑠𝑢𝑙𝑡(𝑠, 𝑎)
X O O O X O
Result( O X X , ) = O X X
X O X O

Game Formalization
• 𝑇𝑒𝑟𝑚𝑖𝑛𝑎𝑙(𝑠)
O
Terminal( O X ) = false
X O X
O X
Terminal( O X ) = true
X O X

Game Formalization
• 𝑈𝑡𝑖𝑙𝑖𝑡𝑦(𝑠)
O X
Utility( O X ) = 1
X O X
O X X
Utility( O O ) = -1
O X O

Single Player
• In case of 1 player, nothing will prevent Max from winning

• Unless there is another player Min who will do everything to make Max lose
Minimax
• Two players: Max and Min
• Players alternate turns
– Max moves first
• Max maximizes results & Min minimizes results
• Compute each node’s minimax values
– The best achievable utility against an optimal adversary
𝑀𝑖𝑛𝑖𝑚𝑎𝑥 𝑣𝑎𝑙𝑢𝑒 ≡ 𝑏𝑒𝑠𝑡 𝑎𝑐ℎ𝑖𝑒𝑣𝑎𝑏𝑙𝑒 𝑝𝑙𝑎𝑦𝑜𝑓𝑓 𝑎𝑔𝑎𝑖𝑛𝑠𝑡 𝑏𝑒𝑠𝑡 𝑝𝑙𝑎𝑦

Minimax

Minimax
O X X X O X O X X
O O O O X X O
O X X X X O X O X
-1 0 1
• Max(X) aims to maximize score

• Min(O) aims to minimize score
Minimax
• 𝑃𝑙𝑎𝑦𝑒𝑟 𝑠 = O X O
Min-Value:
0 O X X
X O
O X O X O
Max-Value: Max-Value:
O X X O X X
1 0
X O X O O
O X X X X O
Value: Value:
O X X O X X
1 0
X X O X O O
Minimax
• 𝑃𝑙𝑎𝑦𝑒𝑟 𝑠 = X Max-Value: X O
1 O X
X O
Min-Value: X O Min-Value: X X O X O
Value:
0 O X X -1 O X O X
1
X O X O X X O
O X O X O O X O X X O
Max-Value: O X X Max-Value: O X X Value: O X O Max-Value: O X
1 0 -1 X O
0 X O O
X O X O O
O X X X X O X X O
Value: O X X Value: O X X Value: O X X
1 X X O
0 X O O
0 X O O

Minimax
• Find the optimal strategy for Max:
– Depth-first search of the game tree
– An optimal leaf node could appear at any depth of the tree
• Minimax principle:
– compute the utility of being in a state assuming both players play
optimally from there until the end of the game
– Propagate minimax values up the tree once terminal nodes are
discovered

Minimax

Minimax

Minimax

Minimax

Minimax

Minimax
• Optimal (opponent plays optimally) and complete (finite tree)
• DFS time: O(bm)
• DFS space: O(bm)
• Tic-Tac-Toe
– ≈ 5 legal moves on average, total of 9 moves (9 plies)
– 59 = 1, 953, 125
– 9! = 362, 880 terminal nodes
• Chess
– b ≈ 35 (average branching factor)
– d ≈ 100 (depth of game tree for a typical game)
– bd ≈ 35100 ≈ 10154 nodes
• Go branching factor starts at 361 (19x19 board)
Minimax
• Given a state s :
– Max picks action a in Actions(s) that produces highest value of
Min-Value(Result(s, a))
– Min picks action a in Actions(s) that produces smallest value of
Max-Value(Result(s, a))

Minimax
function Max-Value(state): function Min-Value(state):
if Terminal(state): if Terminal(state):
return Utility(state) return Utility(state)
v = -∞ v=∞
for action in Actions(state): for action in Actions(state):
v = Max(v, Min-Value( v = Min(v, Max-Value(
Result(state, action))) Result(state, action)))
return v return v

Case of limited resources
• Problem: In real games, we are limited in time, so we can’t
search the leaves
– To be practical and run in a reasonable amount of time, minimax can
only search to some depth
▪ More plies (moves) make a big difference
• Solution:
1. Replace terminal utilities with an evaluation function for non-terminal
positions
2. Use Iterative Deepening Search (IDS)
3. Use pruning: eliminate large parts of the tree

α – β pruning
• A two-ply game tree

α – β pruning
3 2 2

α – β pruning
3
3 2 2

α – β pruning
3
3 2 2
X Y
• Which values are necessary?

α – β pruning
• Strategy: Just like minimax, it performs a DFS
• Parameters: Keep track of two bounds
– α : largest value for Max across seen children (current lower bound on MAX’s
outcome)
– β : lowest value for MIN across seen children (current upper bound on MIN’s
outcome)
• Initialization: α = -∞, β = ∞
• Propagation: Send α, β values down during the search to be used for
pruning
– Update α, β values by propagating upwards values of terminal nodes
– Update α only at Max nodes and update β only at Min nodes

α – β pruning
• If α is better than a for Max, then • If β is better than b for Min, then
Max will avoid it, that is prune that Min will avoid it, that is prune that
branch branch
Move Ordering
• It does matter as it affects the effectiveness of α – β pruning

• Example: We could not prune any successor of D because the worst
successors for Min were generated first
– If the third one (leaf 2) was generated first we would have pruned the two
others (14 and 5)
• Idea of ordering: examine first successors that are likely best
Move ordering
• Worst ordering:
– no pruning happens (best moves are on the right of the game tree)
– Complexity O(bm)
• Ideal ordering:
– lots of pruning happens (best moves are on the left of the game tree)
– This solves tree twice as deep as minimax in the same amount of time
– Complexity O(bm/2)
• How to find a good ordering?
– Remember the best moves from shallowest nodes
– Order the nodes so as the best are checked first
– Use domain knowledge: e.g., for chess, try order: captures first, then threats, then
forward moves, backward moves
– Bookkeep the states, they may repeat!

Real-time decisions
• Minimax: generates the entire game search space
• α – β algorithm: prune large chunks of the trees
– BUT still has to go all the way to the leaves (impractical in real-time)
• Solution:
– bound the depth of search (cut off search) and
– replace utiliy(s) with eval(s) to estimate value of current board configurations
▪ An ideal evaluation function would rank terminal states in the same way as the true
utility function; but must be fast
▪ Use domain knowledge to craft useful features and make the function a linearly
weighted sum of these features
𝑛
𝑒𝑣𝑎𝑙 𝑠 = ෍ 𝑤𝑖 𝑓𝑖 (𝑠)
𝑖=1

Stochastic games
• Include a random element (e.g. throwing a die)
→ chance nodes
• Backgammon: old board game combining skills and chance

– The goal is that each player tries to move all of his pieces off the board before his
opponent does

Stochastic games

Stochastic games
• Algorithm Expectiminimax – generalized Minimax to handle
chance nodes
𝑈𝑡𝑖𝑙𝑖𝑡𝑦 𝑠 𝑖𝑓 𝑇𝑒𝑟𝑚𝑖𝑛𝑎𝑙 − 𝑡𝑒𝑠𝑡(𝑠)

𝑚𝑎𝑥𝑎∈𝐴𝑐𝑡𝑖𝑜𝑛𝑠 𝑠 𝐸𝑥𝑝𝑒𝑐𝑡𝑖𝑚𝑖𝑛𝑖𝑚𝑎𝑥 𝑅𝑒𝑠𝑢𝑙𝑡 𝑠, 𝑎 𝑖𝑓 𝑝𝑙𝑎𝑦𝑒𝑟 𝑠 = 𝑀𝑎𝑥
Expectiminimax(s) = 𝑚𝑖𝑛𝑎∈𝐴𝑐𝑡𝑖𝑜𝑛𝑠 𝑠 𝐸𝑥𝑝𝑒𝑐𝑡𝑖𝑚𝑖𝑛𝑖𝑚𝑎𝑥 𝑅𝑒𝑠𝑢𝑙𝑡 𝑠, 𝑎 𝑖𝑓 𝑝𝑙𝑎𝑦𝑒𝑟 𝑠 = 𝑀𝑖𝑛
෍ 𝑃 𝑟 𝐸𝑥𝑝𝑒𝑐𝑡𝑖𝑚𝑖𝑛𝑖𝑚𝑎𝑥 𝑅𝑒𝑠𝑢𝑙𝑡 𝑠, 𝑟 𝑖𝑓 𝑃𝑙𝑎𝑦𝑒𝑟 𝑠 = 𝐶ℎ𝑎𝑛𝑐𝑒

𝑟

Stochastic games
• Example with coin-flipping

Credit
• Artificial Intelligence: A Modern Approach, 3rd Edition by Stuart
J. Russel and Peter Norvig
– Chapter 5

Lec4 - Adversarial Search and Games

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec4 - Adversarial Search and Games

Uploaded by

Copyright:

Available Formats

Adversarial Search

SE-805 Advance AI – Lecture

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 1

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 2

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 3

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 4

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 5

We are mostly interested in deterministic games, fully observable

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 7

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 8

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 9

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 10

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 11

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 12

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 13

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 14

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 15

• In case of 1 player, nothing will prevent Max from winning

𝑀𝑖𝑛𝑖𝑚𝑎𝑥 𝑣𝑎𝑙𝑢𝑒 ≡ 𝑏𝑒𝑠𝑡 𝑎𝑐ℎ𝑖𝑒𝑣𝑎𝑏𝑙𝑒 𝑝𝑙𝑎𝑦𝑜𝑓𝑓 𝑎𝑔𝑎𝑖𝑛𝑠𝑡 𝑏𝑒𝑠𝑡 𝑝𝑙𝑎𝑦

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 17

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 18

• Max(X) aims to maximize score

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 21

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 22

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 23

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 24

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 25

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 26

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 27

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 29

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 30

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 31

• A two-ply game tree

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 32

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 33

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 34

• Which values are necessary?

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 35

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 36

• It does matter as it affects the effectiveness of α – β pruning

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 39

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 40

• Backgammon: old board game combining skills and chance

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 41

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 42

𝑈𝑡𝑖𝑙𝑖𝑡𝑦 𝑠 𝑖𝑓 𝑇𝑒𝑟𝑚𝑖𝑛𝑎𝑙 − 𝑡𝑒𝑠𝑡(𝑠)

෍ 𝑃 𝑟 𝐸𝑥𝑝𝑒𝑐𝑡𝑖𝑚𝑖𝑛𝑖𝑚𝑎𝑥 𝑅𝑒𝑠𝑢𝑙𝑡 𝑠, 𝑟 𝑖𝑓 𝑃𝑙𝑎𝑦𝑒𝑟 𝑠 = 𝐶ℎ𝑎𝑛𝑐𝑒

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 43

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 44

Khawir Mahmood SE-805 AAI - Adversarial Search and Games 46

You might also like