Adversarial search

• At least two agents and a
competitive environment: Games,
Artificial Intelligence economies.

Adversarial search • Games and AI:
– Generally considered to require
Chapter 6, AIMA
intelligence (to win)
– Have to evolve in real-time
– Well-defined and limited environment
This presentation owes a lot to V. Pavlovic @ Rutgers, who borrowed from J. D. Skrentny, who in turn borrowed from C. Dyer,...

Board games © Thierry Dichtenmuller

Games & AI

Deterministic Chance

perfect info Checkers, Backgammon,
Chess, Go, Monopoly
Othello

imperfect info Bridge, Poker,
Scrabble

Games and search Example: Tic-tac-toe
Traditional search: single agent, searches for its • Initial state: 3×3 empty
well-being, unobstructed table.
Games: search against an opponent • Successor function:
Players take turns marking
± or | in the table cells.
Example: two player board game (chess, checkers, • Goal state: When all the Initial state
tic-tac-toe,…) table cells are filled or
Board configuration: unique arrangement of "pieces“ when either player has ± ± ± ± ±
three symbols in a row. ±| |
Representing board games as goal-directed search | ±± ±|
• Utility function: +1 for ± ± |± ±
problem (states = board configurations): three in a row, -1 if the |±± ±| |± ±
– Initial state: Current board configuration opponent has three in a |
row, 0 if the table is filled ± | ± ± ||
± =0 |±±
Goal state |
– Successor function: Legal moves ± ±
– Goal state: Winning/terminal board configuration and no-one has three Utility |±
symbols in a row. ± Goal state
– Utility function: Current board configuration
| Utility = +1 | ± ±
| Goal state
| Utility = -1

1
The minimax principle Example: Tic-tac-toe
Assume the opponent plays to win and ± |

always makes the best possible move. Your (MAX) move | ± ±
(±) |

The minimax value for a node = the utility
for you of being in that state, assuming
that both players (you and the opponent) Assignment: Expand this tree to the end of the game.

play optimally from there on to the end.

Terminology:
MAX = you, MIN = the opponent.

Example: Tic-tac-toe Example: Tic-tac-toe
± | ± |
Your (MAX) move | ± ± Your (MAX) move | ± ±
| |

± ± | ± | ± | ± ± | ± | ± |
Opponent Opponent
| ± ± | ± ± | ± ± | ± ± | ± ± | ± ±
(MIN) move (MIN) move
| ± | | ± | ± | | ±

± ± | ± ± | | ± | ± | | ± | ± | ± ± | ± ± | | ± | ± | | ± | ± |
Your Your
| ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ±
(MAX) (MAX)
move | | | | ± | ± | | | ± | | ± move | | | | ± | ± | | | ± | | ±
Minimax +1 0 0 0 0 +1
value
± ± | ± ± | | ± | ± ± | | ± | ± ± | ± ± | ± ± | | ± | ± ± | | ± | ± ± |
| ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ±
| | ± ± | | ± | ± ± | | ± | ± | | ± | | ± ± | | ± | ± ± | | ± | ± | | ±
Utility = +1 Utility = 0 Utility = 0 Utility = 0 Utility = 0 Utility = +1 Utility = +1 Utility = 0 Utility = 0 Utility = 0 Utility = 0 Utility = +1

Example: Tic-tac-toe Example: Tic-tac-toe
± | ± |
Your (MAX) move | ± ± Your (MAX) move | ± ±
Minimax
| | value
0

± ± | ± | ± | ± ± | ± | ± |
Opponent Opponent
| ± ± | ± ± | ± ± | ± ± | ± ± | ± ±
(MIN) move (MIN) move
| ± | | ± | ± | | ±
Minimax Minimax
value 0 0 0 value 0 0 0

± ± | ± ± | | ± | ± | | ± | ± | ± ± | ± ± | | ± | ± | | ± | ± |
Your Your
| ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ±
(MAX) (MAX)
move | | | | ± | ± | | | ± | | ± move | | | | ± | ± | | | ± | | ±
Minimax +1 0 0 0 0 +1 Minimax +1 0 0 0 0 +1
value value
± ± | ± ± | | ± | ± ± | | ± | ± ± | ± ± | ± ± | | ± | ± ± | | ± | ± ± |
| ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ±
| | ± ± | | ± | ± ± | | ± | ± | | ± | | ± ± | | ± | ± ± | | ± | ± | | ±
Utility = +1 Utility = 0 Utility = 0 Utility = 0 Utility = 0 Utility = +1 Utility = +1 Utility = 0 Utility = 0 Utility = 0 Utility = 0 Utility = +1

2
The minimax value The minimax algorithm
1. Start with utilities of terminal nodes
Minimax value for node n = 2. Propagate them back to root node by choosing the
minimax strategy
A
A
1 max
Utility(n) If n is a terminal node
B
B C
C D
D E
E
Max(Minimax-values of successors) If n is a MAX node -5 -6 0 1
min
Min(Minimax-values of successors) If n is a MIN node F G H I J K L M N O
-7
7 -5 3 9 -6 0 2 1 3 2

High utility favours you (MAX), therefore choose move with highest utility

Low utility favours the opponent (MIN), therefore choose move with
lowest utility
Figure borrowed from V. Pavlovic

The minimax algorithm The minimax algorithm
1. Start with utilities of terminal nodes 1. Start with utilities of terminal nodes
2. Propagate them back to root node by choosing the 2. Propagate them back to root node by choosing the
minimax strategy minimax strategy
A
A A
A
1 max 1 max

B
B C
C D
D E
E B
B C
C D
D E
E
-5 -6 0 1 -5 -6 0 1
min min
F G H I J K L M N O F G H I J K L M N O
-7
7 -5 3 9 -6 0 2 1 3 2 -7
7 -5 3 9 -6 0 2 1 3 2

Figure borrowed from V. Pavlovic Figure borrowed from V. Pavlovic

Complexity of minimax algorithm Strategies to improve minimax
• A depth-first search 1. Remove redundant search paths
– Time complexity O(bd) - symmetries
– Space complexity O(bd)
2. Remove uninteresting search paths
• Time complexity impossible in real games (with - alpha-beta pruning
time constraints) except in very simple games 3. Cut the search short before goal
(e.g. tic-tac-toe) - Evaluation functions
4. Book moves

3
First three levels of the tic-tac-toe state space reduced by symmetry

1. Remove redundant paths

3 states
(instead of 9)

Tic-tac-toe has mirror symmetries

and rotational symmetries
12 states
(instead of
8·9 = 72)

|± ± | |
| = | = | = |
| ± ± Image from G. F. Luger, ”Artificial Intelligence”, 2002

2. Remove uninteresting paths Alpha-Beta Example
If the player has a better choice
m at n’s parent node, or at minimax(A,0,4)
any node further up, then
node n will never be reached. minimax(node, level, depth limit)

Prune the entire path below node
m’s parent node (except for
the path that m belongs to, max A Call
A
α=?
and paths that are equal to Stack
this path). D
B C 0 E
Minimax is depth-first → keep
track of highest (α) and G H I L
lowest (β) values so far. F -5 -10 8 J K 2 M

Called alpha-beta pruning. N P Q R S T U V
4 O 9 -6 0 3 5 -7 -9

W X A
-3 -5
Slide adapted from V. Pavlovic

Alpha-Beta Example Alpha-Beta Example
minimax(B,1,4) minimax(F,2,4)

max A Call max A Call
α=? Stack α=? Stack
min BB C D E min B C D E
β=? 0 β=? 0

F G H I J K L M max FF G H I J K L M
-5 -10 8 2 α=? -5 -10 8 2

N O P Q R S T U V N O P Q R S T U V F
4 9 -6 0 3 5 -7 -9 4 9 -6 0 3 5 -7 -9
B B
W X A W X A
-3 -5 -3 -5
Slide adapted from V. Pavlovic Slide adapted from V. Pavlovic

4
Alpha-Beta Example Alpha-Beta Example
minimax(N,3,4) minimax(F,2,4) is returned to

alpha = 4, maximum seen so far

max A Call A Call
α=? max α=?
Stack Stack
min B C D E min B C D E
β=? 0 β=? 0

max F G H I J K L M max F G H I J K L M
α=? -5 -10 8 2 α=4
α= -5 -10 8 2
N
N O P Q R S T U V F N O P Q R S T U V F
4 9 -6 0 3 5 -7 -9 4 9 -6 0 3 5 -7 -9
B B
W X gold: terminal state A W X gold: terminal state A
-3 -5 -3 -5
Slide adapted from V. Pavlovic Slide adapted from V. Pavlovic

Alpha-Beta Example Alpha-Beta Example
minimax(O,3,4) minimax(W,4,4)

A Call max A Call
max α=? α=?
Stack Stack

min B C D E min B C D E
β=? 0 β=? 0

max F G H I J K L M max F G H I J K L M
α=4 -5 -10 8 2 α=4 -5 -10 8 2 W
O O
min N O
O P Q R S T U V F min N O P Q R S T U V F
4 β=? 9 -6 0 3 5 -7 -9 4 β=? 9 -6 0 3 5 -7 -9
B B
W X gold: terminal state A W X gold: terminal state (depth limit) A
-3 -5 -3 -5
Slide adapted from V. Pavlovic Slide adapted from V. Pavlovic

Alpha-Beta Example Alpha-Beta Example
minimax(O,3,4) is returned to

beta = -3, minimum seen so far O's beta (-3) < F's alpha (4): Stop expanding O (alpha cut-off)

max A Call A Call
α=? max α=?
Stack Stack

min B C D E min B C D E
β=? 0 β=? 0

max F G H I J K L M max F G H I J K L M
α=4 -5 -10 8 2 α=4 -5 -10 8 2
O O
min N O P Q R S T U V F min N O P Q R S T U V F
4 β=-3
β= 9 -6 0 3 5 -7 -9 4 β=-3 9 -6 0 3 5 -7 -9
B B
W X gold: terminal state (depth limit) A W X gold: terminal state (depth limit) A
-3 -5 -3 -5
Slide adapted from V. Pavlovic Slide adapted from V. Pavlovic

5
Alpha-Beta Example Alpha-Beta Example
Why?
Smart opponent selects W or worse → O's upper bound is –3 minimax(F,2,4) is returned to
So MAX shouldn't select O:-3 since N:4 is better
alpha not changed (maximizing)

A Call A Call
max α=? max α=?
Stack Stack

min B C D E min B C D E
β=? 0 β=? 0

max F G H I J K L M max F G H I J K L M
α=4 -5 -10 8 2 α=4 -5 -10 8 2
O
min N O P Q R S T U V F min N O P Q R S T U V F
4 β=-3 9 -6 0 3 5 -7 -9 4 β=-3 9 -6 0 3 5 -7 -9
B B
W X gold: terminal state (depth limit) A W X gold: terminal state (depth limit) A
-3 -5 -3 -5
Slide adapted from V. Pavlovic Slide adapted from V. Pavlovic

Alpha-Beta Example Alpha-Beta Example
minimax(B,1,4) is returned to minimax(G,2,4)

beta = 4, minimum seen so far

A Call A Call
max α=? max α=?
Stack Stack

min B C D E min B C D E
β=4
β= 0 β=4
β= 0

max F G H I J K L M max F G H I J K L M
α=4 -5 -10 8 2 α=4 -5 -10 8 2

min N O P Q R S T U V min N O P Q R S T U V G
4 β=-3 9 -6 0 3 5 -7 -9 4 β=-3 9 -6 0 3 5 -7 -9
B B
W X gold: terminal state (depth limit) A W X gold: terminal state (depth limit) A
-3 -5 -3 -5
Slide adapted from V. Pavlovic Slide adapted from V. Pavlovic

Alpha-Beta Example Alpha-Beta Example
minimax(B,1,4) is returned to minimax(A,0,4) is returned to

beta = -5, minimum seen so far alpha = -5, maximum seen so far

A Call A Call
max α=? max α=-5
α=?
Stack Stack

min B C D E min B C D E
β=-5
β=4
β= 0 β=-5
β=4
β= 0

max F G H I J K L M max F G H I J K L M
α=4 -5 -10 8 2 α=4 -5 -10 8 2

min N O P Q R S T U V min N O P Q R S T U V
4 β=-3 9 -6 0 3 5 -7 -9 4 β=-3 9 -6 0 3 5 -7 -9
B
W X gold: terminal state (depth limit) A W X gold: terminal state (depth limit) A
-3 -5 -3 -5
Slide adapted from V. Pavlovic Slide adapted from V. Pavlovic

6
Alpha-Beta Example Alpha-Beta Example
minimax(C,1,4) minimax(H,2,4)

A Call A Call
max α=-5
α=? max α=-5
α=?
Stack Stack

min B CC D E min B CC D E
β=-5
β=4
β= β=? 0 β=-5
β=4
β= β=? 0

max F G H I J K L M max F G H I J K L M
α=4 -5 -10 8 2 α=4 -5 -10 8 2

min N O P Q R S T U V min N O P Q R S T U V H
4 β=-3 9 -6 0 3 5 -7 -9 4 β=-3 9 -6 0 3 5 -7 -9
C C
W X gold: terminal state (depth limit) A W X gold: terminal state (depth limit) A
-3 -5 -3 -5
Slide adapted from V. Pavlovic Slide adapted from V. Pavlovic

Alpha-Beta Example Alpha-Beta Example
minimax(C,1,4) is returned to
beta = -10, minimum seen so far C's beta (-10) < A's alpha (-5): Stop expanding C (alpha cut-off)

A Call A Call
max α=-5
α=? max α=-5
α=?
Stack Stack

min B CC D E min B CC D E
β=-5
β=4
β= β=-10
β=? 0 β=-5
β=4
β= β=-10
β=? 0

max F G H I J K L M max F G H I J K L M
α=4 -5 -10 8 2 α=4 -5 -10 8 2

min N O P Q R S T U V min N O P Q R S T U V
4 β=-3 9 -6 0 3 5 -7 -9 4 β=-3 9 -6 0 3 5 -7 -9
C C
W X gold: terminal state (depth limit) A W X gold: terminal state (depth limit) A
-3 -5 -3 -5
Slide adapted from V. Pavlovic Slide adapted from V. Pavlovic

Alpha-Beta Example Alpha-Beta Example
minimax(D,1,4) minimax(D,1,4) is returned to

A Call A Call
max α=-5
α=? max α=-5
α=0
α=?
Stack Stack

min B CC D E min B CC D E
β=-5
β=4
β= β=-10
β=? 0 β=-5
β=4
β= β=-10
β=? 0

max F G H I J K L M max F G H I J K L M
α=4 -5 -10 8 2 α=4 -5 -10 8 2

min N O P Q R S T U V min N O P Q R S T U V
4 β=-3 9 -6 0 3 5 -7 -9 4 β=-3 9 -6 0 3 5 -7 -9
D
W X gold: terminal state (depth limit) A W X gold: terminal state (depth limit) A
-3 -5 -3 -5
Slide adapted from V. Pavlovic Slide adapted from V. Pavlovic

7
Alpha-Beta Example Alpha-Beta Example
minimax(D,1,4) is returned to

Which nodes will Which nodes will
A Call A Call
max α=-5
α=0
α=? be expanded? max α=-5
α=0
α=? be expanded?
Stack Stack

min B CC D E min B CC D E
E
β=-5
β=4
β= β=-10
β=? 0 β=-5
β=4
β= β=-10
β=? 0 β=-7

max F G H I J K L M max F G H I J KK L M
M
α=4 -5 -10 8 2 α=4 -5 -10 8 α=5 2 α=-7

min N O P Q R S T U V min N O P Q R S T U V
4 β=-3 9 -6 0 3 5 -7 -9 4 β=-3 9 -6 0 3 5 -7 -9

W X gold: terminal state (depth limit) A W X gold: terminal state (depth limit) All A
-3 -5 -3 -5
Slide adapted from V. Pavlovic Slide adapted from V. Pavlovic

Alpha-Beta Example Alpha-Beta Example
minimax(D,1,4) is returned to

What if we expand What if we expand
A Call A Call
max α=-5
α=0
α=? from right to left? max α=-5
α=0
α=? from right to left?
Stack Stack

min B CC D E min B CC D E
E
β=-5
β=4
β= β=-10
β=? 0 β=-5
β=4
β= β=-10
β=? 0 β=-7

max F G H I J K L M max F G H I J K L M
M
α=4 -5 -10 8 2 α=4 -5 -10 8 2 α=-7

min N O P Q R S T U V min N O P Q R S T U V
4 β=-3 9 -6 0 3 5 -7 -9 4 β=-3 9 -6 0 3 5 -7 -9

W X gold: terminal state (depth limit) A W X gold: terminal state (depth limit) Only 4 A
-3 -5 -3 -5
Slide adapted from V. Pavlovic Slide adapted from V. Pavlovic

Alpha-Beta pruning rule Alpha-Beta pruning rule
Stop expanding Stop expanding
max node n if α(n) > β higher in the tree max node n if α(n) > β higher in the tree
min node n if β(n) < α higher in the tree min node n if β(n) < α higher in the tree
α= 4

β=3 β=3
ββ==84
α=
α=48 α= 3
α=
α=24
β=2 β=2 β=2
β=3

Which nodes will not be expanded when expanding from left to right?

8
Alpha-Beta pruning rule Alpha-Beta pruning rule
Stop expanding Stop expanding
max node n if α(n) > β higher in the tree max node n if α(n) > β higher in the tree
min node n if β(n) < α higher in the tree min node n if β(n) < α higher in the tree
α= 3

ββ==78 β=3
β=9
ββ==34
α= 8 α=32 α= 9
α=
α= 4
ββ==92 β=2 β=3 β=2

Which nodes will not be expanded when expanding from left to right? Which nodes will not be expanded when expanding from right to left?

Alpha-Beta pruning rule 3. Cut the search short
Stop expanding • Use depth-limit and estimate utility for
max node n if α(n) > β higher in the tree non-terminal nodes (evaluation function)
min node n if β(n) < α higher in the tree – Static board evaluation (SBE)
– Must be easy to compute

Example, chess:
SBE = α " Material Balance"+ β " Center Control"+γ ...
Material balance = value of white pieces – value of black pieces, where
pawn = +1, knight & bishop = +3, rook = +5, queen = +9, king = ?

The parameters (α,β,γ,...) can be learned (adjusted) from
Which nodes will not be expanded when expanding from right to left? experience.

http://en.wikipedia.org/wiki/Computer_chess

Leaf evaluation 4. Book moves
For most chess positions, computers cannot look ahead to all final possible
positions. Instead, they must look ahead a few plies and then evaluate the
final board position. The algorithm that evaluates final board positions is • Build a database (look-up table) of
termed the "evaluation function", and these algorithms are often vastly
different between different chess programs. endgames, openings, etc.
Nearly all evaluation functions evaluate positions in units and at the least
consider material value. Thus, they will count up the amount of material on
• Use this instead of minimax when
the board for each side (where a pawn is worth exactly 1 point, a knight is
worth 3 points, a bishop is worth 3 points, a rook is worth 5 points and a
possible.
queen is worth 9 points). The king is impossible to value since its loss causes
the loss of the game. For the purposes of programming chess computers,
however, it is often assigned a value of appr. 200 points.

Evaluation functions take many other factors into account, however, such as
pawn structure, the fact that doubled bishops are usually worth more,
centralized pieces are worth more, and so on. The protection of kings is
usually considered, as well as the phase of the game (opening, middle or
endgame).

9
http://en.wikipedia.org/wiki/Computer_chess

Using endgame databases Games with chance

Nalimov Endgame Tablebases, which use state-of-the-art compression • Dice games, card games,...
techniques, require 7.05 GB of hard disk space for all five-piece endings. It is
estimated that to cover all the six-piece endings will require at least 1 terabyte.
Seven-piece tablebases are currently a long way off.
• Extend the minimax tree with chance
While Nalimov Endgame Tablebases handle en passant positions, they assume
layers.
that castling is not possible. This is a minor flaw that is probably of interest only A
to endgame specialists. α=4
α= max
Compute the expected
More importantly, they do not take into account the fifty move rule. Nalimov value over outcomes.
50/50 50/50
Endgame Tablebases only store the shortest possible mate (by ply) for each 50/50 50/50 chance
4 -2
position. However in certain rare positions, the stronger side cannot force win
before running into the fifty move rule. A chess program that searches the Select move with
.5 .5 .5 .5
database and obtains a value of mate in 85 will not be able to know in advance if the highest B C D E
β=2 β=6 β=0 β=-4 min
such a position is actually a draw according to the fifty move rule, or if it is a win, expected value.
because there will be a piece exchange or pawn move along the way. Various
solutions including the addition of a "distance to zero" (DTZ) counter have been 7 2 9 6 5 0 8 -4
proposed to handle this problem, but none have been implemented yet.

Animation adapted from V. Pavlovic

10