You are on page 1of 84

Artificial Intelligence:

Adversarial Search

Instructor: Nguyễn Văn Vinh, UET - Hanoi VNU


2022

1 4/18/2022
Warm Up

How would you search for moves in Tic Tac Toe?

2 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Warm Up

How is Tic Tac Toe different from maze search?

3 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Warm Up

How is Tic Tac Toe different from maze search?

Multi-Agent, Adversarial, Zero Sum


Single Agent

4 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Single-Agent Trees

2 0 … 2 6 … 4 6
5 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022
Value of a State

Non-Terminal States:
Value of a state: V(s) = max V(s’)
s’  successors(s)
The best achievable
outcome (utility)
from that state

Terminal States:
2 0 … 2 6 … 4 6
V(s) = known
6 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022
Multi-Agent Applications

(Football)

Adversarial Team: Collaborative


Collaborative Maze Solving Competition: Adversarial

7 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


How could we model multi-agent collaborative problems?

8 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


How could we model multi-agent problems?

Simplest idea: each agent plans their own actions separately from
others.

9 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Many Single-Agent Trees

Choose the best Non-Terminal States:


action for each V(s) = max V(s’)
s’  successors(s)
agent
independently

2 0 … 2 6 … 4 6
10 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022
Idea 2: Joint State/Action Spaces

Combine the states and actions of the N agents

𝑆0 = (𝑆0𝐴 , 𝑆0𝐵 )

11 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Idea 2: Joint State/Action Spaces

Combine the states and actions of the N agents

𝑆𝐾 = (𝑆𝐾𝐴 , 𝑆𝐾𝐵 )

12 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Idea 2: Joint State/Action Spaces

Search looks through all combinations of all agents’ states and


actions
Think of one brain controlling many agents

𝑆𝐾 = (𝑆𝐾𝐴 , 𝑆𝐾𝐵 )

13 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Idea 2: Joint State/Action Spaces

Search looks through all combinations of all agents’ states and


actions
Think of one brain controlling many agents
What is the size of
the state space?

What is the size of


the action space?

What is the size of


the search tree?

14 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Idea 3: Centralized Decision Making

Each agent proposes their actions and computer confirms the joint
plan
Example: Autonomous driving through intersections

https://www.youtube.com/watch?v=4pbAI40dK0A

15 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Idea 4: Alternate Searching One Agent at a Time

Search one agent’s actions from a state, search the next agent’s
actions from those resulting states , etc…

Choose the best Non-Terminal States:


Agent 1 V(s) = max V(s’)
cascading s’  successors(s)
combination of
actions
Agent 2

Agent 1

16 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Idea 4: Alternate Searching One Agent at a Time

Search one agent’s actions from a state, search the next agent’s
actions from those resulting states , etc…

What is the size of


the state space?

What is the size of


the action space?

What is the size of


the search tree?

17 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Multi-Agent Applications

(Football)

Adversarial Team: Collaborative


Collaborative Maze Solving Competition: Adversarial

18 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Games

19 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Why study games?

 Games are a traditional hallmark of intelligence


 Games are easy to formalize
 Games can be a good model of real-world competitive or
cooperative activities
 Military confrontations (đối đầu quân sự), negotiation (đàm phán),
auctions (đấu giá), etc.
 Games have been a key driver of new techniques in CS and
AI.
Types of Games

 Deterministic or stochastic?
 Perfect information (fully observable)?
 One, two, or more players?
 Turn-taking or ?
 Zero sum?

21 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


A Special Case

 Two-Player Sequential Zero-Sum Complete-Information Games

 Two-Player
 Two players involved
 Sequential
 Players take turns to move (take actions)
 Act alternately
 Zero-sum
 Utility values of the two players at the end of the game have equal absolute value
and opposite sign
 Perfect Information
 Each player can fully observe what actions other agents have taken

22 Nguyen Van Vinh - UET, VNU Hanoi 4/18/2022


Tic-Tac-Toe

 Two players, X and O, take turns marking the


spaces in a 3×3 grid

 The player who succeeds in placing three of their


marks in a horizontal, vertical, or diagonal row
wins the game

https://en.wikipedia.org/wiki/Tic-tac-toe
23 Nguyen Van Vinh - UET, VNU Hanoi 4/18/2022
Chinese Chess

 https://www.chessprogramming.org/Pham_Hong_Nguyen
 https://vnexpress.net/giai-co-tuong-may-tinh-lan-dau-tien-o-viet-
nam-1773345.html

24 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Chess

https://en.wikipedia.org/wiki/Chess
25 Nguyen Van Vinh - UET, VNU Hanoi 4/18/2022
Chess

Garry Kasparov vs Deep Blue (1996)

Result: Win-loss-draw-draw-draw-loss
(In even-numbered games, Deep Blue
played white)

26 Nguyen Van Vinh - UET, VNU Hanoi 4/18/2022


Chess

27 Nguyen Van Vinh - UET, VNU Hanoi 4/18/2022


Go

 DeepMind promotion video before the game with Lee Sedol

https://www.youtube.com/watch?v=SUbqykXVx0A
28 Nguyen Van Vinh - UET, VNU Hanoi 4/18/2022
Go

AlphaGo vs Lee Sedol (3/2016) AlphaZero vs AlphaGo (2017)

https://deepmind.com/blog/alphago-zero-learning-
scratch/
Result: win-win-win-loss- Result: 100-
win 0
AlphaGo: https://www.nature.com/articles/nature16961.pdf
AlphaZero: www.nature.com/articles/nature24270.pdf
29 Nguyen Van Vinh - UET, VNU Hanoi 4/18/2022
Zero-Sum Games

Zero-Sum Games General Games


 Agents have opposite utilities • Agents have independent utilities
 Pure competition: • Cooperation, indifference, competition,
shifting alliances, and more are all
 One maximizes, the other minimizes
possible

30 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Game Trees

Search one agent’s actions from a state, search the competitor’s


actions from those resulting states , etc…

31 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Tic-Tac-Toe Game Tree

32 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


A subtree

x o x
ox
win
o
lose
x o x x o x x o x draw
x ox ox ox
o x o x o

x o x x o x x o x x o x x o x x o x
x ox x ox o ox ox o ox o x
o o oo x o x oo x o o x o

x o x x o x x o x x o x
x ox o ox o ox x ox
o x o x xo x x o o x o

33
What is a good move?

x o x
ox
o
win
x o x x o x x o x lose
x ox ox ox
draw
o x o x o

x o x x o x x o x x o x x o x x o x
x ox x ox o ox ox o ox o x
o o oo x o x oo x o o x o

x o x x o x x o x x o x
x ox o ox o o x x o x
o x o x xo x x o o x o

34
Tic-Tac-Toe Game Tree

This is a zero-sum
game, the best
action for X is the
worst action for O
and vice versa

How do we define
best and worst?

35 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Tic-Tac-Toe Game Tree
Instead of taking the
max utility at every
level, alternate max
and min

36 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Tic-Tac-Toe Minimax

MAX nodes: under Agent’s control


V(s) = max V(s’)
s’  successors(s)
MIN nodes: under Opponent’s control
V(s) = min V(s’)
s’  successors(s)

37 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Small Pacman Example

MAX nodes: under Agent’s control MIN nodes: under Opponent’s control
V(s) = max V(s’) V(s) = min V(s’)
s’  successors(s) s’  successors(s)

-8

-8 -10

-8 -5 -10 +8
Terminal States:
V(s) = known
38 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022
Minimax Algorithm

function minimax-decision(s) returns action


return the action a in Actions(s) with the highest Result(s,a)s’
min-value(Result(s,a))

function max-value(s) returns value function min-value(s) returns value


if Terminal-Test(s) then return Utility(s) if Terminal-Test(s) then return Utility(s)
initialize v = -∞ initialize v = +∞
for each a in Actions(s): for each a in Actions(state):
v = max(v, min-value(Result(s,a))) v = min(v, max-value(Result(s,a))
return v return v
V(s) = max V(s’) V(s) = min V(s’)
s’  successors(s) s’  successors(s)
39 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022
Minimax Example

3 2 2

3 12 8 2 4 6 14 5 2

40 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Poll

What kind of search is Minimax Search?

A) BFS
B) DFS
C) UCS
D) A*

41 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Minimax is Depth-First Search

MAX nodes: under Agent’s control MIN nodes: under Opponent’s control
V(s) = max V(s’) V(s) = min V(s’)
s’  successors(s) s’  successors(s)

-8

-8 -10

-8 -5 -10 +8
Terminal States:
V(s) = known
42 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022
Minimax Efficiency

 How efficient is minimax?


 Just like (exhaustive) DFS
 Time: O(bm)
 Space: O(bm)

 Example: For chess, b  35, m 


100
 Exact solution is completely
infeasible
 Humans can’t do this either, so how
do we play chess?

43 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Generalized minimax

 What if the game is not zero-sum, or has multiple players?

 Generalization of minimax: 8,8,1


 Terminals have utility tuples
 Node values are also utility tuples
 Each player maximizes its own component 8,8,1 7,7,2
 Can give rise to cooperation and
competition dynamically…
0,0,7 8,8,1 7,7,2 0,0,8

1,1,6 0,0,7 9,9,0 8,8,1 9,9,0 7,7,2 0,0,8 0,0,7

44 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Resource Limits

45 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Resource Limits

 Problem: In realistic games, cannot search to leaves! 4 max

 Solution 1: Bounded lookahead -2 4 min


 Search only to a preset depth limit or horizon
-1 -2 4 9
 Use an evaluation function for non-terminal positions

 Guarantee of optimal play is gone

 More plies make a BIG difference

 Example:
 Suppose we have 100 seconds, can explore 10K nodes /
sec
 So can check 1M nodes per move
 For chess, b=~35 so reaches about depth 4 – not so good
? ? ? ?

46 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Depth Matters

 Evaluation functions are always


imperfect
 Deeper search => better play
(usually)
 Or, deeper search gives same
quality of play with a less
accurate evaluation function
 An important example of the
tradeoff between complexity of
features and complexity of
computation

47 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


[Demo: depth limited (L6D4, L6D5)]
Evaluation Functions

48 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Evaluation Functions

 Evaluation functions score non-terminals in depth-limited search

 Ideal function: returns the actual minimax value of the position


 In practice: typically weighted linear sum of features:
 EVAL(s) = w1 f1(s) + w2 f2(s) + …. + wn fn(s)
 E.g., w1 = 9, f1(s) = (num white queens – num black queens), etc.
 Terminate search only in quiescent positions, i.e., no major
changes expected in feature values
49 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022
Evaluation for Pacman

50 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Resource Limits

 Problem: In realistic games, cannot search to leaves! 4 max


-2 4 min
 Solution 1: Bounded lookahead
 Search only to a preset depth limit or horizon -1 -2 4 9
 Use an evaluation function for non-terminal positions

 Guarantee of optimal play is gone

 More plies make a BIG difference

 Example:
 Suppose we have 100 seconds, can explore 10K nodes /
sec
 So can check 1M nodes per move
 For chess, b=~35 so reaches about depth 4 – not so good ? ? ? ?

51 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Solution 2: Game Tree Pruning

52 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Pruning - Motivation

 Q1. Why would “Queen to G5” be a bad move for Black?


 Q2. How many White “replies” did you need to consider in answering?
Once we have seen one reply scary enough to convince us the move is really bad, we can
abandon this move and continue searching elsewhere.

53 Phạm Bảo Sơn & Nguyễn Văn Vinh 4/18/2022


Alpha-beta Pruning

 We can improve on the performance of the minimax algorithm


through alpha-beta pruning
 Basic idea: “If you have an idea that is surely bad, don't take the
time to see how truly awful it is.” -- Pat Winston

• We don’t need to compute the value


at this node.
• No matter what it is, it can’t affect the
value of the root node.

54 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Intuition: prune the branches that can’t be chosen

3 2 2

3 12 8 2 4 6 14 5 2

55 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Alpha-Beta Pruning Example

α = best option so far from any


MAX node on this path

α =3 α =3
3

3 12 8 2 14 5 2

We can prune when: min node won’t The order of generation matters: more pruning
be higher than 2, while parent max is possible if good moves come first
has seen something larger in another
branch
56 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022
Why is α-β?

 α is the value of the best choice for


the MAX player found so far
at any choice point above node n
 We want to compute the
MIN-value at n
 As we loop over n’s children,
the MIN-value decreases
 If it drops below α, MAX will never
choose n, so we can ignore n’s
remaining children
 Prunning nodes
 Analogously, β is the value of the
lowest-utility choice found so far for
the MIN player
Alpha-Beta Algorithm

α: MAX’s best option on path to root


β: MIN’s best option on path to root

def max-value(state, α, β): def min-value(state , α, β):


initialize v = -∞ initialize v = +∞
for each successor of state: for each successor of state:
v = max(v, value(successor, α, β)) v = min(v, value(successor, α, β))
if v ≥ β if v ≤ α
return v return v
α = max(α, v) β = min(β, v)
return v return v
58 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022
Quiz: Minimax Example

What is the value of the blue triangle?


A) 10
B) 8
C) 4
D) 50

59 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Quiz: Minimax Example

What is the value of the blue triangle?


8 A) 10
B) 8
C) 4
D) 50
8 4

60 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Alpha-Beta Small Example

𝛼 = −∞ def max-value(state, α, β):


𝛽= ∞ initialize v = -∞
𝑣 = −∞ for each successor of state:
v = max(v, value(successor, α, β))
if v ≥ β
return v
α = max(α, v)
return v

def min-value(state , α, β):


initialize v = +∞
for each successor of state:
v = min(v, value(successor, α, β))
if v ≤ α
return v
β = min(β, v)
return v

61 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Alpha-Beta Small Example

𝛼 = −∞ def max-value(state, α, β):


𝛽= ∞ initialize v = -∞
𝑣 = −∞ for each successor of state:
v = max(v, value(successor, α, β))
if v ≥ β
return v
α = max(α, v)
𝛼 = −∞ return v
𝛽= ∞
𝑣=∞
def min-value(state , α, β):
initialize v = +∞
for each successor of state:
v = min(v, value(successor, α, β))
if v ≤ α
return v
β = min(β, v)
return v

62 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Alpha-Beta Small Example

𝛼 = −∞ def max-value(state, α, β):


𝛽= ∞ initialize v = -∞
𝑣 = −∞ for each successor of state:
v = max(v, value(successor, α, β))
if v ≥ β
return v
α = max(α, v)
𝛼 = −∞ return v
𝛽= ∞
𝑣=∞
def min-value(state , α, β):
initialize v = +∞
for each successor of state:
v = min(v, value(successor, α, β))
if v ≤ α
return v
β = min(β, v)
return v

63 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Alpha-Beta Small Example

𝛼 = −∞ def max-value(state, α, β):


𝛽= ∞ initialize v = -∞
𝑣 = −∞ for each successor of state:
v = max(v, value(successor, α, β))
if v ≥ β
return v
α = max(α, v)
𝛼 = −∞ return v
𝛽= ∞
𝑣 = 10
def min-value(state , α, β):
initialize v = +∞
for each successor of state:
v = min(v, value(successor, α, β))
if v ≤ α
return v
β = min(β, v)
return v

64 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Alpha-Beta Small Example

𝛼 = −∞ def max-value(state, α, β):


𝛽= ∞ initialize v = -∞
𝑣 = −∞ for each successor of state:
v = max(v, value(successor, α, β))
if v ≥ β
return v
α = max(α, v)
𝛼 = −∞ return v
𝛽 = 10
𝑣 = 10
def min-value(state , α, β):
initialize v = +∞
for each successor of state:
v = min(v, value(successor, α, β))
if v ≤ α
return v
β = min(β, v)
return v

65 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Alpha-Beta Small Example

𝛼 = −∞ def max-value(state, α, β):


𝛽= ∞ initialize v = -∞
𝑣 = −∞ for each successor of state:
v = max(v, value(successor, α, β))
if v ≥ β
return v
α = max(α, v)
𝛼 = −∞ return v
𝛽 = 10
𝑣=8
def min-value(state , α, β):
initialize v = +∞
for each successor of state:
v = min(v, value(successor, α, β))
if v ≤ α
return v
β = min(β, v)
return v

66 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Alpha-Beta Small Example

𝛼 = −∞ def max-value(state, α, β):


𝛽= ∞ initialize v = -∞
𝑣 = −∞ for each successor of state:
v = max(v, value(successor, α, β))
if v ≥ β
return v
α = max(α, v)
𝛼 = −∞ return v
𝛽=8
𝑣=8
def min-value(state , α, β):
initialize v = +∞
for each successor of state:
v = min(v, value(successor, α, β))
if v ≤ α
return v
β = min(β, v)
return v

67 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Alpha-Beta Small Example

𝛼 = −∞ def max-value(state, α, β):


𝛽= ∞ initialize v = -∞
𝑣 = −∞ for each successor of state:
v = max(v, value(successor, α, β))
if v ≥ β
return v
α = max(α, v)
𝛼 = −∞ return v
𝛽=8 8
𝑣=8
def min-value(state , α, β):
initialize v = +∞
for each successor of state:
v = min(v, value(successor, α, β))
if v ≤ α
return v
β = min(β, v)
return v

68 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Alpha-Beta Small Example

𝛼 = −∞ def max-value(state, α, β):


𝛽= ∞ initialize v = -∞
𝑣=8 for each successor of state:
v = max(v, value(successor, α, β))
if v ≥ β
return v
α = max(α, v)
𝛼 = −∞ return v
𝛽=8 8
𝑣=8
def min-value(state , α, β):
initialize v = +∞
for each successor of state:
v = min(v, value(successor, α, β))
if v ≤ α
return v
β = min(β, v)
return v

69 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Alpha-Beta Small Example

𝛼=8 def max-value(state, α, β):


𝛽= ∞ initialize v = -∞
𝑣=8 for each successor of state:
v = max(v, value(successor, α, β))
if v ≥ β
return v
α = max(α, v)
𝛼 = −∞ return v
𝛽=8 8
𝑣=8
def min-value(state , α, β):
initialize v = +∞
for each successor of state:
v = min(v, value(successor, α, β))
if v ≤ α
return v
β = min(β, v)
return v

70 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Alpha-Beta Small Example

𝛼=8 def max-value(state, α, β):


𝛽= ∞ initialize v = -∞
𝑣=8 for each successor of state:
v = max(v, value(successor, α, β))
if v ≥ β
return v
α = max(α, v)
𝛼 = −∞ 𝛼=8 return v
𝛽=8 8 𝛽= ∞
𝑣=8 𝑣=∞
def min-value(state , α, β):
initialize v = +∞
for each successor of state:
v = min(v, value(successor, α, β))
if v ≤ α
return v
β = min(β, v)
return v

71 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Alpha-Beta Small Example

𝛼=8 def max-value(state, α, β):


𝛽= ∞ initialize v = -∞
𝑣=8 for each successor of state:
v = max(v, value(successor, α, β))
if v ≥ β
return v
α = max(α, v)
𝛼 = −∞ 𝛼=8 return v
𝛽=8 8 𝛽= ∞
𝑣=8 𝑣=4
def min-value(state , α, β):
initialize v = +∞
for each successor of state:
v = min(v, value(successor, α, β))
if v ≤ α
return v
β = min(β, v)
return v

72 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Alpha-Beta Small Example

𝛼=8 def max-value(state, α, β):


𝛽= ∞ initialize v = -∞
𝑣=8 for each successor of state:
v = max(v, value(successor, α, β))
if v ≥ β
return v
α = max(α, v)
𝛼 = −∞ 𝛼=8 return v
𝛽=8 8 𝛽= ∞ 4
𝑣=8 𝑣=4
def min-value(state , α, β):
initialize v = +∞
for each successor of state:
v = min(v, value(successor, α, β))
if v ≤ α
return v
β = min(β, v)
return v

73 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Alpha-Beta Small Example

𝛼=8 def max-value(state, α, β):


𝛽= ∞ initialize v = -∞
𝑣=8 for each successor of state:
v = max(v, value(successor, α, β))
if v ≥ β
return v
α = max(α, v)
𝛼 = −∞ 𝛼=8 return v
𝛽=8 8 𝛽= ∞ 4
𝑣=8 𝑣=4
def min-value(state , α, β):
initialize v = +∞
for each successor of state:
v = min(v, value(successor, α, β))
if v ≤ α
return v
β = min(β, v)
return v

74 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Alpha-Beta Small Example

𝛼=8 def max-value(state, α, β):


𝛽= ∞ initialize v = -∞
𝑣=8 for each successor of state:
v = max(v, value(successor, α, β))
if v ≥ β
return v
α = max(α, v)
𝛼 = −∞ 𝛼=8 return v
𝛽=8 8 𝛽= ∞ 4
𝑣=8 𝑣=4
def min-value(state , α, β):
initialize v = +∞
for each successor of state:
v = min(v, value(successor, α, β))
if v ≤ α
return v
β = min(β, v)
return v

75 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Alpha-Beta Small Example

𝛼=8 def max-value(state, α, β):


𝛽= ∞ initialize v = -∞
𝑣=8 8 for each successor of state:
v = max(v, value(successor, α, β))
if v ≥ β
return v
α = max(α, v)
𝛼 = −∞ 𝛼=8 return v
𝛽=8 8 𝛽= ∞ 4
𝑣=8 𝑣=4
def min-value(state , α, β):
initialize v = +∞
for each successor of state:
v = min(v, value(successor, α, β))
if v ≤ α
return v
β = min(β, v)
return v

76 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Minimax Quiz

What is the value of the top


node?
A) 10
B) 100
C) 2
D) 4

77 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Alpha Beta Quiz

Which branches are pruned?


A) e, l
B) g, l
C) g, k, l
D) g, n

78 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Alpha-Beta Quiz 2

α = 10

β = 10 ?
β=2
? ?
α= 10 α=100 v= 2
1

? ? ?
79 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022
Alpha-Beta Pruning Properties
 Theorem: This pruning has no effect on minimax value computed for the root!

 Good child ordering improves effectiveness of pruning


 Iterative deepening helps with this
max
 With “perfect ordering”:
 Time complexity drops to O(bm/2)
min
 Doubles solvable depth!
 1M nodes/move => depth=8, respectable

10 10 0
 This is a simple example of metareasoning (computing about what to
compute)

80 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Advanced Techiques

 Transposition table to store previously expanded


states
 Forward pruning to avoid considering all possible
moves
 Lookup tables for opening moves and endgames

81 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


Chess Systems

 Baseline system: 200 million node evalutions per move


(3 min), minimax with a good evaluation function and quiescence
search
 5-ply ≈ human novice
 Add alpha-beta pruning
 10-ply ≈ typical PC, experienced player
 Deep Blue: 30 billion evaluations per move, singular extensions,
evaluation function with 8000 features,
large databases of opening and endgame moves
 14-ply ≈ Garry Kasparov
 More recent state of the art (Hydra, ca. 2006): 36 billion evaluations
per second, advanced pruning techniques
 18-ply ≈ better than any human alive?
Summary

 Multi-agent problems can require more space or deeper trees to search


 Games require decisions when optimality is impossible
 Bounded-depth search and approximate evaluation functions
 Games force efficient use of computation
 Alpha-beta pruning
 Game playing has produced important research ideas
 Reinforcement learning (checkers)
 Iterative deepening (chess)
 Rational metareasoning (Othello)
 Monte Carlo tree search (Go)
 Solution methods for partial-information games in economics (poker)
 Video games present much greater challenges – lots to do!
 b = 10500, |S| = 104000, m = 10,000

83 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022


References

 Slides from CMU, Berkeley University.


 Artificial Intelligence: A modern approach. Chapter 6.

84 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022

You might also like