2022 Slide5 AdversarialSearch Eng

Artificial Intelligence:
Adversarial Search
Instructor: Nguyễn Văn Vinh, UET - Hanoi VNU

2022
1 4/18/2022
Warm Up
How would you search for moves in Tic Tac Toe?
2 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022

Warm Up
How is Tic Tac Toe different from maze search?

Warm Up
How is Tic Tac Toe different from maze search?
Multi-Agent, Adversarial, Zero Sum

Single Agent

Single-Agent Trees
2 0 … 2 6 … 4 6
Value of a State
Non-Terminal States:
Value of a state: V(s) = max V(s’)
s’  successors(s)
The best achievable
outcome (utility)
from that state
Terminal States:
2 0 … 2 6 … 4 6
V(s) = known
Multi-Agent Applications
(Football)
Adversarial Team: Collaborative

Collaborative Maze Solving Competition: Adversarial

How could we model multi-agent collaborative problems?

How could we model multi-agent problems?
Simplest idea: each agent plans their own actions separately from
others.

Many Single-Agent Trees
Choose the best Non-Terminal States:

action for each V(s) = max V(s’)
agent
independently
2 0 … 2 6 … 4 6
Idea 2: Joint State/Action Spaces
Combine the states and actions of the N agents
𝑆0 = (𝑆0𝐴 , 𝑆0𝐵 )

Combine the states and actions of the N agents
𝑆𝐾 = (𝑆𝐾𝐴 , 𝑆𝐾𝐵 )

Search looks through all combinations of all agents’ states and

actions
Think of one brain controlling many agents
𝑆𝐾 = (𝑆𝐾𝐴 , 𝑆𝐾𝐵 )

Search looks through all combinations of all agents’ states and

actions
Think of one brain controlling many agents
What is the size of
the state space?
What is the size of

the action space?
What is the size of

the search tree?

Idea 3: Centralized Decision Making
Each agent proposes their actions and computer confirms the joint
plan
Example: Autonomous driving through intersections
https://www.youtube.com/watch?v=4pbAI40dK0A

Idea 4: Alternate Searching One Agent at a Time
Search one agent’s actions from a state, search the next agent’s
actions from those resulting states , etc…
Choose the best Non-Terminal States:

Agent 1 V(s) = max V(s’)
cascading s’  successors(s)
combination of
actions
Agent 2
Agent 1

Idea 4: Alternate Searching One Agent at a Time
Search one agent’s actions from a state, search the next agent’s
What is the size of

the state space?
What is the size of

the action space?
What is the size of

the search tree?

Multi-Agent Applications
(Football)
Adversarial Team: Collaborative

Collaborative Maze Solving Competition: Adversarial

Games

Why study games?
 Games are a traditional hallmark of intelligence

 Games are easy to formalize
 Games can be a good model of real-world competitive or
cooperative activities
 Military confrontations (đối đầu quân sự), negotiation (đàm phán),
auctions (đấu giá), etc.
 Games have been a key driver of new techniques in CS and
AI.
Types of Games
 Deterministic or stochastic?
 Perfect information (fully observable)?
 One, two, or more players?
 Turn-taking or ?
 Zero sum?

A Special Case
 Two-Player Sequential Zero-Sum Complete-Information Games
 Two-Player
 Two players involved
 Sequential
 Players take turns to move (take actions)
 Act alternately
 Zero-sum
 Utility values of the two players at the end of the game have equal absolute value
and opposite sign
 Perfect Information
 Each player can fully observe what actions other agents have taken
22 Nguyen Van Vinh - UET, VNU Hanoi 4/18/2022

Tic-Tac-Toe
 Two players, X and O, take turns marking the

spaces in a 3×3 grid
 The player who succeeds in placing three of their

marks in a horizontal, vertical, or diagonal row
wins the game
https://en.wikipedia.org/wiki/Tic-tac-toe
Chinese Chess
 https://www.chessprogramming.org/Pham_Hong_Nguyen
 https://vnexpress.net/giai-co-tuong-may-tinh-lan-dau-tien-o-viet-
nam-1773345.html

Chess
https://en.wikipedia.org/wiki/Chess
Chess
Garry Kasparov vs Deep Blue (1996)
Result: Win-loss-draw-draw-draw-loss
(In even-numbered games, Deep Blue
played white)

Chess

Go
 DeepMind promotion video before the game with Lee Sedol
https://www.youtube.com/watch?v=SUbqykXVx0A
Go
AlphaGo vs Lee Sedol (3/2016) AlphaZero vs AlphaGo (2017)
https://deepmind.com/blog/alphago-zero-learning-
scratch/
Result: win-win-win-loss- Result: 100-
win 0
AlphaGo: https://www.nature.com/articles/nature16961.pdf
AlphaZero: www.nature.com/articles/nature24270.pdf
Zero-Sum Games
Zero-Sum Games General Games

 Agents have opposite utilities • Agents have independent utilities
 Pure competition: • Cooperation, indifference, competition,
shifting alliances, and more are all
 One maximizes, the other minimizes
possible

Game Trees
Search one agent’s actions from a state, search the competitor’s


Tic-Tac-Toe Game Tree

A subtree
x o x
ox
win
o
lose
x o x x o x x o x draw
x ox ox ox
o x o x o
x o x x o x x o x x o x x o x x o x
x ox x ox o ox ox o ox o x
o o oo x o x oo x o o x o
x o x x o x x o x x o x
x ox o ox o ox x ox
o x o x xo x x o o x o
33
What is a good move?
x o x
ox
o
win
x o x x o x x o x lose
x ox ox ox
draw
o x o x o
x o x x o x x o x x o x x o x x o x
x ox x ox o ox ox o ox o x
o o oo x o x oo x o o x o
x o x x o x x o x x o x
x ox o ox o o x x o x
o x o x xo x x o o x o
34
This is a zero-sum
game, the best
action for X is the
worst action for O
and vice versa
How do we define
best and worst?

Instead of taking the
max utility at every
level, alternate max
and min

Tic-Tac-Toe Minimax
MAX nodes: under Agent’s control

V(s) = max V(s’)
MIN nodes: under Opponent’s control
V(s) = min V(s’)

Small Pacman Example
MAX nodes: under Agent’s control MIN nodes: under Opponent’s control
V(s) = max V(s’) V(s) = min V(s’)
s’  successors(s) s’  successors(s)
-8
-8 -10
-8 -5 -10 +8
Terminal States:
V(s) = known
Minimax Algorithm
function minimax-decision(s) returns action

return the action a in Actions(s) with the highest Result(s,a)s’
min-value(Result(s,a))
function max-value(s) returns value function min-value(s) returns value

if Terminal-Test(s) then return Utility(s) if Terminal-Test(s) then return Utility(s)
initialize v = -∞ initialize v = +∞
for each a in Actions(s): for each a in Actions(state):
v = max(v, min-value(Result(s,a))) v = min(v, max-value(Result(s,a))
return v return v
Minimax Example
3 2 2
3 12 8 2 4 6 14 5 2

Poll
What kind of search is Minimax Search?
A) BFS
B) DFS
C) UCS
D) A*

Minimax is Depth-First Search
MAX nodes: under Agent’s control MIN nodes: under Opponent’s control
-8
-8 -10
-8 -5 -10 +8
Terminal States:
V(s) = known
Minimax Efficiency
 How efficient is minimax?

 Just like (exhaustive) DFS
 Time: O(bm)
 Space: O(bm)
 Example: For chess, b  35, m 

100
 Exact solution is completely
infeasible
 Humans can’t do this either, so how
do we play chess?

Generalized minimax
 What if the game is not zero-sum, or has multiple players?
 Generalization of minimax: 8,8,1

 Terminals have utility tuples
 Node values are also utility tuples
 Each player maximizes its own component 8,8,1 7,7,2
 Can give rise to cooperation and
competition dynamically…
0,0,7 8,8,1 7,7,2 0,0,8
1,1,6 0,0,7 9,9,0 8,8,1 9,9,0 7,7,2 0,0,8 0,0,7

Resource Limits

Resource Limits
 Problem: In realistic games, cannot search to leaves! 4 max
 Solution 1: Bounded lookahead -2 4 min

 Search only to a preset depth limit or horizon
-1 -2 4 9
 Use an evaluation function for non-terminal positions
 Guarantee of optimal play is gone
 More plies make a BIG difference
 Example:
 Suppose we have 100 seconds, can explore 10K nodes /
sec
 So can check 1M nodes per move
 For chess, b=~35 so reaches about depth 4 – not so good
? ? ? ?

Depth Matters
 Evaluation functions are always

imperfect
 Deeper search => better play
(usually)
 Or, deeper search gives same
quality of play with a less
accurate evaluation function
 An important example of the
tradeoff between complexity of
features and complexity of
computation

[Demo: depth limited (L6D4, L6D5)]
Evaluation Functions

Evaluation Functions
 Evaluation functions score non-terminals in depth-limited search
 Ideal function: returns the actual minimax value of the position

 In practice: typically weighted linear sum of features:
 EVAL(s) = w1 f1(s) + w2 f2(s) + …. + wn fn(s)
 E.g., w1 = 9, f1(s) = (num white queens – num black queens), etc.
 Terminate search only in quiescent positions, i.e., no major
changes expected in feature values
Evaluation for Pacman

Resource Limits
 Problem: In realistic games, cannot search to leaves! 4 max

-2 4 min
 Solution 1: Bounded lookahead
 Search only to a preset depth limit or horizon -1 -2 4 9
 Use an evaluation function for non-terminal positions
 Guarantee of optimal play is gone
 More plies make a BIG difference
 Example:
 Suppose we have 100 seconds, can explore 10K nodes /
sec
 So can check 1M nodes per move
 For chess, b=~35 so reaches about depth 4 – not so good ? ? ? ?

Solution 2: Game Tree Pruning

Pruning - Motivation
 Q1. Why would “Queen to G5” be a bad move for Black?

 Q2. How many White “replies” did you need to consider in answering?
Once we have seen one reply scary enough to convince us the move is really bad, we can
abandon this move and continue searching elsewhere.
53 Phạm Bảo Sơn & Nguyễn Văn Vinh 4/18/2022

Alpha-beta Pruning
 We can improve on the performance of the minimax algorithm

through alpha-beta pruning
 Basic idea: “If you have an idea that is surely bad, don't take the
time to see how truly awful it is.” -- Pat Winston
• We don’t need to compute the value

at this node.
• No matter what it is, it can’t affect the
value of the root node.

Intuition: prune the branches that can’t be chosen
3 2 2
3 12 8 2 4 6 14 5 2

Alpha-Beta Pruning Example
α = best option so far from any

MAX node on this path
α =3 α =3
3
3 12 8 2 14 5 2
We can prune when: min node won’t The order of generation matters: more pruning
be higher than 2, while parent max is possible if good moves come first
has seen something larger in another
branch
Why is α-β?
 α is the value of the best choice for

the MAX player found so far
at any choice point above node n
 We want to compute the
MIN-value at n
 As we loop over n’s children,
the MIN-value decreases
 If it drops below α, MAX will never
choose n, so we can ignore n’s
remaining children
 Prunning nodes
 Analogously, β is the value of the
lowest-utility choice found so far for
the MIN player
Alpha-Beta Algorithm
α: MAX’s best option on path to root

β: MIN’s best option on path to root
def max-value(state, α, β): def min-value(state , α, β):

initialize v = -∞ initialize v = +∞
for each successor of state: for each successor of state:
v = max(v, value(successor, α, β)) v = min(v, value(successor, α, β))
if v ≥ β if v ≤ α
return v return v
α = max(α, v) β = min(β, v)
return v return v
Quiz: Minimax Example
What is the value of the blue triangle?

A) 10
B) 8
C) 4
D) 50

Quiz: Minimax Example
What is the value of the blue triangle?

8 A) 10
B) 8
C) 4
D) 50
8 4

Alpha-Beta Small Example
𝛼 = −∞ def max-value(state, α, β):

𝛽= ∞ initialize v = -∞
𝑣 = −∞ for each successor of state:
v = max(v, value(successor, α, β))
if v ≥ β
return v
α = max(α, v)
return v
def min-value(state , α, β):

initialize v = +∞
for each successor of state:
v = min(v, value(successor, α, β))
if v ≤ α
return v
β = min(β, v)
return v


if v ≥ β
return v
α = max(α, v)
𝛼 = −∞ return v
𝛽= ∞
𝑣=∞
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v


if v ≥ β
return v
α = max(α, v)
𝛽= ∞
𝑣=∞
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v


if v ≥ β
return v
α = max(α, v)
𝛽= ∞
𝑣 = 10
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v


if v ≥ β
return v
α = max(α, v)
𝛽 = 10
𝑣 = 10
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v


if v ≥ β
return v
α = max(α, v)
𝛽 = 10
𝑣=8
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v


if v ≥ β
return v
α = max(α, v)
𝛽=8
𝑣=8
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v


if v ≥ β
return v
α = max(α, v)
𝛽=8 8
𝑣=8
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v


𝑣=8 for each successor of state:
if v ≥ β
return v
α = max(α, v)
𝛽=8 8
𝑣=8
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v

𝛼=8 def max-value(state, α, β):

if v ≥ β
return v
α = max(α, v)
𝛽=8 8
𝑣=8
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v


if v ≥ β
return v
α = max(α, v)
𝛼 = −∞ 𝛼=8 return v
𝛽=8 8 𝛽= ∞
𝑣=8 𝑣=∞
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v


if v ≥ β
return v
α = max(α, v)
𝛽=8 8 𝛽= ∞
𝑣=8 𝑣=4
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v


if v ≥ β
return v
α = max(α, v)
𝛽=8 8 𝛽= ∞ 4
𝑣=8 𝑣=4
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v


if v ≥ β
return v
α = max(α, v)
𝛽=8 8 𝛽= ∞ 4
𝑣=8 𝑣=4
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v


if v ≥ β
return v
α = max(α, v)
𝛽=8 8 𝛽= ∞ 4
𝑣=8 𝑣=4
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v


𝑣=8 8 for each successor of state:
if v ≥ β
return v
α = max(α, v)
𝛽=8 8 𝛽= ∞ 4
𝑣=8 𝑣=4
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v

Minimax Quiz
What is the value of the top

node?
A) 10
B) 100
C) 2
D) 4

Alpha Beta Quiz
Which branches are pruned?

A) e, l
B) g, l
C) g, k, l
D) g, n

Alpha-Beta Quiz 2
α = 10
β = 10 ?
β=2
? ?
α= 10 α=100 v= 2
1
? ? ?
Alpha-Beta Pruning Properties
 Theorem: This pruning has no effect on minimax value computed for the root!
 Good child ordering improves effectiveness of pruning

 Iterative deepening helps with this
max
 With “perfect ordering”:
 Time complexity drops to O(bm/2)
min
 Doubles solvable depth!
 1M nodes/move => depth=8, respectable
10 10 0
 This is a simple example of metareasoning (computing about what to
compute)

Advanced Techiques
 Transposition table to store previously expanded

states
 Forward pruning to avoid considering all possible
moves
 Lookup tables for opening moves and endgames

Chess Systems
 Baseline system: 200 million node evalutions per move

(3 min), minimax with a good evaluation function and quiescence
search
 5-ply ≈ human novice
 Add alpha-beta pruning
 10-ply ≈ typical PC, experienced player
 Deep Blue: 30 billion evaluations per move, singular extensions,
evaluation function with 8000 features,
large databases of opening and endgame moves
 14-ply ≈ Garry Kasparov
 More recent state of the art (Hydra, ca. 2006): 36 billion evaluations
per second, advanced pruning techniques
 18-ply ≈ better than any human alive?
Summary
 Multi-agent problems can require more space or deeper trees to search

 Games require decisions when optimality is impossible
 Bounded-depth search and approximate evaluation functions
 Games force efficient use of computation
 Alpha-beta pruning
 Game playing has produced important research ideas
 Reinforcement learning (checkers)
 Iterative deepening (chess)
 Rational metareasoning (Othello)
 Monte Carlo tree search (Go)
 Solution methods for partial-information games in economics (poker)
 Video games present much greater challenges – lots to do!
 b = 10500, |S| = 104000, m = 10,000

References
 Slides from CMU, Berkeley University.

 Artificial Intelligence: A modern approach. Chapter 6.

2022 Slide5 AdversarialSearch Eng

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2022 Slide5 AdversarialSearch Eng

Uploaded by

Copyright:

Available Formats

Artificial Intelligence:

Instructor: Nguyễn Văn Vinh, UET - Hanoi VNU

How would you search for moves in Tic Tac Toe?

2 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022

How is Tic Tac Toe different from maze search?

3 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022

How is Tic Tac Toe different from maze search?

Multi-Agent, Adversarial, Zero Sum

4 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022

Adversarial Team: Collaborative

7 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022

8 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022

9 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022

Choose the best Non-Terminal States:

Combine the states and actions of the N agents

11 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022

Combine the states and actions of the N agents

12 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022

Search looks through all combinations of all agents’ states and

13 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022

Search looks through all combinations of all agents’ states and

What is the size of

What is the size of

14 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022

15 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022

Choose the best Non-Terminal States:

16 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022

What is the size of

What is the size of

What is the size of

17 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022

Adversarial Team: Collaborative

18 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022

19 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022

 Games are a traditional hallmark of intelligence

21 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022

 Two-Player Sequential Zero-Sum Complete-Information Games

22 Nguyen Van Vinh - UET, VNU Hanoi 4/18/2022

 Two players, X and O, take turns marking the

 The player who succeeds in placing three of their

24 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022

Garry Kasparov vs Deep Blue (1996)

26 Nguyen Van Vinh - UET, VNU Hanoi 4/18/2022

27 Nguyen Van Vinh - UET, VNU Hanoi 4/18/2022

 DeepMind promotion video before the game with Lee Sedol

AlphaGo vs Lee Sedol (3/2016) AlphaZero vs AlphaGo (2017)

Zero-Sum Games General Games

30 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022

Search one agent’s actions from a state, search the competitor’s

31 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022

32 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022

35 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022

36 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022

MAX nodes: under Agent’s control

37 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022

function minimax-decision(s) returns action

function max-value(s) returns value function min-value(s) returns value

40 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022

What kind of search is Minimax Search?

41 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022

 How efficient is minimax?

 Example: For chess, b  35, m 

43 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022

 What if the game is not zero-sum, or has multiple players?