2020 - Adversarial Search - Eng

Artificial Intelligence:
Adversarial Search
INT3401 21
Instructor: Nguyễn Văn Vinh, UET - Hanoi VNU
2020
1 4/19/2020
Warm Up
How would you search for moves in Tic Tac Toe?
2 Nguyen Van Vinh – UET, VNU Hanoi 4/19/2020

Warm Up
How is Tic Tac Toe different from maze search?

Warm Up
How is Tic Tac Toe different from maze search?
Multi-Agent, Adversarial, Zero Sum

Single Agent

Single-Agent Trees
2 0 … 2 6 … 4 6
Value of a State
Non-Terminal States:
Value of a state: V(s) = max V(s’)
s’  successors(s)
The best achievable
outcome (utility)
from that state
Terminal States:
2 0 … 2 6 … 4 6
V(s) = known
Multi-Agent Applications
(Football)
Adversarial Team: Collaborative

Collaborative Maze Solving Competition: Adversarial

How could we model multi-agent collaborative problems?

How could we model multi-agent problems?
Simplest idea: each agent plans their own actions separately from others.

Many Single-Agent Trees
Choose the best Non-Terminal States:

action for each V(s) = max V(s’)
agent independently
2 0 … 2 6 … 4 6
Idea 2: Joint State/Action Spaces
Combine the states and actions of the N agents
𝑆0 = (𝑆0𝐴 , 𝑆0𝐵 )

Combine the states and actions of the N agents
𝑆𝐾 = (𝑆𝐾𝐴 , 𝑆𝐾𝐵 )

Search looks through all combinations of all agents’ states and actions
Think of one brain controlling many agents
𝑆𝐾 = (𝑆𝐾𝐴 , 𝑆𝐾𝐵 )

Search looks through all combinations of all agents’ states and actions
Think of one brain controlling many agents
What is the size of

the state space?
What is the size of

the action space?
What is the size of

the search tree?

Idea 3: Centralized Decision Making
Each agent proposes their actions and computer confirms the joint plan
Example: Autonomous driving through intersections
https://www.youtube.com/watch?v=4pbAI40dK0A

Idea 4: Alternate Searching One Agent at a Time
Search one agent’s actions from a state, search the next agent’s actions
from those resulting states , etc…
Choose the best Non-Terminal States:

Agent 1 V(s) = max V(s’)
cascading combination s’  successors(s)
of actions
Agent 2
Agent 1

Idea 4: Alternate Searching One Agent at a Time
Search one agent’s actions from a state, search the next agent’s actions
What is the size of

the state space?
What is the size of

the action space?
What is the size of

the search tree?

Multi-Agent Applications
(Football)
Adversarial Team: Collaborative

Collaborative Maze Solving Competition: Adversarial

Games

Why study games?
 Games are a traditional hallmark of intelligence

 Games are easy to formalize
 Games can be a good model of real-world competitive or
cooperative activities
 Military confrontations (đối đầu quân sự), negotiation (đàm phán), auctions
(đấu giá), etc.
 Games have been a key driver of new techniques in CS and AI.
Types of Games
 Deterministic or stochastic?
 Perfect information (fully observable)?
 One, two, or more players?
 Turn-taking or simultaneous?
 Zero sum?

A Special Case
 Two-Player Sequential Zero-Sum Complete-Information Games
 Two-Player
 Two players involved
 Sequential
 Players take turns to move (take actions)
 Act alternately
 Zero-sum
 Utility values of the two players at the end of the game have equal absolute value and
opposite sign
 Perfect Information
 Each player can fully observe what actions other agents have taken
22 Nguyen Van Vinh - UET, VNU Hanoi 4/19/2020

Tic-Tac-Toe
 Two players, X and O, take turns marking the spaces in a

3×3 grid
 The player who succeeds in placing three of their

marks in a horizontal, vertical, or diagonal row wins the
game
https://en.wikipedia.org/wiki/Tic-tac-toe
Chinese Chess
 https://www.chessprogramming.org/Pham_Hong_Nguyen
 https://vnexpress.net/giai-co-tuong-may-tinh-lan-dau-tien-o-viet-nam-
1773345.html

Chess
https://en.wikipedia.org/wiki/Chess
Chess
Garry Kasparov vs Deep Blue (1996)
Result: Win-loss-draw-draw-draw-loss
(In even-numbered games, Deep Blue played
white)

Chess

Go
 DeepMind promotion video before the game with Lee Sedol
https://www.youtube.com/watch?v=SUbqykXVx0A
Go
AlphaGo vs Lee Sedol (3/2016) AlphaZero vs AlphaGo (2017)
https://deepmind.com/blog/alphago-zero-learning-scratch/
Result: win-win-win-loss-win Result: 100-0
AlphaGo: https://www.nature.com/articles/nature16961.pdf
AlphaZero: www.nature.com/articles/nature24270.pdf
Zero-Sum Games
Zero-Sum Games General Games

 Agents have opposite utilities • Agents have independent utilities
 Pure competition: • Cooperation, indifference, competition,
shifting alliances, and more are all possible
 One maximizes, the other minimizes

Game Trees
Search one agent’s actions from a state, search the competitor’s actions

Tic-Tac-Toe Game Tree

This is a zero-sum
game, the best action
for X is the worst
action for O and vice
versa
How do we define
best and worst?

Instead of taking the
max utility at every
level, alternate max and
min

Tic-Tac-Toe Minimax
MAX nodes: under Agent’s control

V(s) = max V(s’)
MIN nodes: under Opponent’s control
V(s) = min V(s’)

Small Pacman Example
MAX nodes: under Agent’s control MIN nodes: under Opponent’s control
V(s) = max V(s’) V(s) = min V(s’)
s’  successors(s) s’  successors(s)
-8
-8 -10
-8 -5 -10 +8
Terminal States:
V(s) = known
Minimax Algorithm
function minimax-decision(s) returns action

return the action a in Actions(s) with the highest Result(s,a)s’
min-value(Result(s,a))
function max-value(s) returns value function min-value(s) returns value

if Terminal-Test(s) then return Utility(s) if Terminal-Test(s) then return Utility(s)
initialize v = -∞ initialize v = +∞
for each a in Actions(s): for each a in Actions(state):
v = max(v, min-value(Result(s,a))) v = min(v, max-value(Result(s,a))
return v return v
Minimax Example
3 2 2
3 12 8 2 4 6 14 5 2

Poll
What kind of search is Minimax Search?
A) BFS
B) DFS
C) UCS
D) A*

Minimax is Depth-First Search
MAX nodes: under Agent’s control MIN nodes: under Opponent’s control
-8
-8 -10
-8 -5 -10 +8
Terminal States:
V(s) = known
Minimax Efficiency
 How efficient is minimax?

 Just like (exhaustive) DFS
 Time: O(bm)
 Space: O(bm)
 Example: For chess, b  35, m  100

 Exact solution is completely infeasible
 Humans can’t do this either, so how do
we play chess?

Generalized minimax
 What if the game is not zero-sum, or has multiple players?
 Generalization of minimax: 8,8,1

 Terminals have utility tuples
 Node values are also utility tuples
 Each player maximizes its own component 8,8,1 7,7,2
 Can give rise to cooperation and
competition dynamically…
0,0,7 8,8,1 7,7,2 0,0,8
1,1,6 0,0,7 9,9,0 8,8,1 9,9,0 7,7,2 0,0,8 0,0,7

Resource Limits

Resource Limits
 Problem: In realistic games, cannot search to leaves! 4 max
 Solution 1: Bounded lookahead -2 4 min

 Search only to a preset depth limit or horizon -1 -2 4 9
 Use an evaluation function for non-terminal positions
 Guarantee of optimal play is gone

 More plies make a BIG difference
 Example:
 Suppose we have 100 seconds, can explore 10K nodes / sec
 So can check 1M nodes per move
 For chess, b=~35 so reaches about depth 4 – not so good
? ? ? ?

Depth Matters
 Evaluation functions are always

imperfect
 Deeper search => better play
(usually)
 Or, deeper search gives same
quality of play with a less accurate
evaluation function
 An important example of the
tradeoff between complexity of
features and complexity of
computation

[Demo: depth limited (L6D4, L6D5)]
Evaluation Functions

Evaluation Functions
 Evaluation functions score non-terminals in depth-limited search
 Ideal function: returns the actual minimax value of the position

 In practice: typically weighted linear sum of features:
 EVAL(s) = w1 f1(s) + w2 f2(s) + …. + wn fn(s)
 E.g., w1 = 9, f1(s) = (num white queens – num black queens), etc.
 Terminate search only in quiescent positions, i.e., no major
changes expected in feature values
Evaluation for Pacman

Resource Limits
 Problem: In realistic games, cannot search to leaves! 4 max

-2 4 min
 Solution 1: Bounded lookahead
 Search only to a preset depth limit or horizon -1 -2 4 9
 Use an evaluation function for non-terminal positions
 Guarantee of optimal play is gone

 More plies make a BIG difference
 Example:
 Suppose we have 100 seconds, can explore 10K nodes / sec
 So can check 1M nodes per move
 For chess, b=~35 so reaches about depth 4 – not so good ? ? ? ?

Solution 2: Game Tree Pruning

Pruning - Motivation
 Q1. Why would “Queen to G5” be a bad move for Black?

 Q2. How many White “replies” did you need to consider in answering?
Once we have seen one reply scary enough to convince us the move is really bad, we can abandon this
move and continue searching elsewhere.
51 Phạm Bảo Sơn & Nguyễn Văn Vinh 4/19/2020

Alpha-beta Pruning
 We can improve on the performance of the minimax algorithm through

alpha-beta pruning
 Basic idea: “If you have an idea that is surely bad, don't take the time to see
how truly awful it is.” -- Pat Winston
• We don’t need to compute the value at

this node.
• No matter what it is, it can’t affect the
value of the root node.

Intuition: prune the branches that can’t be chosen
3 2 2
3 12 8 2 4 6 14 5 2

Alpha-Beta Pruning Example
α = best option so far from any

MAX node on this path
α =3 α =3
3
3 12 8 2 14 5 2
We can prune when: min node won’t be The order of generation matters: more pruning
higher than 2, while parent max has seen is possible if good moves come first
something larger in another branch

Why is α-β?
 α is the value of the best choice for the

MAX player found so far
at any choice point above node n
 We want to compute the
MIN-value at n
 As we loop over n’s children,
the MIN-value decreases
 If it drops below α, MAX will never
choose n, so we can ignore n’s remaining
children
 Prunning nodes
 Analogously, β is the value of the lowest-
utility choice found so far for the MIN
player
Alpha-Beta Algorithm
α: MAX’s best option on path to root

β: MIN’s best option on path to root
def max-value(state, α, β): def min-value(state , α, β):

initialize v = -∞ initialize v = +∞
for each successor of state: for each successor of state:
v = max(v, value(successor, α, β)) v = min(v, value(successor, α, β))
if v ≥ β if v ≤ α
return v return v
α = max(α, v) β = min(β, v)
return v return v
Quiz: Minimax Example
What is the value of the blue triangle?

A) 10
B) 8
C) 4
D) 50

Quiz: Minimax Example
What is the value of the blue triangle?

8 A) 10
B) 8
C) 4
D) 50
8 4

Alpha-Beta Small Example
𝛼 = −∞ def max-value(state, α, β):

𝛽= ∞ initialize v = -∞
𝑣 = −∞ for each successor of state:
v = max(v, value(successor, α, β))
if v ≥ β
return v
α = max(α, v)
return v
def min-value(state , α, β):

initialize v = +∞
for each successor of state:
v = min(v, value(successor, α, β))
if v ≤ α
return v
β = min(β, v)
return v


if v ≥ β
return v
α = max(α, v)
𝛼 = −∞ return v
𝛽= ∞
𝑣=∞
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v


if v ≥ β
return v
α = max(α, v)
𝛽= ∞
𝑣=∞
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v


if v ≥ β
return v
α = max(α, v)
𝛽= ∞
𝑣 = 10
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v


if v ≥ β
return v
α = max(α, v)
𝛽 = 10
𝑣 = 10
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v


if v ≥ β
return v
α = max(α, v)
𝛽 = 10
𝑣=8
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v


if v ≥ β
return v
α = max(α, v)
𝛽=8
𝑣=8
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v


if v ≥ β
return v
α = max(α, v)
𝛽=8 8
𝑣=8
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v


𝑣=8 for each successor of state:
if v ≥ β
return v
α = max(α, v)
𝛽=8 8
𝑣=8
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v

𝛼=8 def max-value(state, α, β):

if v ≥ β
return v
α = max(α, v)
𝛽=8 8
𝑣=8
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v


if v ≥ β
return v
α = max(α, v)
𝛼 = −∞ 𝛼=8 return v
𝛽=8 8 𝛽= ∞
𝑣=8 𝑣=∞
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v


if v ≥ β
return v
α = max(α, v)
𝛽=8 8 𝛽= ∞
𝑣=8 𝑣=4
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v


if v ≥ β
return v
α = max(α, v)
𝛽=8 8 𝛽= ∞ 4
𝑣=8 𝑣=4
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v


if v ≥ β
return v
α = max(α, v)
𝛽=8 8 𝛽= ∞ 4
𝑣=8 𝑣=4
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v


if v ≥ β
return v
α = max(α, v)
𝛽=8 8 𝛽= ∞ 4
𝑣=8 𝑣=4
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v


𝑣=8 8 for each successor of state:
if v ≥ β
return v
α = max(α, v)
𝛽=8 8 𝛽= ∞ 4
𝑣=8 𝑣=4
initialize v = +∞
if v ≤ α
return v
β = min(β, v)
return v

Minimax Quiz
What is the value of the top node?

A) 10
B) 100
C) 2
D) 4

Alpha Beta Quiz
Which branches are pruned?

A) e, l
B) g, l
C) g, k, l
D) g, n

Alpha-Beta Quiz 2
α = 10
β = 10 ?
β=2
? ?
α= 10 α=100 v= 2
1
? ? ?
Alpha-Beta Pruning Properties
 Theorem: This pruning has no effect on minimax value computed for the root!
 Good child ordering improves effectiveness of pruning

 Iterative deepening helps with this max
 With “perfect ordering”:

 Time complexity drops to O(bm/2) min
 Doubles solvable depth!
 1M nodes/move => depth=8, respectable
10 10 0
 This is a simple example of metareasoning (computing about what to compute)

Advanced Techiques
 Transposition table to store previously expanded

states
 Forward pruning to avoid considering all possible
moves
 Lookup tables for opening moves and endgames

Chess Systems
 Baseline system: 200 million node evalutions per move

(3 min), minimax with a decent evaluation function and quiescence search
 5-ply ≈ human novice
 Add alpha-beta pruning
 10-ply ≈ typical PC, experienced player
 Deep Blue: 30 billion evaluations per move, singular extensions, evaluation
function with 8000 features,
large databases of opening and endgame moves
 14-ply ≈ Garry Kasparov
 More recent state of the art (Hydra, ca. 2006): 36 billion evaluations per
second, advanced pruning techniques
 18-ply ≈ better than any human alive?
Summary
 Multi-agent problems can require more space or deeper trees to search

 Games require decisions when optimality is impossible
 Bounded-depth search and approximate evaluation functions
 Games force efficient use of computation
 Alpha-beta pruning
 Game playing has produced important research ideas
 Reinforcement learning (checkers)
 Iterative deepening (chess)
 Rational metareasoning (Othello)
 Monte Carlo tree search (Go)
 Solution methods for partial-information games in economics (poker)
 Video games present much greater challenges – lots to do!
 b = 10500, |S| = 104000, m = 10,000

References
 Slides from CMU, Berkeley University.

 Artificial Intelligence: A modern approach. Chapter 6.

2020 - Adversarial Search - Eng

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2020 - Adversarial Search - Eng

Uploaded by

Copyright:

Available Formats

Artificial Intelligence:

How would you search for moves in Tic Tac Toe?

2 Nguyen Van Vinh – UET, VNU Hanoi 4/19/2020

How is Tic Tac Toe different from maze search?

3 Nguyen Van Vinh – UET, VNU Hanoi 4/19/2020

How is Tic Tac Toe different from maze search?

Multi-Agent, Adversarial, Zero Sum

4 Nguyen Van Vinh – UET, VNU Hanoi 4/19/2020

Adversarial Team: Collaborative

7 Nguyen Van Vinh – UET, VNU Hanoi 4/19/2020

8 Nguyen Van Vinh – UET, VNU Hanoi 4/19/2020

9 Nguyen Van Vinh – UET, VNU Hanoi 4/19/2020

Choose the best Non-Terminal States:

Combine the states and actions of the N agents

11 Nguyen Van Vinh – UET, VNU Hanoi 4/19/2020

Combine the states and actions of the N agents

12 Nguyen Van Vinh – UET, VNU Hanoi 4/19/2020

13 Nguyen Van Vinh – UET, VNU Hanoi 4/19/2020

What is the size of

What is the size of

What is the size of

14 Nguyen Van Vinh – UET, VNU Hanoi 4/19/2020

15 Nguyen Van Vinh – UET, VNU Hanoi 4/19/2020

Choose the best Non-Terminal States:

16 Nguyen Van Vinh – UET, VNU Hanoi 4/19/2020

What is the size of

What is the size of

What is the size of

17 Nguyen Van Vinh – UET, VNU Hanoi 4/19/2020

Adversarial Team: Collaborative

18 Nguyen Van Vinh – UET, VNU Hanoi 4/19/2020

19 Nguyen Van Vinh – UET, VNU Hanoi 4/19/2020

 Games are a traditional hallmark of intelligence

21 Nguyen Van Vinh – UET, VNU Hanoi 4/19/2020

 Two-Player Sequential Zero-Sum Complete-Information Games

22 Nguyen Van Vinh - UET, VNU Hanoi 4/19/2020

 Two players, X and O, take turns marking the spaces in a

 The player who succeeds in placing three of their

24 Nguyen Van Vinh – UET, VNU Hanoi 4/19/2020

Garry Kasparov vs Deep Blue (1996)

26 Nguyen Van Vinh - UET, VNU Hanoi 4/19/2020

27 Nguyen Van Vinh - UET, VNU Hanoi 4/19/2020

 DeepMind promotion video before the game with Lee Sedol

AlphaGo vs Lee Sedol (3/2016) AlphaZero vs AlphaGo (2017)

Result: win-win-win-loss-win Result: 100-0

Zero-Sum Games General Games

30 Nguyen Van Vinh – UET, VNU Hanoi 4/19/2020

31 Nguyen Van Vinh – UET, VNU Hanoi 4/19/2020

32 Nguyen Van Vinh – UET, VNU Hanoi 4/19/2020

33 Nguyen Van Vinh – UET, VNU Hanoi 4/19/2020

34 Nguyen Van Vinh – UET, VNU Hanoi 4/19/2020

MAX nodes: under Agent’s control

35 Nguyen Van Vinh – UET, VNU Hanoi 4/19/2020

function minimax-decision(s) returns action

function max-value(s) returns value function min-value(s) returns value

38 Nguyen Van Vinh – UET, VNU Hanoi 4/19/2020

What kind of search is Minimax Search?

39 Nguyen Van Vinh – UET, VNU Hanoi 4/19/2020

 How efficient is minimax?

 Example: For chess, b  35, m  100

41 Nguyen Van Vinh – UET, VNU Hanoi 4/19/2020

 What if the game is not zero-sum, or has multiple players?