Professional Documents
Culture Documents
2022 Slide5 AdversarialSearch Eng
2022 Slide5 AdversarialSearch Eng
Adversarial Search
1 4/18/2022
Warm Up
2 0 … 2 6 … 4 6
5 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022
Value of a State
Non-Terminal States:
Value of a state: V(s) = max V(s’)
s’ successors(s)
The best achievable
outcome (utility)
from that state
Terminal States:
2 0 … 2 6 … 4 6
V(s) = known
6 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022
Multi-Agent Applications
(Football)
Simplest idea: each agent plans their own actions separately from
others.
2 0 … 2 6 … 4 6
10 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022
Idea 2: Joint State/Action Spaces
𝑆0 = (𝑆0𝐴 , 𝑆0𝐵 )
𝑆𝐾 = (𝑆𝐾𝐴 , 𝑆𝐾𝐵 )
𝑆𝐾 = (𝑆𝐾𝐴 , 𝑆𝐾𝐵 )
Each agent proposes their actions and computer confirms the joint
plan
Example: Autonomous driving through intersections
https://www.youtube.com/watch?v=4pbAI40dK0A
Search one agent’s actions from a state, search the next agent’s
actions from those resulting states , etc…
Agent 1
Search one agent’s actions from a state, search the next agent’s
actions from those resulting states , etc…
(Football)
Deterministic or stochastic?
Perfect information (fully observable)?
One, two, or more players?
Turn-taking or ?
Zero sum?
Two-Player
Two players involved
Sequential
Players take turns to move (take actions)
Act alternately
Zero-sum
Utility values of the two players at the end of the game have equal absolute value
and opposite sign
Perfect Information
Each player can fully observe what actions other agents have taken
https://en.wikipedia.org/wiki/Tic-tac-toe
23 Nguyen Van Vinh - UET, VNU Hanoi 4/18/2022
Chinese Chess
https://www.chessprogramming.org/Pham_Hong_Nguyen
https://vnexpress.net/giai-co-tuong-may-tinh-lan-dau-tien-o-viet-
nam-1773345.html
https://en.wikipedia.org/wiki/Chess
25 Nguyen Van Vinh - UET, VNU Hanoi 4/18/2022
Chess
Result: Win-loss-draw-draw-draw-loss
(In even-numbered games, Deep Blue
played white)
https://www.youtube.com/watch?v=SUbqykXVx0A
28 Nguyen Van Vinh - UET, VNU Hanoi 4/18/2022
Go
https://deepmind.com/blog/alphago-zero-learning-
scratch/
Result: win-win-win-loss- Result: 100-
win 0
AlphaGo: https://www.nature.com/articles/nature16961.pdf
AlphaZero: www.nature.com/articles/nature24270.pdf
29 Nguyen Van Vinh - UET, VNU Hanoi 4/18/2022
Zero-Sum Games
x o x
ox
win
o
lose
x o x x o x x o x draw
x ox ox ox
o x o x o
x o x x o x x o x x o x x o x x o x
x ox x ox o ox ox o ox o x
o o oo x o x oo x o o x o
x o x x o x x o x x o x
x ox o ox o ox x ox
o x o x xo x x o o x o
33
What is a good move?
x o x
ox
o
win
x o x x o x x o x lose
x ox ox ox
draw
o x o x o
x o x x o x x o x x o x x o x x o x
x ox x ox o ox ox o ox o x
o o oo x o x oo x o o x o
x o x x o x x o x x o x
x ox o ox o o x x o x
o x o x xo x x o o x o
34
Tic-Tac-Toe Game Tree
This is a zero-sum
game, the best
action for X is the
worst action for O
and vice versa
How do we define
best and worst?
MAX nodes: under Agent’s control MIN nodes: under Opponent’s control
V(s) = max V(s’) V(s) = min V(s’)
s’ successors(s) s’ successors(s)
-8
-8 -10
-8 -5 -10 +8
Terminal States:
V(s) = known
38 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022
Minimax Algorithm
3 2 2
3 12 8 2 4 6 14 5 2
A) BFS
B) DFS
C) UCS
D) A*
MAX nodes: under Agent’s control MIN nodes: under Opponent’s control
V(s) = max V(s’) V(s) = min V(s’)
s’ successors(s) s’ successors(s)
-8
-8 -10
-8 -5 -10 +8
Terminal States:
V(s) = known
42 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022
Minimax Efficiency
Example:
Suppose we have 100 seconds, can explore 10K nodes /
sec
So can check 1M nodes per move
For chess, b=~35 so reaches about depth 4 – not so good
? ? ? ?
Example:
Suppose we have 100 seconds, can explore 10K nodes /
sec
So can check 1M nodes per move
For chess, b=~35 so reaches about depth 4 – not so good ? ? ? ?
3 2 2
3 12 8 2 4 6 14 5 2
α =3 α =3
3
3 12 8 2 14 5 2
We can prune when: min node won’t The order of generation matters: more pruning
be higher than 2, while parent max is possible if good moves come first
has seen something larger in another
branch
56 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022
Why is α-β?
α = 10
β = 10 ?
β=2
? ?
α= 10 α=100 v= 2
1
? ? ?
79 Nguyen Van Vinh – UET, VNU Hanoi 4/18/2022
Alpha-Beta Pruning Properties
Theorem: This pruning has no effect on minimax value computed for the root!
10 10 0
This is a simple example of metareasoning (computing about what to
compute)