13 - IA - Konputazio Inteligentzia - Search With Other Agents

Artificial intelligence
Computation intelligence
Computation Intelligence
Adversarial Games
AI for games
Adimen Artifiziala
3
Game Playing State-of-the-Art
• Checkers:
– 1950: First computer player.
– 1994: First computer champion: Chinook
(IA) ended 40-year-reign of human
champion Marion Tinsley
– 2007: Checkers solved!
• Chess:
– 1997: Deep Blue defeats human
champion Gary Kasparov in a six-game
match. used very sophisticated
evaluation functions.
– Current programs are even better, if less
historic
• Go:
– 2010: champions start having interest in
an AI b>300
– 2016 Alpha GO defeats human champion.
Uses Monte Carlo Tree Search, learned
evaluation function.
Adimen Artifiziala
4
PACMAN!
Adimen Artifiziala
5
Types of games
• There are many different kinds of games

• Properties:
– Deterministic of stochastic?
– One, two, or more players?
– Zero sum?
– Perfect information (can you see the state)?
• Goal:
– We need algorithms for calculating a Strategy (policy) which recommends a
move from each state
Adimen Artifiziala
6
Deterministic games
• Many possible formalizations, one is:

– States: S (start at s0)
– Players: P={1...N} (usually take turns)
– Actions: A (may depend on player / state)
– Transition Function: SxA → S
– Terminal Test: S → {t,f}
– Terminal Utilities: SxP → R
• Solution for a player is a Policy: S → A
Adimen Artifiziala
7
Zero-sum jokuak
• Zero Sum games: • General Games:

– Agents have opposite utilities – Agents have independent
(values on outcomes) utilities (values on
– Lets us think of a single value outcomes)
that one maximizes and the – Cooperation, indifference,
other minimizes competition, and more are
– Adversarial, pure competition all possible
Adimen Artifiziala
8
Adversarial Search
Single-Agent Trees
2 0 … 2 6 … 4 6
Adimen Artifiziala
10
Value of a state
Value of a state:
Non-Terminal States:
The best
achievable
outcome (utility)
from that state
2 0 … 2 6 … 4 6
Terminal States
Adimen Artifiziala
11
Adversarial Game Trees
-20 -8 … -18 -5 … -10 +4 -20 +8

Adimen Artifiziala
12
Minimax Values
States Under Agent’s control: States Under Opponent’s control:
-8 -5 -10 +8
Terminal States:
Adimen Artifiziala
13
Tic-Tac-Toe
Adimen Artifiziala
14
Adversarial Search (Minimax)
• Deterministic, zero-sum games: Minimax values:

– Tic-tac-toe, chess, checkers computed recursively
– One player maximizes result
5 max
– The other minimizes result
• Minimax search: 2 5 min
– A state-space search tree

– Players alternate turns
– Compute each node’s minimax value: the 8 2 5 6
best achievable utility against a rational
(optimal) adversary Terminal values:
– part of the game
Adimen Artifiziala
15
Minimax Implementation
def max-value(state): def min-value(state):

initialize v = -∞ initialize v = +∞
for each successor of state: for each successor of state:
v = max(v, v = min(v,
min-value(successor)) max-value(successor))
return v return v
Adimen Artifiziala
16
Minimax Implementation
def value(state):
if the state is a terminal state: return the state’s utility
if the next agent is MAX: return max-value(state)
if the next agent is MIN: return min-value(state)
def max-value(state): def min-value(state):

for each successor of for each successor of
state: state:
v = max(v, v = min(v,
value(successor)) value(successor))
return v return v
Adimen Artifiziala
17
Minimax Example
3
MAX
3 2 2
5
14
MIN
3 12 8 2 4 6 14 5 2
Adimen Artifiziala
18
Minimax Properties
Optimal against a perfect player.
Otherwise?
max
min
10 10 9 100
Adimen Artifiziala
19
Minimax-en efizientzia
• How efficient is minimax?

– Just like (exhaustive) DFS
– Time: O(bm)
– Space: O(bm)
Example, Chess:
• b≈35, m≈100
• It is not possible to check the exact solution
• but… do we really need it?
Adimen Artifiziala
20
Game Tree Pruning
Adimen Artifiziala
21
Minimax Example
3
MAX
3 2 2
5
14
MIN
3 12 8 2 4 6 14 5 2
Adimen Artifiziala
22
Pruning Minimax Example
3 ≤2 2 2
5
14
3 12 8 2 14 5 2
Adimen Artifiziala
23
Alpha-Beta Implementation
α: MAX’s best option on path to root

β: MIN’s best option on path to root
def max-value(state, α, β): def min-value(state , α, β):

for each successor of state: for each successor of state:
v = max(v, value(successor, α, β)) v = min(v, value(successor, α, β))
if v ≥ β return v if v ≤ α return v
α = max(α, v) β = min(β, v)
return v return v
Adimen Artifiziala
24
Alpha-Beta Prunning properties
• Pruning has not effect in Minimax values!

• Maybe the intermediate values are not the final ones
– But this has no effect on top value
• for the best chance, (the most ordered one)
m/2
– Time complexity goes down by O(b )
– doubles effective depth
– But still… it is too much for chess
Only compute what it is needed!
Adimen Artifiziala
25
Quiz1
Minimax Value?
Which branches will alpha beta prune?
Adimen Artifiziala
26
Quiz2
Minimax Value?
Which branches will alpha beta prune?
Adimen Artifiziala
27
Resource Limits
• Problem: We can not reach final Utility in max

4
the real games!
-2 4 min
• Solution: Limit Depth
– Reach only a certain depth -1 -2 4 9
– The utility should give us a rating for that state
(something similar to heuristics)
• We loss optimality
• the deeper we look into, the better solution
• This is a replanification agent
? ? ? ?
Adimen Artifiziala
28
Replaning problems
Adimen Artifiziala
29
Replaning problems
+8 +8
+8 -2 +8
Adimen Artifiziala
30
Replaning problems
+8 +8
+8 +1 -2 +8
Adimen Artifiziala
31
Replaning problems
Adimen Artifiziala
32
Evaluation Functions
Adimen Artifiziala 33
Evaluation Functions
• Evaluation Functions will give a score to the non terminal states.
• Ideal function: real minimax value
• Reallity: a weighted sum of different features
example: f1(s)=(whites - blacks)

f2(s)=(white queen - black queen)
f3(s)=(white in danger - black in danger)
Adimen Artifiziala
34
Evaluation for Pacman
Adimen Artifiziala
35
depth is critical
Depth=2 Depth=10
Adimen Artifiziala
36
Importance of depth
• Evaluation functions are always non

optimal
• the deeper we look into the tree, the

evaluation function will give us
better results
• The more time you use solving

evaluation function, the less
amount of time you will have to
look into the tree
Adimen Artifiziala
37
Laburpena
• we defined search in games
• For Zero-Sum deterministic games
– Minimax (against perfect gamer)
• Exploration can be pruned
– Apha-Beta pruning
• Even using prunnint, trees are too big:
– Limit depth
• Limiting depth has problems (hungry pacman)
– Evaluation function
• And what if the game is not determinist??
Adimen Artifiziala
38
Uncertainty
Worst-Case vs. Average Case
max
min
chance
10 10 9 100
Idea: Uncertain outcomes controlled by chance, not an adversary!
Adimen Artifiziala
40
Expectimax Bilaketak
Why wouldn’t we know what the result of an action will be?

• Explicit randomness: rolling dice max
• Unpredictable opponents: the ghosts respond
randomly
• Actions can fail: when moving a robot, wheels might chance
slip
Values should now reflect average-case (expectimax)
outcomes, not worst-case (minimax) outcomes 10 10 5
9 100
7
Expectimax: compute the average score under optimal

play
• Max nodes as in minimax search
• Chance nodes are like min nodes but the outcome is
uncertain
• take weighted average (expectation) of children
Adimen Artifiziala
41
Expectimax
Adimen Artifiziala
42
Expectimax pseudokodea
def value(state):
if the state is a terminal state: return the
state’s utility
if the next agent is MAX: return
max-value(state)
if the next agent is EXP: return
exp-value(state)
def max-value(state): def exp-value(state):

initialize v = -∞ initialize v = 0
for each successor of for each successor of state:
state: p = probability(successor)
v = max(v, v += p * value(successor)
value(successor)) return v
return v
Adimen Artifiziala
43
Expectimax Pseudocode
def exp-value(state): 10
initialize v = 0
for each successor of state: 1/2 1/6
p = probability(successor) 1/3
v += p * value(successor)
return v 5
8 24
7 -12
v = (1/2) (8) + (1/3) (24) + (1/6) (-12) = 10
Adimen Artifiziala
44
Quiz
what is the max value at the top?
3 12 9 2 4 6 15 6 0
Adimen Artifiziala
45
Expectimax pruning?
3 12 9 2
batazbestekoa kalkulatzeko, GUZTIA begiratu behar da!

Adimen Artifiziala
46
Depth-Limited Expectimax
400 300 …
492 362 …
Adimen Artifiziala
47
Expected value
The expected value of a function of a random variable is the average,

weighted by the probability distribution over outcomes
• probabilitatea1*balioa1+probabilitatea2*balioa2+....
Example: How long to get to the airport?

Time: 20 min 30 min 60 min
x + x + x 35 min
Probability: 0.25 0.50 0.25
Adimen Artifiziala
48
The Dangers of Optimism and Pessimism
Dangerous Optimism Dangerous Pessimism

Assuming chance when the world is Assuming the worst case when it’s not likely
adversarial
Adimen Artifiziala
49
The Dangers of Optimism and Pessimism
Adimen Artifiziala
50
Assumptions vs. Reality
Adversarial Ghost Random Ghost
Minimax Won 5/5 Won 5/5

Pacman Avg. Score: 483 Avg. Score: 493
Expectimax Won 1/5 Won 5/5

Pacman Avg. Score: -303 Avg. Score: 503
Pacman used depth 4 search with an eval function that avoids trouble
Ghost used depth 2 search with an eval function that seeks Pacman
Adimen Artifiziala
51
Multi-Agent Utilities
• What if the game is not zero-sum, or

has multiple players?
• Generalization of minimax:
– Terminals have utility tuples

– Node values are also utility tuples
– Each player maximizes its own
component
– Can give rise to cooperation and
competition dynamically...
Adimen Artifiziala
52
Evaluation for Pacman
Adimen Artifiziala
53
Problems with search
•
• In any search, we have seen that exploring the tree is very consuming
(time, memory)
• Example: GO: branching factor= 300
– step 1: 300
– step 2: 300²=90000
– step 3: 300³=27 M
– step 4: 300⁴=8100 M
– step 5: 300⁵=2.43 B
• Even in step 5 this is unfeasible. (even with pruning)
• Depth limited search needs an evaluation function
– Mainly engineered by humans
– unique for each problem
• Can we do it better? a more generic AI?
Adimen Artifiziala
54
Monte Carlo Tree Search
• Monte Carlo Tree Search (MCTS) is an important algorithm behind many

major successes of recent AI applications such as AlphaGo’
Adimen Artifiziala
55
Monte Carlo Tree Search
MCTS combines two important ideas:

• Evaluation by rollouts – play multiple games to termination from a states
(using a simple, fast rollout policy) and count wins and losses
• Selective search – explore parts of the tree that will help improve the
decision at the root, regardless of depth
Rollouts:
• repeat until terminal state:
– play a move according to a “cheap” policy
• Save state
• Fraction of wins correlates with the value of the position
Adimen Artifiziala
56
MCTS Version 0
Do N rollouts from each child of the root, record fraction of wins

Pick the move that gives the best outcome by the metric
Adimen Artifiziala
57
MCTS Version 0
Do N rollouts from each child of the root, record fraction of wins

Pick the move that gives the best outcome by the metric
Adimen Artifiziala
58
MCTS Version 0.9
Allocate rollouts to more promising nodes
Adimen Artifiziala
59
MCTS Version 0.9
Adimen Artifiziala
60
MCTS Version 1.0

Allocate rollouts to more uncertain nodes
Adimen Artifiziala
61
UCB Heuristics
How to select next rollout?

• UCB formula combines “promising” and “uncertain”:
• N(n) = number of rollouts from node n

• U(n) = Total utility of rollouts (wins) for player (Parent(n))
Adimen Artifiziala
62
MCTS properties
• “Value” of a node, U(n)/N(n), is a weighted sum of child values!

• idea: as N→∞, the vast majority of rollouts are concentrated in the best
children.
• Theorem: as N→∞, MCTS selects the minimax move
– but N is never infinity!
Adimen Artifiziala
63
Eskerrik asko
Muchas gracias
Thank you
Ekhi Zugasti
ezugasti@mondragon.edu
Loramendi, 4. Apartado 23
20500 Arrasate – Mondragon
T. 943 71 21 85
info@mondragon.edu

13 - IA - Konputazio Inteligentzia - Search With Other Agents

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

13 - IA - Konputazio Inteligentzia - Search With Other Agents

Uploaded by

Copyright:

Available Formats

Artificial intelligence

• There are many different kinds of games

• Many possible formalizations, one is:

• Solution for a player is a Policy: S → A

• Zero Sum games: • General Games:

-20 -8 … -18 -5 … -10 +4 -20 +8

States Under Agent’s control: States Under Opponent’s control:

• Deterministic, zero-sum games: Minimax values:

• Minimax search: 2 5 min

– A state-space search tree

def max-value(state): def min-value(state):

def max-value(state): def min-value(state):

• How efficient is minimax?

α: MAX’s best option on path to root

def max-value(state, α, β): def min-value(state , α, β):

• Pruning has not effect in Minimax values!

Only compute what it is needed!

• Problem: We can not reach final Utility in max

example: f1(s)=(whites - blacks)

• Evaluation functions are always non

• the deeper we look into the tree, the

• The more time you use solving

Idea: Uncertain outcomes controlled by chance, not an adversary!

Why wouldn’t we know what the result of an action will be?

Expectimax: compute the average score under optimal

def max-value(state): def exp-value(state):

v = (1/2) (8) + (1/3) (24) + (1/6) (-12) = 10

what is the max value at the top?

batazbestekoa kalkulatzeko, GUZTIA begiratu behar da!

The expected value of a function of a random variable is the average,

Example: How long to get to the airport?

Dangerous Optimism Dangerous Pessimism

Adversarial Ghost Random Ghost

Minimax Won 5/5 Won 5/5

Expectimax Won 1/5 Won 5/5

• What if the game is not zero-sum, or

– Terminals have utility tuples

• Monte Carlo Tree Search (MCTS) is an important algorithm behind many

MCTS combines two important ideas:

Do N rollouts from each child of the root, record fraction of wins

Do N rollouts from each child of the root, record fraction of wins

Allocate rollouts to more promising nodes

Allocate rollouts to more promising nodes

Allocate rollouts to more promising nodes

How to select next rollout?

• N(n) = number of rollouts from node n

• “Value” of a node, U(n)/N(n), is a weighted sum of child values!

You might also like