You are on page 1of 64

Artificial intelligence

Computation intelligence
Computation Intelligence
Adversarial Games
AI for games

Adimen Artifiziala
3
Game Playing State-of-the-Art
• Checkers:
– 1950: First computer player.
– 1994: First computer champion: Chinook
(IA) ended 40-year-reign of human
champion Marion Tinsley
– 2007: Checkers solved!
• Chess:
– 1997: Deep Blue defeats human
champion Gary Kasparov in a six-game
match. used very sophisticated
evaluation functions.
– Current programs are even better, if less
historic
• Go:
– 2010: champions start having interest in
an AI b>300
– 2016 Alpha GO defeats human champion.
Uses Monte Carlo Tree Search, learned
evaluation function.

Adimen Artifiziala
4
PACMAN!

Adimen Artifiziala
5
Types of games

• There are many different kinds of games


• Properties:
– Deterministic of stochastic?
– One, two, or more players?
– Zero sum?
– Perfect information (can you see the state)?

• Goal:
– We need algorithms for calculating a Strategy (policy) which recommends a
move from each state

Adimen Artifiziala
6
Deterministic games

• Many possible formalizations, one is:


– States: S (start at s0)
– Players: P={1...N} (usually take turns)
– Actions: A (may depend on player / state)
– Transition Function: SxA → S
– Terminal Test: S → {t,f}
– Terminal Utilities: SxP → R

• Solution for a player is a Policy: S → A

Adimen Artifiziala
7
Zero-sum jokuak

• Zero Sum games: • General Games:


– Agents have opposite utilities – Agents have independent
(values on outcomes) utilities (values on
– Lets us think of a single value outcomes)
that one maximizes and the – Cooperation, indifference,
other minimizes competition, and more are
– Adversarial, pure competition all possible
Adimen Artifiziala
8
Adversarial Search
Single-Agent Trees

2 0 … 2 6 … 4 6
Adimen Artifiziala
10
Value of a state

Value of a state:
Non-Terminal States:
The best
achievable
outcome (utility)
from that state

2 0 … 2 6 … 4 6
Terminal States

Adimen Artifiziala
11
Adversarial Game Trees

-20 -8 … -18 -5 … -10 +4 -20 +8


Adimen Artifiziala
12
Minimax Values

States Under Agent’s control: States Under Opponent’s control:

-8 -5 -10 +8

Terminal States:

Adimen Artifiziala
13
Tic-Tac-Toe

Adimen Artifiziala
14
Adversarial Search (Minimax)

• Deterministic, zero-sum games: Minimax values:


– Tic-tac-toe, chess, checkers computed recursively
– One player maximizes result
5 max
– The other minimizes result

• Minimax search: 2 5 min

– A state-space search tree


– Players alternate turns
– Compute each node’s minimax value: the 8 2 5 6
best achievable utility against a rational
(optimal) adversary Terminal values:
– part of the game

Adimen Artifiziala
15
Minimax Implementation

def max-value(state): def min-value(state):


initialize v = -∞ initialize v = +∞
for each successor of state: for each successor of state:
v = max(v, v = min(v,
min-value(successor)) max-value(successor))
return v return v

Adimen Artifiziala
16
Minimax Implementation

def value(state):
if the state is a terminal state: return the state’s utility
if the next agent is MAX: return max-value(state)
if the next agent is MIN: return min-value(state)

def max-value(state): def min-value(state):


initialize v = -∞ initialize v = +∞
for each successor of for each successor of
state: state:
v = max(v, v = min(v,
value(successor)) value(successor))
return v return v
Adimen Artifiziala
17
Minimax Example

3
MAX

3 2 2
5
14
MIN

3 12 8 2 4 6 14 5 2

Adimen Artifiziala
18
Minimax Properties
Optimal against a perfect player.
Otherwise?
max

min

10 10 9 100

Adimen Artifiziala
19
Minimax-en efizientzia

• How efficient is minimax?


– Just like (exhaustive) DFS
– Time: O(bm)
– Space: O(bm)

Example, Chess:
• b≈35, m≈100
• It is not possible to check the exact solution
• but… do we really need it?

Adimen Artifiziala
20
Game Tree Pruning

Adimen Artifiziala
21
Minimax Example

3
MAX

3 2 2
5
14
MIN

3 12 8 2 4 6 14 5 2

Adimen Artifiziala
22
Pruning Minimax Example

3 ≤2 2 2
5
14

3 12 8 2 14 5 2

Adimen Artifiziala
23
Alpha-Beta Implementation

α: MAX’s best option on path to root


β: MIN’s best option on path to root

def max-value(state, α, β): def min-value(state , α, β):


initialize v = -∞ initialize v = +∞
for each successor of state: for each successor of state:
v = max(v, value(successor, α, β)) v = min(v, value(successor, α, β))
if v ≥ β return v if v ≤ α return v
α = max(α, v) β = min(β, v)
return v return v

Adimen Artifiziala
24
Alpha-Beta Prunning properties

• Pruning has not effect in Minimax values!


• Maybe the intermediate values are not the final ones
– But this has no effect on top value
• for the best chance, (the most ordered one)
m/2
– Time complexity goes down by O(b )
– doubles effective depth
– But still… it is too much for chess

Only compute what it is needed!

Adimen Artifiziala
25
Quiz1
Minimax Value?
Which branches will alpha beta prune?

Adimen Artifiziala
26
Quiz2
Minimax Value?
Which branches will alpha beta prune?

Adimen Artifiziala
27
Resource Limits

• Problem: We can not reach final Utility in max


4
the real games!
-2 4 min
• Solution: Limit Depth
– Reach only a certain depth -1 -2 4 9
– The utility should give us a rating for that state
(something similar to heuristics)

• We loss optimality
• the deeper we look into, the better solution
• This is a replanification agent

? ? ? ?

Adimen Artifiziala
28
Replaning problems

Adimen Artifiziala
29
Replaning problems

+8 +8

+8 -2 +8

Adimen Artifiziala
30
Replaning problems

+8 +8

+8 +1 -2 +8

Adimen Artifiziala
31
Replaning problems

Adimen Artifiziala
32
Evaluation Functions

Adimen Artifiziala 33
Evaluation Functions
• Evaluation Functions will give a score to the non terminal states.
• Ideal function: real minimax value
• Reallity: a weighted sum of different features

example: f1(s)=(whites - blacks)


f2(s)=(white queen - black queen)
f3(s)=(white in danger - black in danger)

Adimen Artifiziala
34
Evaluation for Pacman

Adimen Artifiziala
35
depth is critical

Depth=2 Depth=10

Adimen Artifiziala
36
Importance of depth

• Evaluation functions are always non


optimal

• the deeper we look into the tree, the


evaluation function will give us
better results

• The more time you use solving


evaluation function, the less
amount of time you will have to
look into the tree
Adimen Artifiziala
37
Laburpena
• we defined search in games
• For Zero-Sum deterministic games
– Minimax (against perfect gamer)
• Exploration can be pruned
– Apha-Beta pruning
• Even using prunnint, trees are too big:
– Limit depth
• Limiting depth has problems (hungry pacman)
– Evaluation function
• And what if the game is not determinist??

Adimen Artifiziala
38
Uncertainty
Worst-Case vs. Average Case

max

min
chance

10 10 9 100

Idea: Uncertain outcomes controlled by chance, not an adversary!

Adimen Artifiziala
40
Expectimax Bilaketak

Why wouldn’t we know what the result of an action will be?


• Explicit randomness: rolling dice max
• Unpredictable opponents: the ghosts respond
randomly
• Actions can fail: when moving a robot, wheels might chance
slip
Values should now reflect average-case (expectimax)
outcomes, not worst-case (minimax) outcomes 10 10 5
9 100
7

Expectimax: compute the average score under optimal


play
• Max nodes as in minimax search
• Chance nodes are like min nodes but the outcome is
uncertain
• take weighted average (expectation) of children
Adimen Artifiziala
41
Expectimax

Adimen Artifiziala
42
Expectimax pseudokodea

def value(state):
if the state is a terminal state: return the
state’s utility
if the next agent is MAX: return
max-value(state)
if the next agent is EXP: return
exp-value(state)

def max-value(state): def exp-value(state):


initialize v = -∞ initialize v = 0
for each successor of for each successor of state:
state: p = probability(successor)
v = max(v, v += p * value(successor)
value(successor)) return v
return v
Adimen Artifiziala
43
Expectimax Pseudocode

def exp-value(state): 10
initialize v = 0
for each successor of state: 1/2 1/6
p = probability(successor) 1/3
v += p * value(successor)
return v 5
8 24
7 -12

v = (1/2) (8) + (1/3) (24) + (1/6) (-12) = 10

Adimen Artifiziala
44
Quiz

what is the max value at the top?

3 12 9 2 4 6 15 6 0
Adimen Artifiziala
45
Expectimax pruning?

3 12 9 2

batazbestekoa kalkulatzeko, GUZTIA begiratu behar da!


Adimen Artifiziala
46
Depth-Limited Expectimax

400 300 …

492 362 …

Adimen Artifiziala
47
Expected value

The expected value of a function of a random variable is the average,


weighted by the probability distribution over outcomes
• probabilitatea1*balioa1+probabilitatea2*balioa2+....

Example: How long to get to the airport?


Time: 20 min 30 min 60 min
x + x + x 35 min
Probability: 0.25 0.50 0.25

Adimen Artifiziala
48
The Dangers of Optimism and Pessimism

Dangerous Optimism Dangerous Pessimism


Assuming chance when the world is Assuming the worst case when it’s not likely
adversarial

Adimen Artifiziala
49
The Dangers of Optimism and Pessimism

Adimen Artifiziala
50
Assumptions vs. Reality

Adversarial Ghost Random Ghost

Minimax Won 5/5 Won 5/5


Pacman Avg. Score: 483 Avg. Score: 493

Expectimax Won 1/5 Won 5/5


Pacman Avg. Score: -303 Avg. Score: 503

Pacman used depth 4 search with an eval function that avoids trouble
Ghost used depth 2 search with an eval function that seeks Pacman

Adimen Artifiziala
51
Multi-Agent Utilities

• What if the game is not zero-sum, or


has multiple players?
• Generalization of minimax:

– Terminals have utility tuples


– Node values are also utility tuples
– Each player maximizes its own
component
– Can give rise to cooperation and
competition dynamically...

Adimen Artifiziala
52
Evaluation for Pacman

Adimen Artifiziala
53
Problems with search


• In any search, we have seen that exploring the tree is very consuming
(time, memory)
• Example: GO: branching factor= 300
– step 1: 300
– step 2: 300²=90000
– step 3: 300³=27 M
– step 4: 300⁴=8100 M
– step 5: 300⁵=2.43 B
• Even in step 5 this is unfeasible. (even with pruning)
• Depth limited search needs an evaluation function
– Mainly engineered by humans
– unique for each problem
• Can we do it better? a more generic AI?

Adimen Artifiziala
54
Monte Carlo Tree Search

• Monte Carlo Tree Search (MCTS) is an important algorithm behind many


major successes of recent AI applications such as AlphaGo’

Adimen Artifiziala
55
Monte Carlo Tree Search

MCTS combines two important ideas:


• Evaluation by rollouts – play multiple games to termination from a states
(using a simple, fast rollout policy) and count wins and losses
• Selective search – explore parts of the tree that will help improve the
decision at the root, regardless of depth
Rollouts:
• repeat until terminal state:
– play a move according to a “cheap” policy
• Save state
• Fraction of wins correlates with the value of the position

Adimen Artifiziala
56
MCTS Version 0

Do N rollouts from each child of the root, record fraction of wins


Pick the move that gives the best outcome by the metric

Adimen Artifiziala
57
MCTS Version 0

Do N rollouts from each child of the root, record fraction of wins


Pick the move that gives the best outcome by the metric

Adimen Artifiziala
58
MCTS Version 0.9

Allocate rollouts to more promising nodes

Adimen Artifiziala
59
MCTS Version 0.9

Allocate rollouts to more promising nodes

Adimen Artifiziala
60
MCTS Version 1.0

Allocate rollouts to more promising nodes


Allocate rollouts to more uncertain nodes

Adimen Artifiziala
61
UCB Heuristics

How to select next rollout?


• UCB formula combines “promising” and “uncertain”:

• N(n) = number of rollouts from node n


• U(n) = Total utility of rollouts (wins) for player (Parent(n))

Adimen Artifiziala
62
MCTS properties

• “Value” of a node, U(n)/N(n), is a weighted sum of child values!


• idea: as N→∞, the vast majority of rollouts are concentrated in the best
children.
• Theorem: as N→∞, MCTS selects the minimax move
– but N is never infinity!

Adimen Artifiziala
63
Eskerrik asko
Muchas gracias
Thank you

Ekhi Zugasti
ezugasti@mondragon.edu

Loramendi, 4. Apartado 23
20500 Arrasate – Mondragon
T. 943 71 21 85
info@mondragon.edu

You might also like