Professional Documents
Culture Documents
Local Search
Shang-Tse Chen
Department of Computer Science
& Information Engineering
National Taiwan University
o Local search: improve a single option until you can’t make it better (no fringe!)
o Generally much faster and more memory efficient (but incomplete and
suboptimal)
Hill Climbing
o Simple, general idea:
o Start wherever
o Repeat: move to the best neighboring state
o If no neighbors better than current, quit
Shang-Tse Chen
Department of Computer Science
& Information Engineering
National Taiwan University
o Axes:
o Deterministic or stochastic?
o One, two, or more players?
o Zero sum?
o Perfect information (can you see the state)?
Types of Games
o Pacman
Deterministic Games with Terminal Utilities
o Many possible formalizations, one is:
o States: S (start at s0)
o Players: P={1...N} (usually take turns)
o Actions: A (may depend on player / state)
o Transition Function: SxA ® S
o Terminal Test: S ® {t,f}
o Terminal Utilities: SxP ® R
2 0 … 2 6 … 4 6
Value of a State
Value of a state: Non-Terminal States:
The best achievable
outcome (utility)
from that state
2 0 … 2 6 … 4 6
Terminal States:
Adversarial Game Trees
-8 -5 -10 +8
Terminal States:
Tic-Tac-Toe Game Tree
Adversarial Search (Minimax)
o Deterministic, zero-sum games: Minimax values:
computed recursively
o Tic-tac-toe, chess, checkers
o One player maximizes result 5 max
o The other minimizes result
2 5 min
o Minimax search:
o A state-space search tree
o Players alternate turns
8 2 5 6
o Compute each node’s minimax
value: the best achievable utility
Terminal values:
against a rational (optimal) part of the game
adversary
Minimax Implementation
3 2 2
3 12 8 2 4 6 14 5 2
Minimax Properties
max
min
10 10 9 100
3 <=2 2
3 12 8 2 14 5 2
Alpha-Beta Pruning
o General configuration (MIN version)
o We’re computing the MIN-VALUE at some node n MAX
o We’re looping over n’s children
o n’s estimate of the childrens’ min is dropping MIN a
o Who cares about n’s value? MAX
o Let a be the best value that MAX can get at any
choice point along the current path from the root
o If n becomes worse than a, MAX will avoid it, so we MAX
can stop considering n’s other children (it’s already
bad enough that it won’t be played) MIN n
min
o Good child ordering improves effectiveness of pruning
10
<=2
>=100 2
10
Resource Limits
Resource Limits
o Problem: In realistic games, cannot search to leaves! 4 max
o Example:
o Suppose we have 100 seconds, can explore 10K nodes / sec
o So can check 1M nodes per move
o a-b reaches about depth 8 – decent chess program
[Demo: thrashing d=2, thrashing d=2 (fixed evaluation function), smart ghosts coordinate (L6D6,7,8,10)]
Video of Demo Thrashing (d=2)
Why Pacman Starves
max
min
10 10 9 100
def value(state):
if the state is a terminal state: return the state’s utility
if the next agent is MAX: return max-value(state)
if the next agent is EXP: return exp-value(state)
def exp-value(state):
initialize v = 0
for each successor of state: 1/2 1/6
p = probability(successor) 1/3
v += p * value(successor)
return v 5
8 24
7 -12
3 12 9 2 4 6 15 6 0
Expectimax Pruning?
3 12 9 2
Depth-Limited Expectimax
Estimate of true
400 300 …expectimax value
(which would
require a lot of
… work to compute)
492 362 …
Probabilities
Reminder: Probabilities
o A random variable represents an event whose outcome is unknown
o A probability distribution is an assignment of weights to outcomes
0.25
o Example: Traffic on freeway
o Random variable: T = whether there’s traffic
o Outcomes: T in {none, light, heavy}
o Distribution: P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25
Pacman used depth 4 search with an eval function that avoids trouble
Ghost used depth 2 search with an eval function that seeks Pacman
[Demos: world assumptions (L7D3,4,5,6)]
Assumptions vs. Reality
Pacman used depth 4 search with an eval function that avoids trouble
Ghost used depth 2 search with an eval function that seeks Pacman
[Demos: world assumptions (L7D3,4,5,6)]
Video of Demo World Assumptions
Random Ghost – Expectimax Pacman
Video of Demo World Assumptions
Adversarial Ghost – Minimax Pacman
Video of Demo World Assumptions
Adversarial Ghost – Expectimax Pacman
Video of Demo World Assumptions
Random Ghost – Minimax Pacman
Why not minimax?
o Worst case reasoning is too conservative
o Need average case reasoning
Mixed Layer Types
o E.g. Backgammon
o Expectiminimax
o Environment is an
extra “random
agent” player that
moves after each
min/max agent
o Each node
computes the
appropriate
combination of its
children
Example: Backgammon
o Dice rolls increase b: 21 possible rolls with 2 dice
o Backgammon » 20 legal moves
o Depth 2 = 20 x (21 x 20)3 = 1.2 x 109
Oops Whew!
Maximum Expected Utility
o Questions:
o Where do utilities come from?
o How do we know such utilities even exist?
o How do we know that averaging even makes sense?
o What if our behavior (preferences) can’t be described by
utilities?
What Utilities to Use?