You are on page 1of 35

ADVERSARIAL SEARCH

& GAMES
The games most commonly studied within AI (such as chess,checkers
and Go)
Types of games:
• Deterministic eg Tic-tac toe.
• One, Two-player (Turn-taking) or more players
• Zero sum – no win-win situation
• Fully observable games

For games we often use the term move as a synonym for “action” and
position as a synonym for “state.”
Deterministic Games
• A game is said to be deterministic if the resolution of player actions
leads to a theoretically predictable outcome.
• One formalization of deterministic games is:
 States: S (start at S0 )
 Players: P ={1…N} (usually taking turns)
 Actions: A (may depend on player/state)
 Transition function: SxA  S
 Terminal Test: S {t,f}
 Terminal Utilities: SxPR
General games can be formally defined with the following elements:
• S0: The initial state, which specifies how the game is set up at the start.
• TO-MOVE(s): The player whose turn it is to move in state s.
• ACTIONS(s): The set of legal moves in state s.
• RESULT(s, a): The transition model, which defines the state resulting
from taking action a in state s.
• IS-TERMINAL(s): A terminal test, which is true when the game is over
and false otherwise. States where the game has ended are called terminal
states.
• UTILITY(s, p): A utility function defines the final numeric value to player
p when the game ends in terminal state s.
In chess, the outcome is a win, loss, or draw, with values 1, 0, or 1/22
ACTIONS and RESULT function define the state space graph—a graph
where the vertices are states, the edges are moves and a state State
space graph might be reached by multiple paths

We define the complete game tree as a Game tree search tree that
follows every sequence of moves all the way to a terminal state. The
game tree may be infinite if the state space itself is unbounded or if
the rules of the game allow for infinitely repeating positions
Adversarial search
• Consider games where our agents have one or more adversaries who
attempt to keep them from reaching their goal(s).
• The easiest way to think about such games is as being defined by a
single variable value, which one team or agent tries to maximize and
the opposing team or agent tries to minimize, effectively putting them
in direct competition.
• In Pacman, this variable is your score, which you try to maximize by
eating pellets quickly and efficiently while ghosts try to minimize by
eating you first. Many common household games also fall under this
class of games e.g. chess, checkers & Go
To solve zero-sum games, we’ll use a
set of approaches called adversarial
search.
This is where you as an agent have
to think about as a function of what
you’ll do and what will the other
agent do…who is working against
you.
A (partial) game tree for the game of tic-tac-toe. The top node is the initial state,
and MAX moves first, placing an X in an empty square. We show part of the tree,
giving alternating moves by MIN (O) and MAX (X), until we eventually reach
terminal states, which can be assigned utilities according to the rules of the game

• MAX wants to find a


sequence of actions
leading to a win but MIN
has something to say
about it
Single Agent Trees
Assume that Pacman starts with 10 points and loses 1 point per move
until he eats the pellet at which point the game arrives at a terminal state
and ends.
MINIMAX ALGORITHM

• The minimax algorithm runs under the motivating assumption that


the opponent we face behaves optimally, and will always perform the
move that is worst for us. To introduce this algorithm, we must first
formalize the notion of terminal utilities and state value. The value of
a state is the optimal score attainable by the agent which controls
that state.
• Terminal utility: This is the value of the terminal state and it is always
some deterministic known value and an inherent game property.
• State value: The best achievable outcome(utility) by an agent from a
specific state.
Minmax Algorithm: Pacman vs Ghost
• Pacman tries to maximize score
• Ghost wants to minimize score by eating Pacman
• Assume alternating moves
• For each sequence of alternating actions, at some point the game
might end and some utility associated
The rules of the game dictate that the two agents take turns making moves, leading to a game
tree where the two agents switch off on layers of the tree that they "control". Blue nodes
correspond to nodes that Pacman controls and can decide what action to take, while
red nodes correspond to ghost-controlled nodes.
Minimax values

∀ agent-controlled states, V(s) = max s’∈successors(s) V(s’)

∀ opponent-controlled states, V(s) = min s’∈successors(s) V(s’)


In implementation, minimax behaves similarly to depth-first search,
computing values of nodes in the same order as DFS would,
starting with the the leftmost terminal node and iteratively working
its way rightwards. It performs a postorder traversal of the game tree
Minimax properties
• State-space search tree
• Players alternate turns
• To compute minimax values at each
node we work from bottom to top
• Minimax search is depth-search
Minimax implementation
Minimax example
A two-ply game tree. The △ nodes are “MAX nodes,” in which it is
MAX’s turn to move, and the ▽ nodes are “MIN nodes.” The terminal
nodes show the utility values for MAX
Minimax efficiency
How efficient is minimax?
• Just like DFS
• Time: O(bm)
• Space: O(bm)
Where b is the branching factor and m is the depth of the tree
Alpha-beta pruning
• Recalling that b is the branching factor and m is the approximate tree
depth at which terminal nodes can be found, this yields far too great a
runtime for many games
• For example, chess has a branching factor b ≈ 35 and tree depth m ≈
100.
• Exact solution is completely infeasible
• But, do we need to explore the whole tree?
Alpha-beta pruning example
Alpha-beta pruning
Alpha-beta pruning
• We’re computing the MIN-VALUE at some node n
• We’re looping over n’s children
• n’s estimate of the childrens’ min is dropping
• Who cares about n’s value? MAX
• Let a be the best value that MAX can get at any
choice point along the current path from the root
• If n becomes worse than a, MAX will avoid it, so
we can stop considering n’s other children (it’s
already bad enough that it won’t be played
Alpha-beta implementation
Alpha-beta pruning properties
• This pruning has no effect on minimax value computed for the root
• Values of intermediate nodes might be wrong
• With “perfect ordering”:
Time complexity drops to O(bm/2) - computation from regular minimax
Doubles solvable depth – reduces the effective branching factor and
thus allowing for deeper exploration within the same computational
resources.
Alpha-beta pruning example
Alpha-beta pruning example
Evaluation Functions
• Trying to estimate how good the position is for one side without
making anymore moves.
• functions that take in a state and output an estimate of the true
minimax value of that node. Typically, this is plainly interpreted as
"better" states being assigned higher values by a good evaluation
function than "worse" states.
• In alpha beta-pruning an evaluation function can provide guidance to
expand most promising successor nodes first and therefore increasing
the amount of pruning you get.

You might also like