Professional Documents
Culture Documents
INTELLIGENT AGENTS
• An agent is anything that can be viewed as
– Perceiving its environment through sensors
– Acting upon that environment through actuators.
then we say the Nondeterministic environment is deterministic; • Searching in partially observable environments
otherwise, it is nondeterministic. If the environment is partially • Solving partially observable problems
observable, however, then it could appear to be nondeterministic • An agent for partially observable environments
– Search with Nondeterministic Actions
• The erratic vacuum world
• AND–OR search trees
6/10/2023
Hill Climbing
• Keeps track of one current state and on
each iteration moves to the neighboring
state with highest value
– Heads in the direction that provides the
steepest ascent
– Terminates when it reaches a “peak” where
no neighbor has a higher value
– Hill climbing does not look ahead beyond
the immediate neighbors of the current
state
6/10/2023
Contd.
Contd.
• Local maxima: A local maximum is a peak that is higher than
each of its neighboring states but lower than the global • Plateaus: A plateau is a flat area of the state-space
maximum. Hill-climbing algorithms that reach the vicinity of a landscape. It can be a flat local maximum, from which
local maximum will be drawn upward toward the peak but will no uphill exit exists, or a shoulder, from which
then be stuck with nowhere else to go. progress is Shoulder possible.
• Ridges: Ridges result in a sequence of local maxima that is
• A hill-climbing search can get lost wandering on the
very difficult for greedy algorithms to navigate
plateau
• Solution: Keep going when we reach a plateau—to
allow a sideways move in the hope that the
plateau is really a shoulder
• But if we are actually on a flat local maximum, then
this approach will wanderon the plateau forever
Contd.
• Instead of picking the best move, however, it picks a random
move. If the move improves the situation, it is always
accepted.
• probability decreases exponentially with the “badness” of the
move—the amount ∆E by which the evaluation is worsened.
• The algorithm accepts the move with some probability less
than 1
• If the schedule
• lowers T to 0 slowly enough, then a property of the Boltzmann
distribution, e-∆E/T , is that all the probability is concentrated on
the global maxima, which the algorithm will find with probability
approaching 1
6/10/2023
Contd.
• At first sight, a local beam search with k states might Contd.
seem to be nothing more than running k random restarts
in parallel instead of in sequence. • Local beam search can suffer from a lack of diversity among
the k states—they can become clustered in a small region of
• In a random-restart search, each search process runs
the state space, making the search little more than a k-times-
independently of threads. the others
slower version of hill climbing
• In a local beam search, useful information is passed
• A variant called stochastic beam search, analogous - to
among the parallel search threads stochastic hill climbing, helps alleviate this problem.
• In effect, the states that generate the best successors say • Instead of choosing the top k successors, stochastic beam
to the others, “Come over here, the grass is greener!” search chooses successors with probability proportional to the
• The algorithm quickly abandons unfruitful searches successor’s value, thus increasing diversity.
and moves its resources to where the most progress
is being made
recombination • ρ=1(asexual)
• ρ=2 and ρ>2(suitable for computing)
6/10/2023
Contd/
contd • The mutation rate:
– which determines how often offspring have random
• Selection: mutations to their representation. Once an offspring
– The process for selecting the individuals who will become the parents of has been generated, every bit in its composition is
the next generation: one possibility is to select from all individuals with
flipped with probability equal to the mutation rate.
probability proportional to their fitness score. Another possibility is to
randomly select n individuals (n > ρ), and then select the ρ most fit ones as • Elitism :
parents.
– The makeup of the next generation. This can be just
the newly formed offspring, or it can include a few top-
• The recombination procedure:
scoring parents from the previous generation (a
– One common approach (assuming ρ = 2), is to randomly Crossover point
practice called Elitism, which guarantees that overall
select a crossover point to split each of the parent strings, and recombine
the parts to form two children, fitness will never decrease over time).
– The practice of culling, in which all individuals below a
given threshold are discarded, can lead to a speedup
Contd.
• Watson and Crick (1953) identified the
structure of the DNA molecule and its
alphabet, AGTC (adenine, guanine,
thymine, cytosine)
• Variation in Generations occurs both by
point mutations in the letter sequence and
by “crossover” (in which the DNA of an
offspring is generated by combining long
sections of DNA from each parent).
Game Theory
• In this topic we cover competitive
environments, in which two or more
agents have conflicting goals, giving rise
o General Games o Zero-Sum Games
to adversarial search problems. o Agents have independent utilities o Agents have opposite utilities
(values on outcomes) (values on outcomes)
o Cooperation, indifference, o Lets us think of a single value that
one maximizes and the other
competition, and more are all possible
minimizes
o We don’t make AI to act in isolation, it
o Adversarial, pure competition
should a) work around people and b) help
people
o That means that every AI agent needs to
solve a game
6/10/2023
• Go: Human champions are now starting to be challenged • Game theory studies how agents can rationally form beliefs
by machines, though the best humans still beat the best over what other agents will do, and (hence) how agents
machines. In go, b > 300! Classic programs use pattern
knowledge bases, but big recent advances use Monte should act
Carlo (randomized) expansion methods. – Useful for acting as well as predicting behavior of others
zero-sum Game
• “zero-sum” means that what is good for
one player is just as bad for the other:
there is no “win-win” outcome
• For games we often use the term
– move as a synonym for “action”
– position as a synonym for “state.”
– where the vertices are states, the edges are
moves and a state might be reached by
multiple paths
6/10/2023
Contd.
• we can superimpose a search tree over part
of that graph to determine what move to make.
• We define the complete game tree as a
search tree that follows every sequence of
moves all the way to a terminal state.
• The game tree may be infinite if the state
space itself is unbounded or if the rules of the
game allow for infinitely repeating positions
Contd.
• Given a game tree, the optimal strategy can be determined by
working out the minimax value of each state in the tree, which
we write as MINIMAX(s)
• The minimax value is the utility (for MAX) of being in that state,
assuming that both players play optimally from there to the end
of the game
• The minimax value of a terminal state is just its utility
• In a nonterminal state,
• MAX prefers to move to a state of maximum value when it is
MAX’s turn to move, and MIN prefers a state of minimum value
(that is, minimum value for MAX and thus maximum value for
MIN)
Alpha-beta pruning
Alpha–Beta Pruning • Pruning = cutting off parts of the search tree
(because you realize you don’t need to look at
• No algorithm can completely eliminate
them)
the exponential tree
– When we considered A* we also pruned large parts
• we can sometimes cut it in half of the search tree
• computing the correct minimax decision • Maintain alpha = value of the best option for
without examining every state by pruning player 1 along the path so far
• Beta = value of the best option for player 2
along the path so far
6/10/2023
6/10/2023
Contd.
Monte Carlo Tree Search(MCTS)
• The basic MCTS strategy does not use a heuristic
• Two major weaknesses of heuristic evaluation function.
alpha–beta tree search • Instead, the value of a state is estimated as the
– alpha–beta search would be limited to only 4 average utility over a number of simulations of
complete games starting from the state.
or 5 ply because of branching factor
• A simulation (also called a playout or rollout) chooses
– it is difficult to define a good evaluation
moves first for one player, than for the other,
function because material value is not a
repeating until a terminal position is reached
strong indicator and most positions are in flux
• At that point the rules of the game (not fallible
until the endgame
heuristics) determine who has won or lost, and by
what score
Contd.
Contd.
• what is the best move if both players play randomly?”
• Given a playout policy, we next need to decide two
• For some simple games, that happens to be the same
things:
answer as “what is the best move if both players play
– From what positions do we start the playouts
well?,” but for most games it is not.
– How many playouts do we allocate to each position?
• To get useful information from the playout we need a
playout policy that biases the moves towards good ones • Pure Monte Carlo search, is to do N simulations
starting from the current state of the game, and
• For Go and other games, playout policies have been
successfully learned from self-play by using neural track which of the possible moves from the
networks. Sometimes game-specific heuristics are used, current position has the highest win percentage
such as “consider capture moves” in chess or “take the
corner square” in Othello
6/10/2023
Contd.
Contd. • Selection: Starting at the root of the
search tree, we choose a move (guided
• Selection policy that selectively focuses by the selection policy), leading to a
the computational resources on the successor node, and repeat that process,
important parts of the game tree moving down the tree to a leaf.
• Exploration of states that have had few • Best win percentage
playouts, and Exploitation of states that among the moves, selecting
have done well in past playouts, to get a
it is an example of exploitation
more accurate estimate of their value
• Selecting node with less
win percentage is Exploration
Contd.
• Expansion: We grow the
search tree by generating a • Back-propagation: We now use the
result of the simulation to update all the
new child of the selected
search tree nodes going up to the root.
node;
• Since black won the playout, black
• Simulation: We perform a nodes are incremented in both the
number of wins and the number of
playout from the newly
playouts, so 27/35 becomes 28/36 and
generated child node, 60/79 becomes 61/80. Since white lost,
choosing moves for both the white nodes are incremented in the
number of playouts only, so 16/53
players according to the becomes 16/54 and the root 37/100
playout policy. becomes 37/101.
6/10/2023
Contd.
Algorithm • One very effective selection policy is
called “Upper Confidence bounds
applied to trees” or UCT. The policy
ranks each possible move based on an
upper confidence bound formula called
UCB1.
Advantages Disadvantage
• Monte Carlo search has an advantage over alpha– beta for • Monte Carlo search has a disadvantage when it is
games like Go where the branching factor is very high (and likely that a single move can change the course of
thus alpha–beta can’t search deep enough), or when it is
the game, because the stochastic nature of Monte
difficult to define a good evaluation function.
Carlo search means it might fail to consider that
• What alpha– beta does is choose the path to a node that has
move - a vital line of play might not be explored at
the highest achievable evaluation function score, given that
the opponent will be trying to minimize the score. all.
• A miscalculation on a single node can lead alpha–beta to • Inability to detect game states that are “obviously” a
erroneously choose (or avoid) a path to that node. But Monte win for one side or the other (according to human
Carlo search relies on the aggregate of many playouts, and knowledge and to an evaluation function), but where
thus is not as vulnerable to a single error.
it will still take many moves in a playout to verify the
winner.