You are on page 1of 18

6/10/2023

INTELLIGENT AGENTS
• An agent is anything that can be viewed as
– Perceiving its environment through sensors
– Acting upon that environment through actuators.

• A robotic agent might have cameras and infrared


Search in Complex Environments range finders for sensors and various motors for
actuators.
• A software agent
– receives file contents, network packets, and human input
Dr.B.Narendra Kumar Rao (keyboard/mouse/ touchscreen/voice) as sensory inputs
– Acts on the environment by writing files, sending network
packets, and displaying information or generating sounds.

INTELLIGENT AGENTS Agents interacting with environments


• Percept to refer to the content an agent’s sensors are through sensors and actuators
perceiving.

• Percept sequence is the complete history of


everything the agent has ever perceived.

• An Agent’s choice of action depends on


– built-in knowledge and
– Entire percept sequence observed to date,
– but not on anything it hasn’t perceived.

• Agent function that maps any given percept sequence


to an action
6/10/2023

Specifying the Task Environment Properties of Task Environments


• Discrete(4.1):110-119
• Task Environments, which are essentially the – The discrete/continuous distinction applies to the state of the
environment, to the way time is handled, and to the percepts and
“problems” to which rational agents are the
actions of the agent.
“solutions.” – Chess also has a discrete set of percepts and actions.

• Task Environment Eg:Simple vacuum-cleaner – Input from digital cameras is discrete


– Local Search and Optimization Problems
agent, we had to specify the performance • Hill-climbing search

measure, the environment, and the agent’s • Simulated annealing


• Local beam search
actuators and sensors. • Evolutionary algorithms

• PEAS (Performance, Environnent, Actuators,


Sensors) description

Properties of Task Environments Properties of Task Environments


• Continuous(4.2): • Observability(4.4):
– Taxi driving is a continuous-state and continuous-time problem: the
– If an agent’s sensors give it access to the complete state of the
speed and location of the taxi and of the other vehicles sweep
environment at each point in time, then we say that the task
through a range of continuous values and do so smoothly over time.
environment is fully observable. An environment might be partially
Taxi-driving actions are also continuous
observable because of noisy and inaccurate sensors or because
– Local Search in Continuous Spaces parts of the state are simply missing from the sensor data. If the
• Determinism(4.3): agent has no sensors at all then the environment is Unobservable.

– If the next state of the environment is completely Deterministic


– Search in Partially Observable Environments
determined by the current state and the action executed by the agent(s), • Searching with no observation

then we say the Nondeterministic environment is deterministic; • Searching in partially observable environments
otherwise, it is nondeterministic. If the environment is partially • Solving partially observable problems
observable, however, then it could appear to be nondeterministic • An agent for partially observable environments
– Search with Nondeterministic Actions
• The erratic vacuum world
• AND–OR search trees
6/10/2023

Local Search and Optimization Problems Contd.


• Local search algorithms operate by searching • Local search algorithms can also solve optimization
problems, like finding the best state according to an
from a start state to neighboring states,
objective function
without keeping track of the paths, nor the
• Aim of Local Search:
set of states that have been reached.
– Each point (state) in the landscape has an “elevation,”
• Not systematic—they might never explore a defined by the value of the objective function.
portion of the search space where a solution
– If elevation corresponds to an objective function then
actually resides the aim is to find the highest peak—a global
• Advantages maximum(Hill Climbing)
– (1) they use very little Memory – If elevation corresponds to cost, then the aim is to find
– (2) they can often find reasonable solutions in large the lowest valley—a global minimum(Gradient
or infinite state spaces Descent.)

Hill Climbing
• Keeps track of one current state and on
each iteration moves to the neighboring
state with highest value
– Heads in the direction that provides the
steepest ascent
– Terminates when it reaches a “peak” where
no neighbor has a higher value
– Hill climbing does not look ahead beyond
the immediate neighbors of the current
state
6/10/2023

Contd.
Contd.
• Local maxima: A local maximum is a peak that is higher than
each of its neighboring states but lower than the global • Plateaus: A plateau is a flat area of the state-space
maximum. Hill-climbing algorithms that reach the vicinity of a landscape. It can be a flat local maximum, from which
local maximum will be drawn upward toward the peak but will no uphill exit exists, or a shoulder, from which
then be stuck with nowhere else to go. progress is Shoulder possible.
• Ridges: Ridges result in a sequence of local maxima that is
• A hill-climbing search can get lost wandering on the
very difficult for greedy algorithms to navigate
plateau
• Solution: Keep going when we reach a plateau—to
allow a sideways move in the hope that the
plateau is really a shoulder
• But if we are actually on a flat local maximum, then
this approach will wanderon the plateau forever

Variants of Hill Climbing Simulated Annealing


• Stochastic hill climbing chooses at random from among
the uphill moves; the probability of selection can vary • Strategy to combine combine hill climbing with a random
with the steepness of the uphill move. walk in a way that yields both efficiency and
• First-choice hill climbing implements stochastic hill completeness
climbing by generating successors randomly until one is • In metallurgy, annealing is the process used to temper or
generated that is better than the current state. harden metals and glass by heating them to a high
• Random-restart hill climbing, which adopts the adage. temperature and then gradually cooling them, thus
“If at first you don’t succeed, try, try again.” It conducts a allowing the material to reach a low-energy crystalline
series of hill-climbing searches from randomly state
generated initial states, until a goal is found.
• The simulated-annealing solution is to start by shaking
– The success of hill climbing depends very much on the shape of the
state-space landscape: if there are few local maxima and
hard (i.e., at a high temperature)
plateaus, random-restart hill climbing will find a good solution • and then gradually reduce the intensity of the shaking
very quickly
(i.e., lower the temperature)
6/10/2023

Contd.
• Instead of picking the best move, however, it picks a random
move. If the move improves the situation, it is always
accepted.
• probability decreases exponentially with the “badness” of the
move—the amount ∆E by which the evaluation is worsened.
• The algorithm accepts the move with some probability less
than 1
• If the schedule
• lowers T to 0 slowly enough, then a property of the Boltzmann
distribution, e-∆E/T , is that all the probability is concentrated on
the global maxima, which the algorithm will find with probability
approaching 1
6/10/2023

Local beam search algorithm


• The local beam search algorithm keeps track of k states
rather than just one.
• It begins with k randomly generated states.
• At each step, all the successors of all k states are
generated.
– If any one is a goal, the algorithm halts.
– Otherwise, it selects the k best successors from the complete list
and repeats.
• to solve VLSI layout problems
• factory scheduling and other large-scale optimization
tasks.
6/10/2023

Contd.
• At first sight, a local beam search with k states might Contd.
seem to be nothing more than running k random restarts
in parallel instead of in sequence. • Local beam search can suffer from a lack of diversity among
the k states—they can become clustered in a small region of
• In a random-restart search, each search process runs
the state space, making the search little more than a k-times-
independently of threads. the others
slower version of hill climbing
• In a local beam search, useful information is passed
• A variant called stochastic beam search, analogous - to
among the parallel search threads stochastic hill climbing, helps alleviate this problem.
• In effect, the states that generate the best successors say • Instead of choosing the top k successors, stochastic beam
to the others, “Come over here, the grass is greener!” search chooses successors with probability proportional to the
• The algorithm quickly abandons unfruitful searches successor’s value, thus increasing diversity.
and moves its resources to where the most progress
is being made

• Evolutionary algorithms, variations:


Evolutionary algorithms
– Size of the population
• Genetic Algorithms:
– just as DNA is a string over the alphabet Evolution
• There is a population of individuals strategies ACGT
(states), in which the fittest (highest – Evolution strategies, an individual is a sequence of
real numbers, and in genetic programming an
value) individuals produce offspring
individual is a computer program
(successor states) that populate the next – ρ, which is the number of parents that come together
generation, a process called to form offspring –

recombination • ρ=1(asexual)
• ρ=2 and ρ>2(suitable for computing)
6/10/2023

Contd/
contd • The mutation rate:
– which determines how often offspring have random
• Selection: mutations to their representation. Once an offspring
– The process for selecting the individuals who will become the parents of has been generated, every bit in its composition is
the next generation: one possibility is to select from all individuals with
flipped with probability equal to the mutation rate.
probability proportional to their fitness score. Another possibility is to
randomly select n individuals (n > ρ), and then select the ρ most fit ones as • Elitism :
parents.
– The makeup of the next generation. This can be just
the newly formed offspring, or it can include a few top-
• The recombination procedure:
scoring parents from the previous generation (a
– One common approach (assuming ρ = 2), is to randomly Crossover point
practice called Elitism, which guarantees that overall
select a crossover point to split each of the parent strings, and recombine
the parts to form two children, fitness will never decrease over time).
– The practice of culling, in which all individuals below a
given threshold are discarded, can lead to a speedup

Evolution and Search


• The theory of evolution was developed by
Charles Darwin in On the Origin of
Species by Means of Natural Selection
(1859)
• Central idea: variations occur in
reproduction and will be preserved in
successive generations approximately in
proportion to their effect on reproductive
fitness
6/10/2023

Contd.
• Watson and Crick (1953) identified the
structure of the DNA molecule and its
alphabet, AGTC (adenine, guanine,
thymine, cytosine)
• Variation in Generations occurs both by
point mutations in the letter sequence and
by “crossover” (in which the DNA of an
offspring is generated by combining long
sections of DNA from each parent).

Game Theory
• In this topic we cover competitive
environments, in which two or more
agents have conflicting goals, giving rise
o General Games o Zero-Sum Games
to adversarial search problems. o Agents have independent utilities o Agents have opposite utilities
(values on outcomes) (values on outcomes)
o Cooperation, indifference, o Lets us think of a single value that
one maximizes and the other
competition, and more are all possible
minimizes
o We don’t make AI to act in isolation, it
o Adversarial, pure competition
should a) work around people and b) help
people
o That means that every AI agent needs to
solve a game
6/10/2023

Zero-Sum Game Games


Game Theory
• Checkers: 1950: First computer player. 1994: First
computer champion: Chinook ended 40-year-reign of • Game theory studies settings where multiple parties (agents)
human champion Marion Tinsley using complete 8-piece each have
endgame. 2007: Checkers solved!
– different preferences (utility functions),
– different actions that they can take
• Chess: 1997: Deep Blue defeats human champion Gary
Kasparov in a six-game match. Deep Blue examined • Each agent’s utility (potentially) depends on all agents’
200M positions per second, used very sophisticated
evaluation and undisclosed methods for extending some actions
lines of search up to 40 ply. Current programs are even – What is optimal for one agent depends on what other agents do
better, if less historic.
• Very circular!

• Go: Human champions are now starting to be challenged • Game theory studies how agents can rationally form beliefs
by machines, though the best humans still beat the best over what other agents will do, and (hence) how agents
machines. In go, b > 300! Classic programs use pattern
knowledge bases, but big recent advances use Monte should act
Carlo (randomized) expansion methods. – Useful for acting as well as predicting behavior of others

zero-sum Game
• “zero-sum” means that what is good for
one player is just as bad for the other:
there is no “win-win” outcome
• For games we often use the term
– move as a synonym for “action”
– position as a synonym for “state.”
– where the vertices are states, the edges are
moves and a state might be reached by
multiple paths
6/10/2023

Contd.
• we can superimpose a search tree over part
of that graph to determine what move to make.
• We define the complete game tree as a
search tree that follows every sequence of
moves all the way to a terminal state.
• The game tree may be infinite if the state
space itself is unbounded or if the rules of the
game allow for infinitely repeating positions

Optimal Decisions in Games


• MAX wants to find a sequence of actions
leading to a win
• This means that MAX’s strategy must be a
conditional plan—a contingent strategy
specifying a response to each of MIN’s
possible moves.
• For games with multiple outcome scores, we
need a slightly more general algorithm called
minimax search
6/10/2023

Contd.
• Given a game tree, the optimal strategy can be determined by
working out the minimax value of each state in the tree, which
we write as MINIMAX(s)
• The minimax value is the utility (for MAX) of being in that state,
assuming that both players play optimally from there to the end
of the game
• The minimax value of a terminal state is just its utility
• In a nonterminal state,
• MAX prefers to move to a state of maximum value when it is
MAX’s turn to move, and MIN prefers a state of minimum value
(that is, minimum value for MAX and thus maximum value for
MIN)

The minimax search algorithm


6/10/2023

Optimal decisions in multiplayer games

• First, we need to replace the single value for


each node with a vector of values.
• three-player game with players A, B, and C, a
vector (vA,vB,vC) is associated with each
node.
• For terminal states, this vector gives the utility
of the state from each player’s viewpoint
• The simplest way to implement this is to have
the UTILITY function return a vector of utilities

Alpha-beta pruning
Alpha–Beta Pruning • Pruning = cutting off parts of the search tree
(because you realize you don’t need to look at
• No algorithm can completely eliminate
them)
the exponential tree
– When we considered A* we also pruned large parts
• we can sometimes cut it in half of the search tree
• computing the correct minimax decision • Maintain alpha = value of the best option for
without examining every state by pruning player 1 along the path so far
• Beta = value of the best option for player 2
along the path so far
6/10/2023
6/10/2023

• The general principle is this


• Consider a node n somewhere in the tree,
such that Player has a choice of moving to n
• If Player has a better choice either at the
same level (m ′ in Figure) or at any point
higher up in the tree (e.g. m in Figure), then
Player will never move to n

Heuristic Alpha–Beta Tree Benefits of alpha-beta pruning


• Without pruning, need to examine O(bm) nodes
Search
• With pruning, depends on which nodes we
consider first
• If we choose a random successor, need to
examine O(b3m/4) nodes
• If we manage to choose the best successor first,
need to examine O(bm/2) nodes
– Practical heuristics for choosing next successor to
consider get quite close to this
• Can effectively look twice as deep!
– Difference between reasonable and expert play
6/10/2023

Contd.
Monte Carlo Tree Search(MCTS)
• The basic MCTS strategy does not use a heuristic
• Two major weaknesses of heuristic evaluation function.
alpha–beta tree search • Instead, the value of a state is estimated as the
– alpha–beta search would be limited to only 4 average utility over a number of simulations of
complete games starting from the state.
or 5 ply because of branching factor
• A simulation (also called a playout or rollout) chooses
– it is difficult to define a good evaluation
moves first for one player, than for the other,
function because material value is not a
repeating until a terminal position is reached
strong indicator and most positions are in flux
• At that point the rules of the game (not fallible
until the endgame
heuristics) determine who has won or lost, and by
what score

Contd.
Contd.
• what is the best move if both players play randomly?”
• Given a playout policy, we next need to decide two
• For some simple games, that happens to be the same
things:
answer as “what is the best move if both players play
– From what positions do we start the playouts
well?,” but for most games it is not.
– How many playouts do we allocate to each position?
• To get useful information from the playout we need a
playout policy that biases the moves towards good ones • Pure Monte Carlo search, is to do N simulations
starting from the current state of the game, and
• For Go and other games, playout policies have been
successfully learned from self-play by using neural track which of the possible moves from the
networks. Sometimes game-specific heuristics are used, current position has the highest win percentage
such as “consider capture moves” in chess or “take the
corner square” in Othello
6/10/2023

Contd.
Contd. • Selection: Starting at the root of the
search tree, we choose a move (guided
• Selection policy that selectively focuses by the selection policy), leading to a
the computational resources on the successor node, and repeat that process,
important parts of the game tree moving down the tree to a leaf.
• Exploration of states that have had few • Best win percentage
playouts, and Exploitation of states that among the moves, selecting
have done well in past playouts, to get a
it is an example of exploitation
more accurate estimate of their value
• Selecting node with less
win percentage is Exploration

Contd.
• Expansion: We grow the
search tree by generating a • Back-propagation: We now use the
result of the simulation to update all the
new child of the selected
search tree nodes going up to the root.
node;
• Since black won the playout, black
• Simulation: We perform a nodes are incremented in both the
number of wins and the number of
playout from the newly
playouts, so 27/35 becomes 28/36 and
generated child node, 60/79 becomes 61/80. Since white lost,
choosing moves for both the white nodes are incremented in the
number of playouts only, so 16/53
players according to the becomes 16/54 and the root 37/100
playout policy. becomes 37/101.
6/10/2023

Contd.
Algorithm • One very effective selection policy is
called “Upper Confidence bounds
applied to trees” or UCT. The policy
ranks each possible move based on an
upper confidence bound formula called
UCB1.

Advantages Disadvantage
• Monte Carlo search has an advantage over alpha– beta for • Monte Carlo search has a disadvantage when it is
games like Go where the branching factor is very high (and likely that a single move can change the course of
thus alpha–beta can’t search deep enough), or when it is
the game, because the stochastic nature of Monte
difficult to define a good evaluation function.
Carlo search means it might fail to consider that
• What alpha– beta does is choose the path to a node that has
move - a vital line of play might not be explored at
the highest achievable evaluation function score, given that
the opponent will be trying to minimize the score. all.
• A miscalculation on a single node can lead alpha–beta to • Inability to detect game states that are “obviously” a
erroneously choose (or avoid) a path to that node. But Monte win for one side or the other (according to human
Carlo search relies on the aggregate of many playouts, and knowledge and to an evaluation function), but where
thus is not as vulnerable to a single error.
it will still take many moves in a playout to verify the
winner.

You might also like