# 1

INTRODUCTION

1

2

INTELLIGENT AGENTS

function TABLE -D RIVEN -AGENT( percept ) returns an action persistent: percepts , a sequence, initially empty table , a table of actions, indexed by percept sequences, initially fully speciﬁed append percept to the end of percepts action ← L OOKUP( percepts , table ) return action Figure 2.3 The TABLE -D RIVEN -AGENT program is invoked for each new percept and returns an action each time. It retains the complete percept sequence in memory.

function R EFLEX -VACUUM -AGENT( [location ,status ]) returns an action if status = Dirty then return Suck else if location = A then return Right else if location = B then return Left Figure 2.4 The agent program for a simple reﬂex agent in the two-state vacuum environment. This program implements the agent function tabulated in Figure ??.

function S IMPLE -R EFLEX -AGENT( percept ) returns an action persistent: rules , a set of condition–action rules state ← I NTERPRET-I NPUT( percept ) rule ← RULE -M ATCH(state , rules ) action ← rule .ACTION return action Figure 2.6 A simple reﬂex agent. It acts according to a rule whose condition matches the current state, as deﬁned by the percept.

2

3

function M ODEL -BASED -R EFLEX -AGENT( percept ) returns an action persistent: state , the agent’s current conception of the world state model , a description of how the next state depends on current state and action rules , a set of condition–action rules action , the most recent action, initially none state ← U PDATE -S TATE(state , action , percept , model ) rule ← RULE -M ATCH(state , rules ) action ← rule .ACTION return action Figure 2.8 A model-based reﬂex agent. It keeps track of the current state of the world, using an internal model. It then chooses an action in the same way as the reﬂex agent.

3

SOLVING PROBLEMS BY SEARCHING

function S IMPLE -P ROBLEM -S OLVING -AGENT( percept ) returns an action persistent: seq , an action sequence, initially empty state , some description of the current world state goal , a goal, initially null problem , a problem formulation state ← U PDATE -S TATE(state , percept ) if seq is empty then goal ← F ORMULATE -G OAL(state ) problem ← F ORMULATE -P ROBLEM(state , goal ) seq ← S EARCH( problem ) if seq = failure then return a null action action ← F IRST(seq ) seq ← R EST(seq ) return action Figure 3.1 A simple problem-solving agent. It ﬁrst formulates a goal and a problem, searches for a sequence of actions that would solve the problem, and then executes the actions one at a time. When this is complete, it formulates another goal and starts over.

4

5

function T REE -S EARCH( problem ) returns a solution, or failure initialize the frontier using the initial state of problem loop do if the frontier is empty then return failure choose a leaf node and remove it from the frontier if the node contains a goal state then return the corresponding solution expand the chosen node, adding the resulting nodes to the frontier function G RAPH -S EARCH( problem ) returns a solution, or failure initialize the frontier using the initial state of problem initialize the explored set to be empty loop do if the frontier is empty then return failure choose a leaf node and remove it from the frontier if the node contains a goal state then return the corresponding solution add the node to the explored set expand the chosen node, adding the resulting nodes to the frontier only if not in the frontier or explored set Figure 3.7 An informal description of the general tree-search and graph-search algorithms. The parts of G RAPH -S EARCH marked in bold italic are the additions needed to handle repeated states.

function B READTH -F IRST-S EARCH( problem ) returns a solution, or failure node ← a node with S TATE = problem .I NITIAL -S TATE, PATH -C OST = 0 if problem .G OAL -T EST(node .S TATE) then return S OLUTION(node ) frontier ← a FIFO queue with node as the only element explored ← an empty set loop do if E MPTY ?( frontier ) then return failure node ← P OP( frontier ) /* chooses the shallowest node in frontier */ add node .S TATE to explored for each action in problem .ACTIONS(node .S TATE) do child ← C HILD -N ODE( problem , node , action ) if child .S TATE is not in explored or frontier then if problem .G OAL -T EST(child .S TATE) then return S OLUTION(child ) frontier ← I NSERT(child , frontier ) Figure 3.11 Breadth-ﬁrst search on a graph.

S TATE) do child ← C HILD -N ODE( problem . The data structure for frontier needs to support efﬁcient membership testing. or failure/cutoff if problem . or failure node ← a node with S TATE = problem .S TATE is not in explored or frontier then frontier ← I NSERT(child . limit − 1) if result = cutoﬀ then cutoﬀ occurred ? ← true else if result = failure then return result if cutoﬀ occurred ? then return cutoﬀ else return failure Figure 3. with node as the only element explored ← an empty set loop do if E MPTY ?( frontier ) then return failure node ← P OP( frontier ) /* chooses the lowest-cost node in frontier */ if problem . node .I NITIAL -S TATE. problem . function D EPTH -L IMITED -S EARCH( problem .16 A recursive implementation of depth-limited tree search. Solving Problems by Searching function U NIFORM -C OST-S EARCH( problem ) returns a solution.G OAL -T EST(node . or failure/cutoff return R ECURSIVE -DLS(M AKE -N ODE(problem .S TATE) then return S OLUTION(node ) else if limit = 0 then return cutoﬀ else cutoﬀ occurred ? ← false for each action in problem . action ) result ← R ECURSIVE -DLS(child .13 Uniform-cost search on a graph.S TATE) do child ← C HILD -N ODE( problem . problem . except for the use of a priority queue and the addition of an extra check in case a shorter path to a frontier state is discovered. limit ) returns a solution.S TATE is in frontier with higher PATH -C OST then replace that frontier node with child Figure 3.ACTIONS(node . The algorithm is identical to the general graph search algorithm in Figure ??. so it should combine the capabilities of a priority queue and a hash table.6 Chapter 3.G OAL -T EST(node . limit ) function R ECURSIVE -DLS(node . node . frontier ) else if child .S TATE) then return S OLUTION(node ) add node . action ) if child .I NITIAL -S TATE). problem . limit ) returns a solution. .S TATE to explored for each action in problem .ACTIONS(node . PATH -C OST = 0 frontier ← a priority queue ordered by PATH -C OST.

24 The algorithm for recursive best-ﬁrst search. M AKE -N ODE(problem .ACTIONS(node . f limit ) returns a solution. ∞ for each s in successors do /* update f with value from previous search. node .h .g + s . which repeatedly applies depth-limited search with increasing limits. best .f ← max(s . depth ) if result = cutoff then return result Figure 3. best . or failure for depth = 0 to ∞ do result ← D EPTH -L IMITED -S EARCH( problem . node . It terminates when a solution is found or if the depth-limited search returns failure . .f alternative ← the second-lowest f -value among successors result .f )) loop do best ← the lowest f -value node in successors if best . node . ∞) function RBFS( problem .S TATE) then return S OLUTION(node ) successors ← [ ] for each action in problem . min( f limit .I NITIAL -S TATE). alternative )) if result = failure then return result Figure 3.f ← RBFS( problem . meaning that no solution exists.f > f limit then return failure . or failure and a new f -cost limit if problem .S TATE) do add C HILD -N ODE( problem . or failure return RBFS( problem . if any */ s .G OAL -T EST(node .7 function I TERATIVE -D EEPENING -S EARCH( problem ) returns a solution.17 The iterative deepening search algorithm. action ) into successors if successors is empty then return failure . function R ECURSIVE -B EST-F IRST-S EARCH( problem ) returns a solution. best .

2 The hill-climbing search algorithm. 8 . a mapping from time to “temperature” current ← M AKE -N ODE(problem . schedule ) returns a solution state inputs: problem . a problem schedule .I NITIAL -S TATE) for t = 1 to ∞ do T ← schedule (t ) if T = 0 then return current next ← a randomly selected successor of current ∆E ← next . we would ﬁnd the neighbor with the lowest h. which is the most basic local search technique.4 BEYOND CLASSICAL SEARCH function H ILL -C LIMBING( problem ) returns a state that is a local maximum current ← M AKE -N ODE(problem .VALUE – current . function S IMULATED -A NNEALING( problem .VALUE then return current . in this version. At each step the current node is replaced by the best neighbor.5 The simulated annealing algorithm. but if a heuristic cost estimate h is used.VALUE ≤ current. that means the neighbor with the highest VALUE. Downhill moves are accepted readily early in the annealing schedule and then less often as time goes on. a version of stochastic hill climbing where some downhill moves are allowed. The schedule input determines the value of the temperature T as a function of time.I NITIAL -S TATE) loop do neighbor ← a highest-valued successor of current if neighbor.S TATE current ← neighbor Figure 4.VALUE if ∆E > 0 then current ← next else current ← next only with probability e∆E/T Figure 4.

9 function G ENETIC -A LGORITHM( population . action ). . or enough time has elapsed return the best individual in population .8 A genetic algorithm. 1. or failure O R -S EARCH(problem . [state | path ]) if plan = failure then return [action | plan ] return failure function A ND -S EARCH(states . n )) Figure 4. The algorithm is the same as the one diagrammed in Figure ??. path ) if plan i = failure then return failure return [if s1 then plan 1 else if s2 then plan 2 else . problem . F ITNESS -F N) returns an individual inputs: population . . each mating of two parents produces only one offspring. y . path ) returns a conditional plan . F ITNESS -F N) y ← R ANDOM -S ELECTION( population . not two. y ) if (small random probability) then child ← M UTATE(child ) add child to new population population ← new population until some individual is ﬁt enough. or failure if problem . F ITNESS -F N) child ← R EPRODUCE(x . if sn−1 then plan n−1 else plan n ] Figure 4. path ) returns a conditional plan . a set of individuals F ITNESS -F N.) . function A ND -O R -G RAPH -S EARCH(problem ) returns a conditional plan . [ ]) function O R -S EARCH(state . c ← random number from 1 to n return A PPEND(S UBSTRING(x . problem .G OAL -T EST(state ) then return the empty plan if state is on path then return failure for each action in problem . problem . y ) returns an individual inputs: x . It returns a conditional plan that reaches a goal state in all circumstances. c + 1. c ). a function that measures the ﬁtness of an individual repeat new population ← empty set for i = 1 to S IZE( population ) do x ← R ANDOM -S ELECTION( population . with one variation: in this more popular version. or failure for each si in states do plan i ← O R -S EARCH(si .ACTIONS(state ) do plan ← A ND -S EARCH(R ESULTS(state . problem . according to F ITNESS -F N function R EPRODUCE(x . parent individuals n ← L ENGTH(x ).11 An algorithm for searching AND – OR graphs generated by nondeterministic environments. (The notation [x | l] refers to the list formed by adding object x to the front of list l. S UBSTRING(y . problem .I NITIAL -S TATE.

result [s . indexed by state and action. The agent is applicable only in state spaces in which every action can be “undone” by some other action. for each state. function LRTA*-AGENT(s ′ ) returns an action inputs: s ′ . initially empty untried . which are updated as the agent moves about the state space. s ′ . b . initially null if G OAL -T EST(s ′ ) then return stop if s ′ is a new state (not in H ) then H [s ′ ] ← h(s ′ ) if s is not null result [s . H ) b ∈ ACTIONS (s) a ← an action b in ACTIONS(s ′ ) that minimizes LRTA*-C OST(s ′ .21 An online search agent that uses depth-ﬁrst exploration.24 LRTA*-AGENT selects an action according to the values of neighboring states. b ]. result [s ′ . initially empty s . H ) s ← s′ return a function LRTA*-C OST(s . a table. a ] ← s ′ H [s ] ← min LRTA*-C OST(s . for each state. a ] ← s ′ add s to the front of unbacktracked [s ′ ] if untried [s ′ ] is empty then if unbacktracked [s ′ ] is empty then return stop else a ← an action b such that result [s ′ . a table indexed by state and action. a table that lists. b ]. b ] = P OP(unbacktracked [s ′ ]) else a ← P OP(untried [s ′ ]) s ← s′ return a Figure 4. a table of cost estimates indexed by state. a percept that identiﬁes the current state persistent: result . a table that lists. a . . the previous state and action. a.10 Chapter 4. a percept that identiﬁes the current state persistent: result . initially empty H . the previous state and action. Beyond Classical Search function O NLINE -DFS-AGENT(s ′ ) returns an action inputs: s ′ . the actions not yet tried unbacktracked . b . initially null if G OAL -T EST(s ′ ) then return stop if s ′ is a new state (not in untried ) then untried [s ′ ] ← ACTIONS(s ′ ) if s is not null then result [s . H ) returns a cost estimate if s ′ is undeﬁned then return h(s) else return c(s. a . a . the backtracks not yet tried s . s′ ) + H [s′ ] Figure 4.

all the way to the leaves. the move that leads to the outcome with the best utility. a ))) return v Figure 5. that is. under the assumption that the opponent plays to minimize utility. The functions M AX -VALUE and M IN -VALUE go through the whole game tree. It returns the action corresponding to the best possible move. to determine the backed-up value of a state. a ))) return v function M IN -VALUE(state ) returns a utility value if T ERMINAL -T EST(state ) then return U TILITY(state ) v ←∞ for each a in ACTIONS(state ) do v ← M IN(v . M IN -VALUE(R ESULT(s . M AX -VALUE(R ESULT(s . The notation argmaxa ∈ S f (a) computes the element a of set S that has the maximum value of f (a).3 An algorithm for calculating minimax decisions. 11 .5 ADVERSARIAL SEARCH function M INIMAX -D ECISION(state ) returns an action return arg maxa ∈ ACTIONS(s) M IN -VALUE(R ESULT(state . a )) function M AX -VALUE(state ) returns a utility value if T ERMINAL -T EST(state ) then return U TILITY(state ) v ← −∞ for each a in ACTIONS(state ) do v ← M AX(v .

v ) return v function M IN -VALUE(state . v ) return v Figure 5. α. β )) if v ≤ α then return v β ← M IN(β .a ) . M AX -VALUE(R ESULT(s . M IN -VALUE(R ESULT(s . +∞) return the action in ACTIONS(state ) with value v function M AX -VALUE(state . α. β )) if v ≥ β then return v α ← M AX(α.12 Chapter 5. . Notice that these routines are the same as the M INIMAX functions in Figure ??. Adversarial Search function A LPHA -B ETA -S EARCH(state ) returns an action v ← M AX -VALUE(state . except for the two lines in each of M IN -VALUE and M AX -VALUE that maintain α and β (and the bookkeeping to pass these parameters along). α. α.a ). −∞. β ) returns a utility value if T ERMINAL -T EST(state ) then return U TILITY(state ) v ← +∞ for each a in ACTIONS(state ) do v ← M IN(v .7 The alpha–beta search algorithm. β ) returns a utility value if T ERMINAL -T EST(state ) then return U TILITY(state ) v ← −∞ for each a in ACTIONS(state ) do v ← M AX(v .

The name “AC-3” was used by the algorithm’s inventor (?) because it’s the third version developed in the paper. D.6 CONSTRAINT SATISFACTION PROBLEMS function AC-3( csp ) returns false if an inconsistency is found and true otherwise inputs: csp .{Xj } do add (Xk .y ) to satisfy the constraint between Xi and Xj then delete x from Di revised ← true return revised Figure 6. C ) local variables: queue . After applying AC-3. Xj ) ← R EMOVE -F IRST(queue ) if R EVISE(csp .3 The arc-consistency algorithm AC-3. a binary CSP with components (X.N EIGHBORS . indicating that the CSP cannot be solved. either every arc is arcconsistent. Xj ) returns true iff we revise the domain of Xi revised ← false for each x in Di do if no value y in Dj allows (x . Xi . or some variable has an empty domain. Xi . Xj ) then if size of Di = 0 then return false for each Xk in Xi . initially all the arcs in csp while queue is not empty do (Xi . Xi ) to queue return true function R EVISE( csp . a queue of arcs. 13 .

or failure return BACKTRACK({ }. The initial state may be chosen randomly or by a greedy assignment process that chooses a minimal-conﬂict value for each variable in turn. current .5 A simple backtracking algorithm for constraint satisfaction problems.8 The M IN -C ONFLICTS algorithm for solving CSPs by local search. csp ) returns a solution. max steps ) returns a solution or failure inputs: csp . as desired. var . or failure if assignment is complete then return assignment var ← S ELECT-U NASSIGNED -VARIABLE(csp ) for each value in O RDER -D OMAIN -VALUES(var .VARIABLES value ← the value v for var that minimizes C ONFLICTS(var . Constraint Satisfaction Problems function BACKTRACKING-S EARCH(csp ) returns a solution. . function M IN -C ONFLICTS(csp . The function I NFERENCE can optionally be used to impose arc-. If a value choice leads to failure (noticed either by I NFERENCE or by B ACKTRACK).14 Chapter 6. the number of steps allowed before giving up current ← an initial complete assignment for csp for i = 1 to max steps do if current is a solution for csp then return current var ← a randomly chosen conﬂicted variable from csp . we can implement the generalpurpose heuristics discussed in the text. The algorithm is modeled on the recursive depth-ﬁrst search of Chapter ??. csp ) do if value is consistent with assignment then add {var = value } to assignment inferences ← I NFERENCE(csp . then value assignments (including those made by I NFERENCE) are removed from the current assignment and a new value is tried. or k-consistency. a constraint satisfaction problem max steps . v . csp ) function BACKTRACK(assignment . value ) if inferences = failure then add inferences to assignment result ← BACKTRACK(assignment . assignment . By varying the functions S ELECT-U NASSIGNED -VARIABLE and O RDER -D OMAIN -VALUES. The C ONFLICTS function counts the number of constraints violated by a particular value. given the rest of the current assignment. csp ) set var = value in current return failure Figure 6. csp ) if result = failure then return result remove {var = value } and inferences from assignment return failure Figure 6. path-.

or failure inputs: csp .11 The T REE -CSP-S OLVER algorithm for solving tree-structured CSPs. root ) for j = n down to 2 do M AKE -A RC -C ONSISTENT(PARENT(Xj ). Xj ) if it cannot be made consistent then return failure for i = 1 to n do assignment [Xi ] ← any consistent value from Di if there is no consistent value then return failure return assignment Figure 6.15 function T REE -CSP-S OLVER( csp ) returns a solution. we will ﬁnd it in linear time. D. we will detect a contradiction. . a CSP with components X. C n ← number of variables in X assignment ← an empty assignment root ← any variable in X X ← T OPOLOGICAL S ORT(X . if not. If the CSP has a solution.

initially 0. M AKE -ACTION -S ENTENCE(action . the agent adds the percept to its knowledge base. a knowledge base t .7 LOGICAL AGENTS function KB-AGENT( percept ) returns an action persistent: KB . t )) action ← A SK(KB . asks the knowledge base for the best action. and tells the knowledge base that it has in fact taken that action. a counter.1 A generic knowledge-based agent. 16 . indicating time T ELL(KB . M AKE -P ERCEPT-S ENTENCE( percept . t )) t ←t + 1 return action Figure 7. Given a percept. M AKE -ACTION -Q UERY(t )) T ELL(KB .

the query. { }) function TT-C HECK -A LL(KB . (TT stands for truth table. α. model ) then return PL-T RUE ?(α. The function PL-R ESOLVE returns the set of all possible clauses obtained by resolving its two inputs. always return true else do P ← F IRST(symbols ) rest ← R EST(symbols ) return (TT-C HECK -A LL(KB .9 A simple resolution algorithm for propositional logic. model ∪ {P = true }) and TT-C HECK -A LL(KB . the knowledge base. model ) else return true // when KB is false. function PL-R ESOLUTION(KB . the knowledge base. Cj in clauses do resolvents ← PL-R ESOLVE(Ci . model ) returns true or false if E MPTY ?(symbols ) then if PL-T RUE ?(KB . a sentence in propositional logic clauses ← the set of clauses in the CNF representation of KB ∧ ¬α new ← { } loop do for each pair of clauses Ci . a sentence in propositional logic α. . α) returns true or false inputs: KB . a sentence in propositional logic α. symbols . symbols . returning true or false . α. α) returns true or false inputs: KB . rest . The keyword “and” is used here as a logical operation on its two arguments. rest . a sentence in propositional logic symbols ← a list of the proposition symbols in KB and α return TT-C HECK -A LL(KB . the query.17 function TT-E NTAILS ?(KB . α.) PL-T RUE ? returns true if a sentence holds within a model. model ∪ {P = false })) Figure 7. The variable model represents a partial model—an assignment to some of the symbols. Cj ) if resolvents contains the empty clause then return true new ← new ∪ resolvents if new ⊆ clauses then return false clauses ← clauses ∪ new Figure 7.8 A truth-table enumeration algorithm for deciding propositional entailment. α.

Whenever a new symbol p from the agenda is processed.P REMISE do decrement count [c ] if count [c ] = 0 then add c . all the premises of the implication are known. the knowledge base.12 The forward-chaining algorithm for propositional logic. initially symbols known to be true in KB while agenda is not empty do p ← P OP(agenda ) if p = q then return true if inferred [p ] = false then inferred [p ] ← true for each clause c in KB where p is in c . a symbol that is already in the set of inferred symbols need not be added to the agenda again. Logical Agents function PL-FC-E NTAILS ?(KB . q ) returns true or false inputs: KB . a set of propositional deﬁnite clauses q . . where inferred [s ] is initially false for all symbols agenda ← a queue of symbols.C ONCLUSION to agenda return false Figure 7.) If a count reaches zero. the query. The agenda keeps track of symbols known to be true but not yet “processed. This avoids redundant work and prevents loops caused by implications such as P ⇒ Q and Q ⇒ P. the count is reduced by one for each implication in whose premise p appears (easily identiﬁed in constant time with appropriate indexing. a proposition symbol count ← a table. Finally.” The count table keeps track of how many premises of each implication are as yet unknown. so its conclusion can be added to the agenda. where count [c ] is the number of symbols in c ’s premise inferred ← a table. we need to keep track of which symbols have been processed.18 Chapter 7.

each returns a symbol (or null) and the truth value to assign to that symbol. Many versions of the algorithm exist. model ∪ {P =value }) P . symbols – P . max ﬂips ) returns a satisfying model or failure inputs: clauses . Like TT-E NTAILS ?. value ← F IND -U NIT-C LAUSE(clauses . clauses . number of ﬂips allowed before giving up model ← a random assignment of true /false to the symbols in clauses for i = 1 to max ﬂips do if model satisﬁes clauses then return model clause ← a randomly selected clause from clauses that is false in model with probability p ﬂip the value in model of a randomly selected symbol from clause else ﬂip whichever symbol in clause maximizes the number of satisﬁed clauses return failure Figure 7. model ) if P is non-null then return DPLL(clauses . a set of clauses in propositional logic p . model ∪ {P =value }) P ← F IRST(symbols ). model ) returns true or false if every clause in clauses is true in model then return true if some clause in clauses is false in model then return false P . a sentence in propositional logic clauses ← the set of clauses in the CNF representation of s symbols ← a list of the proposition symbols in s return DPLL(clauses . DPLL operates over partial models. typically around 0.15 The WALK SAT algorithm for checking satisﬁability by randomly ﬂipping the values of variables.19 function DPLL-S ATISFIABLE ?(s ) returns true or false inputs: s . the probability of choosing to do a “random walk” move. The ideas behind F IND -P URE -S YMBOL and F IND -U NIT-C LAUSE are described in the text. model ) if P is non-null then return DPLL(clauses . value ← F IND -P URE -S YMBOL(symbols . rest . { }) function DPLL(clauses . model ∪ {P =false })) Figure 7. function WALK SAT(clauses . symbols – P .5 max ﬂips .14 The DPLL algorithm for checking satisﬁability of a sentence in propositional logic. rest ← R EST(symbols ) return DPLL(clauses . symbols . p . rest . . model ∪ {P =true }) or DPLL(clauses . symbols .

{[1.y ) = false } plan ← P LAN -ROUTE(current . y ] : A SK(KB . t )) t ←t + 1 return action function P LAN -ROUTE(current . an action sequence. Lt x.allowed ) returns an action sequence inputs: current . .20 Chapter 7. indicating time plan . safe ) if plan is empty and A SK(KB . the agent’s current position goals . Glitter t ) = true then plan ← [Grab ] + P LAN -ROUTE(current . It uses a propositional knowledge base to infer the state of the world. safe ) + [Climb ] action ← P OP(plan ) T ELL(KB . Logical Agents function H YBRID -W UMPUS -AGENT( percept ) returns an action inputs: percept .y ) = false } plan ← P LAN -S HOT(current . and a combination of problem-solving search and domain-speciﬁc code to decide what actions to take. goals .17 A hybrid agent program for the wumpus world. y ] : A SK(KB . a set of squares. safe ) + [Climb ] if plan is empty then ′ ′ unvisited ← {[x . try to plan a route to one of them allowed . OK t x. possible wumpus . safe ) if plan is empty then // no choice but to take a risk not unsafe ← {[x . ¬ OK t x. t )) T ELL the KB the temporal “physics” sentences for time t safe ← {[x . initially the atemporal “wumpus physics” t . M AKE -P ERCEPT-S ENTENCE( percept . HaveArrow t ) = true then possible wumpus ← {[x . initially 0. M AKE -ACTION -S ENTENCE(action . unvisited ∩ not unsafe .allowed ) return A*-G RAPH -S EARCH(problem ) Figure 7. a set of squares that can form part of the route problem ← ROUTE -P ROBLEM(current . a counter. unvisited ∩ safe . a knowledge base. y ] : A SK(KB .glitter . safe ) if plan is empty then plan ← P LAN -ROUTE(current . 1]}.y ) = true } if A SK(KB . ¬ Wx. {[1.goals .bump .y ) = false for all t ≤ t} plan ← P LAN -ROUTE(current . a list.scream ] persistent: KB . y ] : A SK(KB . initially empty T ELL(KB . [stench .1]}.breeze .

constitute a description of the problem T max . then a plan is extracted by looking at those proposition symbols that refer to actions and are assigned true in the model. then the process is repeated with the goal moved one step later. If no model exists. If the satisﬁability algorithm ﬁnds a model. goal . T max ) returns solution or failure inputs: init . transition . transition .21 function SAT PLAN( init . The planning problem is translated into a CNF sentence in which the goal is asserted to hold at a ﬁxed time step t and axioms are included for each time step up to t. transition .19 The SATP LAN algorithm. . t ) model ← SAT-S OLVER(cnf ) if model is not null then return E XTRACT-S OLUTION(model ) return failure Figure 7. goal . goal . an upper limit for plan length for t = 0 to T max do cnf ← T RANSLATE -T O -SAT( init .

8 FIRST-ORDER LOGIC 22 .

the substitution built up so far (optional.1 The uniﬁcation algorithm. val . 23 . or compound expression θ. θ) else if O CCUR -C HECK ?(var . a variable.A RGS. y . θ) else if C OMPOUND ?(x ) and C OMPOUND ?(y ) then return U NIFY(x . list. θ)) else return failure function U NIFY-VAR(var . The substitution θ that is the argument to U NIFY is built up along the way and is used to make sure that later comparisons are consistent with bindings that were established earlier. y . x . or compound expression y . list. The algorithm works by comparing the structures of the inputs. θ) returns a substitution if {var /val } ∈ θ then return U NIFY(val .O P. θ) else if {x/val } ∈ θ then return U NIFY(var . y . y . constant. element by element. B ). y . B ).F IRST. defaults to empty) if θ = failure then return failure else if x = y then return θ else if VARIABLE ?(x ) then return U NIFY-VAR(x . In a compound expression such as F (A.9 INFERENCE IN FIRST-ORDER LOGIC function U NIFY(x .A RGS. the O P ﬁeld picks out the function symbol F and the A RGS ﬁeld picks out the argument list (A. x ) then return failure else return add {var /x } to θ Figure 9. y .O P. constant. U NIFY(x .R EST. x . x . θ) returns a substitution to make x and y identical inputs: x .F IRST.R EST. U NIFY(x . θ) else if VARIABLE ?(y ) then return U NIFY-VAR(y . θ)) else if L IST ?(x ) and L IST ?(y ) then return U NIFY(x . a variable.

the query. rhs ) ← S TANDARDIZE -VARIABLES((lhs . an atomic sentence local variables: new . θ) do for each θ′′ in FOL-BC-A ND(KB . . θ)) do yield θ′ generator FOL-BC-A ND(KB . ∧ pn ⇒ q ) ← S TANDARDIZE -VARIABLES(rule ) ′ ′ for each θ such that S UBST(θ. goal ) do (lhs . Inference in First-Order Logic function FOL-FC-A SK(KB . . p1 ∧ . . . rest .3 A conceptually straightforward. . a set of ﬁrst-order deﬁnite clauses α. the new sentences inferred on each iteration repeat until new is empty new ← { } for each rule in KB do ( p1 ∧ . the knowledge base. query ) returns a generator of substitutions return FOL-BC-O R(KB . query . . goal . p1 ∧ . U NIFY(rhs . rhs )) for each θ′ in FOL-BC-A ND(KB . ∧ pn ) ′ ′ for some p1 . ﬁrst ). R EST(goals ) for each θ′ in FOL-BC-O R(KB . . { }) generator FOL-BC-O R(KB . lhs .rest ← F IRST(goals ). goal . . The function S TANDARDIZE -VARIABLES replaces all variables in its arguments with new ones that have not been used before. function FOL-BC-A SK(KB . it adds to KB all the atomic sentences that can be inferred in one step from the implication sentences and the atomic sentences already in KB . On each iteration.6 A simple backward-chaining algorithm for ﬁrst-order knowledge bases. θ) yields a substitution if θ = failure then return else if L ENGTH(goals ) = 0 then yield θ else do ﬁrst . θ′ ) do yield θ′′ Figure 9. . pn in KB q ′ ← S UBST(θ. . . goals . ∧ pn ) = S UBST(θ. q ) if q ′ does not unify with some sentence already in KB or new then add q ′ to new φ ← U NIFY(q ′ . forward-chaining algorithm. α) returns a substitution or false inputs: KB . S UBST(θ. α) if φ is not fail then return φ add new to KB return false Figure 9. θ) yields a substitution for each rule (lhs ⇒ rhs ) in F ETCH -RULES -F OR -G OAL(KB .24 Chapter 9. but very inefﬁcient.

y . y . The procedure C ALL(continuation ) continues execution with the speciﬁed continuation. [a | z ]) then A PPEND(x . az ) then C ALL(continuation ) R ESET-T RAIL(trail ) a .8 Pseudocode representing the result of compiling the Append predicate. N EW-VARIABLE(). continuation ) Figure 9. z ← N EW-VARIABLE(). distinct from all other variables used so far. x . . z .25 procedure A PPEND(ax . The function N EW-VARIABLE returns a new variable. N EW-VARIABLE() if U NIFY(ax . [a | x ]) and U NIFY(az . continuation ) trail ← G LOBAL -T RAIL -P OINTER() if ax = [ ] and U NIFY(y . az .

p. SFO ) ∧ At (P2 . Trunk ) ∧ ¬ At (Flat .10 CLASSICAL PLANNING Init (At (C1 . p) ∧ At (p. p)) Action (Fly (p. Ground ) ∧ At (t . a) ∧ Cargo (c) ∧ Plane (p) ∧ Airport (a) E FFECT: At (c. JFK ) ∧ At (C2 . a). P RECOND : At (obj . 26 . a) ∧ Cargo (c) ∧ Plane (p) ∧ Airport (a) E FFECT: ¬ At (c. loc ) E FFECT: ¬ At (obj . JFK ) ∧ At (P1 . Axle )) Action (LeaveOvernight . to ). a) ∧ At (p. from ) ∧ Plane (p) ∧ Airport (from ) ∧ Airport (to ) E FFECT: ¬ At (p. from ) ∧ At (p. Ground )) Action (PutOn (t . P RECOND : At (p. Axle ) ∧ ¬ At (Flat . Ground ) ∧ ¬ At (Spare . p. Ground ) ∧ ¬ At (Flat . Axle ) ∧ At (Spare . P RECOND : At (c. Axle ). loc ) ∧ At (obj . a) ∧ ¬ In (c. Trunk )) Figure 10. P RECOND : Tire (t) ∧ At (t . to )) Figure 10. P RECOND : In (c. Trunk )) Goal (At (Spare . from . Axle )) Action (Remove (obj . a) ∧ In (c. p)) Action (Unload (c. SFO )) Action (Load (c. P RECOND : E FFECT: ¬ At (Spare .2 The simple spare tire problem. SFO ) ∧ At (C2 . Init (Tire (Flat ) ∧ Tire (Spare ) ∧ At (Flat . loc ). Ground ) ∧ ¬ At (Flat . Axle ) ∧ ¬ At (Spare . JFK ) ∧ Cargo (C1 ) ∧ Cargo (C2 ) ∧ Plane (P1 ) ∧ Plane (P2 ) ∧ Airport (JFK ) ∧ Airport (SFO )) Goal (At (C1 .1 A PDDL description of an air cargo transportation planning problem. a). Axle ) E FFECT: ¬ At (t .

x) ∧ ¬Clear (y )) Action (MoveToTable (b. A). problem ) Figure 10. B )]. nogoods ) if solution = failure then return solution if graph and nogoods have both leveled off then return failure graph ← E XPAND -G RAPH(graph . Move (A. P RECOND : On (b. Move (B. C ). x)) Figure 10.3 A planning problem in the blocks world: building a three-block tower. One solution is the sequence [MoveToTable (C. Table ) ∧ On (B. Table ) ∧ Clear (x) ∧ ¬On (b. Table ) ∧ On (C. G RAPHPLAN calls E XPAND -G RAPH to add a level until either a solution is found by E XTRACT-S OLUTION.7 The “have cake and eat cake too” problem. or no solution is possible.G OAL) nogoods ← an empty hash table for tl = 0 to ∞ do if goals all non-mutex in St of graph then solution ← E XTRACT-S OLUTION(graph . B ) ∧ On (B. x). x. .27 Init (On (A. y ). C )) Action (Move (b. P RECOND : On (b. y ) ∧ Clear (x) ∧ ¬On (b. E FFECT: On (b. Init (Have (Cake )) Goal (Have (Cake ) ∧ Eaten (Cake )) Action (Eat (Cake ) P RECOND : Have (Cake ) E FFECT: ¬ Have (Cake ) ∧ Eaten (Cake )) Action (Bake (Cake ) P RECOND : ¬ Have (Cake ) E FFECT: Have (Cake )) Figure 10. goals . function G RAPHPLAN( problem ) returns solution or failure graph ← I NITIAL -P LANNING -G RAPH( problem ) goals ← C ONJUNCTS(problem . A) ∧ Block (A) ∧ Block (B ) ∧ Block (C ) ∧ Clear (B ) ∧ Clear (C )) Goal (On (A.9 The G RAPHPLAN algorithm. Table . E FFECT: On (b. x) ∧ Clear (b) ∧ Clear (y ) ∧ Block (b) ∧ Block (y ) ∧ (b=x) ∧ (b=y ) ∧ (x=y ). x) ∧ Clear (b) ∧ Block (b) ∧ (b=x). Table . N UM L EVELS(graph ).

D URATION : 60. The notation A ≺ B means that action A must precede action B . D URATION : 30. D URATION : 15. D URATION : 30. U SE :WheelStations (1)) Action (AddWheels2 . Inspectors (2). C ONSUME : LugNuts (20).1 A job-shop scheduling problem for assembling two cars. WheelStations (1). U SE :EngineHoists (1 )) Action (AddEngine2 .11 PLANNING AND ACTING IN THE REAL WORLD Jobs ({AddEngine1 ≺ AddWheels1 ≺ Inspect1 }. U SE :WheelStations (1)) Action (Inspect i . D URATION :10. LugNuts (500)) Action (AddEngine1 . U SE :EngineHoists (1 )) Action (AddWheels1 . with resource constraints. C ONSUME : LugNuts (20). {AddEngine2 ≺ AddWheels2 ≺ Inspect2 }) Resources (EngineHoists (1). 28 . U SE :Inspectors (1)) Figure 11.

29 Reﬁnement (Go (Home . Navigate ([a − 1.. note the recursive nature of the reﬁnements and the use of preconditions. y ])] ) Reﬁnement (Navigate ([a. b].4 Deﬁnitions of possible reﬁnements for two high-level actions: going to San Francisco airport and navigating in the vacuum world. Navigate ([a + 1. . or failure frontier ← a FIFO queue with [Act] as the only element loop do if E MPTY ?( frontier ) then return failure plan ← P OP( frontier ) /* chooses the shallowest plan in frontier */ hla ← the ﬁrst HLA in plan . b]. [x. [x. outcome . S TEPS: [Drive (Home . In the latter case.suﬃx ← the action subsequences before and after hla in plan outcome ← R ESULT(problem . [x. b]. frontier ) Figure 11. SFO ). The initial plan supplied to the algorithm is [Act].I NITIAL -S TATE. SFOLongTermParking ). one for each reﬁnement of the HLA whose preconditions are satisﬁed by the speciﬁed state. b].5 A breadth-ﬁrst implementation of hierarchical forward planning search. b]. Figure 11. Shuttle (SFOLongTermParking . S TEPS: [Taxi (Home . outcome . [x. y ]). y ]). y ])] ) .G OAL then return plan else for each sequence in R EFINEMENTS(hla . hierarchy ) do frontier ← I NSERT(A PPEND( preﬁx . hierarchy ) returns a solution. sequence . The R EFINEMENTS function returns a set of action sequences. b]) S TEPS: [Left . SFO ). suﬃx ). [a − 1. b]) S TEPS: [Right . function H IERARCHICAL -S EARCH( problem . P RECOND :Connected ([a. or null if none preﬁx . SFO )] ) Reﬁnement (Navigate ([a. y ]). P RECOND : a = x ∧ b = y S TEPS: [ ] ) Reﬁnement (Navigate ([a. b]. preﬁx ) if hla is null then /* so plan is primitive and outcome is its result */ if outcome satisﬁes problem . [x. SFO )] ) Reﬁnement (Go (Home . b]. [a + 1.. P RECOND :Connected ([a.

loc ) ∧ At (actor . E FFECT:At (actor . problem . s0 .I NITIAL -S TATE. At top level. Two actors A and B are playing together and can be in one of four locations: LeftBaseline . hierarchy ) do frontier ← I NSERT(A PPEND( preﬁx .I NITIAL -S TATE. B ) ∧ Partner (B. to ). call A NGELIC -S EARCH with [Act ] as the initialPlan . plan . plan ) ∩ problem . sequence . suﬃx ). outcome . Note that each action must include the actor as an argument. RightNet ) ∨ At (a. LeftNet )) Action (Hit (actor . RightBaseline . sf ) returns a solution solution ← an empty plan while plan is not empty do action ← R EMOVE -L AST(plan ) si ← a state in R EACH− (s0 . initialPlan ) returns solution or fail frontier ← a FIFO queue with initialPlan as the only element loop do if E MPTY ?( frontier ) then return fail plan ← P OP( frontier ) /* chooses the shallowest node in frontier */ if R EACH+ (problem .G OAL if guaranteed ={ } and M AKING -P ROGRESS(plan . loc ) E FFECT:Returned (Ball )) Action (Go (actor . frontier ) function D ECOMPOSE(hierarchy . plan ) intersects problem . ﬁnalState ) hla ← some HLA in plan preﬁx .10 The doubles tennis problem. B ) Init (At (A. solution ) sf ← si return solution Figure 11. Ball ). loc ) ∧ to = loc . action ) problem ← a problem with I NITIAL -S TATE = si and G OAL = sf solution ← A PPEND(A NGELIC -S EARCH(problem . A) Goal (Returned (Ball ) ∧ (At (a . hierarchy . .suﬃx ← the action subsequences before and after hla in plan for each sequence in R EFINEMENTS(hla .I NITIAL -S TATE.G OAL then if plan is primitive then return plan /* R EACH+ is exact for primitive plans */ guaranteed ← R EACH− (problem . RightNet ) ∧ Approaching (Ball . LeftNet . hierarchy . RightBaseline )) ∧ Partner (A.30 Chapter 11. and RightNet . loc )) Figure 11. P RECOND :At (actor . plan .8 A hierarchical planning algorithm that uses angelic semantics to identify and commit to high-level plans that work while avoiding high-level plans that don’t. Planning and Acting in the Real World function A NGELIC -S EARCH( problem . The predicate M AKING -P ROGRESS checks to make sure that we aren’t stuck in an inﬁnite regression of reﬁnements. initialPlan ) then ﬁnalState ← any element of guaranteed return D ECOMPOSE(hierarchy . P RECOND :Approaching (Ball . The ball can be returned only if a player is in the right place. action ). LeftBaseline ) ∧ At (B. plan ) such that sf ∈R EACH− (si . Actors (A. to ) ∧ ¬ At (actor .

12 KNOWLEDGE REPRESENTATION 31 .

the agent’s action update belief state based on action and percept calculate outcome probabilities for actions. given action descriptions and current belief state select action with highest expected utility given probabilities of outcomes and utility information return action Figure 13.13 QUANTIFYING UNCERTAINTY function DT-AGENT( percept ) returns an action persistent: belief state . probabilistic beliefs about the current state of the world action .1 A decision-theoretic agent that selects rational actions. 32 .

the query variable e. e) else return y P (y | parents(Y )) × E NUMERATE -A LL(R EST(vars ). function E LIMINATION -A SK(X . .14 PROBABILISTIC REASONING function E NUMERATION -A SK(X . bn ) returns a distribution over X inputs: X . e) returns a real number if E MPTY ?(vars ) then return 1. e. a Bayes net with variables {X } ∪ E ∪ Y /* Y = hidden variables */ Q(X ) ← a distribution over X . . 33 . factors ) return N ORMALIZE(P OINTWISE -P RODUCT(factors )) Figure 14. initially empty for each value xi of X do Q(xi ) ← E NUMERATE -A LL(bn . a Bayesian network specifying joint distribution P(X1 . exi ) where exi is e extended with X = xi return N ORMALIZE(Q(X )) function E NUMERATE -A LL(vars .9 The enumeration algorithm for answering queries on Bayesian networks. bn ) returns a distribution over X inputs: X . e. . e)|factors ] if var is a hidden variable then factors ← S UM -O UT(var . Xn ) factors ← [ ] for each var in O RDER(bn .10 The variable elimination algorithm for inference in Bayesian networks. observed values for variables E bn . observed values for variables E bn .0 Y ← F IRST(vars ) if Y has value y in e then returnP P (y | parents(Y )) × E NUMERATE -A LL(R EST(vars ).VARS.VARS) do factors ← [M AKE -FACTOR(var . the query variable e. . ey ) where ey is e extended with Y = y Figure 14.

the query variable e. bn . Xn ) x ← an event with n elements foreach variable Xi in X1 . . . The rejection-sampling algorithm for answering queries given evidence in a Bayesian . function R EJECTION -S AMPLING(X . Probabilistic Reasoning function P RIOR -S AMPLE(bn ) returns an event sampled from the prior speciﬁed by bn inputs: bn . . the total number of samples to be generated local variables: N. . e. . a Bayesian network specifying joint distribution P(X1 .12 A sampling algorithm that generates events from a Bayesian network.34 Chapter 14. . N ) returns an estimate of P(X |e) inputs: X . initially zero for j = 1 to N do x ← P RIOR -S AMPLE(bn ) if x is consistent with e then N[x ] ← N[x ]+1 where x is the value of X in x return N ORMALIZE(N) Figure 14. Each variable is sampled according to the conditional distribution given the values already sampled for the variable’s parents. observed values for variables E bn .13 network. Xn do x[i] ← a random sample from P(Xi | parents(Xi )) return x Figure 14. . a vector of counts for each value of X . . a Bayesian network N .

e. a vector of weighted counts for each value of X . bn . . N ) returns an estimate of P(X |e) local variables: N. e) W[x ] ← W[x ] + w where x is the value of X in x return N ORMALIZE(W) function W EIGHTED -S AMPLE(bn .14 The likelihood-weighting algorithm for inference in Bayesian networks. In W EIGHTED -S AMPLE. bn . Xn ) N . a Bayesian network specifying joint distribution P(X1 . w ← W EIGHTED -S AMPLE(bn . . initially zero for j = 1 to N do x. function G IBBS -A SK(X . initially zero Z. e. observed values for variables E bn . . . this version cycles through the variables. . the nonevidence variables in bn x. x ← an event with n elements initialized from e foreach variable Xi in X1 . each nonevidence variable is sampled according to the conditional distribution given the values already sampled for the variable’s parents. e) returns an event and a weight w ← 1.35 function L IKELIHOOD-W EIGHTING(X . while a weight is accumulated based on the likelihood for each evidence variable.15 The Gibbs sampling algorithm for approximate inference in Bayesian networks. a vector of counts for each value of X . initially copied from e initialize x with random values for the variables in Z for j = 1 to N do for each Zi in Z do set the value of Zi in x by sampling from P(Zi |mb(Zi )) N[x ] ← N[x ] + 1 where x is the value of X in x return N ORMALIZE(N) Figure 14. but choosing variables at random also works. the current state of the network. Xn do if Xi is an evidence variable with value xi in e then w ← w × P (Xi = xi | parents(Xi )) else x[i] ← a random sample from P(Xi | parents(Xi )) return x. . . N ) returns an estimate of P(X |e) inputs: X . . the query variable e. . w Figure 14. the total number of samples to be generated local variables: W.

ev[i]) return sv Figure 15. P(X0 ) local variables: fv. . respectively. . . the prior distribution on the initial state. . ev[i]) for i = t downto 1 do sv[i] ← N ORMALIZE(fv[i] × b) b ← BACKWARD(b. . . a vector of forward messages for steps 0. a vector of smoothed estimates for steps 1. 36 . t b. The F ORWARD and BACKWARD operators are deﬁned by Equations (??) and (??). initially all 1s sv. a vector of evidence values for steps 1. . prior ) returns a vector of probability distributions inputs: ev. a representation of the backward message. . . .15 PROBABILISTIC REASONING OVER TIME function F ORWARD -BACKWARD(ev. t fv[0] ← prior for i = 1 to t do fv[i] ← F ORWARD(fv[i − 1]. t prior .4 The forward–backward algorithm for smoothing: computing posterior probabilities of a sequence of states given a sequence of observations. . .

initially empty local variables: Ot−d . a vector of samples of size N . by Equation (??). diagonal matrices containing the sensor model information add et to the end of et−d:t Ot ← diagonal matrix containing P(et |Xt ) if t > d then f ← F ORWARD(f. the number of samples to be maintained dbn . Each of the sampling operations involves sampling the relevant slice variables in topological order. implemented as an online algorithm that outputs the new smoothed estimate given the observation for a new time step. W ) return S /* step 3 */ Figure 15. dbn ) returns a set of samples for the next time step inputs: e. initially 1 f. hmm . the forward message P(Xt |e1:t ). function PARTICLE -F ILTERING(e. sensor model P(E1 |X1 ) persistent: S .P RIOR B. . initially the identity matrix et−d:t .37 function F IXED -L AG -S MOOTHING(et . The step numbers refer to the description in the text.6 An algorithm for smoothing with a ﬁxed time lag of d steps. a vector of weights of size N for i = 1 to N do S [i ] ← sample from P(X1 | X0 = S [i ]) /* step 1 */ W [i ] ← P(e | X1 = S [i]) /* step 2 */ S ← W EIGHTED -S AMPLE -W ITH -R EPLACEMENT(N . the current time. d ) returns a distribution over Xt−d inputs: et . double-ended list of evidence from t − d to t. S . N . The W EIGHTED -S AMPLE -W ITH -R EPLACEMENT operation can be implemented to run in O(N ) expected time. much as in P RIOR -S AMPLE. initially hmm .17 The particle ﬁltering algorithm implemented as a recursive update operation with state (the set of samples). a DBN with prior P(X0 ). Notice that the ﬁnal output N ORMALIZE(f × B1) is just α f × b. Ot . a hidden Markov model with S × S transition matrix T d . transition model P(X1 |X0 ). the length of the lag for smoothing persistent: t . initially generated from P(X0 ) local variables: W . the d-step backward transformation matrix. the current evidence for time step t hmm . the new incoming evidence N . et ) remove et−d−1 from the beginning of et−d:t Ot−d ← diagonal matrix containing P(et−d |Xt−d ) 1 −1 B ← O− BTOt t −d T else B ← BTOt t ←t + 1 if t > d then return N ORMALIZE(f × B1) else return null Figure 15.

16 MAKING SIMPLE DECISIONS function I NFORMATION -G ATHERING -AGENT( percept ) returns an action persistent: D . a decision network integrate percept into D j ← the value that maximizes VPI (Ej ) / Cost (Ej ) if VPI (Ej ) > Cost (Ej ) return R EQUEST(Ej ) else return the best action from D Figure 16. The agent works by repeatedly selecting the observation with the highest information value.9 Design of a simple information-gathering agent. until the cost of the next observation is greater than its expected beneﬁt. 38 .

discount γ ǫ. transition model P (s′ | s. the maximum change in the utility of any state in an iteration repeat U ← U ′.4 The value iteration algorithm for calculating utilities of states. an MDP with states S . a) U [s′ ] U ′ [s ] ← R(s) + γ max a ∈ A(s) if |U [s ] − U [s]| > δ then δ ← |U ′ [s ] − U [s]| until δ < ǫ(1 − γ )/γ return U Figure 17. ′ s′ 39 . initially zero δ . actions A(s). a). rewards R(s). U ′ . The termination condition is from Equation (??).17 MAKING COMPLEX DECISIONS function VALUE -I TERATION(mdp . δ ← 0 for each state s in S do X P (s′ | s. the maximum error allowed in the utility of any state local variables: U . vectors of utilities for states in S . ǫ) returns a utility function inputs: mdp .

initially random repeat U ← P OLICY-E VALUATION(π .7 The policy iteration algorithm for calculating an optimal policy. sets of plans p with associated utility vectors αp U ′ ← a set containing just the empty plan [ ]. actions A(s). mdp ) unchanged ? ← true for each state s in S do X X P (s′ | s. a POMDP with states S . a plan in U with utility vectors computed according to Equation (??) U ′ ← R EMOVE -D OMINATED -P LANS(U ′ ) until M AX -D IFFERENCE(U . an MDP with states S . a) U [s′ ] unchanged ? ← false until unchanged ? return π Figure 17. with α[ ] (s) = R(s) repeat U ←U′ U ′ ← the set of all plans consisting of an action and. function POMDP-VALUE -I TERATION(pomdp . U ′ ) < ǫ(1 − γ )/γ return U Figure 17. . a) local variables: U . U ′ . a vector of utilities for states in S . a). Making Complex Decisions function P OLICY-I TERATION(mdp ) returns a policy inputs: mdp . transition model P (s′ | s.9 A high-level sketch of the value iteration algorithm for POMDPs. discount γ ǫ. ǫ) returns a utility function inputs: pomdp . a policy vector indexed by state. rewards R(s). the maximum error allowed in the utility of any state local variables: U .40 Chapter 17. transition model P (s′ | s. for each possible next percept. π [s]) U [s′ ] then do P (s′ | s. actions A(s). a) U [s′ ] > if max a ∈ A(s) s′ a ∈ A(s) s′ π [s ] ← argmax X s′ P (s′ | s. initially zero π . sensor model P (e | s). The R EMOVE -D OMINATED -P LANS step and M AX -D IFFERENCE test are typically implemented as linear programs. U .

breaking ties randomly. examples ) add a branch to tree with label (A = vk ) and subtree subtree return tree a Figure 18. The function I MPORTANCE is described in Section ??. parent examples ) returns tree if examples is empty then return P LURALITY-VALUE(parent examples ) else if all examples have the same classiﬁcation then return the classiﬁcation else if attributes is empty then return P LURALITY-VALUE(examples ) else A ← argmaxa ∈ attributes I MPORTANCE(a . attributes − A.18 LEARNING FROM EXAMPLES function D ECISION -T REE -L EARNING(examples .4 The decision-tree learning algorithm.A = vk } subtree ← D ECISION -T REE -L EARNING(exs . examples ) tree ← a new decision tree with root test A for each value vk of A do exs ← {e : e ∈ examples and e. 41 . The function P LURALITY-VALUE selects the most common output value among a set of examples. attributes .

fold errV /k Figure 18. indexed by size . Learner (size .7 An algorithm to select the model that has the lowest error rate on validation data by building models of increasing complexity. examples ) returns two values: average training set error rate. examples ) returns a hypothesis local variables: errT . and choosing the one with best empirical error rate on validation data. training set ) fold errV ← fold errV +E RROR -R ATE(h . k . size . Here errT means error rate on the training data. and errV means error rate on the validation data. average validation set error rate fold errT ← 0. examples ) if errT has converged then do best size ← the value of size with minimum errV [size ] return Learner (best size . examples ) function C ROSS -VALIDATION(Learner . k ) h ← Learner (size . fold errV ← 0 for fold = 1 to k do training set . Learning from Examples function C ROSS -VALIDATION -W RAPPER(Learner . and which is trained on the examples . function D ECISION -L IST-L EARNING(examples ) returns a decision list. errV [size ] ← C ROSS -VALIDATION(Learner . or failure if examples is empty then return the trivial decision list No t ← a test that matches a nonempty subset examples t of examples such that the members of examples t are all positive or all negative if there is no such t then return failure if the examples in examples t are positive then o ← Yes else o ← No return a decision list with initial test t and outcome o and remaining tests given by D ECISION -L IST-L EARNING(examples − examples t ) Figure 18. fold. an array. size . an array.42 Chapter 18. validation set ← PARTITION(examples . fold .10 An algorithm for learning decision lists. k . k . PARTITION(examples. k) splits examples into two subsets: a validation set of size N/k and a training set with all the other examples. examples ) returns a hypothesis whose complexity is set by the parameter size . validation set ) return fold errT /k . indexed by size . . The split is different for each value of fold. storing validation-set error rates for size = 1 to ∞ do errT [size ]. storing training-set error rates errV . training set ) fold errT ← fold errT + E RROR -R ATE(h .

y) in examples do /* Propagate the inputs forward to compute the outputs */ for each node i in the input layer do ai ← xi for ℓ = 2 to L do for each node P j in layer ℓ do in j ← i wi.j ∆[j ] /* Update every weight in network using deltas */ for each weight wi.j ← a small random number for each example (x.23 The back-propagation algorithm for learning in multilayer networks.j ai aj ← g (in j ) /* Propagate deltas backward from output layer to input layer */ for each node j in the output layer do ∆[j ] ← g ′ (in j ) × (yj − aj ) for ℓ = L − 1 to 1 do for each node i in layer P ℓ do ∆[i] ← g ′ (in i ) j wi.j ← wi.j in network do wi. network ) returns a neural network inputs: examples . a set of examples. a vector of errors. activation function g local variables: ∆. . indexed by network node repeat for each weight wi. a multilayer network with L layers. each with input vector x and output vector y network .j + α × ai × ∆[j ] until some stopping criterion is satisﬁed return network Figure 18. weights wi.j .j in network do wi.43 function BACK -P ROP -L EARNING(examples .

initially 1/N h. . a vector of N example weights. (xN . y1 ). z) Figure 18. a vector of K hypotheses z. L.33 The A DA B OOST variant of the boosting method for ensemble learning. . . . w) error ← 0 for j = 1 to N do if h[k ](xj ) = yj then error ← error + w[j ] for j = 1 to N do if h[k ](xj ) = yj then w[j ] ← w[j ] · error /(1 − error ) w ← N ORMALIZE(w) z[k ] ← log (1 − error )/error return W EIGHTED -M AJORITY(h. . yN ) L. a vector of K hypothesis weights for k = 1 to K do h[k ] ← L(examples . The algorithm generates hypotheses by successively reweighting the training examples. Learning from Examples function A DA B OOST(examples . set of N labeled examples (x1 . the number of hypotheses in the ensemble local variables: w.44 Chapter 18. K ) returns a weighted-majority hypothesis inputs: examples . with votes weighted by z. a learning algorithm K . The function W EIGHTED -M AJORITY generates a hypothesis that returns the output value with the highest vote from the hypotheses in h.

19 KNOWLEDGE IN LEARNING function C URRENT-B EST-L EARNING(examples . any hypothesis can be passed in. h ) returns a hypothesis or fail if examples is empty then return h e ← F IRST(examples ) if e is consistent with h then return C URRENT-B EST-L EARNING(R EST(examples ). h ′ ) if h ′′ = fail then return h ′′ return fail Figure 19. It searches for a consistent hypothesis that ﬁts all the examples and backtracks when no consistent specialization/generalization can be found. 45 .2 The current-best-hypothesis learning algorithm. h ) else if e is a false positive for h then for each h ′ in specializations of h consistent with examples seen so far do h ′′ ← C URRENT-B EST-L EARNING(R EST(examples ). h ′ ) if h ′′ = fail then return h ′′ else if e is a false negative for h then for each h ′ in generalizations of h consistent with examples seen so far do h ′′ ← C URRENT-B EST-L EARNING(R EST(examples ). it will be specialized or gneralized as needed. To start the algorithm.

3 The version space learning algorithm.46 Chapter 19.8 An algorithm for ﬁnding a minimal consistent determination. function M INIMAL -C ONSISTENT-D ET(E . A) returns a set of attributes inputs: E . of size n for i = 0 to n do for each subset Ai of A of size i do if C ONSISTENT-D ET ?(Ai . E ) returns a truth value inputs: A. a set of examples local variables: H . Knowledge in Learning function V ERSION -S PACE -L EARNING(examples ) returns a version space local variables: V . e ) return V function V ERSION -S PACE -U PDATE(V . a set of examples A. the version space: the set of all hypotheses V ← the set of all hypotheses for each example e in examples do if V is not empty then V ← V ERSION -S PACE -U PDATE(V . It ﬁnds a subset of V that is consistent with all the examples . a hash table for each example e in E do if some example in H has the same values as e for the attributes A but a different classiﬁcation then return false store the class of e in H . a set of attributes E . E ) then return Ai function C ONSISTENT-D ET ?(A. indexed by the values for attributes A of the example e return true Figure 19. e ) returns an updated version space V ← {h ∈ V : h is consistent with e } Figure 19. a set of attributes. .

initially empty while examples contains positive examples do clause ← N EW-C LAUSE(examples . set of examples target . target ) remove positive examples covered by clause from examples add clause to clauses return clauses function N EW-C LAUSE(examples . extended examples ) append l to the body of clause extended examples ← set of examples created by applying E XTEND -E XAMPLE to each example in extended examples return clause function E XTEND -E XAMPLE(example .12 Sketch of the F OIL algorithm for learning sets of ﬁrst-order Horn clauses from examples. target ) returns a set of Horn clauses inputs: examples . a set of examples with values for new variables extended examples ← examples while extended examples contains negative examples do l ← C HOOSE -L ITERAL(N EW-L ITERALS(clause ). a literal to be added to the clause extended examples .47 function F OIL(examples . set of clauses. a clause with target as head and an empty body l . . a literal for the goal predicate local variables: clauses . target ) returns a Horn clause local variables: clause . literal ) returns a set of examples if example satisﬁes literal then return the set of examples created by extending example with each possible constant value for each new variable in literal else return the empty set Figure 19. N EW-L ITERALS and C HOOSE -L ITERAL are explained in the text.

20 LEARNING PROBABILISTIC MODELS 48 .

a ﬁxed policy mdp . initially null if s ′ is new then U [s ′ ] ← r ′ . a table of outcome frequencies given state–action pairs. a ] is nonzero do P (t | s . a ] U ← P OLICY-E VALUATION(π . the previous state and action. rewards R. π [s ′ ] return a Figure 21. initially empty Nsa .2 A passive reinforcement learning agent based on adaptive dynamic programming. U . s . initially zero Ns ′ |sa .T ERMINAL ? then s . s . R [s ′ ] ← r ′ if s is not null then increment Nsa [s . a ] and Ns ′ |sa [s ′ . 49 . a ← null else s .21 REINFORCEMENT LEARNING function PASSIVE -ADP-AGENT(percept ) returns an action inputs: percept . a ← s ′ . a . a table of utilities. The P OLICY-E VALUATION function solves the ﬁxed-policy Bellman equations. as described on page ??. a table of frequencies for state–action pairs. s . a percept indicating the current state s ′ and reward signal r ′ persistent: π . a ] for each t such that Ns ′ |sa [t . a ] / Nsa [s . mdp ) if s ′ . discount γ U . an MDP with model P . a ) ← Ns ′ |sa [t . initially zero s .

argmaxa′ f (Q [s ′ . a . as described in the text. a ])(r + γ maxa′ Q [s′ . a . initially null if s ′ is new then U [s ′ ] ← r ′ if s is not null then increment N s [s ] U [s ] ← U [s ] + α(Ns [s ])(r + γ U [s ′ ] − U [s ]) ′ if s . a ] ← Q [s . π [s ′ ]. r ← s ′ . a′ ]).4 A passive reinforcement learning agent that learns utility estimates using temporal differences. a ]) s . It uses the same exploration function f as the exploratory ADP agent. a percept indicating the current state s ′ and reward signal r ′ persistent: Q . initially empty Ns . a .T ERMINAL ? then s . initially zero s . a percept indicating the current state s ′ and reward signal r ′ persistent: π .8 An exploratory Q-learning agent.50 Chapter 21. None ] ← r ′ if s is not null then increment Nsa [s . It is an active learner that learns the value Q(s. and reward. a table of frequencies for state–action pairs. a ] Q [s . initially zero Nsa . r ← s ′ . a ﬁxed policy U . a table of frequencies for states. the previous state. initially zero s . Reinforcement Learning function PASSIVE -TD-AGENT(percept ) returns an action inputs: percept . . a ′ ]. a . a ′ ] − Q [s . initially null if T ERMINAL ?(s ) then Q [s . r ← null else s . but avoids having to learn the transition model because the Q-value of a state can be related directly to those of its neighbors. and reward. r . Nsa [s ′ . r ′ return a Figure 21. action. The step-size function α(n) is chosen to ensure convergence. the previous state. a ] + α(Nsa [s . a table of utilities. r . a) of each action in each situation. a table of action values indexed by state and action. action. function Q-L EARNING -AGENT(percept ) returns an action inputs: percept . r ′ return a Figure 21. a .

51 .22 NATURAL LANGUAGE PROCESSING function HITS(query ) returns pages with hub and authority numbers pages ← E XPAND -PAGES(R ELEVANT-PAGES(query )) for each p in pages do p . N ORMALIZE divides each page’s score by the sum of the squares of all pages’ scores (separately for both the authority and hubs scores).H UB ← 1 repeat until convergence do for each p in pages do P p . R ELEVANT-PAGES fetches the pages that match the query.AUTHORITY ← 1 p .AUTHORITY N ORMALIZE(pages ) return pages Figure 22. and E XPAND -PAGES adds in every page that links to or is linked from one of the relevant pages.H UB p .AUTHORITY P ← i I NLINKi (p).H UB ← i O UTLINKi (p).1 The HITS algorithm for computing hubs and authorities with respect to a query.

it ﬁnds the most probable derivation for the whole sequence and for each subsequence. It returns the whole table. start .23 NATURAL LANGUAGE FOR COMMUNICATION function CYK-PARSE(words . len1 ] × P [Z . grammar ) returns P . in which an entry P [X . length ]. Given a sequence of words.4 The CYK algorithm for parsing. start . start + len1 . N ]. N . length ] ← M AX(P [X . start . P [Y . from short to long */ for length = 2 to N do for start = 1 to N − length + 1 do for len1 = 1 to N − 1 do len2 ← length − len1 for each rule of the form (X → Y Z [p ]) do P [X . a table of probabilities N ← L ENGTH(words ) M ← the number of nonterminal symbols in grammar P ← an array of size [M . 1] ← p /* Combine ﬁrst and second parts of right-hand sides of rules. P . If there is no X of that size at that location. start . len ] is the probability of the most probable X of length len starting at position start . the probability is 0. len2 ] × p ) return P Figure 23. 52 . initially all 0 /* Insert lexical rules for each word */ for i = 1 to N do for each rule of form (X → words i [p ]) do P [X . i .

both of which have a missing object. . denoted *-1. Note that in this grammar there is a distinction between an object noun phrase (NP) and a subject noun phrase (NP-SBJ). This tree analyzes the phrase “hear or even see him” as consisting of two constituent VP s.” from the Penn Treebank.5 Annotated tree for the sentence “Her eyes were glazed as if she didn’t hear or even see him. which refers to the NP labeled elsewhere in the tree as [NP-1 him].53 [ [S [NP-SBJ-2 Her eyes] [VP were [VP glazed [NP *-2] [SBAR-ADV as if [S [NP-SBJ she] [VP did n’t [VP [VP hear [NP *-1]] or [VP [ADVP even] see [NP *-1]] [NP-1 him]]]]]]]] . [VP hear [NP *-1]] and [VP [ADVP even] see [NP *-1]].] Figure 23. Note also a grammatical phenomenon we have not covered yet: the movement of a phrase from one part of the tree to another.

24 PERCEPTION 54 .

. a vector of samples of size N local variables: W .9 dent noise. range sensor noise model m.25 ROBOTICS function M ONTE -C ARLO -L OCALIZATION(a . range scan z1 . m ) returns a set of samples for the next time step inputs: a . . 2D map of the environment persistent: S .W ′ ) return S Figure 25. . zM P (X ′ |X. a vector of weights of size N if S is empty then /* initialization phase */ for i = 1 to N do S [i] ← sample from P (X0 ) for i = 1 to N do /* update cycle */ S ′ [i] ← sample from P (X ′ |X = S [i]. m) W ′ [i] ← W ′ [i] · P (zj | z ∗ ) S ← W EIGHTED -S AMPLE -W ITH -R EPLACEMENT(N . P (X ′ |X. X = S ′ [i]. motion model P (z |z ∗ ). . ω ). robot velocities v and ω z . z . a temporary vector of particles of size N W ′ . a vector of weights of size N S ′ . ω ). A Monte Carlo localization algorithm using a range-scan sensor model with indepen- 55 . v. P (z |z ∗ ). v. v. ω ) W ′ [i] ← 1 for j = 1 to M do z ∗ ← R AY C AST(j .S ′ . N .

26 PHILOSOPHICAL FOUNDATIONS 56 .

27 AI: THE PRESENT AND FUTURE 57 .

28 MATHEMATICAL BACKGROUND 58 .

29 NOTES ON LANGUAGES AND ALGORITHMS generator P OWERS -O F -2() yields ints i ←1 while true do yield i i ←2 × i for p in P OWERS -O F -2() do P RINT(p ) Figure 29.1 Example of a generator function and its invocation within a loop. 59 .