## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

VARIOUS STEPS

**UtilityUtility-Based Agent
**

sensors

?

environment agent actuators

**NonNon-deterministic vs. Probabilistic Uncertainty
**

? ?

a {a,b,c}

b

c

a

b

c

{a(pa),b(pb),c(pc)} decision that maximizes expected utility value Probabilistic model

decision that is best for worst case Non-deterministic model ~ Adversarial search

xn and distribution (p1..pn) E. U is the utility of a state The expected utility of A is EU[A] = 7i=1.Expected Utility Random variable X with n values x1.g.: X is the state reached after doing an action A under uncertainty Function U of X E.g. . .n p(xi|A)U(xi) . .

2 + 50 x 0.7 + 70 x 0.1 = 20 + 35 + 7 = 62 A1 s1 0.One State/One Action Example s0 EU(A1) = 100 x 0.7 50 s3 0.1 70 .2 100 s2 0.

EU(A2)} = 74 A2 A1 s1 0.One State/Two Actions Example s0 EU(AI) = 62 EU(A2) = 74 EU(S0) = max{EU(A1).8 80 .2 100 s2 0.1 70 s4 0.2 50 s3 0.7 0.

1 70 s4 0.7 0.2 100 s2 0.EU(A2)} = 57 A2 A1 -5 -25 s1 0.Introducing Action Costs s0 EU(A1) = 62 ± 5 = 57 EU(A2) = 74 ± 25 = 49 EU(S0) = max{EU(A1).2 50 s3 0.8 80 .

MEU Principle rational agent should choose the action that maximizes agent s expected utility this is the basis of the field of decision theory normative criterion for rational choice of action .

and we are able to solve much more complex decision theoretic problems than ever before .Not quite Must have complete model of: Actions Utilities States Even if you have a complete model. will be computationally intractable In fact. a truly rational agent takes into account the utility of reasoning as well---bounded rationality Nevertheless. great progress has been made in this area recently.

17) . 16) Sequential decision making (ch.We ll look at Decision Theoretic Planning Simple decision making (ch.

Decision Networks Extend BNs to handle actions and utilities Also called Influence diagrams Make use of BN inference Can do Value of Information calculations .

. as in BNs Decision nodes: actions that decision maker can take Utility/value nodes: the utility of the outcome state. Chance nodes: random variables.Decision Networks cont.

R&N example .

Prenatal Testing Example .

0 rain happiness U(~umb.0 P(~umb|~take)=1.4 Take Umbrella umbrella P(umb|take) = 1. rain) = -100 U(umb. ~rain) = 100 U(~umb.~rain) = 0 U(umb.Umbrella Network take/don¶t take P(rain) = 0.rain) = -25 .

using BN inference Calculate the resulting utility for action return the action with the highest utility .Evaluating Decision Networks Set the evidence variables for current state For each possible value of the decision node: Set decision node to that value Calculate the posterior probability of the parent nodes of the utility node.

Umbrella Network take/don¶t take P(rain) = 0.rain | take) happiness U(~umb.4 Take Umbrella umbrella P(umb|take) = 1. rain) = -100 U(umb.~rain) = 0 0 0 1 1 U(umb.0 rain #1 umb rain 0 1 0 1 P(umb. ~rain) = 100 U(~umb.0 P(~umb|~take)=1.rain) = -25 #2: EU(take) .

rain) = -25 #2: EU(~take) .0 rain #1 umb rain 0 1 0 1 P(umb.0 P(~umb|~take)=1.4 Take Umbrella umbrella P(umb|take) = 1.Umbrella Network take/don¶t take P(rain) = 0. ~rain) = 100 U(~umb. rain) = -100 U(umb.rain | ~take) happiness U(~umb.~rain) = 0 0 0 1 1 U(umb.

E d! ma | ) A § U(Re sult ( A))P(Re sult ( A) | E .Value of Information (VOI) suppose agent s current knowledge is E. E ) EU( E | E ) . E d o( A)) i i i the value of information for E is: VOI (E d! ) § P( e k k | E )EU( E ek | e k . The value of the current best action E is EU( E | E ) ! ma A § U(Re sult ( A))P(Re sult ( A) | E i i i o ( A)) the value of the new best action (after new evidence E is obtained): EU( E dE .

2 0.8 0.~rain) = 0 0 1 1 U(umb.4 Take Umbrella umbrella P(umb|take) = 1.3 0.0 rain forecast happiness R 0 F 0 1 0 1 P(F|R) 0.0 P(~umb|~take)=1. ~rain) = 100 U(~umb.Umbrella Network take/don¶t take P(rain) = 0. rain) = -100 U(umb.7 U(~umb.rain) = -25 .

VOI VOI(forecast)= P(rainy)EU(Erainy) + P(~rainy)EU(E~rainy) EU(E) .

~rain) = 100 U(~umb.2 0.8 0.~rain) = 0 0 1 1 U(umb. rain) = -100 U(umb.3 0.4 Umbrella Network take/don¶t take F 0 0 1 1 R 0 1 0 1 P(R|F) 0.8 0.7 P(rain) = 0.4 Take Umbrella umbrella P(umb|take) = 1.3 0.P(F=rainy) = 0.0 rain forecast happiness R 0 F 0 1 0 1 P(F|R) 0.rain) = -25 .7 U(~umb.0 P(~umb|~take)=1.2 0.

rain | take. ~rainy) #1: EU(take|rainy) #3: EU(take|~rainy) umb 0 0 1 1 rain 0 1 0 1 P(umb. rainy) umb 0 0 1 1 rain 0 1 0 1 P(umb.umb 0 0 1 1 rain 0 1 0 1 P(umb.rain | take.rain |~take.rain | ~take. ~rainy) #2: EU(~take|rainy) #4: EU(~take|~rainy) . rainy) umb 0 0 1 1 rain 0 1 0 1 P(umb.

2 0.8 0.rain) = -25 .4 Umbrella Network take/don¶t take F 0 0 1 1 R 0 1 0 1 P(R|F) 0.3 0.8 0. rain) = -100 U(umb.0 rain forecast happiness R 0 F 0 1 0 1 P(F|R) 0. ~rain) = 100 U(~umb.7 P(rain) = 0.P(F=rainy) = 0.3 0.~rain) = 0 0 1 1 U(umb.0 P(~umb|~take)=1.4 Take Umbrella umbrella P(umb|take) = 1.2 0.7 U(~umb.

VOI VOI(forecast)= P(rainy)EU(Erainy) + P(~rainy)EU(E~rainy) EU(E) .

Sequential Decision Making Finite Horizon Infinite Horizon .

Simple Robot Navigation Problem In each state. and L . the possible actions are U. D. R.

Probabilistic Transition Model In each state. then it does not move) .8 the robot moves up one square (if the robot is already in the top row. and L The effect of U is as follows (transition model): With probability 0. R. D. the possible actions are U.

8 the robot moves up one square (if the robot is already in the top row.1 the robot moves right one square (if the robot is already in the rightmost row. and L The effect of U is as follows (transition model): With probability 0. the possible actions are U. D.Probabilistic Transition Model In each state. then it does not move) . R. then it does not move) With probability 0.

then it does not move) . D.Probabilistic Transition Model In each state. the possible actions are U.8 the robot moves up one square (if the robot is already in the top row. and L The effect of U is as follows (transition model): With probability 0.1 the robot moves left one square (if the robot is already in the leftmost row. then it does not move) With probability 0.1 the robot moves right one square (if the robot is already in the rightmost row. R. then it does not move) With probability 0.

not on previous history (how that state was reached) .Markov Property The transition properties depend only on the current state.

R) .2] 3 2 1 1 2 3 4 Planned sequence of actions: (U.Sequence of Actions [3.

2] .3] [4.Sequence of Actions 3 2 1 1 2 3 4 Planned sequence of actions: (U.2] [3. R) U is executed [3.2] [3.

Histories 3 2 1 1 2 [3.2] [3.3] [4.2] [3.1] [4.2] [3.1] [3.2] [4.3] 3 4 Planned sequence of actions: (U.3] [4.2] [3. R) U has been executed R is executed There are 9 possible sequences of states called histories and 6 possible final states for the robot! .

3] | R.R).2] | U.3] | R.3]) x P([3.8 P([4.2]) = 0.2]) P([4.1 P([4.[3.3] | (U.3] | U.Probability of Reaching the Goal Note importance of Markov property 2 in this derivation 1 1 2 3 4 3 P([4.[4.[4.[3.2]) x P([4.8 P([3.[3.3] | (U.3] | R.2]) + P([4.[3.[3.1 P([4.2] | U.[3.65 .2]) = P([4.3] | U.2]) = 0.2]) = 0.3] | R.2]) = 0.3]) = 0.[3.R).[3.

Utility Function 3 2 1 1 2 3 4 +1 -1 [4.2] is a sand area from which the robot cannot escape .3] provides power supply [4.

2] is a sand area from which the robot cannot escape The robot needs to recharge its batteries .3] provides power supply [4.Utility Function 3 2 1 1 2 3 4 +1 -1 [4.

2] is a sand area from which the robot cannot escape The robot needs to recharge its batteries [4.Utility Function 3 2 1 1 2 3 4 +1 -1 [4.3] or [4.2] are terminal states .3] provides power supply [4.

3] provides power supply [4.Utility of a History 3 2 1 1 2 3 4 +1 -1 [4.2] are terminal states The utility of a history is defined by the utility of the last state (+1 or 1) minus n/25.2] is a sand area from which the robot cannot escape The robot needs to recharge its batteries [4. where n is the number of moves .3] or [4.

2] .R) from [3.Utility of an Action Sequence 3 2 1 1 2 3 4 +1 -1 Consider the action sequence (U.

2] [3.R) from [3.3] [4.1] [3. each with some probability .1] [4.2] Consider the action sequence (U.2] [4.Utility of an Action Sequence 3 2 1 1 2 3 4 [3.3] [4.2] [3.2] [3.3] +1 -1 [3.2] A run produces one among 7 possible histories.

R) from [3.2] [3.2] [3.1] [3.3] +1 -1 [3.Utility of an Action Sequence 3 2 1 1 2 3 4 [3.2] A run produces one among 7 possible histories.2] [4.2] [3. each with some probability The utility of the sequence is the expected utility of the histories: U = 7hUh P(h) .2] Consider the action sequence (U.3] [4.3] [4.1] [4.

2] [3.1] [3.2] A run produces one among 7 possible histories.3] +1 -1 [3.2] [3.2] [3.3] [4.R) from [3.3] [4.2] [4.1] [4.Optimal Action Sequence 3 2 1 1 2 3 4 [3.2] Consider the action sequence (U. each with some probability The utility of the sequence is the expected utility of the histories The optimal sequence is the one with maximal utility .

2] A run produces onethe sequence is histories.R) from [3.Optimal Action Sequence 3 2 1 1 2 3 4 [3.3] [4.1] [3.2] [3.2] [4.2] [3.3] +1 -1 [3.2] Consider the action sequence (U.1] [4. each with some only if among 7 possible executed blindly! probability The utility of the sequence is the expected utility of the histories The optimal sequence is the one with maximal utility But is the optimal action sequence what we want to compute? .3] [4.2] [3.

Reactive Agent Algorithm Accessible or Repeat: observable state s sensed state If s is terminal then exit a choose action (given s) Perform a .

Policy (Reactive/Closed-Loop Strategy) (Reactive/Closed3 2 1 1 2 3 4 +1 -1 A policy 4 is a complete mapping from states to actions .

Reactive Agent Algorithm Repeat: s sensed state If s is terminal then exit a 4(s) Perform a .

2] is a dangerous mapping from states to actions state that the optimal policy The optimal policy 4* is the one that always yields a tries to avoid history (ending at a terminal state) with maximal expected utility Makes sense because of Markov property .Optimal Policy 3 2 1 1 2 3 4 +1 -1 A policy 4 is a completeNote that [3.

Optimal Policy 3 2 1 1 2 3 4 +1 -1 This problem states to A policy 4 is a complete mapping from is called a actions Markov Decision Problem (MDP) The optimal policy 4* is the one that always yields a history with maximal expected utility How to compute 4*? .

Additive Utility

History H = (s0,s1, ,sn) The utility of H is additive iff: U(s ,s , ,s ) = R(0) + U(s , ,s ) = 7R(i)

0 1 n 1 n

Reward

Additive Utility

History H = (s0,s1, ,sn) The utility of H is additive iff: U(s ,s , ,s ) = R(0) + U(s , ,s )

0 1 n 1 n

=

7R(i)

Robot navigation example: R(n) = +1 if sn = [4,3] R(n) = -1 if sn = [4,2] R(i) = -1/25 if i = 0, , n-1

**Principle of Max Expected Utility
**

History H = (s0,s1, ,sn) Utility of H: U(s ,s ,

0 1

,sn) =

7R(i)

+1 -1

First-step analysis

U(i) = R(i) + maxa 7kP(k | a.i) U(k)

4*(i) = arg maxa 7kP(k | a.i) U(k)

do: Ut+1(i) R(i) + maxa 7kP(k | a.i) Ut(k) 3 2 1 1 2 3 4 +1 -1 .Value Iteration Initialize the utility of each non-terminal state si to U (i) = 0 0 For t = 0. . 2. 1.

660 +1 -1 0.Value Iteration Note the importance of terminal state and Initialize the utility of each non-terminalstates si to connectivity of the U0(i) = 0 state-transition graph For t = 0.5 0 30 t 2 1 0.388 1 2 3 4 0 10 20 .705 0. 2.i) Ut(k) Ut([3.611 0. do: Ut+1(i) R(i) + maxa 7kP(k | a.762 0.868 0.812 0.1]) 0.918 3 0. .655 0.611 0. 1.

Policy Iteration Pick a policy 4 at random .

Policy Iteration Pick a policy 4 at random Repeat: Compute the utility of each state for 4 Ut+1(i) R(i) + 7kP(k | 4(i).i) Ut(k) .

i) U(k) k .i) Ut(k) Compute the policy 4 given these utilities 4 (i) = arg maxa 7 P(k | a.Policy Iteration Pick a policy 4 at random Repeat: Compute the utility of each state for 4 Ut+1(i) R(i) + 7kP(k | 4(i).

i) Ut(k) Compute the policy 4 given these utilities 4 (i) = arg maxa Or 7 P(k solve the set of linear equations: | a.i) U(k) k If 4 = 4 then return 4a sparse system) (often U(i) = R(i) + 7kP(k | 4(i).i) U(k) .Policy Iteration Pick a policy 4 at random Repeat: Compute the utility of each state for 4 Ut+1(i) R(i) + 7kP(k | 4(i).

Example: Tracking a Target The robot must keep the target in view The target s trajectory is not known in advance robot target .

Example: Tracking a Target .

Example: Tracking a Target .

histories are What if the robot lives forever? potentially unbounded and the same One trick: state can be reached many times 3 2 1 1 2 3 4 +1 -1 Use discounting to make infinite Horizon problem mathematically tractible .Infinite Horizon In many problems.g.. e. the robot navigation example.

and actually does not make sense (is not rational) .POMDP (Partially Observable Markov Decision Problem) A sensing operation returns multiple states. with a probability distribution Choosing the action that maximizes the expected utility of this state distribution assuming state utilities computed as above is not good enough.

Example: Target Tracking There is uncertainty in the robot s and target s positions. and this uncertainty grows with further motion There is a risk that the target escape behind the corner requiring the robot to move appropriately But there is a positioning landmark nearby. Should the robot tries to reduce position uncertainty? .

Summary Decision making under uncertainty Utility function Optimal policy Maximal expected utility Value iteration Policy iteration .

- Problem Paper #4 Solution
- CHAPTER 1 Part 1 Student
- Thinking Backwards
- Timeline of Probability and Statistics
- Project Edmed (1)
- Solutions to stats
- Probability
- Univariate Random Variables
- 5_engineering_probability_and_statistics.pdf
- Probability Distributions
- Discussion Week 1 Pad 520
- Site Analysis; Changing Paradigms
- Dtmc Basic
- Lecture 1
- Probability and Measurement Uncertainty in Physic
- BEPP 305 805 Lecture 1
- PRG95
- workshop_research_method
- Study Unit 2.1 Basic Probability
- Probability
- Manasci Syllabus t3 Ay12-13
- Homework #7
- Statistika Ke-3 Coer2012
- PL Meyer.pdf
- 7. M.a. Economics (Dept.)
- Elective 11 Report
- Mat 2377 Practice Exam Sol 2011
- AIMA_ch13
- Textbook on Land Use
- MANASCI term3AY12-13

- UT Dallas Syllabus for mech3350.001.11f taught by Wooram Park (wxp103020)
- tmp7098.tmp
- Load analysis and Multi body dynamics analysis of connecting rod in single cylinder 4 stroke Engine
- Boba (2001) - Introductory Guide to Crime Analysis and Mapping
- Implementation Of Web Document Clustering Methods For Forensic Analysis
- UT Dallas Syllabus for poec5316.501.09s taught by Timothy Bray (tmb021000)
- UT Dallas Syllabus for gisc6301.501.10f taught by Michael Tiefelsdorf (mrt052000)
- UT Dallas Syllabus for psci5362.001.08s taught by Patrick Brandt (pxb054000)
- UT Dallas Syllabus for mkt6231.051.10s taught by (dxf031000)
- UT Dallas Syllabus for bps6370.501 05s taught by Joseph Picken (jcp016300)
- UT Dallas Syllabus for soc3322.001 05f taught by Meryl Nason (mnason)
- 1746_2010-2014
- UT Dallas Syllabus for rhet1302.002.11f taught by Robin Myrick (rlm093020)
- Commonwealth Secretariat - Arindam Roy
- Participatory Methodology
- Global 14 Game Report (Final PDF)
- Root Cause Analysis using Ishikawa Diagram for Reducing Chain Link Rejection
- UT Dallas Syllabus for entp6370.501 06f taught by Robert Robb (rxr055100)
- UT Dallas Syllabus for bps6370.521.06u taught by Robert Robb (rxr055100)
- tmp8A01.tmp
- tmp722C.tmp
- Art Gallery of Peterborough report
- UT Dallas Syllabus for bps6370.501 05f taught by Joseph Picken (jcp016300)
- Quick Guide to Power Analysis
- UT Dallas Syllabus for acn6347.501.09f taught by Richard Golden (golden)
- UT Dallas Syllabus for opre6301.504 06s taught by Alain Bensoussan (axb046100)
- UT Dallas Syllabus for mis6308.501.08s taught by Jayatirtha Asundi (jxa027000)
- UT Dallas Syllabus for entp6380.0g1.10s taught by (dcb091000)
- UT Dallas Syllabus for poec6342.501.09s taught by Paul Tracy (ptracy)
- UT Dallas Syllabus for gisc7360.001 06f taught by Michael Tiefelsdorf (mrt052000)

Sign up to vote on this title

UsefulNot usefulClose Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Close Dialog## This title now requires a credit

Use one of your book credits to continue reading from where you left off, or restart the preview.

Loading