Professional Documents
Culture Documents
Mitchell, Ch. 13
(see also Barto & Sutton book on-line)
Rationale
• Learning from experience
• Adaptive control
• Examples not explicitly labeled, delayed
feedback
• Problem of credit assignment – which
action(s) led to payoff?
• tradeoff short-term thinking (immediate
reward) for long-term consequences
Agent Model
• Transition function – T:SxA->S, environment
• Reward function R:SxA->real, payoff
• Stochastic but Markov
=