Professional Documents
Culture Documents
LEARNING
§ Introduction
§ Markov Decision Process
§ Dynamic Programming
2
REINFORCEMENT LEARNING INTRODUCTION
Intelligent agents learning and acting
Sequence of decision, reward
3
Reinforcement learning: What is it?
4
Characteristics of Reinforcement Learning
5
RL Applications
6
Robotics
https://www.youtube.com/watch?v=ZBFwe1gF0FU
7
Gaming
8
RL vs supervised and unsupervised learning
9
Rewards
10
Agent and Environment
12
Fully and Partially Observable Environments
§ Full observation:
q Agent fully observes environment state
14
Policy
15
Value Function
17
Maze Example
Rewards: -1 per time-step
Actions: N, E, S, W
States: Agent's location
Policy
Value function
18 Model
Categorizing Reinforcement Learning Agents
§ Agents Action:
q Value Based: Value function, no policy
q Policy Based: Policy, no value function
q Actor Critic: Both Policy and Value Function
Policy
§ Modelling environment
q Model Free: interacting directly environments
q Model Based: Learn and model environments
Value function
19
Learning and Planning
20
Exploration and Exploitation
22
MARKOV DECISION PROCESS
Markov decision process: Model of finite-state environment
Bellman Equation
Dynamic Programming
23
Markov Decision Process
(Model of the environment)
§ Terminologies:
24
Markov Decision Process
• Maximize return
• Episodic task: consider return over finite horizon (e.g. games, maze).
• Continuing task: consider return over infinite horizon (e.g. juggling, balancing).
25
How we get good decision?
26
Value functions
§ Value of a policy
28
Bellman’s equation
§ Optimal Q-function:
§ Optimal policy:
30
Dynamic Programming (DP)
§ Main idea of Dynamic Programming: turn Bellman eq: 𝑉 " = 𝑅" + 𝛾𝑃" 𝑉 "
Bellman equations to update rules
§ Problem: evaluate a given policy π
§ Iterative policy evaluation: Fix policy
32
DP: Improving a Policy
33
Gridworld example
34
Gridworld example
35
DP:Value Iteration
36
DP: Pros and Cons
37
Visualization and Codes
§ https://cs.stanford.edu/people/karpathy/reinforcejs/index.html
38
Recap on Reinforcement Learning
§ Introduction on RL
q Intelligent agents learning and acting
q Sequence of decision, reward
§ Markov Decision Process
q Model of finite-state environment
q Bellman Equation
q Dynamic Programming
§ Next:
q Online Learning
39
Questions?
THANK YOU!
40