Professional Documents
Culture Documents
Stockhammer TCP 2019
Stockhammer TCP 2019
Machine Learning
Unsupervised
Learning
[No Labeled Data]
Clustering
2
Machine Learning, Deep Learning, and Reinforcement Learning
Machine Learning
3
Machine Learning, Deep Learning, and Reinforcement Learning
Machine Learning
Deep Learning
Deep learning typically does not
involve feature extraction
4
Reinforcement Learning vs Machine Learning
Machine Learning
Reinforcement learning:
Unsupervised Supervised Learning
Reinforcement
Learning
Learning a behavior or
Learning [Labeled Data]
[No Labeled Data] [Interaction Data] accomplishing a task through
trial & error
Decision
[interaction]
Clustering Classification Regression Control
Making
Complex problems typically need
Deep Learning deep models
[Deep Reinforcement Learning]
Robotics
Autonomous driving
Finance
A.I. Gameplay
6
Why is reinforcement learning appealing?
Observations
Sensors
Motor
Leg &
Motor Commands
Balance Trunk
Control
Trajectories
Observations
8
What is the alternative approach?
Observations
Sensors
Sensors
Motor
Commands
Camera
Data
9
A Practical Example of Reinforcement Learning
Training a Robot to Walk
Rt = 𝑥 𝑡 𝑇 𝑅𝑥(𝑡) + 𝑢 𝑡 𝑇 𝑄𝑢(𝑡)
At Ot
11
Define Environment to Generate Data
Agent
Agent
Reward defines task to
learn
13
Define Environment to Generate Data
*Reward function inspired by: N. Heess et al, "Emergence of Locomotion Behaviours in Rich Environments," Technical Report, ArXiv, 2017. 14
Define Policy and Learning Algorithm
Agent
15
Code for Configuring Agent and Training
16
Create Critic Network
17
Create Actor Network
18
Create DDPG Agent
19
Training the Agent
20
Train Robot to Walk and Track Progress
Episode Reward
Episode Number
21
Train Robot to Walk and Track Progress
Episode Reward
...
Episode Number
22
Deploy Policy to Embedded Device
23
Everything is Great, Right?
24
Reward Function Design Matters
25
Reward Function Design Matters
26
Reward Function Design Matters
27
Reward Function Design Matters
28
Reward Function Design Matters
29
Simulation and Virtual Models are a Key Aspect of Reinforcement
Learning
Reinforcement learning needs a lot of data
(sample inefficient)
– Training on hardware can be prohibitively
expensive and dangerous
Complex end-to-end solutions can be developed Reward signal design, network layer structure &
(e.g. camera input→ car steering wheel) hyperparameter tuning can be challenging
Virtual models allow simulations of varying Further training might be necessary after
conditions and training parallelization deployment on real hardware
31
Reinforcement Learning Toolbox
New in
Built-in and custom algorithms for reinforcement
learning
env = rlFunctionEnv(obsInfo,actInfo,stepfcn,resetfcn)
– obsInfo: Observation Specification
– actInfo: Action Specification
– stepfcn: Function handle for stepping the environment
– resetfcn: Function handle for resetting the environment
34
Resources
35
Reinforcement Learning Toolbox
New in
Built-in and custom algorithms for reinforcement
learning
37