Stockhammer TCP 2019

Reinforcement Learning with MATLAB & Simulink
Christoph Stockhammer MathWorks Application Engineering
© 2019 The MathWorks, Inc.

1
Machine Learning, Deep Learning, and Reinforcement Learning
Machine Learning
Unsupervised
Learning
[No Labeled Data]
Clustering
2
Machine Learning
Unsupervised Supervised Learning

Learning [Labeled Data]
[No Labeled Data]
Clustering Classification Regression
3
Machine Learning

[No Labeled Data] Supervised learning typically involves
manual feature extraction
Clustering Classification Regression
Deep Learning
Deep learning typically does not
involve feature extraction
4
Reinforcement Learning vs Machine Learning
Machine Learning
Reinforcement learning:
Reinforcement
Learning
 Learning a behavior or
[No Labeled Data] [Interaction Data] accomplishing a task through
trial & error
Decision
[interaction]
Clustering Classification Regression Control
Making
 Complex problems typically need
Deep Learning deep models
[Deep Reinforcement Learning]
Reinforcement Learning Toolbox

New in
5
Reinforcement Learning enables the use of Deep Learning for
Controls and Decision Making Applications
Robotics
Autonomous driving
Finance
A.I. Gameplay
6
Why is reinforcement learning appealing?
Teach a robot to follow a straight line using camera data

7
Let’s try to solve this problem the traditional way
Observations
Sensors
Camera Feature State

Controller
Data Extraction Estimation
Motor
Commands
Motor
Leg &
Motor Commands
Balance Trunk
Control
Trajectories
Observations
8
What is the alternative approach?
Observations
Sensors
Camera Feature State

Controller
Data Extraction Estimation
Motor
Commands
How do we
design this?
Sensors
Motor
Commands
Camera
Data
9
A Practical Example of Reinforcement Learning
Training a Robot to Walk
 Robot’s computer learns how to walk…

(agent) AGENT
 using sensor readings from joints, torso,… STATE ACTION
Policy
(state) Policy
 that represent robot’s pose and orientation,… update
(environment) Reinforcement
Learning
 by generating joint torque commands,…
Algorithm
(action)
 based on an internal state-to-action mapping…
(policy) REWARD
 that tries to optimize forward locomotion, …
(reward).
ENVIRONMENT
 The policy is updated through repeated trial-
and-error by a reinforcement learning
algorithm
10
Connections with Controls
Rt = 𝑥 𝑡 𝑇 𝑅𝑥(𝑡) + 𝑢 𝑡 𝑇 𝑄𝑢(𝑡)
At Ot
11
Define Environment to Generate Data
Physical modeling of robot

dynamics and contact
forces using Simscape
Agent
The environment provides 29 observations to the agent.

The observations are: Y (lateral) and Z (vertical) translations of the torso center of mass; X (forward), Y (lateral), and
Z (vertical) translation velocities; yaw, pitch, and roll angles of the torso; yaw, pitch, and roll angular velocities;
angular position and velocity of 3 joints (ankle, knee, hip) on both legs; and previous actions from the agent. The
translation in the Z direction is normalized to a similar range as the other observations.
12
Agent
Reward defines task to
learn
13
*Reward function inspired by: N. Heess et al, "Emergence of Locomotion Behaviours in Rich Environments," Technical Report, ArXiv, 2017. 14
Define Policy and Learning Algorithm
Define agent’s policy and

learning algorithm
Agent
15
Code for Configuring Agent and Training
16
Create Critic Network
17
Create Actor Network
18
Create DDPG Agent
19
Training the Agent
20
Train Robot to Walk and Track Progress
Episode Reward
Episode Number
21
Train Robot to Walk and Track Progress
Episode Reward
Accelerate Training with Parallel Computing
...
Episode Number
22
Deploy Policy to Embedded Device
23
Everything is Great, Right?
24
Reward Function Design Matters
25
26
27
28
29
Simulation and Virtual Models are a Key Aspect of Reinforcement
Learning
 Reinforcement learning needs a lot of data
(sample inefficient)
– Training on hardware can be prohibitively
expensive and dangerous
 Virtual models allow you to simulate conditions

hard to emulate in the real world
– This can help develop a more robust
solution
 Many of you have already developed MATLAB

and Simulink models that can be reused
30
Pros & Cons of Reinforcement Learning
Pros Cons
No need to label data before training A lot of simulation trials required
Complex end-to-end solutions can be developed Reward signal design, network layer structure &
(e.g. camera input→ car steering wheel) hyperparameter tuning can be challenging
No performance guarantees, Training may not

Can be applied to uncertain, nonlinear
converge
environments
Virtual models allow simulations of varying Further training might be necessary after
conditions and training parallelization deployment on real hardware
31
New in
 Built-in and custom algorithms for reinforcement
learning
 Environment modeling in MATLAB and Simulink
 Deep Learning Toolbox support for designing policies
 Training acceleration through GPUs and cloud

resources
 Deployment to embedded devices and production

systems
 Reference examples for getting started

32
Predefined Environments and Many Examples
 MATLAB Environment  Simulink Environment

– 'BasicGridWorld' – 'SimplePendulumModel-Discrete'
– 'CartPole-Discrete' – 'SimplePendulumModel-Continuous'
– 'CartPole-Continuous' – 'CartPoleSimscapeModel-Discrete'
– 'DoubleIntegrator-Discrete' – 'CartPoleSimscapeModel-Continuous'
– 'DoubleIntegrator-Continuous'
– 'SimplePendulumWithImage-Discrete'
– 'SimplePendulumWithImage-Continuous'  Examples
– 'WaterFallGridWorld-Stochastic' – Grid World, MDP
– 'WaterFallGridWorld-Deterministic' – Classical Control Benchmarks
– Automotive
– Robotics
– Custom LQR Agent
33
Extensible Environment Interface
 env = rlFunctionEnv(obsInfo,actInfo,stepfcn,resetfcn)
– obsInfo: Observation Specification
– actInfo: Action Specification
– stepfcn: Function handle for stepping the environment
– resetfcn: Function handle for resetting the environment
 Subclassing from rl.env.MATLABEnvironment

– Custom MATLAB Environments
– Interfacing with 3rd party simulators (e.g. OpenAI Gym)
34
Resources
 Reference examples for controls,

robotics, and autonomous system
applications
 Documentation written for

engineers and domain experts
 Tech Talk video series on

reinforcement learning concepts for
engineers
35
New in
 Built-in and custom algorithms for reinforcement
learning
 Environment modeling in MATLAB and Simulink
 Deep Learning Toolbox support for designing policies
 Training acceleration through GPUs and cloud

resources
 Deployment to embedded devices and production

systems
 Reference examples for getting started

36
Questions?
37

Stockhammer TCP 2019

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stockhammer TCP 2019

Uploaded by

Copyright:

Available Formats

Reinforcement Learning with MATLAB & Simulink

Christoph Stockhammer MathWorks Application Engineering

© 2019 The MathWorks, Inc.

Unsupervised Supervised Learning

Clustering Classification Regression

Unsupervised Supervised Learning

Reinforcement Learning Toolbox

Teach a robot to follow a straight line using camera data

Camera Feature State

Camera Feature State

 Robot’s computer learns how to walk…

Physical modeling of robot

The environment provides 29 observations to the agent.

Define agent’s policy and

Accelerate Training with Parallel Computing

 Virtual models allow you to simulate conditions

 Many of you have already developed MATLAB

No need to label data before training A lot of simulation trials required

No performance guarantees, Training may not

 Environment modeling in MATLAB and Simulink

 Deep Learning Toolbox support for designing policies

 Training acceleration through GPUs and cloud

 Deployment to embedded devices and production

 Reference examples for getting started

 MATLAB Environment  Simulink Environment

 Subclassing from rl.env.MATLABEnvironment

 Reference examples for controls,

 Documentation written for

 Tech Talk video series on

 Environment modeling in MATLAB and Simulink

 Deep Learning Toolbox support for designing policies

 Training acceleration through GPUs and cloud

 Deployment to embedded devices and production

 Reference examples for getting started

You might also like