You are on page 1of 37

Reinforcement Learning with MATLAB & Simulink

Christoph Stockhammer MathWorks Application Engineering

© 2019 The MathWorks, Inc.


1
Machine Learning, Deep Learning, and Reinforcement Learning

Machine Learning

Unsupervised
Learning
[No Labeled Data]

Clustering

2
Machine Learning, Deep Learning, and Reinforcement Learning

Machine Learning

Unsupervised Supervised Learning


Learning [Labeled Data]
[No Labeled Data]

Clustering Classification Regression

3
Machine Learning, Deep Learning, and Reinforcement Learning

Machine Learning

Unsupervised Supervised Learning


Learning [Labeled Data]
[No Labeled Data] Supervised learning typically involves
manual feature extraction
Clustering Classification Regression

Deep Learning
Deep learning typically does not
involve feature extraction

4
Reinforcement Learning vs Machine Learning

Machine Learning

Reinforcement learning:
Unsupervised Supervised Learning
Reinforcement
Learning
 Learning a behavior or
Learning [Labeled Data]
[No Labeled Data] [Interaction Data] accomplishing a task through
trial & error
Decision
[interaction]
Clustering Classification Regression Control
Making
 Complex problems typically need
Deep Learning deep models
[Deep Reinforcement Learning]

Reinforcement Learning Toolbox


New in
5
Reinforcement Learning enables the use of Deep Learning for
Controls and Decision Making Applications

Robotics

Autonomous driving
Finance

A.I. Gameplay
6
Why is reinforcement learning appealing?

Teach a robot to follow a straight line using camera data


7
Let’s try to solve this problem the traditional way

Observations

Sensors

Camera Feature State


Controller
Data Extraction Estimation
Motor
Commands

Motor
Leg &
Motor Commands
Balance Trunk
Control
Trajectories

Observations

8
What is the alternative approach?

Observations

Sensors

Camera Feature State


Controller
Data Extraction Estimation
Motor
Commands
How do we
design this?

Sensors
Motor
Commands
Camera
Data
9
A Practical Example of Reinforcement Learning
Training a Robot to Walk

 Robot’s computer learns how to walk…


(agent) AGENT
 using sensor readings from joints, torso,… STATE ACTION
Policy
(state) Policy
 that represent robot’s pose and orientation,… update
(environment) Reinforcement
Learning
 by generating joint torque commands,…
Algorithm
(action)
 based on an internal state-to-action mapping…
(policy) REWARD
 that tries to optimize forward locomotion, …
(reward).
ENVIRONMENT
 The policy is updated through repeated trial-
and-error by a reinforcement learning
algorithm
10
Connections with Controls

Rt = 𝑥 𝑡 𝑇 𝑅𝑥(𝑡) + 𝑢 𝑡 𝑇 𝑄𝑢(𝑡)

At Ot

11
Define Environment to Generate Data

Physical modeling of robot


dynamics and contact
forces using Simscape

Agent

The environment provides 29 observations to the agent.


The observations are: Y (lateral) and Z (vertical) translations of the torso center of mass; X (forward), Y (lateral), and
Z (vertical) translation velocities; yaw, pitch, and roll angles of the torso; yaw, pitch, and roll angular velocities;
angular position and velocity of 3 joints (ankle, knee, hip) on both legs; and previous actions from the agent. The
translation in the Z direction is normalized to a similar range as the other observations.
12
Define Environment to Generate Data

Agent
Reward defines task to
learn

13
Define Environment to Generate Data

*Reward function inspired by: N. Heess et al, "Emergence of Locomotion Behaviours in Rich Environments," Technical Report, ArXiv, 2017. 14
Define Policy and Learning Algorithm

Define agent’s policy and


learning algorithm

Agent

15
Code for Configuring Agent and Training

16
Create Critic Network

17
Create Actor Network

18
Create DDPG Agent

19
Training the Agent

20
Train Robot to Walk and Track Progress
Episode Reward

Episode Number

21
Train Robot to Walk and Track Progress
Episode Reward

Accelerate Training with Parallel Computing

...
Episode Number

22
Deploy Policy to Embedded Device

23
Everything is Great, Right?

24
Reward Function Design Matters

25
Reward Function Design Matters

26
Reward Function Design Matters

27
Reward Function Design Matters

28
Reward Function Design Matters

29
Simulation and Virtual Models are a Key Aspect of Reinforcement
Learning
 Reinforcement learning needs a lot of data
(sample inefficient)
– Training on hardware can be prohibitively
expensive and dangerous

 Virtual models allow you to simulate conditions


hard to emulate in the real world
– This can help develop a more robust
solution

 Many of you have already developed MATLAB


and Simulink models that can be reused
30
Pros & Cons of Reinforcement Learning
Pros Cons

No need to label data before training A lot of simulation trials required

Complex end-to-end solutions can be developed Reward signal design, network layer structure &
(e.g. camera input→ car steering wheel) hyperparameter tuning can be challenging

No performance guarantees, Training may not


Can be applied to uncertain, nonlinear
converge
environments

Virtual models allow simulations of varying Further training might be necessary after
conditions and training parallelization deployment on real hardware

31
Reinforcement Learning Toolbox
New in
 Built-in and custom algorithms for reinforcement
learning

 Environment modeling in MATLAB and Simulink

 Deep Learning Toolbox support for designing policies

 Training acceleration through GPUs and cloud


resources

 Deployment to embedded devices and production


systems

 Reference examples for getting started


32
Predefined Environments and Many Examples

 MATLAB Environment  Simulink Environment


– 'BasicGridWorld' – 'SimplePendulumModel-Discrete'
– 'CartPole-Discrete' – 'SimplePendulumModel-Continuous'
– 'CartPole-Continuous' – 'CartPoleSimscapeModel-Discrete'
– 'DoubleIntegrator-Discrete' – 'CartPoleSimscapeModel-Continuous'
– 'DoubleIntegrator-Continuous'
– 'SimplePendulumWithImage-Discrete'
– 'SimplePendulumWithImage-Continuous'  Examples
– 'WaterFallGridWorld-Stochastic' – Grid World, MDP
– 'WaterFallGridWorld-Deterministic' – Classical Control Benchmarks
– Automotive
– Robotics
– Custom LQR Agent
33
Extensible Environment Interface

 env = rlFunctionEnv(obsInfo,actInfo,stepfcn,resetfcn)
– obsInfo: Observation Specification
– actInfo: Action Specification
– stepfcn: Function handle for stepping the environment
– resetfcn: Function handle for resetting the environment

 Subclassing from rl.env.MATLABEnvironment


– Custom MATLAB Environments
– Interfacing with 3rd party simulators (e.g. OpenAI Gym)

34
Resources

 Reference examples for controls,


robotics, and autonomous system
applications

 Documentation written for


engineers and domain experts

 Tech Talk video series on


reinforcement learning concepts for
engineers

35
Reinforcement Learning Toolbox
New in
 Built-in and custom algorithms for reinforcement
learning

 Environment modeling in MATLAB and Simulink

 Deep Learning Toolbox support for designing policies

 Training acceleration through GPUs and cloud


resources

 Deployment to embedded devices and production


systems

 Reference examples for getting started


36
Questions?

37

You might also like