Reinforcement Learning Workflows For Ai

Reinforcement Learning Workflows for AI
Key Takeaways
▪ What is reinforcement learning and why should I care about it?
▪ How do I set up and solve a reinforcement learning problem?
▪ What are some common challenges?
2
Why Should You Care About Reinforcement Learning?
3
One Approach Could Be…
Joint trajectory generation Low-level control

State estimation
(e.g. MPC) (e.g. PID)
Measurements Motor torques
4
Any Alternatives?
Measurements Black Box Motor torques

Controller
5
Applications of Reinforcement Learning
6
What is reinforcement learning?
Type of machine learning that trains an ‘agent’ through trial & error
interactions with an environment
7
Reinforcement Learning vs Machine Learning vs Deep Learning
8
9
10
11
Deep Learning
What about deep learning?
Complex reinforcement learning problems typically need deep neural networks

[Deep Reinforcement Learning]
12
How does reinforcement learning training work?
Analogies with pet training

Let’s try
this…
AGENT
OBSERVATIONS ACTIONS
Policy
Policy update
Reinforcement
Yay!
Learning
Here is
Algorithm
your treat!
Fetch!
REWARD
ENVIRONMENT
13
Reinforcement Learning Concepts
Training a self-driving car
▪ Vehicle’s computer…
AGENT (agent)
OBSERVATIONS ACTIONS ▪ is reading sensor measurements from LIDAR, cameras,…
Policy (observations)
Policy update ▪ that represent road conditions, vehicle position,…
Reinforcement (environment)
Learning ▪ and generates steering, braking, throttle commands,…
Algorithm
(action)
▪ based on an internal state-to-action mapping…
REWARD
(policy)
▪ that tries to optimize, e.g., lap time & fuel efficiency…
(reward).
ENVIRONMENT
▪ The policy is updated through repeated trial-and-error by a
reinforcement learning algorithm
14
Reinforcement Learning Concepts
Training a self-driving car
After training, only trained

policy is needed ▪ Vehicle’s computer uses the final state-to-action
mapping… (policy)
▪ to generate steering, braking, throttle commands,…
(action)
POLICY
▪ based on sensor readings from LIDAR, cameras,…
(observations)
▪ that represent road conditions, vehicle position,…
(environment).
OBSERVATIONS ACTIONS
By definition, this trained policy is

optimizing lap time & fuel efficiency
ENVIRONMENT
15
Reinforcement Learning vs Controls
Control system Reinforcement learning system
AGENT
+ ERROR
CONTROLLER PLANT
OBSERVATIONS
Policy
ACTION
REFERENCE MANIPULATED Policy update
- VARIABLE Reinforcement
Learning
Algorithm
MEASUREMENT
REWARD
ENVIRONMENT
Adaptation mechanism RL Algorithm

Error/Cost function Reward
Manipulated variable Action
Measurement Observation
Plant Environment
Controller Policy
Reinforcement learning has parallels to control system design
16
Policy Representation and Deep Learning
Representation options
▪ Look-up table
▪ Polynomials
Observations 3 Next action
1 A
1 2 3 4 5
Look-up tables do not scale well
17
Policy Representation and Deep Learning
Representation options
▪ Look-up table
▪ Polynomials
▪ (Deep) neural networks Deep neural network policy
Observations Next action

(camera frame, sensors, …)
Neural networks allow representation of complex policies
18
How do I set up and solve
a reinforcement learning problem?
19
Reinforcement Learning Workflow
▪ Simulation models or real hardware ▪ Deep network? Table? Polynomial?
▪ Virtual models are safer and cheaper

▪ Select training algorithm
▪ Tune hyperparameters
▪ Trained policy is a standalone
function
Environment Reward Policy Agent Training Deployment

representation
▪ Large number of simulations needed

▪ Numerical value that evaluates goodness of policy
▪ Parallel & GPU computing can speed up training
▪ Reward shaping can be challenging
▪ Training could still take hours or days
20
Reinforcement Learning Workflow

representation
21
Reinforcement Learning Toolbox
Introduced in
▪ Built-in and custom reinforcement learning algorithms
▪ Environment modeling in MATLAB and Simulink

– Existing scripts and models can be reused
▪ Deep Learning Toolbox support for representing policies
▪ Training acceleration with Parallel Computing Toolbox and

MATLAB Parallel Server
▪ Deployment of trained policies with GPU Coder and

MATLAB Coder
▪ Reference examples for getting started
22
Example: Walking Robot
Measurements Motor torques
Control objective: Walk

on a straight line
23
Creating the Environment
24
Reward Shaping
25
26
Creating the Agent
27
Training the Agent
28
29
Applications of Reinforcement Learning
30
Autonomous Driving Example
Image (Observation) Environment
RL Agent
Traditional
+
Controller
Steer, throttle,
Start/Goal brake
Car Position (Observation)
Objective: Augment traditional controller with

reinforcement learning to improve lap time
31
RL Agent
Reward
Vehicle Model
Traditional
Controller
32
33
Results
Traditional controller +
Lap time (s) reinforcement learning
Baseline
RL+Baseline
30% performance improvement
34
Reference Applications in Documentation
▪ Controller Design
▪ Robotic Locomotion
▪ Lane Keep Assist
▪ Adaptive Cruise Control
▪ Imitation Learning
35
Pros & Cons of Reinforcement Learning
Pros Cons
▪ No data required before training ▪ Trained policies are hard to verify
(no performance guarantees)
▪ New possibilities with AI for
hard-to-solve problems ▪ Many trials/data points required
(sample inefficient)
▪ Complex end-to-end solutions can – Training with real hardware can be
be developed expensive and dangerous
▪ Uncertain, nonlinear environments ▪ Large number of design parameters

can be used – Reward signal
– Network architectures
– Training Hyperparameters
Simulations are key in Reinforcement Learning
36
How Can MATLAB and Simulink Help?
Challenges
▪ Trained policies are hard to verify ▪ Reuse existing code and models for
(no performance guarantees) environments
▪ Many trials/data points required ▪ Use simulations for policy

(sample inefficient) verification
– Training with real hardware can be – Simulate extreme scenarios
expensive and dangerous
▪ Run simulation trials in parallel to
▪ Large number of design parameters accelerate training
– Reward signal
– Network architectures ▪ Consult Reinforcement Learning
– Training Hyperparameters Toolbox examples
– Iterative tuning with simulations
37
Key Takeaways
▪ What is reinforcement learning and why should I care about it?
▪ How do I set up and solve a reinforcement learning problem?
▪ What are some common challenges?
38
Learn More
▪ Reference examples for controls,

robotics, and autonomous system
applications
▪ Documentation written for

engineers and domain experts
▪ Tech Talk video series on

Reinforcement Learning concepts
▪ Reinforcement Learning ebooks

available at mathworks.com
39

Reinforcement Learning Workflows For Ai

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Reinforcement Learning Workflows For Ai

Uploaded by

Copyright:

Available Formats

Reinforcement Learning Workflows for AI

▪ What is reinforcement learning and why should I care about it?

▪ How do I set up and solve a reinforcement learning problem?

▪ What are some common challenges?

Joint trajectory generation Low-level control

Measurements Motor torques

Measurements Black Box Motor torques

What about deep learning?

Complex reinforcement learning problems typically need deep neural networks

Analogies with pet training

After training, only trained

By definition, this trained policy is

REFERENCE MANIPULATED Policy update

Adaptation mechanism RL Algorithm

Observations 3 Next action

Look-up tables do not scale well

Observations Next action

Neural networks allow representation of complex policies

▪ Virtual models are safer and cheaper

Environment Reward Policy Agent Training Deployment

▪ Large number of simulations needed

Environment Reward Policy Agent Training Deployment

▪ Environment modeling in MATLAB and Simulink

▪ Deep Learning Toolbox support for representing policies

▪ Training acceleration with Parallel Computing Toolbox and

▪ Deployment of trained policies with GPU Coder and

▪ Reference examples for getting started

Environment Reward Policy Agent Training Deployment

Measurements Motor torques

Control objective: Walk

Image (Observation) Environment

Car Position (Observation)

Objective: Augment traditional controller with

30% performance improvement

▪ Lane Keep Assist

▪ Adaptive Cruise Control

▪ Uncertain, nonlinear environments ▪ Large number of design parameters

Simulations are key in Reinforcement Learning

▪ Many trials/data points required ▪ Use simulations for policy

▪ What is reinforcement learning and why should I care about it?

▪ How do I set up and solve a reinforcement learning problem?

▪ What are some common challenges?

▪ Reference examples for controls,

▪ Documentation written for

▪ Tech Talk video series on

▪ Reinforcement Learning ebooks

You might also like