You are on page 1of 39

Reinforcement Learning Workflows for AI

Key Takeaways

▪ What is reinforcement learning and why should I care about it?

▪ How do I set up and solve a reinforcement learning problem?

▪ What are some common challenges?

2
Why Should You Care About Reinforcement Learning?

3
One Approach Could Be…

Joint trajectory generation Low-level control


State estimation
(e.g. MPC) (e.g. PID)

Measurements Motor torques

4
Any Alternatives?

Measurements Black Box Motor torques


Controller

5
Applications of Reinforcement Learning

6
What is reinforcement learning?
Type of machine learning that trains an ‘agent’ through trial & error
interactions with an environment

7
Reinforcement Learning vs Machine Learning vs Deep Learning

8
Reinforcement Learning vs Machine Learning vs Deep Learning

9
Reinforcement Learning vs Machine Learning vs Deep Learning

10
Reinforcement Learning vs Machine Learning vs Deep Learning

11
Reinforcement Learning vs Machine Learning vs Deep Learning

Deep Learning

What about deep learning?

Complex reinforcement learning problems typically need deep neural networks


[Deep Reinforcement Learning]

12
How does reinforcement learning training work?

Analogies with pet training


Let’s try
this…
AGENT
OBSERVATIONS ACTIONS
Policy

Policy update

Reinforcement
Yay!
Learning
Here is
Algorithm
your treat!
Fetch!

REWARD

ENVIRONMENT
13
Reinforcement Learning Concepts
Training a self-driving car

▪ Vehicle’s computer…
AGENT (agent)
OBSERVATIONS ACTIONS ▪ is reading sensor measurements from LIDAR, cameras,…
Policy (observations)
Policy update ▪ that represent road conditions, vehicle position,…
Reinforcement (environment)
Learning ▪ and generates steering, braking, throttle commands,…
Algorithm
(action)
▪ based on an internal state-to-action mapping…
REWARD
(policy)
▪ that tries to optimize, e.g., lap time & fuel efficiency…
(reward).
ENVIRONMENT
▪ The policy is updated through repeated trial-and-error by a
reinforcement learning algorithm

14
Reinforcement Learning Concepts
Training a self-driving car

After training, only trained


policy is needed ▪ Vehicle’s computer uses the final state-to-action
mapping… (policy)
▪ to generate steering, braking, throttle commands,…
(action)
POLICY
▪ based on sensor readings from LIDAR, cameras,…
(observations)
▪ that represent road conditions, vehicle position,…
(environment).
OBSERVATIONS ACTIONS

By definition, this trained policy is


optimizing lap time & fuel efficiency
ENVIRONMENT

15
Reinforcement Learning vs Controls
Control system Reinforcement learning system
AGENT

+ ERROR
CONTROLLER PLANT
OBSERVATIONS
Policy
ACTION

REFERENCE MANIPULATED Policy update

- VARIABLE Reinforcement
Learning
Algorithm

MEASUREMENT
REWARD

ENVIRONMENT

Adaptation mechanism RL Algorithm


Error/Cost function Reward
Manipulated variable Action
Measurement Observation
Plant Environment
Controller Policy
Reinforcement learning has parallels to control system design

16
Policy Representation and Deep Learning

Representation options
▪ Look-up table
▪ Polynomials

Observations 3 Next action

1 A
1 2 3 4 5

Look-up tables do not scale well

17
Policy Representation and Deep Learning

Representation options
▪ Look-up table
▪ Polynomials
▪ (Deep) neural networks Deep neural network policy

Observations Next action


(camera frame, sensors, …)

Neural networks allow representation of complex policies

18
How do I set up and solve
a reinforcement learning problem?

19
Reinforcement Learning Workflow
▪ Simulation models or real hardware ▪ Deep network? Table? Polynomial?

▪ Virtual models are safer and cheaper


▪ Select training algorithm
▪ Tune hyperparameters
▪ Trained policy is a standalone
function

Environment Reward Policy Agent Training Deployment


representation

▪ Large number of simulations needed


▪ Numerical value that evaluates goodness of policy
▪ Parallel & GPU computing can speed up training
▪ Reward shaping can be challenging
▪ Training could still take hours or days

20
Reinforcement Learning Workflow

Environment Reward Policy Agent Training Deployment


representation

21
Reinforcement Learning Toolbox
Introduced in
▪ Built-in and custom reinforcement learning algorithms

▪ Environment modeling in MATLAB and Simulink


– Existing scripts and models can be reused

▪ Deep Learning Toolbox support for representing policies

▪ Training acceleration with Parallel Computing Toolbox and


MATLAB Parallel Server

▪ Deployment of trained policies with GPU Coder and


MATLAB Coder

▪ Reference examples for getting started

22
Example: Walking Robot

Environment Reward Policy Agent Training Deployment

Measurements Motor torques

Control objective: Walk


on a straight line

23
Creating the Environment

24
Reward Shaping

25
26
Creating the Agent

27
Training the Agent

28
29
Applications of Reinforcement Learning

30
Autonomous Driving Example

Image (Observation) Environment

RL Agent

Traditional
+
Controller
Steer, throttle,
Start/Goal brake

Car Position (Observation)

Objective: Augment traditional controller with


reinforcement learning to improve lap time

31
RL Agent

Reward

Vehicle Model

Traditional
Controller

32
33
Results
Traditional controller +
Lap time (s) reinforcement learning

Baseline

RL+Baseline

30% performance improvement

34
Reference Applications in Documentation

▪ Controller Design

▪ Robotic Locomotion

▪ Lane Keep Assist

▪ Adaptive Cruise Control

▪ Imitation Learning

35
Pros & Cons of Reinforcement Learning

Pros Cons
▪ No data required before training ▪ Trained policies are hard to verify
(no performance guarantees)
▪ New possibilities with AI for
hard-to-solve problems ▪ Many trials/data points required
(sample inefficient)
▪ Complex end-to-end solutions can – Training with real hardware can be
be developed expensive and dangerous

▪ Uncertain, nonlinear environments ▪ Large number of design parameters


can be used – Reward signal
– Network architectures
– Training Hyperparameters

Simulations are key in Reinforcement Learning

36
How Can MATLAB and Simulink Help?

Challenges
▪ Trained policies are hard to verify ▪ Reuse existing code and models for
(no performance guarantees) environments

▪ Many trials/data points required ▪ Use simulations for policy


(sample inefficient) verification
– Training with real hardware can be – Simulate extreme scenarios
expensive and dangerous
▪ Run simulation trials in parallel to
▪ Large number of design parameters accelerate training
– Reward signal
– Network architectures ▪ Consult Reinforcement Learning
– Training Hyperparameters Toolbox examples
– Iterative tuning with simulations

37
Key Takeaways

▪ What is reinforcement learning and why should I care about it?

▪ How do I set up and solve a reinforcement learning problem?

▪ What are some common challenges?

38
Learn More

▪ Reference examples for controls,


robotics, and autonomous system
applications

▪ Documentation written for


engineers and domain experts

▪ Tech Talk video series on


Reinforcement Learning concepts

▪ Reinforcement Learning ebooks


available at mathworks.com

39

You might also like