AWS DEEPRACER (AutoRecovered) (AutoRecovered)

AWS DEEPRACER
Sometime in June or is it July? I applied for the AWS machine Learning Engineer foundation
Nanodegree scholarship on Udacity. I alongside many others, got enrolled into the program. The
program basically took us through the basics of machine learning some of which included Machine
learning using some AWS features like Deep composer, Deep lens and the topic of this blogpost
Deep racer. I was most fascinated with it and enjoyed playing around with it. I will probably touch a
bit about Deep composer and deep lens at the end.
Before AWS Deep Racer was introduced the concept of reinforcement learning has to be
understood. You see the way deep racer works, is built on the concept of reinforcement learning.
Reinforcement learning is type of machine learning in which an agent learns in an environment by
doing actions. These actions can either get a reward if the agent learns correctly, otherwise it gets
penalized. This form of learning makes no use of labelled data.
Let me use an example to further explain this. I am not much of a gamer but from time to time, I play
call of duty and pubg. I remember when I started playing PUBG. PUBG is an online shooting game. It
has different maps that allows you pick a place you want to play in. One of my favourite is sanhok.
Sanhok is an imitation of a rainforest area. Let me get to the point. One thing I did not understand
was the red zone. Why was it there. After a few plays, I started to relise that the longer I stayed in
the red zone, the faster I died. Soon enough, I got to learn that staying away from the red zone
meant not dying by the ed zone. Let’s connect this to reinforcement learning. I am an agent in this
case. An agent explores the environment and acts on it. The map, sanhok, is the environment in
which the agent is acting on. What I do when I am in the red zone can then be said to be my action.
An action is the move taken by an an agent in an environment. When I die or lose a chunk of my life,
that can be viewed as a penalty. When I do not get killed by the red zone, it can be seen as a reward.
Putting it together. An agent is dropped into an environment and then allowed to learn within that
environment based on rewards or penalty assigned to actions it takes in that environment. It is like a
child putting his hand in fire and realising he shouldn’t from the pain.
Getting started on AWS deep racer, you obviously need an account. Also, if you are on free tier, as a
new user, you have 10 free hours towards being able to explore this AWS feature. After which, you
get charged. The ten hours for me were quite an experience tho. Aws also provides tutorials and an
extensive documentation to help you better understand the features.
Talk features and hyperparameters. add link.. aws has a short course to put you througj
GET STARTED
Creating a model first starts with giving your model a name. If you wish, you may add a description.
The agent here is going to be your car. The environment is the selected track you wish to train your
agent on. Aws provides numerous track type you can choose from. The track you choose to race on
can be chosen based on the type of race you wish to participate in. This can be a time trial where
speed is the main focus. It can also be an object avoidance sort of race which means you would
probably want your agent to be able to navigate properly. It can also be the head-to-head race
where you race against vehicle in a two-lane track.
Selecting parameters
Aws has made available two training algorithm that can be selected based on what you want. Aws
has two neural networks available. The PPO (proximal policy optimization)and SAC(soft acrtor
critic) algorithm.
PPO: Uses on policy learning. This means it learns the value function from observations made by
the current policy exploring the enviroment
SAC:uses off policy learning. which means it can use observations made by previous
policies exploring the environment
Alongside the algorithm, it makes uses of hyperparameters. We have the
Gradient descent batch size(32,64,128,256,512),Number of epochs(3-10),Learning rate:

,Entropy,Discount factor,Loss type: HUBER OR MSE
Define action space
An action space can be said to be the actions available for an agent to make as it interacts with an
environment. In the console, two action spaces are provided as options. This is the discrete continuous
action space and the continuous action space. However, for SAC algorithm, only continuous is made
available. Continuous action space allows you define a range of actions for your agent. With discrete
action space, your agent has a limited set of action to pick from when interacting with the environment.
Some of these actions include steering angle and speed. the range of these actions is defined after selecting
your preferred action space. The documentation goes deeper into action spaces.
For example, we can define the following ranges for steering angel and
speed, [-20 degrees, 20 degrees] and [0.75m/s, 4m/s]. This means that the
policy can explore all combinations specified in this range, as opposed to the
discrete action space case where it could only explore three combination
Reward function
If you can recall, A reward is …….
The reward function here is the block of code that enables your agent to learn. AWS has provided
sample codes you can build on. The reward function involves variables like all_wheels_track that
allows you assign a weight that will be rewarded to your agent when it stays on track.
Doc for code: https://docs.aws.amazon.com/deepracer/latest/developerguide/deepracer-reward-

function-input.html#reward-function-input-steering_angle
Train
Evaluate
Your car is automatically submitted to the DeepRacer league once you are done training and
evaluating. You can then see how your car does in the leader board. A car can be reused in another
race and cloned too. Below is a lead board of one of the cars I trained. It gives me the time taken for
my car to reach the finish line. Keep in line that this is a race with other cars on a virtual circuit. It
also gives the time difference between mine and the car at first place. Off track counts the number
of times your car went off the track before finishing the race.
So far, I am enjoying diving into the various service made available by AWS. AWS also have the deep
lens and deep composer to help you further help you understand various Machine Learning
algorithms. Although, I enjoyed the DeepRacer the most, I do find the other two quite interesting
and educative.
AWS DeepComposer is basically a demonstartaions of Geneartive
AWS DeepLens
 closest_waypoints
 closest_objects
 distance_from_center
 heading
 is_crashed
 is_left_of_center
 is_offtrack
 is_reversed
 objects_distance
 objects_heading
 objects_left_of_center
 objects_location
 objects_speed
 progress
 speed
 steering_angle
 steps
 track_length
 track_width
 x, y
 waypoints
training takes time before starting about 6 minutes

37

AWS DEEPRACER (AutoRecovered) (AutoRecovered)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AWS DEEPRACER (AutoRecovered) (AutoRecovered)

Uploaded by

Copyright:

Available Formats

AWS DEEPRACER

Gradient descent batch size(32,64,128,256,512),Number of epochs(3-10),Learning rate:

If you can recall, A reward is …….

Doc for code: https://docs.aws.amazon.com/deepracer/latest/developerguide/deepracer-reward-

AWS DeepComposer is basically a demonstartaions of Geneartive

training takes time before starting about 6 minutes

You might also like