Deep Reinforcement Learning for Drones in 3D Environments

Deep Reinforcement Learning for
Drones in 3D realistic environments

Aqeel Anwar Follow
Oct 23 · 8 min read
A complete code to get you started with implementing Deep Reinforcement Learning in a
realistically looking environment using Unreal Gaming Engine and Python.
Overview:
Last week, I made a GitHub repository public that contains a stand-alone detailed
python code implementing deep reinforcement learning on a drone in a 3D simulated
environment using Unreal Gaming Engine. I decided to cover a detailed documentation
in this article. The 3D environments are made on Epic Unreal Gaming engine, and
Python is used to interface with the environments and carry out Deep reinforcement
learning using TensorFlow.
Drone navigating in a 3D indoor environment.[4]
At the end of this article, you will have a working platform on your machine capable of
implementing Deep Reinforcement Learning on a realistically looking environment for a
Drone. You will be able to
Design your custom environments
Interface it with your Python code
Use/modify existing Python code for DRL
For this article, the underlying objective will be drone autonomous navigation. There are
no start or end positions, rather the drone has to navigate as long as it can without
colliding into obstacles. The code can be modi ed to any user-de ned objective.
The complete simulation consists of three major parts
3D Simulation Platform — To create and run simulated environments
Interface Platform — To simulate drone physics and interface between Unreal and
Python
DRL python code Platform — Contains the DRL code based on TensorFlow
There are multiple options to select each of these three platforms. But for this article, we
will select the following
3D simulation Platform — Unreal Engine [1]
Interface Platform — AirSim [2]
DRL python code Platform — DRLwithTL GitHub repository [3]

. . .
The rest of the article will be divided into three steps
Step1 — Installing the platforms
Step2 — Running the python code
Step3 — Control/Modify the code parameters
Step 1 — Installing the three Platforms:

It’s advisable to make a new virtual environment for this project and install the
dependencies. Following steps can be taken to download get started with these
platforms.
1. Clone the repository: The repository containing the DRL code can be cloned using
git clone https://github.com/aqeelanwar/DRLwithTL.git

2. Download imagenet weights for Alexnet: The DNN when initiliazed, uses Imagenet
learned weights for AlexNet instead of random weights. This given the DNN a better
starting point for training and help in convergence.
Following link can be used to download the imagenet.npy le.
Download imagenet.npy
Once downloaded, create a folder ‘models’ in DRLwithTL root folder, and place the
downloaded le there.
models/imagenet.py
2. Install required packages: The provided requirements.txt le can be used to install

all the required packages. Use the following command
cd DRLwithTL
pip install -r requirements.txt
This will install the required packages in the activated python environment.
3. Install Epic Unreal Engine: You can follow the guidelines in the link below to install
Unreal Engine on your platform
Instructions on installing Unreal engine
4. Install AirSim: AirSim is an open-source plugin for Unreal Engine developed by

Microsoft for agents (drones and cars) with physically and visually realistic simulations.
In order to interface between Python and the simulated environment, AirSim needs to be
installed. It can be downloaded from the link below
Instructions on installing AirSim
Once everything is installed properly, we can move onto the next step of running the
code.
. . .
Step 2 — Running DRLwithTL-Sim code:

Once you have the required packages and software downloaded and running, you can
take the following steps to run the code
Create/Download a simulated environment

You can either manually create your environment using Unreal Engine, or can download
one of the sample environments from the link below and run it.
Indoor Long Environment
Indoor Condo Environment
(more packaged environments will be added soon)
The link above will download the packaged version of the Indoor Long environment. Run
the indoor_long.exe le to run the environment. If you are having trouble with running
the environment, make sure your settings.json le in Documents/AirSim has been
con gured properly. You can try using the key F, M and backslash to change the camera
view in the environment. Also, keys 1,2,3 and 0 can be used to look at FPV, segmentation
map, depth map and toggle sub-window views.
Edit the con guration le (Optional)

The RL parameters for the DRL simulation can be set using the provided con g le and
are explained in the last section.
cd DRLwithTL\configs
notepad config.cfg (# for Windows)
Run the Python code

The DRL code can be started using the following command
cd DRLwithTL
python main.py
Running main.py carries out the following steps
1. Attempt to load the con g le
2. Attempt to connect with the Unreal Engine (the indoor_long environment must be
running for python to connect with the environment, otherwise connection refused
warning will appear — The code won’t proceed unless a connection is established)
3. Attempt to create two instances of the DNN (Double DQN is being used) and
initialize them with the selected weights.
4. Attempt to initialize Pygame screen for user interface
5. Start the DRL algorithm
At this point, the drone can be seen moving around in the environment collecting data-
points. The block diagram below shows the DRL algorithm used.
Block diagram of DRL Training and associated segments
Viewing learning parameters using tensorboard

During simulation, RL parameters such as epsilon, learning rate, average Q values, loss
and return can be viewed on the tensorboard. The path of the tensorboard log les
depends on the env_type, env_name and train_type set in the con g le and is given by
models/trained/<env_type>/<env_name>/Imagenet/ # Generic path
models/trained/Indoor/indoor_long/Imagenet/ # Example path
Once identi ed where the log les are stored, following command can be used on the
terminal to activate tensorboard.
cd models/trained/Indoor/indoor_long/Imagenet/
tensorboard --logdir <train_type> # Generic
tensorboard --logdir e2e # Example
The terminal will display the local URL that can be opened up on any browser, and the
tensorboard display will appear plotting the DRL parameters on run-time.
Run-time controls using PyGame screen

DRL is notorious to be data hungry. For complex tasks such as drone autonomous
navigation in a realistically looking environment using the front camera only, the
simulation can take hours of training (typically from 8 to 12 hours on a GTX1080 GPU)
before the DRL can converge. In the middle of the simulation, if you feel that you need to
change a few DRL parameters, you can do that by using the PyGame screen that appears
during your simulation. This can be done using the following steps
1. Change the con g le to re ect the modi cations (for example decrease the learning
rate) and save it.
2. Select the Pygame screen, and hit ‘backspace’. This will pause the simulation.
3. Hit the ‘L’ key. This will load the updated parameters and will print it on the
terminal.
4. Hit the ‘backspace’ key to resume the simulation.
Right now the simulation only updates the learning rate. Other variables can be updated
too by editing the aux_function.py le for the module check_user_input at the following
lines.
Editing check_user_input module to update other parameters too
cfg variable at line 187 has all the updated parameters, you only need to assign it to
corresponding variable and return the value for it to be activated.
. . .
Step3 — Control/Modify Parameters in DRLwithTL-Sim:

The code gives you the control to
1. Change the DRL con gurations
2. Change the Deep Neural Network (DNN)
3. Modify the drone action space
1. Change the DRL con gurations:

The provided con g le can be used to set the DRL parameters before starting the
simulation.
Con g le used for DRL and sample values
num_actions: Number of actions in the action space. The code uses perception
based action space [4] by dividing the camera frame into grid of
sqrt(num_actions)*sqrt(num_actions).
train_type: Determines the number of layers to be trained in the DNN. The

supported values are e2e, last4, last3, last2. More values can be de
wait_before_train: This parameter is used to set up the iteration at which the

training should begin. The simulation collects this many data-points before it starts
the training phase.
max_iters: Determines the maximum number of iterations used for DRL. The
simulation stops when these many iterations has been completed.
bu er_len: is used to set the size of the experience replay bu er. The simulation
keeps on collecting the data points and starts storing it in the replay bu er. Data-
points are sampled from this replay bu er and used for training.
batch_size: Determines the batch size in one training iteration.
epsilon_saturation: Epsilon greedy method is used to transition from exploration to

exploitation phase. When the number of iterations approaches this value, epsilon
approaches 0.9 i.e. 90% actions are predicted through the DNN (exploitation) and
only 10% are random (exploration)
crash_threshold: This value is used along with the depth map to determine when
the drone is considered to be virtually crashed. When the average depth to the
closest obstacle in the center dynamic window on the depth map falls below this
value, a reward of -1 is assigned to the data-tuple.
Q_clip: If set to True, the Q values are clipped if beyond a certain value. Helps in the
convergence of DRL.
train_interval: This value determines how often training happens. For example, if
set to 3, training happens after every 3 iterations.
update_target_interval: The simulation uses Double DQN approach to help

converging DRL loss. update_target_interval determines how often the simulation
shifts between the two Q-networks.
dropout_rate: Determines how often connections will be dropped out to avoid over-
tting.
switch_env_steps: Determines how often the drone changes its initial positions.
These initial positions are set in environments/initial_positions.py under the
corresponding environment name.
epsilon_model: linear of exponential

2. Change the DNN topology:
The DNN used for mapping the Q values to their states can be modi ed in the following
python le.
network/network.py #Location of DNN
Di erent DNN topology can be de ned in this python le as a class. The code comes with
three di erent versions of modi ed AlexNet network. More network can be de ned
according to the user needs if required. Once a new network is de ned,
netowork/agent.py le can be modi ed to use the required network on line 30 as shown
below
1 class DeepAgent():
2 def __init__(self, input_size, num_actions, client, env_type, train_fc, network_path
3 print('------------------------------ ' +str(name)+ ' --------------------------
4 self.g = tf.Graph()
5 self.iter=0
6 with self.g.as_default():
7
8 self.stat_writer = tf.summary.FileWriter(network_path+'return_plot')
9 # name_array = 'D:/train/loss'+'/'+name
10 self.loss_writer = tf.summary.FileWriter(network_path+'loss/'+name)
11 self.env_type=env_type
12 self.client=client
13 self.input_size = input_size
14 self.num_actions = num_actions
15
16 #Placeholders
17 self.batch_size = tf.placeholder(tf.int32, shape=())
18 self.learning_rate = tf.placeholder(tf.float32, shape=())
19 self.X1 = tf.placeholder(tf.float32, [None, input_size, input_size, 3], name
20
21 #self.X = tf.image.resize_images(self.X1, (227, 227))
22
23
24 self.X = tf.map_fn(lambda frame: tf.image.per_image_standardization(frame),
25 self.target = tf.placeholder(tf.float32, shape = [None], name='Qvals')
26 self.actions= tf.placeholder(tf.int32, shape = [None], name='Actions')
27
28 initial weights 'imagenet'
28 initial_weights ='imagenet'
29 initial_weights = 'models/weights/weights.npy'
30 self.model = AlexNetDuel(self.X, num_actions, train_fc)
31
32 self.predict = self.model.output
33 ind = tf.one_hot(self.actions, num_actions)
34 pred_Q = tf.reduce_sum(tf.multiply(self.model.output, ind), axis=1)
35 self.loss = huber_loss(pred_Q, self.target)
36 self.train = tf.train.AdamOptimizer(learning_rate=self.learning_rate, beta1=
37
38 self.sess = tf.InteractiveSession()
39 tf.global_variables_initializer().run()
40 tf.local_variables_initializer().run()
41 self.saver = tf.train.Saver()
42
43 self.sess.graph.finalize()
agent.py hosted with ❤ by GitHub view raw

3. Modify the drone Action Space:
Current version of the code supports perception based action space. Changing the
num_actions parameter in the con g le changes the number of bins the front facing
camera is divided into.
Perception based action space — Default action space used in the DRLwithTL code [4]
If an entirely di erent type of action space needs to be used, the user can de ne it by
modifying the following module
Module: take_action
Location: network/agent.py
If modi ed, this module should be able to map the action number (say 0,1, 2, …,
num_actions) to a corresponding yaw and pitch value of the drone.
Summary:
This article was aimed at getting you started with a working platform for Deep
reinforcement learning platform on a realistic 3D environment. The article also
mentions the parts of codes that can be modi ed according to user needs. The complete
code in working can be seen in paper [4].
References:
1. https://www.unrealengine.com
2. https://github.com/microsoft/airsim
3. https://github.com/aqeelanwar/DRLwithTL.git
4. http://arxiv.org/abs/1910.05547
Python Reinforcement Learning Drones Unreal Engine Airsim
About Help Legal

Deep Reinforcement Learning for Drones in 3D Environments

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deep Reinforcement Learning for Drones in 3D Environments

Uploaded by

Copyright:

Available Formats

Deep Reinforcement Learning for

Drones in 3D realistic environments

Design your custom environments

Interface it with your Python code

Use/modify existing Python code for DRL

The complete simulation consists of three major parts

3D Simulation Platform — To create and run simulated environments

3D simulation Platform — Unreal Engine [1]

Interface Platform — AirSim [2]

DRL python code Platform — DRLwithTL GitHub repository [3]

The rest of the article will be divided into three steps

Step1 — Installing the platforms

Step2 — Running the python code

Step3 — Control/Modify the code parameters

Step 1 — Installing the three Platforms:

git clone https://github.com/aqeelanwar/DRLwithTL.git

Following link can be used to download the imagenet.npy le.

2. Install required packages: The provided requirements.txt le can be used to install

Instructions on installing Unreal engine

4. Install AirSim: AirSim is an open-source plugin for Unreal Engine developed by

Instructions on installing AirSim

Step 2 — Running DRLwithTL-Sim code:

Create/Download a simulated environment

Indoor Long Environment

Indoor Condo Environment

(more packaged environments will be added soon)

Edit the con guration le (Optional)

Run the Python code

Running main.py carries out the following steps

1. Attempt to load the con g le

4. Attempt to initialize Pygame screen for user interface

5. Start the DRL algorithm

Block diagram of DRL Training and associated segments

Viewing learning parameters using tensorboard

Run-time controls using PyGame screen

4. Hit the ‘backspace’ key to resume the simulation.

Editing check_user_input module to update other parameters too

Step3 — Control/Modify Parameters in DRLwithTL-Sim:

1. Change the DRL con gurations

2. Change the Deep Neural Network (DNN)

3. Modify the drone action space

1. Change the DRL con gurations:

Con g le used for DRL and sample values

train_type: Determines the number of layers to be trained in the DNN. The

wait_before_train: This parameter is used to set up the iteration at which the

batch_size: Determines the batch size in one training iteration.

epsilon_saturation: Epsilon greedy method is used to transition from exploration to

update_target_interval: The simulation uses Double DQN approach to help

epsilon_model: linear of exponential

network/network.py #Location of DNN

agent.py hosted with ❤ by GitHub view raw

Python Reinforcement Learning Drones Unreal Engine Airsim

About Help Legal

You might also like