You are on page 1of 15

Deep Reinforcement Learning for

Drones in 3D realistic environments


Aqeel Anwar Follow
Oct 23 · 8 min read

A complete code to get you started with implementing Deep Reinforcement Learning in a
realistically looking environment using Unreal Gaming Engine and Python.

Overview:
Last week, I made a GitHub repository public that contains a stand-alone detailed
python code implementing deep reinforcement learning on a drone in a 3D simulated
environment using Unreal Gaming Engine. I decided to cover a detailed documentation
in this article. The 3D environments are made on Epic Unreal Gaming engine, and
Python is used to interface with the environments and carry out Deep reinforcement
learning using TensorFlow.
Drone navigating in a 3D indoor environment.[4]

At the end of this article, you will have a working platform on your machine capable of
implementing Deep Reinforcement Learning on a realistically looking environment for a
Drone. You will be able to

Design your custom environments

Interface it with your Python code

Use/modify existing Python code for DRL

For this article, the underlying objective will be drone autonomous navigation. There are
no start or end positions, rather the drone has to navigate as long as it can without
colliding into obstacles. The code can be modi ed to any user-de ned objective.

The complete simulation consists of three major parts

3D Simulation Platform — To create and run simulated environments

Interface Platform — To simulate drone physics and interface between Unreal and
Python

DRL python code Platform — Contains the DRL code based on TensorFlow

There are multiple options to select each of these three platforms. But for this article, we
will select the following

3D simulation Platform — Unreal Engine [1]

Interface Platform — AirSim [2]

DRL python code Platform — DRLwithTL GitHub repository [3]


. . .

The rest of the article will be divided into three steps

Step1 — Installing the platforms

Step2 — Running the python code

Step3 — Control/Modify the code parameters

Step 1 — Installing the three Platforms:


It’s advisable to make a new virtual environment for this project and install the
dependencies. Following steps can be taken to download get started with these
platforms.

1. Clone the repository: The repository containing the DRL code can be cloned using

git clone https://github.com/aqeelanwar/DRLwithTL.git


2. Download imagenet weights for Alexnet: The DNN when initiliazed, uses Imagenet
learned weights for AlexNet instead of random weights. This given the DNN a better
starting point for training and help in convergence.

Following link can be used to download the imagenet.npy le.

Download imagenet.npy

Once downloaded, create a folder ‘models’ in DRLwithTL root folder, and place the
downloaded le there.

models/imagenet.py

2. Install required packages: The provided requirements.txt le can be used to install


all the required packages. Use the following command

cd DRLwithTL
pip install -r requirements.txt

This will install the required packages in the activated python environment.

3. Install Epic Unreal Engine: You can follow the guidelines in the link below to install
Unreal Engine on your platform

Instructions on installing Unreal engine

4. Install AirSim: AirSim is an open-source plugin for Unreal Engine developed by


Microsoft for agents (drones and cars) with physically and visually realistic simulations.
In order to interface between Python and the simulated environment, AirSim needs to be
installed. It can be downloaded from the link below

Instructions on installing AirSim

Once everything is installed properly, we can move onto the next step of running the
code.
. . .

Step 2 — Running DRLwithTL-Sim code:


Once you have the required packages and software downloaded and running, you can
take the following steps to run the code

Create/Download a simulated environment


You can either manually create your environment using Unreal Engine, or can download
one of the sample environments from the link below and run it.

Indoor Long Environment

Indoor Condo Environment

(more packaged environments will be added soon)

The link above will download the packaged version of the Indoor Long environment. Run
the indoor_long.exe le to run the environment. If you are having trouble with running
the environment, make sure your settings.json le in Documents/AirSim has been
con gured properly. You can try using the key F, M and backslash to change the camera
view in the environment. Also, keys 1,2,3 and 0 can be used to look at FPV, segmentation
map, depth map and toggle sub-window views.

Edit the con guration le (Optional)


The RL parameters for the DRL simulation can be set using the provided con g le and
are explained in the last section.

cd DRLwithTL\configs
notepad config.cfg (# for Windows)

Run the Python code


The DRL code can be started using the following command
cd DRLwithTL
python main.py

Running main.py carries out the following steps

1. Attempt to load the con g le

2. Attempt to connect with the Unreal Engine (the indoor_long environment must be
running for python to connect with the environment, otherwise connection refused
warning will appear — The code won’t proceed unless a connection is established)

3. Attempt to create two instances of the DNN (Double DQN is being used) and
initialize them with the selected weights.

4. Attempt to initialize Pygame screen for user interface

5. Start the DRL algorithm

At this point, the drone can be seen moving around in the environment collecting data-
points. The block diagram below shows the DRL algorithm used.

Block diagram of DRL Training and associated segments

Viewing learning parameters using tensorboard


During simulation, RL parameters such as epsilon, learning rate, average Q values, loss
and return can be viewed on the tensorboard. The path of the tensorboard log les
depends on the env_type, env_name and train_type set in the con g le and is given by
models/trained/<env_type>/<env_name>/Imagenet/ # Generic path
models/trained/Indoor/indoor_long/Imagenet/ # Example path

Once identi ed where the log les are stored, following command can be used on the
terminal to activate tensorboard.

cd models/trained/Indoor/indoor_long/Imagenet/
tensorboard --logdir <train_type> # Generic
tensorboard --logdir e2e # Example

The terminal will display the local URL that can be opened up on any browser, and the
tensorboard display will appear plotting the DRL parameters on run-time.

Run-time controls using PyGame screen


DRL is notorious to be data hungry. For complex tasks such as drone autonomous
navigation in a realistically looking environment using the front camera only, the
simulation can take hours of training (typically from 8 to 12 hours on a GTX1080 GPU)
before the DRL can converge. In the middle of the simulation, if you feel that you need to
change a few DRL parameters, you can do that by using the PyGame screen that appears
during your simulation. This can be done using the following steps
1. Change the con g le to re ect the modi cations (for example decrease the learning
rate) and save it.

2. Select the Pygame screen, and hit ‘backspace’. This will pause the simulation.

3. Hit the ‘L’ key. This will load the updated parameters and will print it on the
terminal.

4. Hit the ‘backspace’ key to resume the simulation.

Right now the simulation only updates the learning rate. Other variables can be updated
too by editing the aux_function.py le for the module check_user_input at the following
lines.

Editing check_user_input module to update other parameters too

cfg variable at line 187 has all the updated parameters, you only need to assign it to
corresponding variable and return the value for it to be activated.

. . .

Step3 — Control/Modify Parameters in DRLwithTL-Sim:


The code gives you the control to

1. Change the DRL con gurations

2. Change the Deep Neural Network (DNN)

3. Modify the drone action space

1. Change the DRL con gurations:


The provided con g le can be used to set the DRL parameters before starting the
simulation.

Con g le used for DRL and sample values

num_actions: Number of actions in the action space. The code uses perception
based action space [4] by dividing the camera frame into grid of
sqrt(num_actions)*sqrt(num_actions).

train_type: Determines the number of layers to be trained in the DNN. The


supported values are e2e, last4, last3, last2. More values can be de

wait_before_train: This parameter is used to set up the iteration at which the


training should begin. The simulation collects this many data-points before it starts
the training phase.

max_iters: Determines the maximum number of iterations used for DRL. The
simulation stops when these many iterations has been completed.

bu er_len: is used to set the size of the experience replay bu er. The simulation
keeps on collecting the data points and starts storing it in the replay bu er. Data-
points are sampled from this replay bu er and used for training.

batch_size: Determines the batch size in one training iteration.

epsilon_saturation: Epsilon greedy method is used to transition from exploration to


exploitation phase. When the number of iterations approaches this value, epsilon
approaches 0.9 i.e. 90% actions are predicted through the DNN (exploitation) and
only 10% are random (exploration)

crash_threshold: This value is used along with the depth map to determine when
the drone is considered to be virtually crashed. When the average depth to the
closest obstacle in the center dynamic window on the depth map falls below this
value, a reward of -1 is assigned to the data-tuple.

Q_clip: If set to True, the Q values are clipped if beyond a certain value. Helps in the
convergence of DRL.

train_interval: This value determines how often training happens. For example, if
set to 3, training happens after every 3 iterations.

update_target_interval: The simulation uses Double DQN approach to help


converging DRL loss. update_target_interval determines how often the simulation
shifts between the two Q-networks.

dropout_rate: Determines how often connections will be dropped out to avoid over-
tting.

switch_env_steps: Determines how often the drone changes its initial positions.
These initial positions are set in environments/initial_positions.py under the
corresponding environment name.

epsilon_model: linear of exponential


2. Change the DNN topology:
The DNN used for mapping the Q values to their states can be modi ed in the following
python le.

network/network.py #Location of DNN

Di erent DNN topology can be de ned in this python le as a class. The code comes with
three di erent versions of modi ed AlexNet network. More network can be de ned
according to the user needs if required. Once a new network is de ned,
netowork/agent.py le can be modi ed to use the required network on line 30 as shown
below

1 class DeepAgent():
2 def __init__(self, input_size, num_actions, client, env_type, train_fc, network_path
3 print('------------------------------ ' +str(name)+ ' --------------------------
4 self.g = tf.Graph()
5 self.iter=0
6 with self.g.as_default():
7
8 self.stat_writer = tf.summary.FileWriter(network_path+'return_plot')
9 # name_array = 'D:/train/loss'+'/'+name
10 self.loss_writer = tf.summary.FileWriter(network_path+'loss/'+name)
11 self.env_type=env_type
12 self.client=client
13 self.input_size = input_size
14 self.num_actions = num_actions
15
16 #Placeholders
17 self.batch_size = tf.placeholder(tf.int32, shape=())
18 self.learning_rate = tf.placeholder(tf.float32, shape=())
19 self.X1 = tf.placeholder(tf.float32, [None, input_size, input_size, 3], name
20
21 #self.X = tf.image.resize_images(self.X1, (227, 227))
22
23
24 self.X = tf.map_fn(lambda frame: tf.image.per_image_standardization(frame),
25 self.target = tf.placeholder(tf.float32, shape = [None], name='Qvals')
26 self.actions= tf.placeholder(tf.int32, shape = [None], name='Actions')
27
28 initial weights 'imagenet'
28 initial_weights ='imagenet'
29 initial_weights = 'models/weights/weights.npy'
30 self.model = AlexNetDuel(self.X, num_actions, train_fc)
31
32 self.predict = self.model.output
33 ind = tf.one_hot(self.actions, num_actions)
34 pred_Q = tf.reduce_sum(tf.multiply(self.model.output, ind), axis=1)
35 self.loss = huber_loss(pred_Q, self.target)
36 self.train = tf.train.AdamOptimizer(learning_rate=self.learning_rate, beta1=
37
38 self.sess = tf.InteractiveSession()
39 tf.global_variables_initializer().run()
40 tf.local_variables_initializer().run()
41 self.saver = tf.train.Saver()
42
43 self.sess.graph.finalize()

agent.py hosted with ❤ by GitHub view raw


3. Modify the drone Action Space:
Current version of the code supports perception based action space. Changing the
num_actions parameter in the con g le changes the number of bins the front facing
camera is divided into.
Perception based action space — Default action space used in the DRLwithTL code [4]

If an entirely di erent type of action space needs to be used, the user can de ne it by
modifying the following module

Module: take_action
Location: network/agent.py

If modi ed, this module should be able to map the action number (say 0,1, 2, …,
num_actions) to a corresponding yaw and pitch value of the drone.

Summary:
This article was aimed at getting you started with a working platform for Deep
reinforcement learning platform on a realistic 3D environment. The article also
mentions the parts of codes that can be modi ed according to user needs. The complete
code in working can be seen in paper [4].

References:
1. https://www.unrealengine.com

2. https://github.com/microsoft/airsim

3. https://github.com/aqeelanwar/DRLwithTL.git

4. http://arxiv.org/abs/1910.05547

Python Reinforcement Learning Drones Unreal Engine Airsim

About Help Legal

You might also like