Professional Documents
Culture Documents
A complete code to get you started with implementing Deep Reinforcement Learning in a
realistically looking environment using Unreal Gaming Engine and Python.
Overview:
Last week, I made a GitHub repository public that contains a stand-alone detailed
python code implementing deep reinforcement learning on a drone in a 3D simulated
environment using Unreal Gaming Engine. I decided to cover a detailed documentation
in this article. The 3D environments are made on Epic Unreal Gaming engine, and
Python is used to interface with the environments and carry out Deep reinforcement
learning using TensorFlow.
Drone navigating in a 3D indoor environment.[4]
At the end of this article, you will have a working platform on your machine capable of
implementing Deep Reinforcement Learning on a realistically looking environment for a
Drone. You will be able to
For this article, the underlying objective will be drone autonomous navigation. There are
no start or end positions, rather the drone has to navigate as long as it can without
colliding into obstacles. The code can be modi ed to any user-de ned objective.
Interface Platform — To simulate drone physics and interface between Unreal and
Python
DRL python code Platform — Contains the DRL code based on TensorFlow
There are multiple options to select each of these three platforms. But for this article, we
will select the following
1. Clone the repository: The repository containing the DRL code can be cloned using
Download imagenet.npy
Once downloaded, create a folder ‘models’ in DRLwithTL root folder, and place the
downloaded le there.
models/imagenet.py
cd DRLwithTL
pip install -r requirements.txt
This will install the required packages in the activated python environment.
3. Install Epic Unreal Engine: You can follow the guidelines in the link below to install
Unreal Engine on your platform
Once everything is installed properly, we can move onto the next step of running the
code.
. . .
The link above will download the packaged version of the Indoor Long environment. Run
the indoor_long.exe le to run the environment. If you are having trouble with running
the environment, make sure your settings.json le in Documents/AirSim has been
con gured properly. You can try using the key F, M and backslash to change the camera
view in the environment. Also, keys 1,2,3 and 0 can be used to look at FPV, segmentation
map, depth map and toggle sub-window views.
cd DRLwithTL\configs
notepad config.cfg (# for Windows)
2. Attempt to connect with the Unreal Engine (the indoor_long environment must be
running for python to connect with the environment, otherwise connection refused
warning will appear — The code won’t proceed unless a connection is established)
3. Attempt to create two instances of the DNN (Double DQN is being used) and
initialize them with the selected weights.
At this point, the drone can be seen moving around in the environment collecting data-
points. The block diagram below shows the DRL algorithm used.
Once identi ed where the log les are stored, following command can be used on the
terminal to activate tensorboard.
cd models/trained/Indoor/indoor_long/Imagenet/
tensorboard --logdir <train_type> # Generic
tensorboard --logdir e2e # Example
The terminal will display the local URL that can be opened up on any browser, and the
tensorboard display will appear plotting the DRL parameters on run-time.
2. Select the Pygame screen, and hit ‘backspace’. This will pause the simulation.
3. Hit the ‘L’ key. This will load the updated parameters and will print it on the
terminal.
Right now the simulation only updates the learning rate. Other variables can be updated
too by editing the aux_function.py le for the module check_user_input at the following
lines.
cfg variable at line 187 has all the updated parameters, you only need to assign it to
corresponding variable and return the value for it to be activated.
. . .
num_actions: Number of actions in the action space. The code uses perception
based action space [4] by dividing the camera frame into grid of
sqrt(num_actions)*sqrt(num_actions).
max_iters: Determines the maximum number of iterations used for DRL. The
simulation stops when these many iterations has been completed.
bu er_len: is used to set the size of the experience replay bu er. The simulation
keeps on collecting the data points and starts storing it in the replay bu er. Data-
points are sampled from this replay bu er and used for training.
crash_threshold: This value is used along with the depth map to determine when
the drone is considered to be virtually crashed. When the average depth to the
closest obstacle in the center dynamic window on the depth map falls below this
value, a reward of -1 is assigned to the data-tuple.
Q_clip: If set to True, the Q values are clipped if beyond a certain value. Helps in the
convergence of DRL.
train_interval: This value determines how often training happens. For example, if
set to 3, training happens after every 3 iterations.
dropout_rate: Determines how often connections will be dropped out to avoid over-
tting.
switch_env_steps: Determines how often the drone changes its initial positions.
These initial positions are set in environments/initial_positions.py under the
corresponding environment name.
Di erent DNN topology can be de ned in this python le as a class. The code comes with
three di erent versions of modi ed AlexNet network. More network can be de ned
according to the user needs if required. Once a new network is de ned,
netowork/agent.py le can be modi ed to use the required network on line 30 as shown
below
1 class DeepAgent():
2 def __init__(self, input_size, num_actions, client, env_type, train_fc, network_path
3 print('------------------------------ ' +str(name)+ ' --------------------------
4 self.g = tf.Graph()
5 self.iter=0
6 with self.g.as_default():
7
8 self.stat_writer = tf.summary.FileWriter(network_path+'return_plot')
9 # name_array = 'D:/train/loss'+'/'+name
10 self.loss_writer = tf.summary.FileWriter(network_path+'loss/'+name)
11 self.env_type=env_type
12 self.client=client
13 self.input_size = input_size
14 self.num_actions = num_actions
15
16 #Placeholders
17 self.batch_size = tf.placeholder(tf.int32, shape=())
18 self.learning_rate = tf.placeholder(tf.float32, shape=())
19 self.X1 = tf.placeholder(tf.float32, [None, input_size, input_size, 3], name
20
21 #self.X = tf.image.resize_images(self.X1, (227, 227))
22
23
24 self.X = tf.map_fn(lambda frame: tf.image.per_image_standardization(frame),
25 self.target = tf.placeholder(tf.float32, shape = [None], name='Qvals')
26 self.actions= tf.placeholder(tf.int32, shape = [None], name='Actions')
27
28 initial weights 'imagenet'
28 initial_weights ='imagenet'
29 initial_weights = 'models/weights/weights.npy'
30 self.model = AlexNetDuel(self.X, num_actions, train_fc)
31
32 self.predict = self.model.output
33 ind = tf.one_hot(self.actions, num_actions)
34 pred_Q = tf.reduce_sum(tf.multiply(self.model.output, ind), axis=1)
35 self.loss = huber_loss(pred_Q, self.target)
36 self.train = tf.train.AdamOptimizer(learning_rate=self.learning_rate, beta1=
37
38 self.sess = tf.InteractiveSession()
39 tf.global_variables_initializer().run()
40 tf.local_variables_initializer().run()
41 self.saver = tf.train.Saver()
42
43 self.sess.graph.finalize()
If an entirely di erent type of action space needs to be used, the user can de ne it by
modifying the following module
Module: take_action
Location: network/agent.py
If modi ed, this module should be able to map the action number (say 0,1, 2, …,
num_actions) to a corresponding yaw and pitch value of the drone.
Summary:
This article was aimed at getting you started with a working platform for Deep
reinforcement learning platform on a realistic 3D environment. The article also
mentions the parts of codes that can be modi ed according to user needs. The complete
code in working can be seen in paper [4].
References:
1. https://www.unrealengine.com
2. https://github.com/microsoft/airsim
3. https://github.com/aqeelanwar/DRLwithTL.git
4. http://arxiv.org/abs/1910.05547