Professional Documents
Culture Documents
ae
Recent Patents on Computer Science 2019, 12, 1-8 1
RESEARCH ARTICLE
Wael Farag*
Electrical Engineering Department, American University of the Middle East, Kuwait, Cairo University, Egypt
Abstract: In this paper, a Convolutional Neural Network (CNN) to learn safe driving behavior and
smooth steering manoeuvring, is proposed as an empowerment of autonomous driving technologies.
The training data is collected from a front-facing camera and the steering commands issued by an
ARTICLE HISTORY experienced driver driving in traffic as well as urban roads. This data is then used to train the pro-
posed CNN to facilitate what it is called “Behavioral Cloning”. The proposed Behavior Cloning
Received: August 25, 2018
Revised: October 30, 2018 CNN is named as “BCNet”, and its deep seventeen-layer architecture has been selected after exten-
Accepted: October 30, 2018 sive trials. The BCNet got trained using Adam’s optimization algorithm as a variant of the Stochas-
DOI: tic Gradient Descent (SGD) technique. The paper goes through the development and training pro-
10.2174/2213275911666181106160002 cess in details and shows the image processing pipeline harnessed in the development. The proposed
approach proved successful in cloning the driving behavior embedded in the training data set after
extensive simulations.
Keywords: Behavioral cloning, convolutional neural network, autonomous driving, machine learning, stochastic gradient de-
scent.
1. INTRODUCTION road features with only the human steering angle as the train-
ing signal. In comparison with the explicit decomposition of
In the past decade, the automobile industry has made a
the autonomous driving problem into lane-marking detec-
shift towards intelligent vehicles equipped with driving as- tion, path planning, and control, the proposed end-to-end
sistance systems [1, 2], and recently has introduced vision
CNN optimizes all processing steps simultaneously.
systems in their high end cars. The vision system (the
mounted cameras in the car including the front-facing ones)
is being utilized by autonomous driving engineers to develop 2. THE CNN ARCHITECTURE
many of the future self-driving cars features like: a) road- The proposed CNN architecture is a seventeen-layer Be-
lane finding; b) free driving-space finding; c) traffic signs havior Cloning CNN model is given the name “BCNet”. The
detection and recognition [3, 4]; d) traffic lights detection model is coded using Keras [6] on top of Tensorflow [7] in
and recognition; e) road-objects detection and tracking. In Python [8]. Fig. (1) illustrates the BCNet architecture as well
this paper, it is proposed to use the mounted car vision sys- as Table I below describes the architecture in details:
tem (more specifically, the front-facing camera) to improve
Four Drop-out layers are added so as to prevent over-
the safety and the driving behavior of future self-driving
fitting during training, and the fully connected layers are
cars.
widened. Also, No Pooling layers are used here, as it is a
The main idea is to construct a Convolutional Neural regression problem and not a classification. Additionally, all
Network (CNN) that is able to learn the safe driving the convolutional layers are sized according to the input im-
manoeuvers from data collected through the driving of an age sizes after normalization and cropping.
expert driver in urban roads. The main focus in this paper to
let the proposed CNN to map raw pixels from a single front- 3. THE TRAINING DATA SET
facing camera directly to steering commands of the car. This
an end-to-end approach that lets the car drives without lane The following are the two main sources of data which are
markings on highways and on the roads with unclear visual utilized to construct the training data set that is used to train
guidance such as in parking lots and on unpaved roads [5]. the BCNet:
The CNN automatically learns internal representations of the 1) Source 1 - Udacity Supplied Data [9]: These collections
necessary processing pipeline steps such as detecting useful with an unzipped size of 365MB consists of 24,108 im-
ages equally divided between center, left and right front
*Address correspondence to this author at the Electrical Engineering De- cameras shots. Each image is 160x320 pixels size with 3
partment, American University of the Middle East, Kuwait, Cairo Universi- channels for RGB colors. The index of the data is stored
ty, Egypt; E-mails: wael.farag@aum.edu.kw, wael.farag@cu.edu.eg in a CSV file which contains 8,036 line of records.
2213-2759/19 $58.00+.00 © 2019 Bentham Science Publishers
2 Recent Patents on Computer Science, 2019, Vol. 12, No. 1 Wael Farag
Keep Probability:
10 Drop-out 2,176
0.5 => 0.7
Keep Probability:
12 Drop-out 200
0.5 => 0.7
Keep Probability:
14 Drop-out 100
0.5 => 0.7
Keep Probability:
16 Drop-out 20
0.5 => 0.7
2) Source 2 - Simulator Generated Data: collected using the lator several times (~ 10 times) with as good as possible
open source Udacity driving simulator in [10]. The rec- safe driving behavior. Particularly, it is encouraged to in-
orded data set has an unzipped size of 808MB and con- clude "recovery" data while training. This means that da-
sists of 49,851 images equally divided between center, ta should be captured starting from the point of approach-
left and right front cameras shots. Each image is 160x320 ing the edge of the track (perhaps nearly missing a turn
pixels size with 3 channels for RGB colors. The index of and almost driving off the track) and recording the pro-
the data is stored in a CSV file which contains 16,617 cess of steering the car back toward the center of the
line of records. The data has been generated by driving track to give the model a chance to learn recovery
the car manually around Track 1 in the mentioned simu- behavior.
Cloning Safe Driving Behavior for Self-Driving Cars Recent Patents on Computer Science, 2019, Vol. 12, No. 1 3
C onvolution
# 1
C onvolution
# 2
31x159x24 C onvolution
# 3
14x78x36 5x38x48
160x320x3 C ro pp ing
65x320x3
O u tp u t
N euro n
D ro p-‐ o ut
Dro p-‐ o ut
Dro p-‐ o ut
0.7
Dro p-‐ o ut
0.7 0.5 F ull y
0.5 C o n nected
20
K ern el:
3x3 F ull y
F ull y
K ern el:
3x3 S trid es:
1x1 C o n nected
200 C o n nected
100
S trid es:
1x1
Several subroutines have been written for data visualiza- the bias or the tendency of the car to steer right or left
tion and analysis. This acts as a sort of sanity check to verify due to the exposure to a biased training data (more left
that the preprocessing is not fundamentally flawed. Flawed turns than right or vice versa).
data will almost certainly act to confuse the model and result
4) Jittering images: To minimize the model's tendency to
in unacceptable performance. An Example of the output of
overfit to the conditions of the test track, images are "jit-
these subroutines is presented in Fig. (2) which displays a
tered" before being fed to the BCNet. The jittering con-
sample of the generated training data (source 2), and Fig. (3)
sists of a randomized brightness adjustment, a random-
which presents the histogram of the steering angle values ized shadow, and a randomized horizon shift. The shad-
collected during driving (source 2).
ow effect is simply a darkening of a random rectangular
The data (both sources 1 and 2) is divided into 2 separate portion of the image, starting at either the left or right
parts: training data which represents 80% of the chunk and edge and spanning the height of the image. The horizon
validation data which represents 20% of the chunk. shift applies a perspective transform beginning at the
horizon line (at roughly 2/5 of the height) and shifting it
4. DRIVING DATA PRE-PROCESSING up or down randomly by up to 1/8th of the image height.
The horizon shift is meant to mimic the topology condi-
Before using the front cameras images in train- tions of the test track.
ing/validation data sets, these images need to be pre-
processed to make more useful and convenient throughout 5) Data Distribution Flattening: Because the test track in-
the learning process. The pre-processing steps meant to im- cludes long sections with very slight or no curvature, the
prove the training results and reduce the computation as data captured from it tends to be heavily skewed toward
much as possible, or to find the delicate balance between the low and zero turning angles. This creates a problem for
two requirements. The following steps describe the imple- the neural network, which then becomes biased toward
mented pre-processing steps in order of execution: driving in a straight line and can become easily confused
by sharp turns. The distribution of the input data can be
1) Normalization (color): this is done for color images using observed in Fig. (3). To reduce the occurrence of low and
the “Lambda function” in Keras [6] by simply imple- zero angle data points, a histogram of the turning angles
menting a min-max scaling. The values of the RGB pix- is produced and the average number of samples per bin is
els are scaled to the -1 → 1 range and centered on zero computed. Next, a "keep probability" for the samples be-
instead of the 0→255 range. longing to each bin is determined. That keep probability
2) Cropping images: The images have been cropped from is 1.0 for bins that contain less than the computed aver-
the top by 70 pixels and from the bottom by 25 pixels, in age samples per bin, and for other bins the “keep proba-
order to focus on the region of interest (ROI) and to re- bility” is calculated to be the number of samples for that
duce the number of inputs (faster learning process). The bin divided by average samples per bin. Finally, random
cropped images have the size of 65x320x3. data points from the data set are removed with a rate
of (1 – “keep probability”). The resulting data distribu-
3) Flipping images: The data has been doubled (augmented)
tion can be seen in Fig. (4). The distribution is not uni-
by flipping all the images (around the y axis) and revers-
form overall, but it is much closer to uniform for lower
ing the sign of the corresponding steering angle. Accord- and zero turning angles. This method helped speed up the
ingly, the source-1 data becomes 48,216 samples, and
training process as lower size data is used but with higher
source-2 data becomes 99,702 samples. In other words,
quality.
each CSV line record can generate 6 training samples
(center, left, right, flipped-center, flipped-left, and 6) Cleaning the dataset: it is discovered that the model per-
flipped-right). This technique actually serves to balance formed poorly especially on certain data points, and then
the data at both the left and right steering, which removes found those data points to be mislabeled in several cases.
4 Recent Patents on Computer Science, 2019, Vol. 12, No. 1 Wael Farag
Fig. (2). Sample of the collected images: center, left and right re-
spectively.
during the initial time steps, and especially when the decay by Fig. (8). For this reason, the learning rate is further re-
rates are small (i.e. β1 and β2 are close to 1). duced and the keep probability increased.
They counteract these biases by computing bias-
corrected first and second moment estimates:
!!
!! =
1 − !!!
!!
!! = (2)
!!!!!
The BCNet model is trained using the parameters listed Keep Probability 0.5 → 0.7 For Drop-out Layers
in Tables II, III and IV using ADAM’s optimization algo-
rithm. Fig. (5) shows the setup of the BCNet used during the
training phase, while Fig. (6) shows the setup during the The training of the BCNet has been carried-out through
running and simulation modes. Furthermore, the training several trials to achieve the presented results in Table V. The
results are presented in Table V, Figs. (7 and 8). The state of following observations have been collected during the train-
the model is a bit over-fitting after the training represented ing process:
6 Recent Patents on Computer Science, 2019, Vol. 12, No. 1 Wael Farag
5 Epochs
Phase 1 Training : 0.0235 Learning Rate = 0.001 Coarse Tuning with Udacity Data. Not
Source-1 Data
Coarse Tuning Validation: 0.0205 Keep Prob. = 0.5 enough for full learning.
Fig. (7)
5 Epochs
Fine Tuning with self-collected data. Proved
Phase 2 Training : 0.0455 Learning Rate = 0.001
Source-2 Data enough for full learning with acceptable
Fine Tuning Validation: 0.0411 Keep Prob. = 0.5
Performance.
Fig. (8)
3 Epochs
More fine tuning with self-collected data.
Phase 3 Training : 0.0417 Learning Rate = 0.0005
Source-2 Data Caused over-fitting with a kind of inferior
Fine Tuning Validation: 0.0377 Keep Prob. = 0.6
performance.
Fig. (9)
2 Epochs
Phase 4 Training : 0.0348 Learning Rate = 0.0002 More fine tuning with self-collected data. Full
Source-2 Data
Fine Tuning Validation: 0.0295 Keep Prob. = 0.7 learning with very good performance.
Fig. (10)
1) Training the network using only the “source-1” Udacity this results in an over-fitting model as shown in Fig. (9).
supplied data have been tried several times incorporating Furthermore, the testing confirmed that after producing
several ways of data augmentation, however, acceptable inferior performance even with both training and valida-
results have not been achieved, and the car always hit the tion loss are lower than the previous case. Consequently,
borders. this model is set to get further fine-tuning.
2) Using the Udacity driving simulator [10], training data 5) The learning rate of the ADAM optimizer has been re-
has been collected by manoeuvring the car using a key- duced further to 0.0002 and keep probability increased to
board or a joystick. Accordingly, useful data has been 0.7 and the model got trained for an extra 2 epochs (Fig.
successfully collected for training by looping the car 10). The resultant model is then tested on “Track 1” and
around; as an example; “Track 1” several times (~ 10). produced very good performance.
3) After coarse tuning the model using the “source-1” data,
7. SHORTCOMING OF THE IMPLEMENTED AP-
the resultant model weights are then reused for further PROACH
training and fine-tuning based on the “source-2” data us-
ing ADAM optimizer learning rate of 0.001 as in Table V The following list summaries the identified shortcom-
and Fig. (8). This matter of coarse and then fine-tuning ings:
resembles the transfer learning approach. Note that the 1) The presented neural network model doesn’t have a
two types of data are never used together. After this fine memory, it takes momentarily decision and doesn’t build
tuning phase, the resultant model is then tested on “Track on previous states to make the current decision. Howev-
1” in the simulator and produces acceptable results (no er, It is believed that driving is a sequential process and
unsafe or sudden manoeuvring). the current approach doesn’t mimic that.
4) In order to improve the performance further, the learning 2) After training the network on one track and testing it on
rate of the ADAM optimizer has been halved to 0.0005 another one (considerably different than the first one), it
and the model got trained for further 3 epochs. However, may produce unacceptable results in some scenarios in
8 Recent Patents on Computer Science, 2019, Vol. 12, No. 1 Wael Farag
terms of driving behavior, as it has never gone through sents a corner stone in facilitating the existence of fully au-
these scenarios before. Accordingly, this approach may tonomous cars in the near future.
require the network to be exposed to a massive number
of tracks in order to generalize well for actual street de- CONSENT FOR PUBLICATION
ployment (commercial application).
Not applicable.
8. SUGGESTED IMPROVEMENTS
CONFLICT OF INTEREST
The following points summarize the suggested improve-
ments: The authors declare no conflict of interest, financial or
otherwise.
1) Other network topologies with a memory like Long
Short-Term Memory (LSTM) models need to be tried for ACKNOWLEDGEMENTS
behavior cloning end-to-end learning.
This work used the HPC facilities of the American Uni-
2) The network needs to be trained on much more tracks, versity of the Middle East, Kuwait.
manoeuvring scenarios and road conditions in order to
make it generalize as much as possible.
REFERENCES
3) More useful data can be generated from the current col- [1] Karim Mansour, Wael Farag, “AiroDiag: A Sophisticated Tool that
lected data by random distortion addition, brightness ma- Diagnoses and Updates Vehicles Software Over Air”, 2012 IEEE
nipulation, jitter and rotation … etc. Intern. Electric Vehicle Conference (IEVC), TD Convention Center
Greenville, SC, USA, March 4, 2012, ISBN: 978-1-4673-1562-3.
4) Applying the concept of a finite impulse response (FIR) [2] Wael Farag, “CANTrack: Enhancing automotive CAN bus security
filtering or the moving average concept for the steering using intuitive encryption algorithms”, 7th Inter. Conf. on Model-
angle estimation before the final steering command, in- ing, Simulation, and Applied Optimization (ICMSAO), UAE,
stead of using the raw estimated value directly. In such a March 2017.
[3] Á. Arcos-García, J.A. Álvarez-García, L.M. Soria-Morillo, “Deep
case, the new estimated value will depend on previous neural network for traffic sign recognition systems: An analysis of
history as well. spatial transformers and stochastic optimisation methods”, Neural
Networks 99 (2018) 158–165, Elsevier.
CONCLUSION [4] Wael Farag, Zakaria Saleh, "Traffic Signs Identification by Deep
Learning for Autonomous Driving", IET Smart Cities Symposium
In this paper, a CNN-based safe steering controller (SCS'18), Bahrain, 22-23 April, 2018.
[5] M Bojarski, D Del Testa, D Dworakowski, B Firner, B Flepp, P
“BCNet” has been proposed. The architecture of the CNN is Goyal, ... et al., “End to End Learning for Self-Driving Cars”,
presented in details. The structure of the comprehensive arXiv:1604.07316, 25 Apr 2016.
training, validation and testing data is described. The in- [6] Keras Documentation, “https://keras.io/”.
volved image processing algorithms have been described as [7] TensorFlow, “https://www.tensorflow.org/”.
well and their contributions are analyzed. The BCNet has [8] Python, “https://www.python.org/”
[9] Udacity Sample Training Data, https://d17h27t6h515a5. cloud-
shown that it is able to learn the entire task of lane and road front.net/topher/2016/December/584f6edd_data/data.zip
following without manual decomposition into road or lane [10] Udacity Simulator, https://github.com/udacity/self-driving-car-sim
marking detection, semantic abstraction, path planning, and [11] Shervine Amidi, “https://stanford.edu/~shervine/blog/keras-how-
control. A small amount of training data from one or two to-generate-data-on-the-fly.html”.
[12] Mina Nagiub, Wael Farag, “Automatic selection of compiler op-
tracks was sufficient to train the car to drive safely in multi- tions using genetic techniques for embedded software design”,
ple tracks. The CNN is able to learn meaningful road fea- IEEE 14th Inter. Symposium on Comp. Intelligence and Informatics
tures from a very sparse training signal (steering alone). It (CINTI), Budapest, Hungary, 19 Nov., 2013, ISBN: 978-1-4799-
has been shown throughout the training process that the 0194-4.
quality of data (much more than quantity) is specifically cru- [13] D.P. Kingma, J. Ba, “Adam: A Method for Stochastic Optimiza-
tion”, 3rd Inter. Conf. for Learning Representations, San Diego,
cial for this application. Therefore, a comprehensive pipeline USA, 2015.
of training data pre-processing has been carefully imple- [14] Léon Bottou, "Online Algorithms and Stochastic Approximations",
mented. Online Learning and Neural Nets, Cambridge Univ. Press, ISBN
978-0-521-65263-6, (1998).
Moreover, the shortcomings of the proposed approach [15] Sebastian Ruder, “An overview of gradient descent optimization
have been discussed with proposed improvement actions for algorithms”, arXiv:1609.04747v2, 15 Jun 2017.
future work being elaborated. The presented solution pre- [16] Wael Farag, “Synthesis of intelligent hybrid systems for modeling
and control”, University of Waterloo, Canada, 1998.