You are on page 1of 6

A Deep Learning-Based Approach for Road Pothole

Detection in Timor Leste


1st Vosco Pereira 2nd Satoshi Tamura
Department of Intelligence Science and Engineering Department of Intelligence Science and Engineering
Gifu University Gifu University
Gifu , Japan Gifu, Japan
vosco@asr.info.gifu-u.ac.jp tamura@info.gifu-u.ac.jp

3rd Satoru Hayamizu 4th Hidekazu Fukai


Department of Intelligence Science and Engineering Department of Intelligence Science and Engineering
Gifu University Gifu University
Gifu, Japan Gifu, Japan
hayamizu@gifu-u.ac.jp fukai@gifu-u.ac.jp

Abstract—This research proposes a low-cost solution for detect- countries, visual inspection by human is the main form of road
ing road potholes image by using convolutional neural network conditions assessment method in order to guarantee the road
(CNN). Our model is trained entirely on the image which is still meet the requirements of services in general. However,
collected from several different places and has variation such
as in wet, dry and shady conditions. The experiment using the manually reviewing and assessing visual road data is a time-
500 testing images showed that our model can achieve (99.80 %) consuming and expensive procedure. In addition, the results
of Accuracy, Precision (100%), Recall (99.60%), and F-Measure are highly influenced by the subjectivity and experience of
(99.60%) simultaneously. the human raters. Moreover, such the methods are very slow
Index Terms—Potholes, Deep Learning, Convolutional Neural and uncomfortable for inspectors. With the advance of science
Network, Image Classification
and technology and popularity of the deep learning model in
I. I NTRODUCTION the engineering field, sophisticated and low-cost systems with
intelligence can be used to detect potholes instead of humans.
Timor-Leste is one of the newest and least developed
By using the sophisticated systems, the time consumed and the
country in the region of Asia. Therefore, transportation infras-
cost for detecting the potholes will be reduced and potholes are
tructure development and maintenance are the essential part
detected with more accuracy. This paper presents an enhanced
in developing the country. Roads is one of the most important
pothole detection method that takes advantage of a sequence
infrastructure in transportation system. It indirectly contributes
of images which taken by low cost and simple equipment from
to the economic growth of the country and it is very important
road condition video by one of the deep learning techniques,
that the road maintenance is a significant concern. The poor of
Convolutional Neural Network (CNN). We develope a CNN
road network condition is the main factor for economic devel-
model which can detect pothole images from various condi-
opment and poverty alleviation [1]. Almost all the roads net-
tions such as dry, wet and shady conditions and various sizes
work was constructed by the Indonesian government since 20
and shapes. In order to evaluate the performance, we compare
years ago. Nowadays, most of those roads have been damaged
the result with traditional or conventional machine learning
and greatly hampered the transport circulation. Approximately,
method. The resulting performance indicates that potholes can
around 75-80% of the rural roads is in a poor condition. Not
be well recognized.
only the districts roads but the condition of national roads
We organized the rest of the paper as follows: we present the
also are very poor. 80% of these roads are (or used to be)
previous related works in section 2. While section 3 described
paved [2]. The 2006 Asian Development Bank (ADB) study
technical details of the proposed work include how the data
showed that about half of these paved roads are in a poor
was obtained, architecture of proposed model and how the
condition. All developed countries almost have the similar
model is trained. Section 4 reported our experimental results.
problem. One of the major problems in developing countries
At the end in section 5 we described our conclusion and future
is maintenance of roads include potholes. Potholes formed due
tasks.
to heavy rains and movement of heavy vehicles, also become
a major reason in the inhibition of transportation system to II. R ELATED W ORK
support economic development. The most important step in
Several numbers of papers which focusing on road condition
maintenance the road is the inspection stage. In least developed
task include road damage, pothole and crack have published
Japan International Cooperation Agency (JICA). and increasing.

978-1-5386-4522-2/18/$31.00 2018
c IEEE 279
Fig. 2. Installation Setup of the Camera on the Car.

Fig. 1. Sample Images of Dataset.


III. P ROPOSED M ETHOD
A traditional machine learning method based on SVM or A. Data Preparation
Support Vector Machine was proposed [3] for road pothole In this research we used several datasets [10, 11] taken from
detection task. In this experiment they extracted the image somewhere in South African, Bangalore [21] and Rangoon
region feature based on the histogram and non linear kernel [20]. In addition, we also recorded road movies in Timor
SVM is added up to identify the target. The result shows that Leste. We conducted several different data collection tasks by
in this experiment a pothole can be well and highly recognize. using SONY Handycam HDR-CX720V and mobile Samsung
While in paper [4], a deep learning-based particularly Galaxy S7 mounted inside the vehicle as shown in Figure
convolutional neural network was used as classifier for crack 2. Several types of vehicles such as Toyota, Mitsubishi and
damage detection from concrete images. They build a clas- Nissan used for collecting the data. The collected images were
sifier which can be less influenced by the noised caused by then annotated manually. Our dataset has variation such as in
illumination, shadow casting and so on. The advantages of wet, dry and shady conditions. Some examples of our dataset
this experiment are it automatically learn the feature without are shown in Figure 1. The images also have various of size
conducting any feature extraction process and computation and contains multiple object inside, so we resized each image
compared to the conventional methods. in our dataset to 200 x 200 pixels and we crop the unnecessary
Hiroya Maeda, et al. [5] introduced road damage detection part of the image.
using deep neural networks with images captured through a Finally, we conduct the augmentation process by rotate, shift
smartphone. They developed a new large-scale dataset for road and flip the images as shown in Figure 3 on training set in
damage detection and apply the end-to-end object detection order to avoid overfitting.
method based on deep learning to the road surface damage
detection problem, and verify its detection accuracy and pro- B. Deep Learning Based Classification
cessing speed for road damage detection and classification.
Classification with artificial neural networks is a very popu-
A binary classification method using neural network [6]
lar approach to solve pattern recognition problems. One of the
was presented for classifying whether the road images are
essential components leading to these results has been a special
belonging to crack or normal road images. The network works
kind of neural network called a convolutional neural network
by feed by the feature of the images before performing the
(CNN). The main advantage of CNN is it automatically
classification.
detects the important features without any human supervision.
Another convolutional neural network model called Crack-
As shown in figure 4, a CNN model can be thought as a
Net [7] was proposed for crack detection in pixel-level pave-
combination of two components: feature extraction part and
ment. Different from common CNNs, pooling layers does not
the classification part. The convolution and pooling layers
include in this model. Experiment result shows this method
perform feature extraction while the fully connected layer
efficiently work in crack detection task.
performs classisification task. The first layer of convolution
A low-cost sensor and deep CNN-based was proposed [8] will try to detect edges and form template for edge detection.
for automatic crack detection. This experiment presented a Then subsequent layers will try to combine them into simpler
CNN model which can automatically learn the feature without shapes.
any feature extraction process. The input images manually
The basic concept behind convolutional layers is filter or
annotated before feed to the model.
kernel. It refers to an operator applied to the entirety of the
Finally, a real - time automatic pavement crack and pothole image such that it transforms the information encoded in the
recognition system for mobile Android-based devices was pixels then convolved with the input volume to obtain a feature
proposed by A. Tedeshi & F. Benedetto [9]. map. It will be presented as :

280 International Conference on Service Operations and Logistics, and Informatics (SOLI 2018)
exp−x ), where f is the neuron’s output as a function of its
input x.
Filter size of 3, pooling filter of size 2, with a stride of 1
and zero-padding are being used as hyperparameters. Outputs
of fully connected layers are connected into the output layer
which does binary classification with one neuron. The last
fully-connected layer gives the classification for the class of
the input images.

TABLE I
M ODEL A RCHITECTURE

Layers Size
Convolutional I 200 x 200 x 4
Pooling Layer I 100 x 100 x 4
ReLU
Convolutional II 100 x 100 x 8
Pooling Layer II 50 x 50 x 8
Fig. 3. Image Augmentation Results. ReLU
Convolutional III 50 x 50 x 16
Pooling Layer III 25 x 25 x 16
ReLU
 Convolutional IV 25 x 25 x 32
M i = bi + W ik ∗ Xk (1) Pooling Layer IV 12 x 12 x 32
k ReLU
Fully Connected 512
where the feature maps Mi is resulted by bias term bi plus Sigmoid
sum of product between input channel Xk and sub kernel Wik Output
[19].
After a convolution operation usually perform pooling to
reduce the dimensionality. Pooling provides basic invariance D. Training Methodology
to rotations and translations and improves the object detection The network was trained for 200 epochs on 13,244 training
capability of convolutional networks. It will reduce the number set and 3.250 validation set images. The batch size was 16. An
of parameters, which both shortens the training time and Adam optimizer is used to reduce the cost function [14]. Adam
combats overfitting. Pooling layers downsample each feature algorithm is one of those algorithms that work well across a
map independently, reducing the height and width, keeping the wide range of deep learning architectures . It is a combination
depth intact. The most common type of pooling is max pooling of gradient descent with momentum and RMSprop algorithms.
which just takes the max value in the pooling window. A max- Some advantages of using the Adam optimizer include rela-
pooling layer takes the maximum of features over small blocks tively low memory requirements and usually works well even
of a previous layer. They allow later convolutional layers to with a little tuning of hyperparamaters. We constantly set the
work on larger sections of the data, because a small patch after learning rate to 0.0001, β1 = 0.9 , β2 = 0.999 , epsilon =
the pooling layer corresponds to a much larger patch before 1e−8 and decay = 0. The binary crossentropy was applied as
it. the cost function. The cost function is used to measure the
At the end of convolutional and pooling layers, networks similarity between the predicted values and target values [19]:
generally use fully-connected layers in which each pixel is
considered as a separate neuron just like a regular neural C = −yi log ai − (−1 − yi) log(1 − ai) (2)
network. The last fully connected layer will contain as many
neuron just like a regular neural network. where yi is the given target value and ai is the predicted
value. Although deep neural nets with many parameters as
C. Architecture of the Model shown in Table 2 are very powerful machine learning systems,
The neural network design for this research is shown in overfitting is a serious problem in such networks.
Table 1. It consists of four convolutional and pooling layers To adress this problem we applied dropout [15] between
and one fully connected layer. A Rectified Linear Unit (ReLU) fully connected layer and sigmoid function.
activation function [13] is implied between the convolutional Dropout works by randomly drop neurons from the neural
and pooling layers. The ReLU, a piecewise linear function, network during training. In our experiment, we applied 20
has the simplified form f (x) = max(x, 0). It retains only the % of dropout. The sigmoid activation function at the output
positive value of the activation, by reducing the negative part to layer resulted predicted target values between 0 and 1. In our
zero, while the integrated maximum operator promotes faster experiment, the target values 0 refers to not pothole and 1
computation. The sigmoid function is employed between fully means pothole.
connected layer and the output layer, given by f (x) = (1/(1+

International Conference on Service Operations and Logistics, and Informatics (SOLI 2018) 281
Fig. 4. General graphics depiction of CNN model.

TABLE II of actual class is yes and the value of predicted class is also
N UMBER OF PARAMETERS OF THE M ODEL yes represent as TP (True Postive) ; (ii) these are the correctly
Layers Number of Parameters predicted negative values which means that the value of actual
Convolutional I 112 class is no and value of predicted class is also no represent
Convolutional II 296 as TN (True Negatives) ; (iii) when actual class is no and
Convolutional III 1.168
Convolutional IV 4.640 predicted class is yes represent as FP ( False Positives); (iv)
Fully Connected 2.359.808 when actual class is yes but predicted class is no represent as
Output 513 FN (False Negatives). False positives and false negatives, these
Total 2.366.537 values occur when actual class contradicts with the predicted
class. Above metrics can be defined as follows.

IV. EXPERIMENTAL RESULT TABLE IV


E XPERIMENT R ESULT
The network was implemented using Keras [16] as Tensor-
Flow [17] backend running on an Intel(R) Core (TM) i7-6700 Methods Accuracy Precision Recall F1-Score
CPU @ 3.40 GHz 3.41 GHz, 8.00 GB RAM. For training, Proposed Method 99.80% 100 % 99.60 % 99.60%
SVM 88.20% 86.87 % 82.20 % 81.62%
13,244 images were used and for validation, a different set
of 3,250 images were used. Table 3 showed the accuracy and
loss obtained of both training and validation. In additional, The precision is intuitively the ability of the classifier not
500 images were used for test the trained model .We made to label as positive a sample that is negative:
comparison with other conventional machine learning methods
such as SVM for evaluate the perfomance of our model. In the TP
P recision = (3)
comparison by using SVM, we reduced the images dimension TP + FP
to 50 x 50 with objective to minimize the computational time The recall is intuitively the ability of the classifier to find
and we use Support Vector Classifier (SVC) from Sklearn all the positive samples:
Library [18] as a classifier and set the kernel type as Linear,
with Gamma parameter = 5.0 and C parameter = 1.0. TP
Recall = (4)
TP + FN
TABLE III
T RAINING AND VALIDATION R ESULT Accuracy is the number of correct predictions made by the
model over all kinds predictions made. It is computed as:
Accuracy Loss
Training 100% 0.01% TP + TN
Validation 99.94% 0.32% Accuracy = (5)
TP + FP + TN + FN

We used confusion matrix to evaluate the perfomance of At last, the F1- Score can be interpreted as a weighted
our model. average of the precision and recall:
This metric is based on the following values: (i) these are the P recision ∗ Recall
correctly predicted positive values which means that the value F 1 − Score = 2 ∗ (6)
P recision + Recall

282 International Conference on Service Operations and Logistics, and Informatics (SOLI 2018)
Fig. 5. Classification Result

Based on Table 4 and equations (3-6), SVM has high overall R EFERENCES
accuracy but fail to detect several numbers of true positive and [1] Timor-Leste-Strategic-Plan-2011-2030.
false negative. Put another way it is fail to detect the small, [2] Roads-for-development-project-document.
under shady and illumination variations. The result also show [3] J. Lin and Y. Liu, Potholes detection based on SVM in the pavement
distress image, in Proc. 9th Int. Symp.Distrib. Comput. Appl. Bus. Eng.
that there is no balance between precision and recall. On the Sci., Aug. 2010, pp. 544-547.
other hand our model outperformed the SVM method, it has [4] YoungJin Cha, Wooram Choi, Oral Bykztrk, ”Deep LearningBased
higher exactness and completeness in pothole detection task Crack Damage Detection Using Convolutional Neural Networks”, 2017.
[5] Hiroya Maeda, Yoshihide Sekimoto, Toshikazu Seto, Takehiro
(see Figure 5) and has higher balance between the precision Kashiyama, Hiroshi Omata ,” Road Damage Detection Using Deep Neu-
and the recall. ral Networks with Images Captured Through a Smartphone”, University
of Tokyo, 4-6-1 Komaba, Tokyo, Japan.
V. CONCLUSION AND FUTURE WORKS [6] Justin Bray, Brijesh Verma, Xue Li, and Wade He, A Neural Network
based Technique for Automatic Classification of Road Cracks,2006
In this paper, we propose a deep learning - based and a International Joint Conference on Neural Networks Sheraton Vancouver
low-cost approach for road pothole detection for addressing the Wall Centre Hotel, Vancouver, BC, Canada July 16-21, 2006.
problem of current method which using human to perform road [7] Allen Zhang, Kelvin C. P. Wang, Baoxian Li, Enhui Yang, Xianxing Dai
& Yi Peng, Yue Fei, Yang Liu, Joshua Q. Li & Cheng Chen, ”Automated
potholes inspection task. Our model was trained on 13,244 Pixel - Level Pavement Crack Detection on 3D Asphalt Surfaces Using
images which we collected from several different places with a Deep-Learning Network”, Computer-Aided Civil and Infrastructure
various condition, size and shape. When we tested on new im- Engineering 00 (2017) 1-15.
[8] Lei Zhang, Fan Yang, Yimin Daniel Zhang, and Ying Julie Zhu, ” Road
ages, our model achieved very high accuracy and experimental Crack Detection Using Deep Convolutional Neural Network”, DOI:
result showed that Accuracy (99.80 %), Precision (100 %), 10.1109/ICIP.2016.7533052.
Recall (99.60 %) and F1-Score (99.60%). It is demonstrated in [9] A. Tedeschi, F. Benedetto, ” A real-time automatic pavement crack
and pothole recognition system for mobile Android-based devices”,
the comparison study that our model significantly outperforms Advanced Engineering Informatics 32 (2017) 11-25.
the conventional machine learning algorithm such as SVM. [10] S. Nienaber, M.J. Booysen, R.S. Kroon, Detecting potholes using simple
The result also proved that use of deep learning techniques image processing techniques and real-world footage, SATC, July 2015,
Pretoria, South Africa.
can provide better solutions than traditional algorithms with [11] S. Nienaber, R.S. Kroon, M.J. Booysen , A Comparison of Low-Cost
many weaknesses in low level feature detection. Monocular Vision Techniques for Pothole Distance Estimation, IEEE
In the future we will collect more data to overcome the CIVTS, December 2015, Cape Town, South Africa.
[12] Waseem Rawat, ZenghuiWang, Deep Convolutional Neural Networks
weakness of the the proposed method which has no ability for Image Classification: A Comprehensive Review , Neural Computa-
to detect the pothole image under condition of illumination tion 29, 2352-2449, 2017, Massachusetts Institute of Technology.
variations and build a system for automatic pothole detection [13] Vinod Nair, Geoffrey E. Hinton, Rectified Linear Units Improve Re-
stricted Boltzmann Machines , Appearing in Proceedings of the 27th
applicable on smartphones. International Conference on Machine Learning, Haifa, Israel, 2010.
Copyright 2010 by the author(s)/owner(s).
VI. ACKNOWLEDGMENT [14] Diederik P. Kingma, Jimmy Lei Ba, Adam: A Method For Stochastic
Optimization, Arxiv.
This research was supported by Japan International Coop- [15] Nitish Srivastava, Geo rey Hinton, Alex Krizhevsky, Ilya Sutskever,
eration Agency (JICA). Ruslan Salakhutdinov, Dropout: A Simple Way to Prevent Neural
Networks from Overfitting Journal of Machine Learning Research 15
(2014) 1929-1958.
[16] Franois Chollet and others, Keras, GitHub, 2015.
[17] Martn Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng
Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu
Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey
Irving, Michael Isard, Rafal Jozefowicz, Yangqing Jia, Lukasz Kaiser,
Manjunath Kudlur, Josh Levenberg, Dan Man, Mike Schuster, Rajat

International Conference on Service Operations and Logistics, and Informatics (SOLI 2018) 283
Monga, Sherry Moore, Derek Murray, Chris Olah, Jonathon Shlens,
Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Van-
houcke, Vijay Vasudevan, Fernanda Vigas, Oriol Vinyals, Pete Warden,
Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng,
TensorFlow: Large-scale machine learning on heterogeneous systems,
2015.
[18] Pedregosa, F.and Varoquaux, G. and Gramfort, A. and Michel, V.
and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A.
and Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay,
E., ”Scikit-learn: Machine Learning in Python”, Journal of Machine
Learning Research, volume=12, pages=2825-2830, 2011.
[19] Jihen Amara, Bassem Bouaziz, Alsayed Algergawy ,” A Deep Learning
- based Approach for Banana Leaf Diseases Classification ”, Lecture
Notes in Informatics (LNI), Gesellschaft fr Informatik, Bonn 2017, 79.
[20] Road video data, https://www.youtube.com/watch?v=b2jQ42hVY8c
[21] Road video data, https://www.youtube.com/watch?v=ZUbDPhL2gEE

284 International Conference on Service Operations and Logistics, and Informatics (SOLI 2018)

You might also like