Professional Documents
Culture Documents
Abstract—This research proposes a low-cost solution for detect- countries, visual inspection by human is the main form of road
ing road potholes image by using convolutional neural network conditions assessment method in order to guarantee the road
(CNN). Our model is trained entirely on the image which is still meet the requirements of services in general. However,
collected from several different places and has variation such
as in wet, dry and shady conditions. The experiment using the manually reviewing and assessing visual road data is a time-
500 testing images showed that our model can achieve (99.80 %) consuming and expensive procedure. In addition, the results
of Accuracy, Precision (100%), Recall (99.60%), and F-Measure are highly influenced by the subjectivity and experience of
(99.60%) simultaneously. the human raters. Moreover, such the methods are very slow
Index Terms—Potholes, Deep Learning, Convolutional Neural and uncomfortable for inspectors. With the advance of science
Network, Image Classification
and technology and popularity of the deep learning model in
I. I NTRODUCTION the engineering field, sophisticated and low-cost systems with
intelligence can be used to detect potholes instead of humans.
Timor-Leste is one of the newest and least developed
By using the sophisticated systems, the time consumed and the
country in the region of Asia. Therefore, transportation infras-
cost for detecting the potholes will be reduced and potholes are
tructure development and maintenance are the essential part
detected with more accuracy. This paper presents an enhanced
in developing the country. Roads is one of the most important
pothole detection method that takes advantage of a sequence
infrastructure in transportation system. It indirectly contributes
of images which taken by low cost and simple equipment from
to the economic growth of the country and it is very important
road condition video by one of the deep learning techniques,
that the road maintenance is a significant concern. The poor of
Convolutional Neural Network (CNN). We develope a CNN
road network condition is the main factor for economic devel-
model which can detect pothole images from various condi-
opment and poverty alleviation [1]. Almost all the roads net-
tions such as dry, wet and shady conditions and various sizes
work was constructed by the Indonesian government since 20
and shapes. In order to evaluate the performance, we compare
years ago. Nowadays, most of those roads have been damaged
the result with traditional or conventional machine learning
and greatly hampered the transport circulation. Approximately,
method. The resulting performance indicates that potholes can
around 75-80% of the rural roads is in a poor condition. Not
be well recognized.
only the districts roads but the condition of national roads
We organized the rest of the paper as follows: we present the
also are very poor. 80% of these roads are (or used to be)
previous related works in section 2. While section 3 described
paved [2]. The 2006 Asian Development Bank (ADB) study
technical details of the proposed work include how the data
showed that about half of these paved roads are in a poor
was obtained, architecture of proposed model and how the
condition. All developed countries almost have the similar
model is trained. Section 4 reported our experimental results.
problem. One of the major problems in developing countries
At the end in section 5 we described our conclusion and future
is maintenance of roads include potholes. Potholes formed due
tasks.
to heavy rains and movement of heavy vehicles, also become
a major reason in the inhibition of transportation system to II. R ELATED W ORK
support economic development. The most important step in
Several numbers of papers which focusing on road condition
maintenance the road is the inspection stage. In least developed
task include road damage, pothole and crack have published
Japan International Cooperation Agency (JICA). and increasing.
978-1-5386-4522-2/18/$31.00 2018
c IEEE 279
Fig. 2. Installation Setup of the Camera on the Car.
280 International Conference on Service Operations and Logistics, and Informatics (SOLI 2018)
exp−x ), where f is the neuron’s output as a function of its
input x.
Filter size of 3, pooling filter of size 2, with a stride of 1
and zero-padding are being used as hyperparameters. Outputs
of fully connected layers are connected into the output layer
which does binary classification with one neuron. The last
fully-connected layer gives the classification for the class of
the input images.
TABLE I
M ODEL A RCHITECTURE
Layers Size
Convolutional I 200 x 200 x 4
Pooling Layer I 100 x 100 x 4
ReLU
Convolutional II 100 x 100 x 8
Pooling Layer II 50 x 50 x 8
Fig. 3. Image Augmentation Results. ReLU
Convolutional III 50 x 50 x 16
Pooling Layer III 25 x 25 x 16
ReLU
Convolutional IV 25 x 25 x 32
M i = bi + W ik ∗ Xk (1) Pooling Layer IV 12 x 12 x 32
k ReLU
Fully Connected 512
where the feature maps Mi is resulted by bias term bi plus Sigmoid
sum of product between input channel Xk and sub kernel Wik Output
[19].
After a convolution operation usually perform pooling to
reduce the dimensionality. Pooling provides basic invariance D. Training Methodology
to rotations and translations and improves the object detection The network was trained for 200 epochs on 13,244 training
capability of convolutional networks. It will reduce the number set and 3.250 validation set images. The batch size was 16. An
of parameters, which both shortens the training time and Adam optimizer is used to reduce the cost function [14]. Adam
combats overfitting. Pooling layers downsample each feature algorithm is one of those algorithms that work well across a
map independently, reducing the height and width, keeping the wide range of deep learning architectures . It is a combination
depth intact. The most common type of pooling is max pooling of gradient descent with momentum and RMSprop algorithms.
which just takes the max value in the pooling window. A max- Some advantages of using the Adam optimizer include rela-
pooling layer takes the maximum of features over small blocks tively low memory requirements and usually works well even
of a previous layer. They allow later convolutional layers to with a little tuning of hyperparamaters. We constantly set the
work on larger sections of the data, because a small patch after learning rate to 0.0001, β1 = 0.9 , β2 = 0.999 , epsilon =
the pooling layer corresponds to a much larger patch before 1e−8 and decay = 0. The binary crossentropy was applied as
it. the cost function. The cost function is used to measure the
At the end of convolutional and pooling layers, networks similarity between the predicted values and target values [19]:
generally use fully-connected layers in which each pixel is
considered as a separate neuron just like a regular neural C = −yi log ai − (−1 − yi) log(1 − ai) (2)
network. The last fully connected layer will contain as many
neuron just like a regular neural network. where yi is the given target value and ai is the predicted
value. Although deep neural nets with many parameters as
C. Architecture of the Model shown in Table 2 are very powerful machine learning systems,
The neural network design for this research is shown in overfitting is a serious problem in such networks.
Table 1. It consists of four convolutional and pooling layers To adress this problem we applied dropout [15] between
and one fully connected layer. A Rectified Linear Unit (ReLU) fully connected layer and sigmoid function.
activation function [13] is implied between the convolutional Dropout works by randomly drop neurons from the neural
and pooling layers. The ReLU, a piecewise linear function, network during training. In our experiment, we applied 20
has the simplified form f (x) = max(x, 0). It retains only the % of dropout. The sigmoid activation function at the output
positive value of the activation, by reducing the negative part to layer resulted predicted target values between 0 and 1. In our
zero, while the integrated maximum operator promotes faster experiment, the target values 0 refers to not pothole and 1
computation. The sigmoid function is employed between fully means pothole.
connected layer and the output layer, given by f (x) = (1/(1+
International Conference on Service Operations and Logistics, and Informatics (SOLI 2018) 281
Fig. 4. General graphics depiction of CNN model.
TABLE II of actual class is yes and the value of predicted class is also
N UMBER OF PARAMETERS OF THE M ODEL yes represent as TP (True Postive) ; (ii) these are the correctly
Layers Number of Parameters predicted negative values which means that the value of actual
Convolutional I 112 class is no and value of predicted class is also no represent
Convolutional II 296 as TN (True Negatives) ; (iii) when actual class is no and
Convolutional III 1.168
Convolutional IV 4.640 predicted class is yes represent as FP ( False Positives); (iv)
Fully Connected 2.359.808 when actual class is yes but predicted class is no represent as
Output 513 FN (False Negatives). False positives and false negatives, these
Total 2.366.537 values occur when actual class contradicts with the predicted
class. Above metrics can be defined as follows.
We used confusion matrix to evaluate the perfomance of At last, the F1- Score can be interpreted as a weighted
our model. average of the precision and recall:
This metric is based on the following values: (i) these are the P recision ∗ Recall
correctly predicted positive values which means that the value F 1 − Score = 2 ∗ (6)
P recision + Recall
282 International Conference on Service Operations and Logistics, and Informatics (SOLI 2018)
Fig. 5. Classification Result
Based on Table 4 and equations (3-6), SVM has high overall R EFERENCES
accuracy but fail to detect several numbers of true positive and [1] Timor-Leste-Strategic-Plan-2011-2030.
false negative. Put another way it is fail to detect the small, [2] Roads-for-development-project-document.
under shady and illumination variations. The result also show [3] J. Lin and Y. Liu, Potholes detection based on SVM in the pavement
distress image, in Proc. 9th Int. Symp.Distrib. Comput. Appl. Bus. Eng.
that there is no balance between precision and recall. On the Sci., Aug. 2010, pp. 544-547.
other hand our model outperformed the SVM method, it has [4] YoungJin Cha, Wooram Choi, Oral Bykztrk, ”Deep LearningBased
higher exactness and completeness in pothole detection task Crack Damage Detection Using Convolutional Neural Networks”, 2017.
[5] Hiroya Maeda, Yoshihide Sekimoto, Toshikazu Seto, Takehiro
(see Figure 5) and has higher balance between the precision Kashiyama, Hiroshi Omata ,” Road Damage Detection Using Deep Neu-
and the recall. ral Networks with Images Captured Through a Smartphone”, University
of Tokyo, 4-6-1 Komaba, Tokyo, Japan.
V. CONCLUSION AND FUTURE WORKS [6] Justin Bray, Brijesh Verma, Xue Li, and Wade He, A Neural Network
based Technique for Automatic Classification of Road Cracks,2006
In this paper, we propose a deep learning - based and a International Joint Conference on Neural Networks Sheraton Vancouver
low-cost approach for road pothole detection for addressing the Wall Centre Hotel, Vancouver, BC, Canada July 16-21, 2006.
problem of current method which using human to perform road [7] Allen Zhang, Kelvin C. P. Wang, Baoxian Li, Enhui Yang, Xianxing Dai
& Yi Peng, Yue Fei, Yang Liu, Joshua Q. Li & Cheng Chen, ”Automated
potholes inspection task. Our model was trained on 13,244 Pixel - Level Pavement Crack Detection on 3D Asphalt Surfaces Using
images which we collected from several different places with a Deep-Learning Network”, Computer-Aided Civil and Infrastructure
various condition, size and shape. When we tested on new im- Engineering 00 (2017) 1-15.
[8] Lei Zhang, Fan Yang, Yimin Daniel Zhang, and Ying Julie Zhu, ” Road
ages, our model achieved very high accuracy and experimental Crack Detection Using Deep Convolutional Neural Network”, DOI:
result showed that Accuracy (99.80 %), Precision (100 %), 10.1109/ICIP.2016.7533052.
Recall (99.60 %) and F1-Score (99.60%). It is demonstrated in [9] A. Tedeschi, F. Benedetto, ” A real-time automatic pavement crack
and pothole recognition system for mobile Android-based devices”,
the comparison study that our model significantly outperforms Advanced Engineering Informatics 32 (2017) 11-25.
the conventional machine learning algorithm such as SVM. [10] S. Nienaber, M.J. Booysen, R.S. Kroon, Detecting potholes using simple
The result also proved that use of deep learning techniques image processing techniques and real-world footage, SATC, July 2015,
Pretoria, South Africa.
can provide better solutions than traditional algorithms with [11] S. Nienaber, R.S. Kroon, M.J. Booysen , A Comparison of Low-Cost
many weaknesses in low level feature detection. Monocular Vision Techniques for Pothole Distance Estimation, IEEE
In the future we will collect more data to overcome the CIVTS, December 2015, Cape Town, South Africa.
[12] Waseem Rawat, ZenghuiWang, Deep Convolutional Neural Networks
weakness of the the proposed method which has no ability for Image Classification: A Comprehensive Review , Neural Computa-
to detect the pothole image under condition of illumination tion 29, 2352-2449, 2017, Massachusetts Institute of Technology.
variations and build a system for automatic pothole detection [13] Vinod Nair, Geoffrey E. Hinton, Rectified Linear Units Improve Re-
stricted Boltzmann Machines , Appearing in Proceedings of the 27th
applicable on smartphones. International Conference on Machine Learning, Haifa, Israel, 2010.
Copyright 2010 by the author(s)/owner(s).
VI. ACKNOWLEDGMENT [14] Diederik P. Kingma, Jimmy Lei Ba, Adam: A Method For Stochastic
Optimization, Arxiv.
This research was supported by Japan International Coop- [15] Nitish Srivastava, Geo rey Hinton, Alex Krizhevsky, Ilya Sutskever,
eration Agency (JICA). Ruslan Salakhutdinov, Dropout: A Simple Way to Prevent Neural
Networks from Overfitting Journal of Machine Learning Research 15
(2014) 1929-1958.
[16] Franois Chollet and others, Keras, GitHub, 2015.
[17] Martn Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng
Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu
Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey
Irving, Michael Isard, Rafal Jozefowicz, Yangqing Jia, Lukasz Kaiser,
Manjunath Kudlur, Josh Levenberg, Dan Man, Mike Schuster, Rajat
International Conference on Service Operations and Logistics, and Informatics (SOLI 2018) 283
Monga, Sherry Moore, Derek Murray, Chris Olah, Jonathon Shlens,
Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Van-
houcke, Vijay Vasudevan, Fernanda Vigas, Oriol Vinyals, Pete Warden,
Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng,
TensorFlow: Large-scale machine learning on heterogeneous systems,
2015.
[18] Pedregosa, F.and Varoquaux, G. and Gramfort, A. and Michel, V.
and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A.
and Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay,
E., ”Scikit-learn: Machine Learning in Python”, Journal of Machine
Learning Research, volume=12, pages=2825-2830, 2011.
[19] Jihen Amara, Bassem Bouaziz, Alsayed Algergawy ,” A Deep Learning
- based Approach for Banana Leaf Diseases Classification ”, Lecture
Notes in Informatics (LNI), Gesellschaft fr Informatik, Bonn 2017, 79.
[20] Road video data, https://www.youtube.com/watch?v=b2jQ42hVY8c
[21] Road video data, https://www.youtube.com/watch?v=ZUbDPhL2gEE
284 International Conference on Service Operations and Logistics, and Informatics (SOLI 2018)