IEEE

Real-time pothole detection system on vehicle using
improved YOLOv5 in Malaysia

Au Yang Her Weng Kean Yew Pang Jia Yew Melissa Chong Jia Ying
School of Engineering and School of Engineering and School of Engineering and School of Engineering and
Physical Sciences Physical Sciences Physical Sciences Physical Sciences
IECON 2022 – 48th Annual Conference of the IEEE Industrial Electronics Society | 978-1-6654-8025-3/22/$31.00 ©2022 IEEE | DOI: 10.1109/IECON49645.2022.9968423
Heriot-Watt University Heriot-Watt University Heriot-Watt University Heriot-Watt University

Malaysia Malaysia Malaysia Malaysia
Putrajaya, Malaysia Putrajaya, Malaysia Putrajaya, Malaysia Putrajaya, Malaysia
w.yew@hw.ac.uk w.yew@hw.ac.uk j.pang@hw.ac.uk w.yew@hw.ac.uk
Abstract— Commonly in Malaysia, the road surfaces deteriorate up in a hospital for recovery treatments [2]. Within a few days
over time which results to potholes. With the increasing number later, on the 3rd of January 2021, a 75-year-old motorcyclist
of potholes on the road, it has become a road hazard, potentially had met fatality after crashing into a pothole in Kuala Lumpur
harming the safety of the Malaysian citizens. Potholes generally where he flew off his motorcycle and landed 30m away from
will cause drivers to skew away from their original direction his motorcycle [3]. With these cases, a method would be
which may result in vehicle damages and/or road accidents. needed to assist drivers to quickly detect potholes on road
Computer vision has improved significantly in the past decade surfaces.
to improve the technologies that require image processing, such
as pothole detection. However, it has several limitations in Several research have been done for pothole detection.
pothole detection as potholes have inconsistent shapes, making Authors in [4] used popular semantic segmentation techniques
it difficult to obtain an accurate prediction. Another limitation such as Mask-RCNN and U-Net algorithms and they yielded
is the speed of the detection algorithm is not able to predict the an accuracy of 80.00% and 41.67% and respectively. This
pothole in real time. In this paper, the use of computer vision shows that Mask RCNN significantly outperformed U-Net,
technology on vehicles is presented as a form of solution to detect with especially poor performance by U-Net on images with
the potholes in real time. A deep learning model based on multiple potholes and long shot images. On the other hand,
Convolutional Neural Networks, YOLOv5 is found to improve authors from [5] have developed a stereo vision-based pothole
the accuracy of the prediction as compared to past results. The
detection system based on a novel disparity transformation
findings on the trained YOLOv5 model have a mAP@0.5 of 80.8
algorithm and disparity map modelling algorithm and yielded
%, 82.2 % and 82.5 % on the YOLOv5m6, YOLOv5s6 and
YOLOv5n6 respectively. The trained YOLOv5n6 model was an accuracy of 99.64%. Disparity map modelling prevents
chosen to be the main image processing model for its robust disparity errors of larger than one pixel which provides higher
performance to size ratio. The results were able to demonstrate accuracy than point cloud modelling However, this research
the actual deployment of the trained YOLOv5n6 model on the results in a further need for segmentation of road surfaces as
road in real time with distance estimation. not all surfaces of the road can be treated as quadratic. The
pothole detection model in paper [6] was constructed using
Keywords—computer vision, pothole detection, convolutional “you only look once” (YOLO) algorithm based on
neural network convolutional neural network (CNN). In this study, a pothole
image was first converted to a single channel using grey scale
I. INTRODUCTION to minimize loss in quality of the image and was then fed into
. Technologies have improved human lives by several fold the YOLO algorithm. Overall, the model showed a mean
over the past decade until humans are almost capable to average precision (mAP) of 77.86%. On the other hand,
developing an autonomous navigation system. Although authors in [7] proposed a pothole detection system using a
multiple advancements in technology had been invented, there black-box camera. The proposed algorithm consists of 3 steps:
is one flaw in technology that still bothers people till now: pre-processing (cropping, greyscaling, and thresholding),
pothole on road surfaces. Potholes vary in size and shape candidate extraction (detecting lanes to remove unwanted
caused by the ground water expanding and contracting. Thus, noise) and cascaded detection (distinguish true or false
potholes that have severe depression can cause vehicles to detected potholes) was able to obtain a mAP of 71%.
skew away from the original direction, causing drivers to lose Although extensive research has been conducted on various
control, resulting in vehicle damages and/or road accidents. methodologies as well as the accuracy of the pothole
This problem is more prevalent in lesser developed countries detection, these literatures were lacking in terms of the speed
and can be found quite commonly in Malaysia. With the help of the detection in real time.
from Waze, an application for Malaysian road traffic In our paper, the pothole detection system will be constructed
navigation, reported a whopping 52,295 potholes in the state using the YOLOv5 model to build a pothole recognition
Selangor and city Kuala Lumpur alone. Besides that, the methodology. After training the model using the existing
statistics show that the total deaths caused from pothole datasets, the optimal model of the YOLOv5 is selected to be
accidents from the year 2000 to 2011 is 840 and amounts to deployed on the Nvidia Jetson Xavier X which is located on
11.2% of all the road deaths based on road defects [1]. the vehicle. The images of the moving vehicle from the Intel
Many local drivers have expressed their concerns on Realsense D435i Camera will be given as input, then the
potholes, where even the Malaysian Minister Khairy system will be able to detect the potholes and its distance as
Jamaluddin, who had met an accident in December 2020, the vehicle is moving in real time. The remainder of our paper
where he tripped over a pothole while cycling and had ended is structured as follows; Section II provides a description of
Authorized licensed use limited to: Vasavi College of Engineering. Downloaded on September 22,2023 at 02:07:12 UTC from IEEE Xplore. Restrictions apply.
the YOLOv5 segmentation framework. Our methodology is In addition, YOLOv5 has 5 different models to choose from
presented in Section III, with results and discussion in Section with YOLOv5x6 (extra-large) being the most accurate model
IV. Finally, a summary of the primary findings is presented in but requires more graphical computational cost and being the
Section V. biggest in size while YOLOv5n6 (nano) is the least accurate
but requires less graphical computational cost and smallest in
II. FRAMEWORK size. The statistics and performances of each model can be
A. Model development found in Figures 3 & 4. These model weights were pretrained
using COCO dataset that contains 5000 images of 80 classes
In this paper the CNN (Convolutional Neural Network) of identification with 300 epochs of training cycles.
model that will be used is You Only Look Once version 5
(YOLOv5) by Ultralytics where their model can be found in
GitHub. YOLOv5 is based off of Pytorch machine learning
library which can be executed in Linux-based OS using
Python 3.6 and above. The schematic architecture of
YOLOv5 can be seen in Figure 1.
Figure 3: YOLOv5 Models and their Performances[9]
Figure 1: Schematic Architecture of YOLOv5
In the initial stages of YOLOv5, it requires a Backbone, a

feature extraction network that is used to encode the
network’s input into a certain feature extraction. In the use
case of YOLOv5, it uses the DarkNet as its backbone where
it displays information of the classified images and prints the
top 10 classes of the images as it loads the configuration file
and weights. This information then gets passed to the Path
Aggregation Network (PANet) which is the neck of
YOLOv5. PANet framework first shortens the information Figure 4: YOLOv5 Graphical Computational
path and enhances the feature pyramid by creating bottom-up Performances[9]
path augmentation [8]. Then it recovers the broken
information path between all feature levels through the
adaptive feature pooling; it aggregates features from all B. Image Dataset for Labelling
feature levels while avoiding expeditiously assigned results The pothole image datasets were taken from RTK, Kaggle,
[8]. At the end of PANet it captures different views of each Pothole-600, Roboflow and PotDataset datasets which
proposed layers by augment mask prediction [8]. Figure 2 amounts to a total of 3784 images of potholes. For the best
shows the sequence for PANet framework. outcome and accuracy, the images were split into a ratio of
7:3 where the train folder contains 70 % of the entire dataset
collection and the test folder contains 30 % of the dataset
collection.
III. METHODOLOGY
A. Model training
Due to the NVIDIA Jetson Xavier NX’s 384 CUDA cores, in
this paper, the following three models of YOLOv5:
YOLOv5m6, YOLOv5s6 and YOLOv5n6 were trained. This
Figure 2: PANet Framework: (a) Feature Pyramid Network is because they provide a balance of performance to
(FPN). (b) Bottom-up Path Augmentation. (c) Adaptive computational cost ratio making it more efficient for
Feature Pooling. (d) Box Branch. (e) Fully connected Fusion
deployment. Google Colab Pro was used for the model
At the output side of YOLOv5, it displays the class training with GPU ‘Tesla P100-PCIE-16GB’ and RAM of
approximation by the convolutional layers with its bounding 32GB.
boxes on the detected objects.
B. Hardware integration with YOLOv5 Table 1 - Tabulation of Trained Models Pothole Detection
The hardware architecture is shown in Figure 5. An Intel Performances
Realsense D435i camera will be used for its stereoscopic Pothole Detection Performances
functions, which allows for 3D calculations and perspective
on distance estimation. The camera is connected to an Nvidia Models Precision (%) Recall (%) mAP@0.5 (%) mAP@0.5:0.95 (%)
Jetson Xavier NX supercomputer which holds and operates
the AI algorithm in detecting potholes and the distance of the YOLOv5m6 84.2 71.3 80.8 53.9
pothole from the vehicle. The Nvidia Jetson Xavier NX
YOLOv5s6 80.7 75.4 82.2 54.7
consumes 30 W of power with specs of 8 GB RAM and 384
CUDA cores (GPU) which are crucial for the use of AI YOLOv5n6 81.9 74.6 82.5 53.8
computations.
From Table 1, the trained m6 model of the YOLOv5 has the

highest precision while surprisingly the trained n6 model has
the highest mAP@0.5%. In terms of graphical computational
cost, the trained n6 model requires the least according to
Figures 3 & 4. Thus, the n6 model provides the best precision
to performance ratio and is selected as the use case for the
pothole detection. To demonstrate the performance of the
trained YOLOv5n6 model, Google Colab was used with the
trained model The image size was set to 1152 pixels and the
‘conf’ or confidence/threshold was set to 0.30 or 30%. The
Figure 5: Pothole detection hardware architecture results are shown in Figure 7.
The Nvidia Jetson Xavier NX is loaded with JetPack 4.4 SDK

as specified in their “Getting Started with Jetson Xavier NX
Developer Kit” site from Nvidia [10]. The home directory of
the Nvidia Jetson Xavier NX will house the libraries and
dependencies for the YOLOv5 to run: numpy, pandas, scipy,
scikit-image, matplotlib, seaborn, opencv-python, torch,
torch-vision and cython .
Once the configuration for YOLOv5 is done, the drivers for

the Intel Realsense D435i camera is installed for the camera
to function with YOLOv5. The folder that contains the shell
script for the installation of RealSense SDK was downloaded
into the Nvidia Jetson Xavier NX and ran in the terminal. The
flow chart of the proposed YOLOv5 Instance Segmentation
is shown in Figure 6. (a)
Figure 6: Flow chart of the proposed YOLOv5 Instance

Segmentation
IV. RESULT & DISCUSSION (b)
A. Model training performance Figure 7: Results of The Detection in

Google Colab (a) Before and (b) After
The following show the validation results by their precision,
recall rate and mean average precision over IoU (mAP@0.5
& mAP@0.5:0.95) upon finishing the model training. Since B. Model deployment on hardware
there were three models that were trained; YOLOv5m6,
YOLOv5s6 and YOLOv5n6, their results are shown in Table After getting the results on the performance of the trained
1. model, the model would then be deployed into the Nvidia
Jetson Xavier NX to see how it performs on the device itself.
The Nvidia Jetson Xavier Nx and the Intel Realsense D435i
camera were installed on-board a vehicle; a Myvi car and
were powered through a car inverter with an output power of
220W as can be seen in Figure 8. The vehicle was deployed
and tested in the neighbourhood of Casawood Cybersouth,
Malaysia.
(c)
Figure 8: Pothole detection hardware architecture
The trained model n6 was executed in the directory of the

YOLOv5 folder as the necessary files are located there. After
executing the code in terminal, the image processing
commences, and the terminal shows the inference time for
each frame that it processes which showed an average of
0.040 s or 40 ms per frame which amounts to about ~25 FPS
(Frames Per Second). Figures 9 (a) – (d) shows the model
detection using the on-board Intel Realsense D435i camera
with its distance measurements of the pothole.
(d)
Figure 9(a)-(d): Results of the Pothole Detection System
on Vehicle in real-time
While the YOLOv5n6 was running, the distance accuracy

measurements were taken down for future reference on
distance estimation. The model when using the Intel Real
Sense D435i camera, has an accurate detection at a distance
of 1.5 m onwards whereas a distortion between 0-1 m. In a
brightly lit condition, the model has an accurate detection up
to 15 m whereas in low light condition the accuracy of the
detection is valid up to 12 m.
(a)
However, there were some imperfections observed on the
YOLOv5n6 model during the pothole detection. As shown in
Figure 9 below the model detected manhole as pothole. The
manhole detected can be accepted as manholes can also cause
accidents like potholes. The detection also accounts for the
severity and depth of the pothole in the images i.e. how wide
the potholes are, dark colour characteristics of potholes which
would perceive the depth of the pothole and so on as the
image datasets used had severe depression and wide surface
areas of potholes. If the manhole is flushed to the surface of
the road the model would not detect it as a pothole.
(b)
The trained YOLOv5n6 model was chosen to be the main
image processing model for its robust performance to size
ratio which perfectly suited the mobility on deploying it and
lack of GPU cores in the Nvidia Jetson Xavier NX. Following
that, the detection model was then integrated with the Intel
Realsense D435i Stereo Camera to provide images as the
input. The devices on-board the vehicle were installed and
tested in a household area called Casawood Cybersouth
located in Dengkil, Malaysia.
The results were able to demonstrate the actual deployment
of the trained YOLOv5n6 model on the road in real time with
distance estimation. However, the results also showed some
minor errors as it detects manholes, trees and shadows as
potholes which is due to the resemblances with vast images
of potholes in the image dataset. It is recommended that
further research can be done to enhance the image processing
particularly during night-time and increase detection
performance on long distance object detection.
Figure 10: Manhole detected of the
Pothole Detection System ACKNOWLEDGMENT
In addition, the model also detected certain parts of trees as This work is part of the ‘Road Obstacles and Pedestrian
potholes and even detected shadows as potholes. The Awareness System’ (ROPAS) project and funded by the
detected trees and shadows are based on the angle at which Empower Grant Scheme by Heriot-Watt University
the camera was mounted and the intensity of light which may Malaysia.
off-balance the white balance of the camera. This off-balance
could make the trees and shadows look like potholes to the REFERENCES
model as shown in Figure 11. Overall, the model had done a [1] Y. Darma, M. R. Karim, and S. Abdullah, “An analysis of
good job at detecting potholes with its mAP@0.5 of 82.5%. Malaysia road traffic death distribution by road environment,”
Sadhana - Academy Proceedings in Engineering Sciences, vol. 42,
no. 9, pp. 1605–1615, Sep. 2017, doi: 10.1007/s12046-017-0694-
9.
[2] “Malaysia’s Minister Khairy Jamaluddin injures from fall after
bicycle hits pothole,” The Straits Times, Dec. 28, 2020. Accessed:
May 31, 2022. [Online]. Available:
https://www.straitstimes.com/asia/se-asia/malaysias-minister-
khairy-jamaluddin-injured-from-fall-after-bicycle-hits-pothole
[3] K. Perimbanayagam, “75-year-old man killed after crashing into
pothole,” New Straits Times, Jan. 03, 2021. Accessed: May 31,
2022. [Online]. Available:
https://www.nst.com.my/news/nation/2021/01/654169/75-year-
old-man-killed-after-crashing-pothole
[4] S. Thiruppathiraj, U. Kumar, and S. Buchke, “Automatic pothole
classification and segmentation using android smartphone sensors
and camera images with machine learning techniques,” in 2020
IEEE REGION 10 CONFERENCE (TENCON), Nov. 2020, pp.
1386–1391. doi: 10.1109/TENCON50793.2020.9293883.
[5] R. Fan, U. Ozgunalp, B. Hosking, M. Liu, and I. Pitas, “Pothole
Detection Based on Disparity Transformation and Road Surface
Modeling,” IEEE Transactions on Image Processing, vol. 29, pp.
897–908, 2020, doi: 10.1109/TIP.2019.2933750.
[6] J. W. Baek and K. Chung, “Pothole classification model using
edge detection in road image,” Applied Sciences (Switzerland),
Figure 11: Part of trees detected of the Pothole Detection vol. 10, no. 19. MDPI AG, Oct. 01, 2020. doi:
System 10.3390/APP10196662.
[7] Y. Jo and S. Ryu, “Pothole detection system using a black-box
V. CONCLUSION camera,” Sensors (Switzerland), vol. 15, no. 11, pp. 29316–29331,
Nov. 2015, doi: 10.3390/s151129316.
This paper primarily shows the work process done on the [8] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path Aggregation Network
development of pothole detection system. The pothole for Instance Segmentation,” Mar. 2018.
detection methodology was based off of YOLOv5 CNN [9] GitHub, “GitHub - ultralytics/yolov5.”
model that has included its pre-processing backbone DarkNet https://github.com/ultralytics/yolov5 (accessed May 31, 2022).
and its neck PANet which essentially entraps the object [10] NVIDIA Developer, “Getting Started With Jetson Xavier NX
detection and use convolutional methods to achieve the image Developer Kit.”
https://developer.nvidia.com/embedded/learn/get-started-jetson-
processing model. The findings on the trained YOLOv5 xavier-nx-devkit (accessed May 31, 2022).
model have a mAP@0.5 of 80.8 %, 82.2 % and 82.5 % on the
YOLOv5m6, YOLOv5s6 and YOLOv5n6 respectively.

IEEE

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

IEEE

Uploaded by

Copyright:

Available Formats

Real-time pothole detection system on vehicle using

improved YOLOv5 in Malaysia

Heriot-Watt University Heriot-Watt University Heriot-Watt University Heriot-Watt University

Figure 3: YOLOv5 Models and their Performances[9]

Figure 1: Schematic Architecture of YOLOv5

In the initial stages of YOLOv5, it requires a Backbone, a

From Table 1, the trained m6 model of the YOLOv5 has the

The Nvidia Jetson Xavier NX is loaded with JetPack 4.4 SDK

Once the configuration for YOLOv5 is done, the drivers for

Figure 6: Flow chart of the proposed YOLOv5 Instance

IV. RESULT & DISCUSSION (b)

A. Model training performance Figure 7: Results of The Detection in

Figure 8: Pothole detection hardware architecture

The trained model n6 was executed in the directory of the

While the YOLOv5n6 was running, the distance accuracy

You might also like