You are on page 1of 7

YOLOv8-Based Visual Detection of Road Hazards:

Potholes, Sewer Covers, and Manholes


Om Khare Shubham Gandhi Aditya Rahalkar
Dept. of Computer Engineering and IT Dept. of Computer Engineering and IT Dept. of Mechanical Engineering
COEP Technological University COEP Technological University COEP Technological University
Pune, India Pune, India Pune, India
khareom20.comp@coeptech.ac.in shubhammg20.comp@coeptech.ac.in rahalkaram20.mech@coeptech.ac.in

Sunil Mane
Dept. of Computer Engineering and IT
COEP Technological University
Pune, India
sunilbmane.comp@coeptech.ac.in
arXiv:2311.00073v1 [cs.CV] 31 Oct 2023

Abstract—Effective detection of road hazards plays a pivotal One notable innovation in the field of Computer Vision
role in road infrastructure maintenance and ensuring road safety. is the You Only Look Once version 8 (YOLOv8) object
This research paper provides a comprehensive evaluation of detection algorithm developed by Ultralytics et al. YOLOv8
YOLOv8, an object detection model, in the context of detecting
road hazards such as potholes, Sewer Covers, and Man Holes. is a state-of-the-art deep learning model that has demonstrated
A comparative analysis with previous iterations, YOLOv5 and success in various real-time object detection applications. This
YOLOv7, is conducted, emphasizing the importance of compu- paper investigates the potential effectiveness of YOLOv8 as a
tational efficiency in various applications. The paper delves into robust tool for Pothole, Sewer Cover, and Man Hole detec-
the architecture of YOLOv8 and explores image preprocessing tion. It conducts a comprehensive analysis and comparative
techniques aimed at enhancing detection accuracy across diverse
conditions, including variations in lighting, road types, hazard evaluation, shedding light on its performance compared to its
sizes, and types. Furthermore, hyperparameter tuning experi- predecessors, YOLOv7 and YOLOv5.
ments are performed to optimize model performance through
adjustments in learning rates, batch sizes, anchor box sizes, and A. Problem Statement
augmentation strategies. Model evaluation is based on Mean Traditional methods of pothole detection often rely on man-
Average Precision (mAP), a widely accepted metric for object
detection performance. The research assesses the robustness and ual inspections or sensor-based systems, both of which exhibit
generalization capabilities of the models through mAP scores limitations in terms of accuracy, scalability, and real-time
calculated across the diverse test scenarios, underlining the sig- responsiveness. Manual inspections are time-consuming, sub-
nificance of YOLOv8 in road hazard detection and infrastructure jective, and impractical for covering extensive road networks
maintenance. efficiently. On the other hand, sensor-based systems can be ex-
Index Terms—YOLO, Object Detection, Autonomous, Deep
pensive to deploy and maintain, and their effectiveness can be
learning, Potholes, Sewer Covers, Man Holes
hindered by various environmental factors. Furthermore, many
existing computer vision approaches for pothole detection lack
I. I NTRODUCTION the speed and accuracy required for real-time implementation,
especially in the context of autonomous vehicles.
In the era of advancing transportation systems, the goal While there have been previous attempts to use deep learn-
of enhancing transportation infrastructure for safety and ef- ing models like YOLOv7 and YOLOv5 for pothole detection,
ficiency has come to the forefront. Factors such as heavy these models have their own set of disadvantages. YOLOv7,
rainfall, inadequate road maintenance, and the possibility of for example, may exhibit lower accuracy and efficiency com-
natural disasters have highlighted the urgent need for swift pared to newer versions, while YOLOv5 might not offer the
and efficient road hazard detection. Potholes, Sewer Covers, same level of robustness required for diverse environmental
and Man Holes pose substantial threats to both motorists conditions. Addressing these challenges is crucial for enhanc-
and pedestrians, leading to approximately 4,800 [1] accidents ing road safety, minimizing vehicle damage, and optimizing
annually and placing a significant financial burden on public road maintenance efforts.
resources due to vehicle damages and related expenses. The
rugged contours of potholes have the potential to cause a B. Motivation and Objectives
range of damages, from tire tears to compromising wheel rim Motivated by the pressing need for accurate and timely
integrity upon impact, presenting risks that could potentially road hazards, the observed advantages of YOLOv8, including
lead to fatal consequences. increased speed and accuracy, as well as its improved model
TABLE I
A PPROACHES USED IN P OTHOLE DETECTION

Methodology Abstract Reference Pros Cons


Vibration based Uses GPS, accelerometer, gyro- [2], [3], [4] Real-time insights into road net- Requires vehicles to drive over
scope units for mapping road sur- work conditions. Approx. 90% ac- potholes. Limited to devices with
faces. Wavelet decomposition & curacy in severe anomalies. specific hardware/software capabil-
SVM used ities.
3D Laser-based Uses 3D laser scanning to iden- [5], [6] Accurate 3D point-cloud points. Costly, short range of detection.
tify pavement distresses. Grid- FNN for severity classification. Not suitable for early detection by
based approach for specific distress autonomous vehicles.
features.
3D Stereo-vision Reconstructs 3D pavement surface [7], [8] Precise representation of road sur- Requires high computational power
from input images. Uses stereo im- faces, Hence high accuracy. for 3D surface reconstruction.
ages for road distress identification.
Vision-based Uses CNN, DNN for road damage [9], [10], [11], Cost-effective, enables determining There is a tradeoff between dataset
(2D) detection using a dataset of road [12] the exact shape and area of a pot- diversity, model’s accuracy, pro-
damage images for training. hole. High accuracy. cessing time and model size.

architecture and user-friendliness, are sought to be evaluated road location, providing real-time insights into road network
by this study. Firstly, crucial insights into the model’s ar- conditions.
chitecture that enable it to excel in real-time identification, 3D construction methods are further classified into laser-
its precision, and its lightweight characteristics in identifying based and stereo-vision approaches. The potential of 3D
road hazards of varying sizes and categories will be offered laser scanning as a tool for identifying pavement distresses,
through this evaluation. Secondly, rigorous performance tests such as potholes, was explored [6]. The 3D laser scanning
and benchmarking are conducted to compare YOLOv8 with technology captured accurate 3D point cloud data, which
its predecessors, namely YOLOv7 and YOLOv5, and ascertain was then processed using a grid-based approach to focus on
how its enhancements in speed and accuracy surpass those of specific distress features. [7] utilized laser imaging to find
the earlier models. distress in pavements. This method captured pavement images
II. R ELATED W ORK and identified pothole areas, which were then represented
In recent years, there has been a significant surge in research using a matrix of square tiles. A feedforward neural network
focusing on road conditions, encompassing challenges like (FNN) was used to classify the severity of pothole and
potholes, manholes, sewer covers, and man hole detection. crack types, demonstrating its capability to enhance pavement
This heightened interest can largely be attributed to the ad- images, extract potholes, and analyze their severity. [8] in-
vancements in autonomous vehicle technologies, where the troduced a novel approach using the stereovision technique
accurate mapping of road conditions holds paramount impor- to reconstruct a full 3D pavement surface from the input
tance. Pothole detection methods have evolved into various images. The methodology involved calibrating the input data,
categories, [2] including vibration-based, 3D laser-based, 3D correcting any observed distortion, feature extraction, and 3D
reconstruction, and 2D vision-based approaches. Table I, out- reconstruction. [9] introduced a method that created a set of
lines the strengths and limitations associated with each of these points in a three-dimensional space, allowing for a precise
approaches. representation of road surfaces. By leveraging stereo images
In vibration-based methods, hazards are detected using and image processing technologies, the system could identify
accelerometers. A vibration-based system was developed to various road distresses, such as potholes, bumps, and cracks,
estimate pavement conditions [3]. It models the interactions etc
between the ground and the vehicle, considering the vehicle Vision-based methods utilize image processing and deep
to be under random force excitations. Real-time detection learning on 2D images obtained from cameras. A system that
of road irregularities, potential hazards, is achieved using uses Convolutional Neural Networks (CNNs) was proposed
a mobile sensing system that utilizes the accelerometers in to detect road damage [10]. A large dataset comprising ir-
smartphones [4]. These devices were specifically designed for regularities on road images captured via smartphones, with
limited access to hardware and software and did not require several instances of road surface damage, was used, achieving
extensive signal-processing techniques. an accuracy of 75%. [11] used thermal images as input
[5] explored the use of devices equipped with GPS, feed for deep neural networks for the detection and local
accelerometers, and gyroscope units to map road surfaces. mapping of potholes. To deal with the challenges posed by
Wavelet decomposition was employed for signal processing, changing weather conditions, a modified ResNet50-RetinaNet
while Support Vector Machine (SVM) was utilized for the model was employed, achieving a precision of 91.15% in
detection of cracks and irregularities on the road surfaces. pothole localization using thermal images. [12] explored the
They consistently achieved an accuracy of approximately 90% use of YOLO (You Only Look Once), a model that uses
in detecting severe anomalies, regardless of the vehicle type or a combination of region proposal algorithms and CNNs to
Fig. 1. YOLOv8 Architecture

detect and localize objects in images. They developed a new the YOLOv8 model.
dataset comprising 1500 images of Indian roads with 76%
as the highest precision obtained from YOLOv3 Tiny. [13] A. Architecture
used YOLOv7, leveraging the power of Convolutional Neural The architecture of YOLOv8, illustrated in Figure 1 [14],
Networks for detecting potholes. An open-source dataset was incorporates novel improvements for superior detection accu-
used for training the model, which achieved an F1 score of racy while maintaining high speed and efficiency. YOLOv8’s
0.51. backbone architecture is inspired by YOLOv5 but includes
The following drawbacks have been identified in current several key modifications [15]:
implementations: 1) CSPDarknet53 Feature Extractor: YOLOv8 employs
1) Insufficient diversity in the datasets utilized for training CSPDarknet53, a variant of the Darknet architecture, as
and testing the model. its feature extractor. This component consists of convo-
2) High processing time. lutional layers, batch normalization, and SiLU activation
3) Large model size. functions. Notably, YOLOv8 replaces the initial 6x6
4) Low mAP@0.5 metric values for the deployed model. convolutional layer with a 3x3 convolutional layer for
improved feature extraction.
III. M ETHODOLOGY 2) C2f Module (Cross-Stage Partial Bottleneck):
In this section, we discuss the architecture, dataset, prepro- YOLOv8 introduces the C2f module to effectively merge
cessing methods, and hyperparameter tuning used in training high-level features with contextual information. This
is achieved by concatenating the outputs of bottleneck camera’s perspective. Additionally, the dataset contains images
blocks, which consist of two 3x3 convolutions with of water-filled potholes, ensuring that the model is trained to
residual connections. This architectural change aims to recognize a variety of scenarios, thus enhancing its robustness.
enhance feature representation. The dataset consists of a total of 3,770 images, with 3,033
3) Detection Head: YOLOv8 adopts an anchor-free de- images in the training set, 491 images in the validation set,
tection strategy, eliminating the need for predefined and 246 images in the testing set.
anchor boxes and directly predicting object centers. The
detection head comprises the following components: D. Preprocessing
a) Independent Branches: YOLOv8 uses a decou- To enhance the robustness of our model, the dataset under-
pled head approach, where objectness, classifica- went several preprocessing methods. Since all input images
tion, and regression tasks are processed indepen- in the dataset were of size 640 x 640 pixels, no image
dently by separate branches. This design allows resizing was required. Min-max normalization was applied to
each branch to focus on its specific task, contribut- normalize the pixel values across all images.
ing to overall detection accuracy. Data augmentation was performed to increase the dataset’s
b) Activation Functions: The objectness score in the size and diversity, which proved beneficial given the uncer-
output layer uses the sigmoid activation function, tainty in weather conditions, camera mounting, and image
representing the probability of an object’s presence quality variability. Five types of modifications were applied:
within a bounding box. For class probabilities,
1) Image Flip: The image is horizontally flipped to account
YOLOv8 employs the softmax function, indicating
for different shapes and orientations of potholes.
the likelihood of an object belonging to each class.
2) Image Scaling: Image scaling helps the model adapt to
c) Loss Functions: To optimize the model, YOLOv8
different pothole sizes.
utilizes the CIoU (Complete Intersection over
3) Motion Blurring: To train the model for low-quality
Union) and DFL (Dynamic Focal Loss) loss func-
images and images with motion blur, a blur effect was
tions for bounding box regression and binary cross-
introduced.
entropy for classification. These loss functions are
4) Color Manipulation: Color manipulation or RGB ma-
effective in enhancing object detection, especially
nipulation helps the model adapt to varying lighting
for smaller objects.
conditions, such as bright sunlight or low ambient light
4) YOLOv8-Seg Model: In addition to object detection, at night.
YOLOv8 offers a semantic segmentation model known 5) Fog Addition: Adding fog to images makes the model
as YOLOv8-Seg. This model uses CSPDarknet53 as robust against foggy conditions.
its backbone feature extractor and incorporates the C2f
These data augmentations were applied to every alternate
module. Two segmentation heads are added to predict
image, resulting in five derivative images from the original
semantic segmentation masks, making it a versatile tool
image. This increased the size of the training set to 10,613
for various computer vision tasks.
images.
B. Training Techniques
To enhance model performance, YOLOv8 employs inno- E. Model Training
vative training techniques, including Mosaic Augmentation. Hyperparameter tuning was conducted by considering dif-
During training, YOLOv8 combines four images, encouraging ferent parameters and evaluating which ones yielded the best
the model to learn object contexts in different locations and results. All three models, namely, YOLOv5 tiny, YOLOv5
against varying backgrounds. However, this augmentation is small, YOLOv7, YOLOv8 tiny, and YOLOv8 small, were
disabled during the final ten training epochs to prevent poten- trained with the same hyperparameters, as shown in Table
tial performance degradation. II, to ensure comparable results. Training of all models was
performed on an NVIDIA GeForce RTX 3090 GPU with
C. Dataset
10,496 CUDA cores.
The dataset used for pothole detection is sourced from
Roboflow Universe, specifically the ”Pothole Detection” Hyperparameter Value
dataset provided by the Intel Unnati Training Program [16]. Epochs 250
This dataset offers a comprehensive collection of annotated Batch Size 16
images, classifying not only potholes but also manholes and
Learning Rate 0.01
sewer covers. This diversity aids in making the model more
Weight Decay 0.0005
robust and preventing misclassification. The dataset comprises
images captured under various lighting conditions, angles, and Optimizer Adam
environments. Momentum 0.937
The Indian Driving Dataset is included in this dataset, en- TABLE II
abling the model to train on images captured from a dashboard T RAINING H YPERPARAMETERS
Fig. 2. Prediction images by YOLOv8 tiny

IV. RESULTS AND DISCUSSION Model mAP@0.5 Processing Time Size of Model
YOLOv5 nano 0.84 28 ms 14.8 MB
A. Performance Metrics
YOLOv5 small 0.86 38 ms 15.1 MB
The following metrics [17] were used for evaluating and
YOLOv7 0.90 35 ms 74.8 MB
comparing the models:
YOLOv8 nano 0.911 8.8 ms 6.3 MB
1) Mean Average Precision: Mean Average Precision
YOLOv8 small 0.92 11 ms 21.5 MB
(mAP), computed using equation 1, is a widely accepted
performance metric for object detection models. mAP is TABLE III
M ODEL P ERFORMANCE M ETRICS
calculated by taking the mean of Average Precision (AP) for
each of the ’n’ classes. AP for each class ’k’ is determined
by calculating the area under the precision-recall curve. mAP
provides a single score that considers Recall, Precision, and Class mAP@0.5 mAP@0.95
Intersection over Union (IoU), eliminating bias in performance Overall 0.91 0.59
measurement. Manhole 0.979 0.606

k=n Pothole 0.871 0.648


1X Sewer Cover 0.88 0.516
mAP = APk (1)
n
k=1 TABLE IV
C LASS - WISE M EAN AVERAGE P RECISION ( M AP) M ETRICS FOR YOLOV 8
2) Processing Time: Processing time is a crucial metric NANO
for assessing the speed at which the model classifies an
input image. It encompasses the total time taken by the
model for pre-processing, inference, loss calculation, and post-
processing of an image. Swift decision-making is essential for maintaining a model size of just 6.3 MB, making it lightweight
autonomous vehicles to avoid potholes, making the minimiza- for deployment in embedded systems.
tion of processing time imperative.
3) Size of the Trained Model: The size of the deployed Table IV shows the results obtained for each class for the
model, responsible for processing incoming data in embedded YOLOv8 nano model. Our model achieved a precision of 0.90,
systems, depends on the size of the final trained model a recall of 0.81, an mAP of 91% at 0.5 IoU, and an mAP
generated. Keeping this size as small as possible is essential of 59% at 0.95 IoU. The results of the YOLOv8 model after
due to limitations in onboard memory and the storage capacity applying it to the images in the dataset are presented in Figure
of the hardware. Larger models require more computation, 2, and the confusion matrix for the trained model is illustrated
resulting in lower power efficiency. in Figure 3.
The inclusion of manholes and sewer covers in the training
B. Results dataset benefits the model by increasing its robustness in
Table III presents the results obtained for Processing Time correctly identifying potholes, which is crucial for autonomous
and the size of the model on various YOLO models. vehicles to take appropriate action. In the case of manholes
While YOLOv5 nano and small models have small sizes and sewer covers, there is no need for the vehicle to take any
(14.8 and 15.1 MB, respectively), they yield lower mean action, preventing the model from giving false positives. As
average precision compared to other YOLO models. YOLOv7 illustrated in the first image of Figure 2, the model accurately
achieves a good mean average precision value but results in identifies and distinguishes between a sewer cover and a
high processing time and model size. YOLOv8 nano and small pothole, highlighting the robust nature of our model.
models outperform the previous YOLO models in every met- Comparing our implementation of YOLOv8 with previous
ric. The most efficient model, considering all three parameter works based on YOLO, superior performance is displayed
metrics, is the YOLOv8 nano model. It provides an exceptional across all categories such as mAP, Processing Time, and Size
average processing time of 8.8 ms for a single image while of Model as illustrated in Table V.
precise detection of potholes.
In summary, the YOLOv8 model, particularly the nano
variant, emerges as a highly promising solution for road hazard
detection. It strikes a delicate balance between accuracy, speed,
and resource efficiency, making significant strides in advancing
road safety and infrastructure maintenance. This research not
only contributes to the field of road hazard detection but also
paves the way for enhanced safety and efficiency in road in-
frastructure maintenance. Future work may involve real-world
deployment and further optimizations to continually enhance
its performance in various road and lighting conditions.

R EFERENCES
[1] M. Sathvik, G. Saranya, and S. Karpagaselvi, “An intelligent convolu-
tional neural network based potholes detection using yolo-v7,” in 2022
Fig. 3. Confusion Matrix International Conference on Automation, Computing and Renewable
Systems (ICACRS), 2022, pp. 813–819.
[2] T. Kim and S.-K. Ryu, “Review and analysis of pothole detection
Reference Model mAP Processing Size of methods,” Journal of Emerging Trends in Computing and Information
(IOU=50%) Time Model Sciences, vol. 5, no. 8, pp. 603–608, 2014.
Ukhwah et. al. YOLOv3 88.93 40 ms N/A [3] B. X. Yu and X. Yu, “Vibration-based system for pavement condition
[18] evaluation,” in Applications of advanced technology in transportation,
2006, pp. 183–189.
Shaghouri et. YOLOv4 85.39 50 ms N/A [4] A. Mednis, G. Strazdins, R. Zviedris, G. Kanonirs, and L. Selavo, “Real
al. [19] time pothole detection using android smartphones with accelerometers,”
Gao et. al. [20] YOLOv5 93.99 12.78 ms N/A in 2011 International conference on distributed computing in sensor
systems and workshops (DCOSS). IEEE, 2011, pp. 1–6.
Proposed YOLOv8 91.11 8.8 ms 6.3 MB [5] F. Seraj, B. J. Van Der Zwaag, A. Dilo, T. Luarasi, and P. Havinga,
Implementation “Roads: A road pavement monitoring system for anomaly detection
using smart phones,” in International Workshop on Modeling Social
TABLE V Media. Springer, 2014, pp. 128–146.
C OMPARISON OF YOLOV 8 WITH PREVIOUS WORK [6] K. Chang, J. Chang, and J. Liu, “Detection of pavement distresses using
3d laser scanning technology,” in Computing in civil engineering (2005),
2005, pp. 1–11.
[7] X. Yu and E. Salari, “Pavement pothole detection and severity measure-
V. C ONCLUSION ment using laser imaging,” in 2011 IEEE International Conference on
Electro/Information Technology. IEEE, 2011, pp. 1–5.
[8] Z. Hou, K. C. Wang, and W. Gong, “Experimentation of 3d pavement
In conclusion, this study comprehensively evaluated the imaging through stereovision,” in International Conference on Trans-
performance of the YOLOv8 object detection model in the portation Engineering 2007, 2007, pp. 376–381.
critical task of pothole detection. The research presented a [9] M. Staniek, “Stereo vision techniques in the road pavement evaluation,”
detailed discussion of YOLOv8’s architecture, compared it in XXVIII International Baltic Road Conference, 2013, pp. 1–5.
[10] H. Maeda, Y. Sekimoto, T. Seto, T. Kashiyama, and H. Omata, “Road
with earlier iterations, and various versions of YOLOv5 and damage detection using deep neural networks with images captured
YOLOv7, including nano and small variants. The evaluation through a smartphone,” arXiv preprint arXiv:1801.09454, 2018.
criteria encompassed processing time, model size, and robust- [11] S. Gupta, P. Sharma, D. Sharma, V. Gupta, and N. Sambyal, “Detection
and localization of potholes in thermal images using deep neural
ness under diverse conditions. networks,” Multimedia tools and applications, vol. 79, pp. 26 265–
The experiments conclusively establish the superiority of 26 284, 2020.
YOLOv8, particularly the nano variant, as the most efficient [12] J. Dharneeshkar, S. Aniruthan, R. Karthika, L. Parameswaran et al.,
“Deep learning based detection of potholes in indian roads using yolo,”
and effective model for pothole detection. It achieved an in 2020 international conference on inventive computation technologies
impressive mean average precision (mAP) of 0.911 at 0.5 IoU (ICICT). IEEE, 2020, pp. 381–385.
while maintaining a remarkably low processing time of just [13] M. Sathvik, G. Saranya, and S. Karpagaselvi, “An intelligent convolu-
tional neural network based potholes detection using yolo-v7,” in 2022
8.8 ms per image and a compact model size of 6.3 MB. These International Conference on Automation, Computing and Renewable
attributes are of paramount importance for seamless real-time Systems (ICACRS). IEEE, 2022, pp. 813–819.
implementation in the context of road hazard detection, where [14] L. Zhang, G. Ding, C. Li, and D. Li, “Dcf-yolov8: An improved
algorithm for aggregating low-level features to detect agricultural pests
rapid decision-making and resource efficiency are indispens- and diseases,” Agronomy, vol. 13, p. 2012, 07 2023.
able. [15] J. Terven and D. Cordova-Esparza, “A comprehensive review of yolo:
Furthermore, the research underscores the significance of a From yolov1 and beyond,” 2023.
[16] I. U. T. Program, “Pothole detection dataset,” https://universe.
diverse and robust training dataset, including not only potholes roboflow.com/intel-unnati-training-program/pothole-detection-bqu6s,
but also manholes, sewer covers, and various other road aug 2023. [Online]. Available: https://universe.roboflow.com/
elements. This meticulous approach significantly enhanced intel-unnati-training-program/pothole-detection-bqu6s
[17] D. Wu, S. Lv, M. Jiang, and H. Song, “Using channel pruning-based
our model’s capability to distinguish between different road yolo v4 deep learning algorithm for the real-time and accurate detection
hazards, substantially reducing false positives and ensuring the of apple flowers in natural environments,” Computers and Electronics
in Agriculture, vol. 178, p. 105742, 2020. [Online]. Available:
https://www.sciencedirect.com/science/article/pii/S0168169920318986
[18] E. N. Ukhwah, E. M. Yuniarno, and Y. K. Suprapto, “Asphalt pavement
pothole detection using deep learning method based on yolo neural
network,” in 2019 International Seminar on Intelligent Technology and
Its Applications (ISITIA), 2019, pp. 35–40.
[19] A. A. Shaghouri, R. Alkhatib, and S. Berjaoui, “Real-time pothole
detection using deep learning,” arXiv preprint arXiv:2107.06356, 2021.
[20] Q. Gao, P. Liu, S. Chen, K. Jia, and X. Wang, “Detection method
of potholes on highway pavement based on yolov5,” in International
Conference on Artificial Intelligence and Security. Springer, 2022, pp.
188–199.

You might also like