Final Report

DEEP LEARNING-BASED TECHNIQUES FOR
PRECISE VEHICLE DETECTION AND DISTANCE

ESTIMATION IN AUTONOMOUS SYSTEMS
A dissertation report
submitted by
SRINIVASA C [MY.SC.P2MCA21008]
DHARUN G K [MY.SC.P2MCA21035]
in partial fulfillment of the requirements for the award of the degree of

MASTER OF COMPUTER APPLICATIONS
under the guidance of
Dr. Manohar N
Vice Chairperson
Assistant Professor
Department of Computer Science
Amrita Vishwa Vidyapeetham
Mysuru Campus
AMRITA SCHOOL OF COMPUTING

MYSORE CAMPUS
June 2023
BONAFIDE CERTIFICATE
This is to certify that this dissertation report entitled “Deep Learning-based

Techniques for Precise Vehicle Detection and Distance Estimation in
Autonomous Systems” submitted to Amrita School of Computing, Mysuru,
Amrita Vishwa Vidyapeetham, India, is a bonafide record of work done by
“Srinivasa C”,” Dharun G K” under my supervision from “October 2022” to “
June 2023”
Dr. Manohar N
Vice Chairperson
Asst. Professor
Department of Computer Science
Mr. Akshay ADWITIYA MUKHOPADHYAY

PG PROJECT COORDINATOR CHAIRPERSON
Place:
Date:
Internal Examiner: External Examiner
1. …………………………. ……………………….
2. …………………………..
ii
Declaration by Author(s)
This is to declare that this report has been written by us. No part of the report is
plagiarized from other sources. All information included from other sources have
been duly acknowledged. I/We aver that if any part of the report is found to be
plagiarized, I/we are shall take full responsibility for it.
Srinivasa C
MY.SC.P2MCA21008
Dharun G K
MY.SC.P2MCA21035
Place: Mysuru
Date:
iii
ACKNOWLEDGEMENT
First and foremost, I feel deeply indebted to Her Holiness Most Revered Mata
AmritanandamayiDevi (Amma) for her inspiration and guidance both in unseen and
unconcealed ways.
Wholeheartedly, I thank our respected Director, Br. Anantaananda Chaitanya and

Correspondent Br. Muktidamrta Chaitanya for providing the necessary
environment, infrastructure and encouragement for carrying out my dissertation work
at Amrita VishwaVidyapeetham, Mysuru Campus.
We would like to express our sincere thanks to our beloved Principal Dr. G
Ravindranath forgiving us moral support and continuous encouragement which has
been the key for the successful completion of the project.
We are pleased to acknowledge Mr. Adwitiya Mukhopadhyay, Chairperson,

Department of Computer Science, for his encouragement and support throughout the
project.
We would like to express our heartfelt gratitude to our Project Co-Ordinator Ms.
Pallavi M S, Assistant Professor, and Department of Computer Science for her
valuable suggestions and excellent guidance rendered throughout this project.
We would like to express our heartfelt gratitude to our guide Dr. Manohar N, Vice
Chairperson, Assistant Professor, Department of Computer Science, and to our co-guide
Mr. Suresha R Assistant Professor, Department of Computer Science for their valuable
suggestions and excellent guidance rendered throughout this project.
Further, we extend our thanks to all the faculty members and technical staff of our
department and ICTS for their suggestions, and support and for providing resources
at needed times.
Finally, we express our sincere gratitude to our parents and friends who have been the
embodiment of loveand affection which helped us to carry out the project in a smooth
and successful way.
iv
CONTENTS
List of Figures vii
List of Tables vii
List of Abbreviations ix
Abstract x
1. Introduction
1.1. Introduction to broad area of research 1
1.2. Introduction to specific area of research 2-3

1.3. Introduction to the background of problem of research 3-4
1.4. Objectives of Research 4-5
1.5. Applications and contributions 6-7
1.6. Organisation of Report 8-9
1.7. Overall Block Diagram 10
2. Literature Survey
2.1 Literature Review 11-16
2.1.2. Observations Table 16-18
2.3. Motivation 18-20
3. Image/ Video Pre-processing
3.1. Introduction 21
3.2. Proposed Methodology 22
3.2.1. Bilinear Interpolation 24
3.2.2. Autoencoder-decoder EfficientNet B0 26
3.3. Experimental Analysis 27
3.4. Dataset Description 27
3.5. Results of Image/Video Pre-processing 27-29
3.6. Conclusion 29
v
4. Vehicle Detection
4.2.1. Faster RCNN ( Resnet-50) As Backbone For Vehicle Detection 34-35
4.3. Results of Vehicle Detection using Faster R-CNN 35-36
4.4. Conclusion 36
5. Distance Estimation and Classification

5.2.1. Euclidean Distance for calculating distance 39
5.2.2. Classifying the distance using various classifiers 40-41
5.2.3. Linear Regression for calculating MAE 42

5.3. Results of Distance Estimation and Classification 43-44
5.4. Conclusion 45
6. Conclusion
6.1. Summary 46
6.2. Future Scope 46
ANNEXURE A 47-51
AUTHORS RESEARCH PUBLICATIONS 52
BIBLIOGRAPHY 53-56
vi
LIST OF FIGURES
Figure 1.7 Overall block diagram 17
Figure 3.1.1. Architecture of Image/video enhancement process 21
Figure 3.2.1.1 Architecture of Autoencoder-decoder 22
Figure 3.2.1.2. Architecture of EfficientNet-B0 23
Figure3.5.1. Output of Enhancement Process 24
Figure 4.2.1. Architecture of Faster RCNN with RESNET-50 as the backbone 27
Figure 4.4.1. Vehicle detection results using Faster R-CNN with RESNET50 29
Figure 5.4.1. Performance Comparison of Classifiers for Distance Classification 32
Figure 5.4.2. Comparison of MAE Values among Different Classifiers at Various
Split Ratios using Bar Graph 34
vii
LIST OF TABLES
Table 2.1.2. Literature Review Observations 14
Table 4.6.1. Classification report of vehicle detection using Faster RCNN 30
Table 5.6.1. Classification report of various classifiers with different split ratios 34
Table 5.6.2. Mae Values of Different Class Objects Detected At Various Split Ratios 35
viii
LIST OF ABBREVIATIONS
Abbreviation Full Form

CNN Convolutional Neural Network
F-RCNN Faster Region-Based Convolutional Neural Network
KITTI Dataset Karlsruhe Institute of Technology and Toyota
Technological Institute Dataset
RPN Regional Proposal Network
SVM Support Vector Machine
KNN K-Nearest Neighbour
MAE Mean Absolute Error
ix
ABSTRACT
Vehicle detection and distance classification play a crucial role in computer vision, particularly in
applications such as intelligent traffic systems. However, existing methods face challenges in achieving
accurate vehicle detection and distance estimation, often resulting in low detection accuracy and
limited distance estimation capability. To address these issues, this paper proposes an integrated
approach that combines the Faster R- CNN architecture with scale-based distance estimation. Through
experimental evaluations, the results validate the effectiveness of the proposed method in achieving
high detection accuracy and accurate distance estimation. The introduced framework shows
potential in enhancing the performance and robustness of vehicle detection and distance estimation
systems, offering promising advancements in this field.
x
1. INTRODUCTION TO VEHICLE DETECTION AND DISTANCE
ESTIMATION
Vehicle comprehension and calculation of distance are significant areas of computer vision and
are gaining more and more attention for applications in various fields like advanced driver
assistance systems (ADAS), self-driving cars, traffic surveillance. Accurate and efficient
detection and estimation of vehicle location and distance are critical to ensuring traffic
safety and realizing intelligent traffic systems.
Traditional approaches to vehicle detection and range estimation have relied heavily on
hand- crafted functions and rule-based algorithms. However, these methods often struggled
to cope with complex traffic scenarios with difficult lighting conditions, shading, and
varying vehicle sizes and shapes. As a result, the introduction of deep learning methods,
particularly neural networks based on convolution (CNNs), has transformed it by enabling more
precise and robust identification and estimation. Deep learning approaches harness the power
of deep networks for learning and obtaining complex properties from input data, making
them well-suited to address the challenges of vehicle detection and range estimation. By
training deep learning models on large-scale annotated datasets, these approaches effectively
capture complex patterns and representations, enabling generalization and improved
performance in real-world scenarios.
Various deep-learning architectures and frameworks have been proposed in recent years for
vehicle detection and range estimation. These models use convolutional layers to extract spatial
features from input images and use advanced network structures for object localization and
classification. Among the popular architectures, Faster R-CNN, Single Shot MultiBox Detector
(SSD), and YOLO offer promising outcomes. In addition to advances in deep learning
architectures, Researchers have investigated novel ways to increase the quality of
automobile recognition and range forecast systems, Image enhancement strategies such as
bilinear interpolation and autoencoders can now be used to improve the quality and
resolution of input images for better feature extraction and subsequent analysis by deep
learning models. These techniques help overcome challenges arising from low-resolution
images and noise, improving overall system accuracy and robustness.
The purpose of this article is to propose a comprehensive approach to vehicle detection and
range estimation using deep learning techniques. It combines cutting-edge architectures
1
including EfficientNet and Faster R-CNN with image enhancement techniques to achieve
accurate and efficient detection and estimation. The proposed methodology was evaluated
on diverse and extensive datasets collected from real traffic scenarios, and the results
demonstrate the effectiveness and superiority of deep learning-based approaches over
traditional methods.
Overall, this study contributes to the advancement of vehicle detection and distance
estimation techniques using deep learning approaches, with the aim of enhancing road
safety, enabling intelligent transportation systems, and supporting the development of
autonomous vehicles.
1.1. INTRODUCTION TO DEEP LEARNING TECHNIQUES IN

COMPUTER VISION
The broad research area of vehicle detection and range estimation using deep learning
approaches falls under computer vision and machine learning. Specifically, The
methods known as deep learning with names like neural networks made up of
convolutions (CNN) are used and advanced network architectures to accurately detect
and estimate vehicles in images and videos. This research area focuses on developing
robust and efficient algorithms and models that can address a variety of challenges in
real-world traffic scenarios such as Occlusion, different lighting conditions, and
varying vehicle sizes.
This research aims to overcome the limitations of traditional computer vision

techniques by leveraging the ability of deep learning models to automatically learn from
visual data and extract meaningful features. An extensive dataset namely the KITTI
dataset which contains images annotated with vehicle labels has been trained so that
the model can effectively recognize and locate vehicles in complex scenes. In addition,
this research also includes exploring image enhancement techniques such as
interpolation methods and autoencoders to improve the quality and resolution of input
images, thereby improving the performance of deep learning models. Using these
techniques, the researchers hope to overcome problems associated with low-resolution
and noisy images that can affect the tracking frequency and range of objects estimation
algorithms.
2
The ultimate goal of this research is to advance the development of intelligent
transportation systems and autonomous driving technology. Accurate and efficient
vehicle detection and range estimation are essential for safe and reliable navigation,
collision avoidance and traffic management. By using deep learning approaches,
researchers aim to improve the robustness, accuracy, and real-time performance of
these systems, making them more practical and applicable to real-world scenarios.
Research in this area also contributes to the broader field of artificial intelligence and
autonomous systems, as it addresses fundamental challenges in computer vision, object
recognition, and estimation. The insights and techniques developed may extend beyond
vehicle-related applications to other areas requiring accurate object detection and range
estimation, such as Surveillance, robotics, and object tracking.
In summary, the broad research area of vehicle detection and range estimation using
deep learning approaches includes advanced order to tackle the issues of effective
identification of vehicles, localization, and range prediction in real-world traffic
scenarios. The work, which includes the application of deep learning architectures, aims
to produce resilient algorithms, models and image enhancement techniques, ultimately
contributing to the further development of intelligent transportation systems and related
fields.
1.2. INTRODUCTION TO SPECIFIC AREA OF RESEARCH
One specific area of research in vehicle detection and distance estimation is the use of
deep learning techniques, particularly convolutional neural networks (CNNs), for
accurate and efficient detection and estimation. Deep learning techniques have shown
impressive results in various computer vision tasks, including object detection and
segmentation. CNNs, in particular, have been widely used in vehicle detection due to
their ability to learn features directly from the input images and their ability to handle
complex and non-linear relationships between the input and output. In the context of
vehicle detection and distance estimation, researchers are exploring various
architectures and training techniques to improve the accuracy and efficiency of CNN-
based models. For example, some researchers are investigating the use of multi-scale
and multi-level features to capture both local and global information about vehicles.
3
In addition to CNNs, researchers are also exploring the use of other deep learning
techniques, such as recurrent neural networks (RNNs), for tracking vehicles over time
and estimating their distances. RNNs can capture the temporal dynamics of vehicle
movements and enable more accurate distance estimation. Another specific area of
research is the integration of multiple sensors, such as cameras and lidars, for robust
and accurate vehicle detection and distance estimation. Researchers are investigating
various fusion techniques to combine the information from multiple sensors and
develop more robust and accurate models. Overall, the use of deep learning techniques
and sensor fusion is a promising area of research for vehicle detection and distance
estimation, with the potential to significantly improve the safety and efficiency of
transportation systems.
1.3. INTRODUCTION TO BACKGROUND OF THE PROBLEM OF

RESEARCH
Accurate vehicle detection and range estimation are fundamental tasks in the
development of intelligent traffic systems, Vehicle autonomy and sophisticated driver
assistance technologies (ADAS). The purpose of these technologies is to increase traffic
safety, improve traffic management and enable efficient and reliable navigation.
However, Classical AI techniques for computer vision for automotive adoption and
range estimation in complex traffic scenarios often face challenges that limit their
accuracy and effectiveness.
A big challenge is the presence of occlusion in real traffic scenes. Vehicles may be
partially or completely blocked by other vehicles, objects, or environmental elements
such as buildings or trees. Accurately detecting and locating concealed vehicles can be
difficult with traditional methods, leading to erroneous range estimates and
compromising safety.
Different lighting conditions present another challenge. Changes in lighting due to

weather conditions, time of day, and shadow effects can greatly affect the appearance
of vehicles in images and videos. Conventional techniques based on hand-crafted
4
features may not be robust enough to deal with such variations, resulting in poor
detection accuracy and unreliable range estimation.
Additionally, vehicles come in many different sizes and shapes, making accurate
distance estimation difficult. With vehicles of various sizes in the scene, estimating
distance based solely on image cues becomes unreliable. A robust method should
consider the scale of the detected vehicle and incorporate a size-distance relationship
to accurately estimate its distance from the observer.
To overcome these challenges, deep learning approaches have emerged as a promising

solution. Convolutional neural networks (CNNs), in particular, exhibit higher feature
extraction skills in deep learning models, enabling precise and reliable vehicle
identification and range estimate. These models can automatically learn complex
features from large annotated datasets to catch appearance changes, handle occlusion,
and estimate distance more effectively. However, Neural learning-based recognizer of
vehicles and range predict can yet be improved.. Real-time performance is a key
requirement for many applications such as autonomous driving. The development of
efficient architectures and optimization techniques to achieve fast detection and
estimation while maintaining accuracy is the focus of ongoing research.
Additionally, integrating multimodal sensor data can improve the robustness and
reliability of vehicle detection and range estimation systems. Combining information
from camera, LiDAR, and radar sensors can provide complementary clues and mitigate
the limitations of individual sensor modalities. In summary, the research problem relies
upon the shortcomings that exist with conventional algorithms for computer vision in
accurately detecting and estimating distance to vehicles in complex traffic scenarios.
Deep learning approaches offer promising solutions by leveraging the ability to learn
complex features and patterns from data. However, challenges such as occlusion,
lighting variations, and size differences still need to be addressed. Improving real-time
performance, integrating multimodal sensor data, and optimizing deep learning
architectures are key focus areas for further development of vehicle detection and range
estimation.
5
1.4. OBJECTIVES OF RESEARCH
 Develop a vehicle detection model based on deep learning.
The major objective of this study is to use deep learning techniques to build a reliable
and precise vehicle identification model. This entails investigating cutting-edge
architectures like EfficientNet, putting them into practice, and training models using
sizable annotated data sets for various traffic scenarios. Despite obstacles including
occlusion, various lighting conditions, and various vehicle sizes, the goal is to obtain
extremely accurate vehicle localization and categorization.
 Consider image enhancement techniques to improve performance.
The goal of the study is to determine how well vehicle detection and range estimate
function when performance-improving picture enhancement techniques, particularly
bilinear interpolation, and autoencoders, are used. The goal is to increase the quality
and resolution of input photos to enable deeper learning models to extract features more
effectively. Studies assess how different methods affect detection accuracy and
robustness.
 Implement a faster R-CNN model for scale-based distance estimation.
Research focuses on building faster R-CNN models to calculate the distance to

observed cars. This methodology employs a classification network to categorise items
and a region proposal network (RPN) to create object ideas. The objective is to develop
a model employing established size-distance connections that can properly estimate the
distance based on the size of identified vehicles.
 Develop a distance classification module.
The goal of the study is to create a distance classification module that will allow for the
division of observed vehicles into several distance ranges. As part of achieving this
objective, a classification model will be trained to analyze estimated distances and
6
group cars according to the proper distance categories. Better distance comprehension
will facilitate better decision-making in applications like collision avoidance systems.
 Evaluate and analyze the performance of the proposed approach.
The performance of the suggested approach for vehicle detection and range estimate
will be examined through large-scale trials and evaluations. Using big datasets gathered
from actual traffic scenarios, this includes testing the deep learning models' precision,
effectiveness, and resilience. The analysis's goal is to point forth the analysis's
advantages, disadvantages, and potential improvement areas. This work will advance
vehicle identification and range estimation systems by attaining these objectives. The
suggested method seeks to increase the precision, effectiveness, and usability of these
systems in real-world scenarios by combining deep learning models and picture
enhancing methods. The outcomes of research can be used to direct the creation of
smarter, more dependable transportation systems and to support the idea of safer, more
1.5. APPLICATIONS AND CONTRIBUTIONS
 Self-driving:
Research into vehicle detection and range estimation using deep learning approaches
has direct implications for autonomous driving systems. Accurately detecting and
estimating vehicle location and distance is critical to safe driving autonomous vehicles
and making informed decisions. The proposed approach can contribute to the
development of reliable and robust autonomous driving systems and improve their
ability to perceive and react to the environment.
 Advanced Driver Assistance Systems (ADAS):
The ADAS, which was created to assist human drivers avoid collisions and increase
overall driving safety, can also benefit from the study's findings. The foundation for
ADAS features like adaptive cruise control, lane departure warning, and collision
7
avoidance is precise vehicle identification and range calculation. The accuracy and
dependability of these systems can be improved through study, which will assist lessen
accidents and increase road safety.
 Traffic monitoring and surveillance:
Vehicle detection and distance estimation play an important role in traffic monitoring
and surveillance applications. By accurately detecting and estimating distances to
vehicles, these systems help with traffic flow analysis, congestion management, and
accident detection. This research may contribute to more effective and efficient traffic
enforcement systems and enable better traffic management and planning.
 Intelligent Transport System (ITS):
The proposed approach can contribute to a broader area of intelligent transportation

systems. Accurate vehicle detection and range estimation is essential for various ITS
applications such as signal control, intersection management, and pedestrian safety. By
improving the accuracy and efficiency of these systems, research can drive the
development of smarter and more efficient transportation infrastructure.
 Robustness and Adaptability:
Research into image enhancement techniques and deep learning models can help
improve the robustness and adaptability of vehicle detection and range estimation
systems. The proposed approach overcomes challenges such as occlusion, different
lighting conditions, and different vehicle sizes, enabling more reliable performance in
different real-world scenarios. This robustness and adaptability are very important to
ensure the reliability and applicability of the system in different environments and
weather conditions.
 Further development of deep learning techniques:
8
This research contributes to the advancement of deep learning techniques specialized
in vehicle detection and range estimation. This research advances our knowledge and
understanding of deep learning models in the context of computer vision and transport
domains by exploring novel architectures, image enhancement techniques, and
multimodal sensor fusion. The results may spark further research and innovation in this
area.
In summary, our research on vehicle detection and range estimation using deep learning
approaches has important applications and contributions in the advancement of autonomous
driving, ADAS, traffic monitoring, intelligent traffic systems, and deep learning. The proposed
approach improves the accuracy, efficiency, and robustness of these systems, contributing to
improved traffic safety, traffic management, and the realization of intelligent traffic systems.
1.6. Organisation of Report
The entire research work has been organized into the rest of the chapters in the report
described.
Chapter 2 provides the overview of literature relevant to the research challenges in

terms of video enhancement, vehicle detection, region of interest, generating bounding
box coordinates, and estimating distances between vehicles and classifying them into
different labels.
Chapter 3 discusses techniques used in the video enhancement process, such as bilinear
interpolation and Autoencoder EfficientNet B0.
Chapter 4 discusses using Faster RCNN as an object detection model for detecting the
vehicles and generating bounding boxes around the vehicles and extracting their
coordinates.
Chapter 5 discusses the use of Pairwise Euclidean Distance for estimating the distances
between the vehicles using the bounding box coordinates, Linear Regression for
9
estimating the difference between the actual and predicted distances, and classifying
these distances into different labels starting from 0-10m till 100m.
Chapter 6 presents the conclusion of the research work along with the scope for further
research in this direction.
Finally, the research papers published based on contributions and references of related
literature are presented at the end of the report.
1.7 Overall block diagram
Fig 1.7 Overall Block Diagram
10
2. LITERATURE SURVEY
A literature survey is a thorough examination and analysis of the body of knowledge on a

particular subject or area of study. It entails compiling, analyzing, and synthesizing data
from numerous sources, including books, academic papers, journals, conference
proceedings, and conference proceedings. The goal is to get a comprehensive grasp of the
state of knowledge in the chosen field. Conducting a literature assessment on vehicle
detection and distance estimates would include carefully going over and examining 25
publications that were written about these subjects and were published between 2012 and
2022. In order to inform and direct more research and development in this area, it is
possible to discover existing methodology, techniques, and trends as well as potential
research gaps by doing this.
Tamang et al. (2022) In their research paper introduce a custom deep-learning technique
for vehicle detection and speed estimation. The method combines a convolutional neural
network (CNN) and recurrent neural network (RNN) to detect vehicles in video images and
estimate their speed. The trained model achieves high accuracy in vehicle detection, and
the speed estimation module utilizes motion analysis techniques. Experimental results
demonstrate the effectiveness of the proposed approach in real-world scenarios,
contributing to intelligent transportation systems.
Vajgl et al. (2022) In their research paper present Dist-YOLO, a fast object detection
method that combines distance estimation. By leveraging the YOLO architecture, the
proposed method achieves real-time performance while accurately estimating the distance
of detected objects. Test results demonstrate the effectiveness of Dist-YOLO in various
applications that require efficient and accurate object detection with distance information.
Manohar N et al. (2022) In their research paper presents a new method for traffic congestion
management using image processing techniques. By analyzing traffic camera footage, the
authors extract vehicle information to monitor real-time vehicle density. This approach
enables the implementation of effective traffic management strategies, such as optimizing
signal timings and diverting routes, to improve traffic flow and reduce congestion in urban
areas through data-driven decision-making and intelligent transportation systems.
11
Suresha R et al. (2022) In this research, we can explore the use of SVM (Support Vector
Machine) classifiers in lung cancer detection. The authors propose a fusion approach that
combines multiple features extracted from lung images to improve classification accuracy.
The SVM classifier has been found to be effective in distinguishing between cancerous and
noncancerous cases, and is useful for early detection and treatment of lung cancer.
M Chandrajit et al. (2021) In their research paper focus on developing robust methods for
segmenting and classifying moving objects in surveillance videos. The authors propose an
algorithm that effectively separates foreground objects from the background and accurately
classifies them into predefined categories. This approach offers potential applications in
video surveillance systems, enabling more reliable and efficient object detection and
tracking for security purposes.
G Dhyanjith et al. (2021) Their research focuses on helmet detection using two popular
object detection algorithms, YOLO V3 and Single Shot Detector (SSD). The authors
propose a method to effectively detect whether a person is wearing a helmet. This approach
could potentially be applied to improve safety measures such as mandatory helmet-wearing
in a variety of settings, such as construction sites and bike-sharing programs.
S Akshay et al. (2021) Their research addresses the problem of detecting sleepy drivers
using eye-tracking technology and machine learning algorithms. The authors propose a
method to determine a driver's level of drowsiness by analyzing eye movements. This
approach offers potential applications in improving road safety by providing timely
warnings and interventions to prevent accidents caused by driver fatigue.
Huang et al.(2021) In their research paper propose a multi-stream attention learning

approach for monocular vehicle velocity and inter-vehicle distance estimation. By
incorporating attention mechanisms into a deep learning framework, the method achieves
an accurate estimation of vehicle velocity and distance using a single camera. Experimental
results demonstrate the effectiveness of the proposed approach, providing valuable insights
for intelligent transportation systems.
12
Lin et al. (2021) In their research paper present a real-time system that uses a virtual
detection zone and the YOLO (You Only Look Once) philosophy to count vehicles,
estimate their speeds, and categorize them. The proposed system achieves accurate results
in real-time scenarios, providing valuable capabilities for traffic analysis and management
in intelligent transportation systems.
Bourja et al. (2021) In their research paper present a real-time system for detecting,
tracking, and estimating the distance between vehicles using stereoscopic technology and
deep learning with YOLOv3. The proposed method combines the power of stereoscopic
depth perception with the accuracy of deep learning models to achieve powerful vehicle
tracking and detection. In addition, it provides a reliable estimate of the distance between
vehicles. The test findings show how effective and efficient the system is, and they offer
useful data for use in applications for intelligent transportation systems.
S Sushmitha et al. (2020) The main focus of this research is the detection and tracking of
multiple vehicles in traffic jams. The authors propose how computer vision techniques can
be used to identify and track vehicles, enabling various applications such as traffic
monitoring and surveillance. This approach contributes to improved traffic management
and enhances road safety measures.
Zaarane et al. (2020) In their research paper present a vision-based approach for inter-
vehicle distance estimation to develop a driver alarm system. By leveraging image
processing and computer vision techniques, The suggested solution provides real-time,
accurate distance estimates between the host car and surrounding vehicles. By using this
data, timely alarms are generated for the driver, improving road safety. The approach's
efficacy is shown through experimental results, which also highlight its potential for use in
driver support systems and advance the field of intelligent transportation.
Ma et al. (2020) In their research paper present a stereo camera-based distance

measurement system for autonomous vehicles. The suggested approach accurately
determines the distance between the vehicle and surrounding objects by utilising the
stereoscopic range concepts. The technology gathers depth information from stereo image
pairs and uses trigonometric equations to calculate distances. The test results demonstrate
13
the effectiveness of the system, demonstrating the potential to improve the perception
capabilities of autonomous vehicles and ensure safe navigation in dynamic environments.
Ali et al. (2020) In their research paper present a quick and precise way for measuring
distance and detecting vehicles. The suggested method combines the pairwise Euclidean
distance calculation method with the Faster R-CNN detection framework. The system
provides high detection precision and accurate inter-vehicle distance estimate by
combining deep learning and computer vision techniques. The proposed approach is
effective, as shown by experimental results, and is therefore a valuable solution for a variety
of applications, including autonomous driving, traffic monitoring, and collision avoidance
systems.
Liu et al. (2019)In their research paper presents a single-view geometry-based real-time
vehicle distance estimation approach. The suggested method determines the detected cars'
sizes and the intrinsic camera parameters to calculate an accurate distance estimate. This
approach achieves real-time performance without the use of extra sensors or difficult
calibration procedures by utilising single-view geometry concepts and geometric
transformations. The test results show that this method works well, giving it a viable option
for distance estimation in a variety of contexts, such as driver assistance systems and
Kim (2019) In their research paper presents an effective approach for vehicle detection and
distance estimates using inverse perspective mapping (IPM) using a single camera and
composite channel data. The suggested technique creates composite channel features using
colour and gradient data, which makes it possible to detect vehicles effectively. The method
precisely calculates the separation between identified vehicles using IPM. The test findings
show that this approach is successful and efficient, making it suited for a number of
applications, such as sophisticated driver assistance systems and intelligent traffic systems.
Qi et al. (2019) In their research paper presents a technique for measuring distance based
on knowledge of the posture of the vehicle and employing a monocular camera. The
suggested method calculates the vehicle's posture from the image using the known
measurements of the vehicle. This approach determines the distance between the camera
and the vehicle with accuracy by integrating pose estimation with camera correction
parameters. The method is useful for distance estimate in applications like autonomous
14
driving and collision avoidance systems because the test results show its efficacy in real-
world circumstances.
Yilmaz et al. (2018) In their research paper present a deep learning-based technique for
vehicle detection. Convolutional neural networks (CNNs) are employed by the suggested
technique to identify cars in photos. The CNN model learns robust features for precise
detection thanks to training on a big dataset of vehicle photos. The test results demonstrate
that this approach achieves high detection performance with quick processing time, making
it appropriate for real-time applications like traffic monitoring and autonomous driving.
The study advances the use of deep learning algorithms for vehicle detection.
Lee et al. (2018) In their research paper introduce a Faster R-CNN-based frame similarity
solution for rear-approaching vehicle identification. The suggested method uses frame
comparison to find incoming automobiles. The Faster R-CNN architecture is incorporated
into the approach to produce accurate detection outcomes. According to experimental
evaluations, the method is effective at identifying vehicles that are approaching from
behind, making it useful for applications like collision warning systems.
Kim et al(2018) In their research paper present a faster R-CNN-based object detection
system created specifically for unmanned surface vehicles (USVs). The suggested
approach enables precise and immediate object recognition in the USV's environment,
improving navigation and safety capabilities. According to experimental findings, the
algorithm is effective at recognising a variety of objects, making USV applications
possible.
Sommer et al. (2018) In their research paper gives a thorough review of techniques for
detecting vehicles using deep learning in aerial pictures. Using several data sets and criteria,
it assesses the performance of various models, including Faster R-CNN, SSD, and YOLO.
The findings illustrate each method's advantages and disadvantages, assisting in the
selection of the most suitable algorithms for vehicle detection in aerial imaging
applications.
Nguyen et al(2018)In their research paper proposes a real-time media detection technique
that integrates 3-channel pattern analysis and insights. The algorithm offers accurate and
effective vehicle detection in a variety of settings by using the effective area suggestion
15
approach. The test results show that the suggested technique is quick and effective, making
it appropriate for real-time applications that need for accurate vehicle detection.
Yabo et al. (2016) In their research paper presents a computer vision-based vehicle
classification and speed estimation system. The suggested method accurately categorises
various vehicle kinds and calculates their speeds by examining the visual properties of
moving objects. The results of the experiments show how effective this strategy is and
highlight its potential use in traffic management and intelligent traffic systems.
Rezaei et al. (2015) In their research paper present a potent vehicle identification and
distance estimation technique made to work in poor lighting. The suggested solution
provides precise vehicle detection and distance calculation even under poor illumination
conditions by utilizing cutting-edge image processing techniques. The test results show
how the method can be used to increase the accuracy of vehicle identification and distance
estimation systems in a variety of real-world scenarios.
Chen et al. (2015) In their research paper present an combined lane detection and vehicle
detection system with distance estimation. The suggested method accurately detects
vehicles and establishes lane boundaries by combining computer vision techniques.
Additionally, it offers a rough estimation of the separation between the vehicle and other
objects, which enhances scene comprehension. The test results show the integrated system's
efficiency and scope of use in the computer vision industry.
Kim et al(2012) In their research paper presents a system based on vision that is specifically
made for driver warning systems that is used for vehicle detection and distance estimation.
The suggested solution accurately identifies vehicles in real time and calculates the distance
between them using computer vision techniques. By giving timely alerts based on the
location and distance of detected vehicles, the system seeks to increase driver safety.
16
Table 2.1. LITERATURE REVIEW OBSERVATIONS
Title of
SL 8. Problem Addressed
Paper and Year of Author
No cl
Publication
as Vehicle detection and speed
Customized Deep Learning
si estimation
Technique for vehicle Tamang, B et al
1. fi
detection along with speed Object detection with distance
c
estimation (2022) estimation
at
Dist-YOLO: Fast io Vajgl, M et al. Traffic congestion management
Object n based on vehicle density
2. Detection with Distance s
Estimation (2022) y Lung cancer recognition
Manohar, N et al
Traffic Congestion st
Management based on e
Segmentation and classification of
3. Vehicle Density Using m
moving objects
Image Processing b Suresha R et al.
Techniques. as
Drowsy driver detection
(2022) e
Support Vector d Chandrajit et al
o Monocular vehicle velocity and
Machine Classifier
n inter-vehicle distance
based
4.
Lung Cancer Recognition: A vi
Akshay S et al
Fusion Approach (2022) rt
Vehicle counting, speed
Robust segmentation and u
estimation, and classification
classification of moving al
5. d
objects from surveillance Huang, K et al
et
video (2021)
e
Drowsy driver
ct
detection
io
6. using eye-tracking through
n
machine learning. Lin, C. J et al
z
(2021) Multi-Stream
o
Attention Learning for
n
Monocular
e
7. Vehicle Velocity and Inter-
a
Vehicle Distance Estimation
n
(2021) d
A real-time vehicle YOLO
counting, speed estimation,
and
17
Methods and Techniques Used c fu ocessing
l sio techniques
a n
Customized deep learning
s ap
technique Eye-tracking and
s pr
machine learning
i oa
f ch
Dist-YOLO model
i
Multi-stream
e I
attention learning
r m
Image processing techniques a

a g
n e
Support Vector Machine (SVM) pr Virtual detection
d
zone, YOLO
18
Real-time vehicle detection,
Vehicle detection,
tracking, and inter-vehicle
tracking, and Stereovision, deep
9. distance estimation based on Bourja, O et
al inter-vehicle learning, YOLO
stereovision and deep
distance
learning using YOLO
(2021) Distance
Distance measurement measurement for
10. Zaarane, A Stereo camera
system for autonomous autonomous
et al
vehicles using stereo vehicles
camera (2020)
Multiple ca detection, Car detection,
Sushmitha, S Computer vision
11. recognition and tracking recognition, and
et al techniques
in traffic. (2020) tracking in
traffic
Fast, Accurate Vehicle Vehicle detection

Faster R-CNN, deep
12. Detection and Ma, Q et and distance
al learning
Distance Estimation estimation
(2020)
Real-time vehicle distance

Vehicle distance Single-view
13. estimation using single- Ali, A et
al estimation geometry
view geometry (2020)
Efficient vehicle detection

and distance estimation Aggregated channel
Vehicle detection
based on aggregated features, inverse
14. Kim, J. B and distance
channel features and inverse perspective
estimation
perspective mapping from a mapping
single camera (2019)
Monocular
Distance estimation of Monocular vision,
distance
15. monocular based on Qi, S et vehicle pose
al estimation based
vehicle pose information information
on vehicle pose
(2019)
learning
Vision-based vehicle Vision-based vehicle detection
methodologies
detection and inter-vehicle and inter-vehicle distance
16. (2018) Liu, Z et al
distance estimation for Vehicle detection using deep
driver alarm systems (2019) learning
A vehicle detection
17. approach using deep Yilmaz, A et al
19
ision q s
based u Deep learning
V
techni e methodologies
20
Rear-approaching vehicle
detection using frame Rear-approaching Faster R-CNN,
18. Lee, Y et al
similarity based on faster vehicle detection frame similarity
R- CNN (2018)
Object detection algorithm
Object detection
for unmanned surface
19. Kim, H et al for unmanned Faster R-CNN
vehicles using faster R-
surface vehicles
CNN (2018)
Comprehensive analysis of
deep learning-based Sommer, L Vehicle detection Deep learning-based
20.
vehicle detection in aerial et al in aerial images techniques
images (2018)
Real-time vehicle detection
using an effective region Nguyen, V. Real-time vehicle Region proposal-
21.
proposal-based depth and D et al detection based techniques
3- channel pattern (2018)
Vehicle classification and

Vehicle
speed estimation using Computer vision
22. Yabo, A et classification and
computer vision al techniques
speed estimation
techniques (2016)
Robust vehicle detection

Vehicle detection Robust techniques
and distance estimation
23. Rezaei, M et and distance under challenging
under challenging lighting al
estimation lighting conditions
conditions (2015)
Vehicle and
Integrated vehicle and Integrated detection
lane detection
24. lane detection with Chen, Y et and distance
al with distance
distance estimation (2014) estimation
estimation
Vision-based vehicle Vision-based

detection and inter-vehicle vehicle detection Vision-based
25. Kim, G et
distance estimation for al and inter-vehicle techniques
driver alarm systems (2012) distance
2.2 MOTIVATION
There are several motivations for developing and improving vehicle detection and distance
estimation technologies. Here are a few:
21
 Safety: By warning drivers of potential risks and assisting with collision avoidance,
vehicle detection and distance estimating technology can help reduce the likelihood
of accidents.
 Efficiency: In order to improve traffic flow, lessen congestion, and boost overall
transportation efficiency, accurate vehicle identification and distance estimation
can be used.
 Autonomous driving: Autonomous driving systems must have vehicle detection and
distance estimation in order for vehicles to effectively assess their surroundings and
make decisions.
 Traffic management: Traffic can be monitored and managed using vehicle

recognition and distance estimation, for example, by altering traffic signals to ease
congestion or rerouting vehicles during emergencies.
 Environmental impact: Vehicle detection and distance estimation can assist lessen
the environmental effect of transportation, particularly emissions from idling
automobiles, by improving transportation efficiency and lowering congestion.
Overall, the development and improvement of vehicle detection and distance estimation
technologies have the potential to greatly enhance the safety, efficiency, and
sustainability of our transportation systems, making our roads safer and more accessible
for everyone.
22
3. IMAGE/VIDEO PRE-PROCESSING
3.1 INTRODUCTION
Video preprocessing is an important step in vehicle detection and range estimation,

aimed at improving the quality and sharpness of video images and improving the
accuracy and reliability of subsequent analyses. In this context, techniques such as
bilinear interpolation and autoencoder EfficientNet B0 have received considerable
attention due to their effectiveness in enhancing video content.
Vehicle detection and distance estimation rely heavily on visual information captured
from video images. However, videos often suffer from low resolution, noise, motion
blur, and other issues that can negatively impact the performance of recognition
algorithms. Therefore, preprocessing techniques play an important role in overcoming
these challenges and improving the overall quality of video images. Bilinear
interpolation is a widely used technique to improve the resolution and sharpness of
video images by estimating missing pixel values based on neighboring pixels. Bilinear
interpolation fills gaps and improves visual fidelity, which can improve the accuracy of
subsequent vehicle detection and range estimation algorithms.
Additionally, a powerful deep learning model, Autoencoderefficientnet B0, can be used

for video preprocessing. By leveraging the representation learning capabilities of
autoencoders, efficientnet B0 can effectively denoise and reconstruct video images,
reduce the effects of noise, and improve the quality of visual information.
The purpose of this work is to investigate the application of bilinear interpolation and
his EfficientNet B0 autoencoder for video preprocessing in the context of vehicle
detection and range estimation. The aim is to improve the quality and resolution of
video images to enable more accurate and reliable analysis. In the next section, we
review the methods and experimental results related to these techniques and highlight
their effectiveness in improving the performance of vehicle detection and range
estimation algorithms.
23
3.2 PROPOSED METHODOLOGY FOR ENHANCEMENT PROCESS
The video/image enhancement process aims to enhance the visual quality of a given
video or image by improving its clarity, sharpness, and overall appearance. Two widely
used techniques in this process are bilinear interpolation and the autoencoder
efficientNet B0.
Figure. 3.2.1
Fig 3.2.1. Architecture of Image/Video Enhancement
Bilinear interpolation is a method employed for resizing images or videos. It calculates

the color values of new pixels based on a weighted average of the surrounding pixels.
When applied to video/image enhancement, bilinear interpolation can effectively
increase the resolution of low-resolution media. Estimating missing pixel values
through interpolation enhances the overall visual quality.
On the other hand, the autoencoder efficientNet B0 is a deep learning model specially
designed for image classification and feature extraction. Comprising an encoder and a
decoder, the efficientNet B0 learns to extract meaningful features from input images
and reconstructs the original image using those features. Within the video/image
enhancement process, efficientNet B0 serves as a feature extractor, capturing crucial
visual details. Through training on a large dataset of high-quality images, the model
can enhance the visual quality of low-resolution or degraded images by reconstructing
them with higher fidelity.
To summarize, bilinear interpolation is utilized to resize and increase the resolution of

images or videos, while the autoencoder efficientNet B0 is a deep learning model that
extracts features and enhances the visual quality of low-resolution or degraded images.
The combination of these two methods can lead to significant improvements in the
overall visual quality of videos and images.
22
3.2.1 BILINEAR INTERPOLATION
Bilinear interpolation is a widely used technique in image and video enhancement for
increasing the resolution of low-resolution media. It estimates missing pixel values by
taking the weighted average of surrounding pixels. The process comprises four key
steps: sampling, weight calculation, multiplication, and summation.
To begin, neighboring pixels closest to the target pixel are sampled, usually forming a
square or rectangle around it. The next step involves calculating weights for each
neighboring pixel based on their distances from the target pixel. These weights
determine the influence of each pixel on the final estimation and are commonly
determined using linear or quadratic functions.
Once the weights are determined, they are multiplied by the corresponding color values
of the neighboring pixels. These weighted color values are then summed up to obtain
an interpolated value for the target pixel. This process is repeated for each pixel in the
low-resolution image or video, resulting in an enhanced output with increased
resolution.
Bilinear interpolation provides a smooth and continuous estimation of missing pixel

values, resulting in visually pleasing enhancements. However, it's important to note that
bilinear interpolation may introduce some blurring or loss of detail, particularly when
significantly increasing the resolution.
The formula used for bilinear interpolation is:
NewPixel = (1 - x_frac) * (1 - y_frac) * Pixel_A + x_frac * (1 - y_frac) * Pixel_B +

(1 - x_frac) * y_frac * Pixel_C + x_frac * y_frac * Pixel_D
Here, x_frac and y_frac represent the fractional parts of the x and y coordinates of the
target pixel within the square formed by the neighboring pixels. Pixel_A, Pixel_B,
Pixel_C, and Pixel_D are the color values of the four neighboring pixels.
23
The weights assigned to each neighboring pixel depend on their distance from the target
pixel. The closer a neighboring pixel is to the target pixel, the higher its weight in the
interpolation process. By applying this formula to all pixels in a low-resolution image,
we can resize and increase its resolution.
The interpolation process improves the visual quality of the resized image by estimating
missing pixel values and smoothing out the transitions between pixels. It ensures that
the resized image appears more natural and visually pleasing.
To summarize, bilinear interpolation involves calculating the color value of a target

pixel by taking a weighted average of neighboring pixel values. This formula, applied
to each pixel in a low-resolution image, helps to enhance its visual quality by estimating
missing pixels and creating a smoother transition between adjacent pixels.
3.2.2. AUTOENCODER-DECODER EFFICIENTNET-B0
The efficient autoencoder decoderPowerful deep learning models like Net B0 are
frequently employed in the process of improving images and videos. It is based on the
convolutional neural network (CNN) efficientNet B0 architecture, which combines
autoencoders and is intended for feature extraction and image categorization.
Fig 3.2.2.1: Architecture of Autoencoder-decoder

An encoder plus a decoder make together a neural network called an autoencoder. The
decoder reconstructs the original input from the features that were extracted after the
encoder has learned to compress and extract useful features from the input data. The
autoencoder can be taught to recognise key visual details and improve the visual quality
24
of low-resolution or degraded photos by being trained on a sizable dataset of high-
quality images.
The efficientNet B0 architecture is employed as the encoder in the autoencoder decoder

efficientNet B0 while the decoder is in charge of recreating the improved image. The
lightweight and effective efficientNet B0 CNN architecture has been successful in a
number of computer vision workloads.
Fig 3.2.2.2. Architecture of EfficientNet-B0
The encoder (efficientNet B0) learns to take useful features out of high-resolution
images during the training process. The decoder receives these features after which it
combines them to rebuild the improved image. The aim is to reduce the difference
between the high-resolution original image and the augmented reconstructed image.
Once the autoencoder decoder efficientNet B0 model is trained, it can be used for image
and video enhancement. Given a low-resolution or degraded image or video, the model
applies the learned feature extraction and reconstruction process to enhance the visual
quality. It reconstructs the image or video with higher fidelity, improving clarity,
sharpness, and overall appearance.
The advantage of using the autoencoder decoder efficientNet B0 for image and video
enhancement is that it can capture complex visual patterns and details, allowing for
significant improvements in visual quality. Moreover, the efficientNet B0 architecture
provides an efficient and lightweight solution, making it suitable for real-time
applications.
In summary, the autoencoder decoder efficientNet B0 combines the power of

autoencoders and the efficientNet B0 architecture for image and video enhancement.
25
By learning to extract meaningful features from high-resolution images and
reconstructing enhanced versions, this model can effectively enhance the visual quality
of low-resolution or degraded images and videos.
3.3 EXPERIMENTAL ANALYSIS
The experiment was conducted using Python programming language version 3.7 on a
machine with an Intel® Core TM i5 processor @ 2.00 GHz and 8 GB of RAM. The
experiment utilized videos from the KITTI dataset, for evaluation. The aim was to
assess the performance of the proposed autoencoder decoder efficientNet B0 model for
image and video enhancement.
3.4 DATASET DESCRIPTION
The KITTI Vision Benchmark dataset is extensively utilized in computer vision

research, especially in the autonomous driving domain. It consists of high-resolution
images and videos recorded from a vehicle navigating urban environments. The dataset
encompasses multiple data modalities such as RGB images, depth maps, lidar data, and
camera calibration parameters. Researchers rely on the KITTI dataset to develop and
assess algorithms for tasks like object detection, tracking, scene understanding, and
other relevant applications in real-world driving scenarios.
Table 3.4.1 Represents the distribution of dataset into two different parts training and
testing with split ratio 80:20 and its further classified into vehicles and non-vehicles
class labels
Table 3.4.1 KITTI Dataset Distribution
Class Labels No. of Samples Total Images

Vehicles 24620
Training 32456
Non-vehicles 7836
Testing Vehicles 5600 8114
26
Non-vehicles 2514
Total 40570
Figure 3.4.1 contains some samples of the KITTI dataset which includes of overall
40570 images that can be categorized into vehicles and non-vehicle.
Fig 3.4.1. Samples of KITTI Dataset
3.5 RESULTS OF IMAGE/VIDEO PRE-PROCESSING
Figure 3.5.1 showcases the output of the enhancement process using bilinear
interpolation and autoencoder-decoder EfficientNet B0 techniques.
Before Enhancement After Enhancement
27
Figure 3.5.1. The output of the Enhancement Process
3.6 CONCLUSION
In conclusion, the enhancement of videos and images using bilinear interpolation and
the autoencoder decoder efficientNet B0 model offers promising results for improving
visual quality. Bilinear interpolation provides a straightforward method for increasing
28
resolution and smoothing pixel transitions, resulting in enhanced clarity and sharpness.
On the other hand, the autoencoder decoder efficientNet B0 model leverages the power
of deep learning to extract meaningful features and reconstruct high-quality images or
videos. By combining these approaches, we can achieve significant improvements in
the overall visual quality of low-resolution or degraded content. The proposed
methodology offers a versatile and effective solution for enhancing videos and images,
with the potential for various applications, including multimedia content restoration,
video surveillance, and autonomous driving systems. Further research and
experimentation can explore the full potential of these methods and refine their
performance for even better results.
4. VEHICLE DETECTION
4.1 INTRODUCTION
29
Vehicle detection is a fundamental task in computer vision with applications in traffic
management, autonomous driving, and surveillance systems. Accurate localization and
identification of vehicles in images and videos are crucial for understanding traffic
patterns and ensuring road safety.
Vehicle detection is challenging due to various factors like diverse vehicle appearances,
occlusions, changing lighting conditions, and complex backgrounds. To address these
challenges, deep learning and convolutional neural networks (CNNs) have
revolutionized vehicle detection by capturing intricate features that differentiate
vehicles from the background.
Researchers have developed different techniques, such as region-based methods,

sliding window approaches, and single-shot detectors, which leverage CNNs for feature
extraction and employ advanced algorithms for object localization and classification.
Vehicle detection finds applications in traffic management, enabling real-time traffic

monitoring, congestion detection, and traffic control. In autonomous driving, accurate
vehicle detection is crucial for perceiving the environment and ensuring safe
navigation. Surveillance systems help in identifying and tracking vehicles for security
purposes.
In conclusion, vehicle detection is a vital task in computer vision with diverse

applications. Advancements in deep learning and CNNs have greatly improved the
accuracy and efficiency of vehicle detection algorithms, contributing to the
development of intelligent transportation systems and enhancing safety in various
domains.
4.2 PROPOSED METHODOLOGY
The proposed methodology for vehicle detection utilizes the Faster R-CNN framework
with RESNET50 as the backbone. Faster R-CNN is a two-stage object detection system
30
that combines region proposal generation and object classification in a single network.
RESNET50 serves as the backbone architecture responsible for extracting informative
features from input images.
In this methodology, RESNET50 extracts discriminative features by employing

multiple convolutional layers to capture hierarchical representations of the images.
These features are crucial for distinguishing vehicles from the background. The region
proposal network (RPN) then generates potential bounding box proposals based on the
feature map produced by RESNET50. These proposals are subsequently refined and
classified by the network's subsequent layers.
By leveraging the feature extraction capabilities of RESNET50 and the region

proposals generated by the RPN, the Faster R-CNN architecture enables accurate
localization and classification of vehicles. The model is trained on a comprehensive
dataset of labeled vehicle images, allowing it to learn the visual characteristics and
spatial relationships necessary for vehicle detection.
Once trained, the Faster R-CNN with RESNET50 can be applied to new images or
videos to detect vehicles. It processes the input data, identifies potential vehicle regions,
refines the bounding box predictions, and classifies the detected regions as either
vehicles or backgrounds.
This proposed methodology provides a robust and accurate approach to vehicle

detection by combining deep learning techniques with the strong feature extraction
capabilities of RESNET50. By incorporating these components, it achieves reliable
vehicle detection across various real-world scenarios, making it highly valuable in
applications such as autonomous driving, traffic monitoring, and surveillance systems.
4.2.1 FASTER RCNN ( RESNET-50) AS BACKBONE FOR VEHICLE

DETECTION
31
The vehicle detection process using Faster R-CNN with RESNET50 as the backbone
comprises several stages to precisely identify and locate vehicles in images or videos.
Fig 4.2.1. The architecture of Faster RCNN with RESNET-50 as the backbone
To begin, the input image or video frame is preprocessed to ensure it is in a suitable

format for the model, including resizing and normalization. RESNET50, serving as the
backbone architecture, extracts essential features from the input through multiple
convolutional layers, capturing intricate visual patterns that distinguish vehicles from
the background.
Next, the feature map obtained from RESNET50 is fed into the Region Proposal
Network (RPN), which generates potential bounding box proposals. The RPN analyzes
different regions of the feature map and predicts the likelihood of containing objects,
specifically vehicles. These proposals serve as initial regions of interest.
In order to handle fixed-size feature maps for classification and regression tasks, the
RoI pooling layer extracts feature maps from each proposed region. The next step is to
classify whether each suggested region contains a vehicle and to refine the bounding
box coordinates of the identified vehicles using the retrieved features.
Non-maximum suppression (NMS) is used to get rid of redundant detections. This

method eliminates overlapping bounding box predictions for the same vehicle and
chooses the most certain detection.
Finally, post-processing techniques are employed to refine the detected vehicle regions
further. This may involve filtering out detections based on confidence scores, applying
32
size constraints, or incorporating domain-specific knowledge to enhance detection
accuracy.
During training, the Faster R-CNN with RESNET50 is optimized using a large dataset
of labeled vehicle images. Through backpropagation and gradient descent, the model
learns to accurately identify and localize vehicles by optimizing classification and
regression components in the loss function.
Once trained, the model can be applied to new images or videos. The input data
undergoes preprocessing, followed by feature extraction, region proposal generation,
classification, regression, NMS, and post-processing. The result is a set of bounding
box predictions representing the detected vehicles.
The combination of Faster R-CNN and RESNET50 as the backbone architecture

provides a robust solution for vehicle detection. RESNET50's deep learning capabilities
enable effective feature extraction, while the two-stage architecture of Faster R-CNN
ensures accurate localization and classification of vehicles. This methodology has
demonstrated high effectiveness in applications like autonomous driving, traffic
monitoring, and surveillance systems.
4.3 RESULTS OF VEHICLE DETECTION USING FASTER RCNN
Table 4.6.1 contains classification reports for various vehicles detected using the Faster
RCNN model. Accuracy, recall, and F1 scores are calculated for each object class (car,
bike, truck, van). Accuracy describes the accuracy of positive predictions, recall
measures the percentage of positive predictions that were actually recognized correctly,
and the F1 score is a combination of both measures. The Support column indicates the
number of instances of each class. The macro average shows overall performance
across all classes. Overall accuracy can be calculated as the average accuracy across all
classes. In this case, it is about 0.95 which is 95 percent
Table 4.3.1. Classification report of vehicle detection using Faster RCNN
33
class Precision Recall F1-Score Support
Shadow 0.95 0.96 0.95 76
Non Shadow 0.93 0.97 0.95 68
Hard Shadow 0.94 0.88 0.95 34
Soft Shadow 0.98 0.93 0.95 45
Macro avg 0.95 0.94 0.94 223
The classification outcomes of our proposed model for vehicle detection are evaluated
using various metrics such as F1 score, precision, recall, and support. The formulas for
the same are mentioned below
P = T_P / (T_P+ F_P)
R = T_P / (T_P + F_N)
F1-score = 2* ( P * R ) / ( P + R )
Support = Total number of samples belonging to a class
Where P stands for Precision, R stands for Recall, T_P (vehicles that were correctly
detected), as well as F_P (vehicles that were not detected correctly), and F_N
(undetected vehicles), are taken into account in these metrics .These metrics provide
valuable insights into the performance of our model in identifying and estimating the
distance of vehicles
Figure 4.6.1 showcases the output of vehicle detections obtained using the Faster R-
CNN with RESNET50 as the backbone methodology.
34
Fig 4.3.1. Vehicle detection results using Faster R-CNN with RESNET50
4.4. CONCLUSION
In conclusion, the integration of Faster R-CNN with RESNET50 as the backbone

architecture provides a powerful and reliable solution for vehicle detection in images
and videos. By leveraging the capabilities of deep learning and the advanced feature
extraction abilities of RESNET50, this methodology achieves accurate identification
and localization of vehicles in complex scenes.
The two-stage approach of Faster R-CNN enables the generation of region proposals
and precise object classification and bounding box regression. With the assistance of
the RESNET50 backbone, the model extracts meaningful and discriminative features,
capturing intricate visual patterns that distinguish vehicles from the background.
The utilization of deep learning and convolutional neural networks has brought
significant improvements in the accuracy and efficiency of vehicle detection. The
incorporation of RESNET50 as the backbone architecture further enhances the feature
extraction process, facilitating better discrimination and representation of vehicles.
The Faster R-CNN with RESNET50 backbone has demonstrated exceptional

performance across various domains, including autonomous driving, traffic monitoring,
and surveillance systems. Its ability to accurately detect and localize vehicles
35
contributes to enhanced road safety, intelligent transportation systems, and improved
traffic management.
To further augment this methodology, additional techniques such as data augmentation,

fine-tuning, and multi-scale evaluation can be employed to enhance the model's
robustness and generalization capabilities.
Overall, the integration of Faster R-CNN with RESNET50 as the backbone architecture
presents a cutting-edge approach to vehicle detection, offering high accuracy and real-
time performance. Ongoing research and development continue to advance this
methodology, propelling the field of computer vision and driving advancements in road
safety and transportation efficiency.
36
5. DISTANCE ESTIMATION AND CLASSIFICATION
5.1. INTRODUCTION
Distance estimation and distance classification are fundamental tasks in machine vision
that play an important role in various applications such as robotics, augmented reality,
and autonomous driving. Accurately determining the distance between objects in a
scene is important for understanding spatial relationships, depth perception, and
interaction with the environment.
Distance estimation is estimating the exact physical distance between the camera or
observer and the objects in the scene. This task is difficult due to factors such as
perspective distortion, occlusion, varying lighting conditions, and complex scene
geometry. Researchers have developed various techniques for estimating distance,
including stereo vision, depth from monocular cues, and depth sensing techniques such
as LiDAR and structured light. Distance classification, on the other hand, focuses on
classifying objects into different distance ranges, such as near, medium, and far. This
classification provides valuable information for scene understanding and object
interaction. This enables applications such as obstacle detection, path planning, and
depth-aware scene segmentation.
Both distance estimation and distance classification use mathematical models, machine
learning algorithms, and sensor techniques to derive distance information from visual
input. These approaches use features extracted from images or sensor data to accurately
estimate or classify distances. Accurate distance estimation and classification enable
accurate object localization, 3D reconstruction, and scene understanding. These are key
components in developing intelligent systems that can effectively perceive and interact
with their environment.
Ongoing research and advances in computer vision continue to improve the accuracy
and robustness of distance estimation and classification techniques. These advances are
helping develop safer and smarter systems in areas such as self-driving cars, robotics,
and immersive augmented reality experiences.
37
5.2. PROPOSED METHODOLOGY
Distance estimation and distance classification are important tasks in image processing
that allow us to accurately measure and classify the distances between objects in a
scene. A common approach to distance estimation is to use the Euclidean distance,
which computes the straight-line distance between two points in space.
Various machine learning classifiers can be used to classify estimated distances into
specific labels. Decision tree classifiers partition the feature space based on a series of
hierarchical decisions, while random forest classifiers combine multiple decision trees
to improve accuracy. K-Nearest Neighbors (KNN) assigns labels based on a majority
vote of the nearest neighbors in the feature space. Support vector machines (SVMs)
create decision boundaries that maximize the margins between different classes. Naive
Bayesian classifiers classify distances using a probabilistic model based on Bayes'
theorem.
After the distances have been estimated and classified, it is common to assign labels to
the various distance ranges. 0-10m, 10-20m, etc. up to 100m. This classification
provides meaningful information for a variety of applications, such as avoiding
obstacles and determining appropriate interactions with objects in the environment.
To assess the accuracy of our distance estimates, we can use linear regression to find
the mean absolute error (MAE) between the actual and predicted distances. Linear
regression models the relationship between estimated and actual distances, allowing us
to measure the mean absolute difference between them.
Combining distance estimation using Euclidean distance with various classifiers such
as decision trees, random forests, KNN, SVM, naive Bayes, and distance classification
to specific labels enables accurate distance estimation and classification. increase.
Calculating his MAE using linear regression gives insight into the accuracy of the
distance estimation model.
38
These technologies have applications in a variety of fields, including robotics, self-
driving cars, virtual reality, and human-computer interaction. It enables accurate
distance recognition and supports the decision-making process based on distance
information. Ongoing research is improving the performance and robustness of these
techniques, driving advances in computer vision and related fields.
5.2.1. EUCLIDEAN DISTANCE FOR ESTIMATING DISTANCE
Euclidean distance is a commonly used measure for estimating the distance between
two vehicles in two-dimensional (2D) space. Calculate the straight-line distance
between the centroids of the bounding boxes that enclose the vehicle.
To estimate distance using Euclidean distance, we first need to find the coordinates of
the centroid of the bounding box. The centroid represents the center of the bounding
box and serves as a representative point for the vehicle's position. The coordinates can
be found using the formula:
centroid_x = (bounding box_x_min + bounding box_x_max) / 2
centroid_y = (bounding box_y_min + bounding box_y_max) / 2
where Bounding Box_x_min and Bounding Box_x_max represent the minimum and
maximum x-coordinates of the bounding box, respectively. Similarly, Bounding
Box_y_min and Bounding Box_y_max represent the minimum and maximum y-
coordinates of the bounding box, respectively.
Once the coordinates of the centroids of both vehicles are known, you can calculate the
Euclidean distance between them using the following formula:
distance = sqrt((centroid2_x - centroid1_x)^2 + (centroid2_y - centroid1_y)^2)
39
Here, Centroid1_x and Centroid1_y represent the coordinates of the centroid of the first
vehicle, and Centroid2_x and Centroid2_y represent the coordinates of the centroid of
the second vehicle.
Calculating the Euclidean distance gives an estimate of the distance between two
vehicles in 2D space. This distance information is useful in various applications such
as collision avoidance, traffic monitoring, and autonomous driving systems.
5.2.2. CLASSIFICATION OF DISTANCES USING VARIOUS

CLASSIFIERS
In the task of classifying distances into different labels, various classifiers can be
employed to effectively categorize the predicted distances. Commonly used classifiers
include decision trees, random forests, k-nearest neighbors (KNN), support vector
machines (SVM), and naive Bayes.
These classifiers use features extracted from the data, such as Euclidean distances
between vehicles, to make predictions. The predicted distances are categorized into
various labels or ranges starting from 0-10m up to 100m.
Additionally, a scale-based index called bounding box ratio can be used as a feature in
the classification process. The bounding box ratio is calculated by dividing the area of
the box by the area of the bounding box. This ratio indicates the relative size of the
vehicle within the frame. A larger bounding box ratio might indicate that the vehicle is
closer, while a smaller ratio might indicate that the vehicle is farther away.
The bounding-Box ratio can be calculated by using the below-mentioned formula.
bb-ratio = Bounding-Box Area / Image-Frame Area
Bounding-Box Area =(Right - Left) * (Bottom - Top)
Image-Frame Area = Width * Height
40
Classifiers consider these features, such as Euclidean distance and bounding box ratio,
and apply classification algorithms to match predicted distances to appropriate labels
or regions. Decision tree and random forest algorithms create decision rules based on
features, KNN classifies based on nearest neighbors, SVM uses hyperplanes to separate
data, and Naïve Bayes estimates probabilities of different classes. To do. These
classifiers allow you to accurately classify predicted distances into specific distance
labels, allowing you to better understand and analyze spatial relationships between
vehicles. This information is important in various applications such as collision
avoidance systems, traffic management, and autonomous driving.
5.2.3. LINEAR REGRESSION FOR CALCULATING MEAN ABSOLUTE

ERROR (MAE)
Linear regression is used to calculate the mean absolute error (MAE) between predicted
distances and actual distances obtained from the KITTI benchmark dataset. The
bounding box coordinates and the Euclidean distance between vehicles are used as
input variables for the regression model. By fitting a linear regression line to the data,
the model estimates the relationship between predicted and actual distances. The MAE
is then calculated by comparing the predicted distances to the ground truth distances.
This provides a quantitative measure of the mean absolute deviation between the
predicted distance and the actual distance, allowing us to assess the accuracy of distance
estimation methods.
5.4. RESULTS OF DISTANCE ESTIMATION AND CLASSIFICATION
Table 5.4.1 presents a classification report for different classifiers used to categorize
distances into various distance range labels. Three split ratios (80:20, 70:30, and 60:40)
were considered for each classifier, and precision, recall, and F1-score were used as
evaluation metrics.
41
The findings demonstrate that Decision Tree and Random Forest classifiers perform
well across all split ratios, exhibiting higher precision, recall, and F1-scores. On the
other hand, SVM, KNN, and Naïve Bayes classifiers show lower performance, with
lower precision, recall, and F1-scores. These results indicate that Decision Tree and
Random Forest classifiers are more effective in accurately classifying distances into
different distance range labels compared to the other classifiers.
Table 5.4.1. Classification report of various classifiers with different split ratios
80:20 70:30 60:40

Classifier
F1- F1- F1-
Precision Recall Precision Recall Precision Recall
score score score
Decision
0.7159 0.7171 0.7163 0.6976 0.7028 0.7 0.6948 0.9714 0.6806
Tree
Random
0.7386 0.7615 0.7492 0.7444 0.7588 0.7511 0.7328 0.7269 0.727
Forest
SVM 0.1623 0.42 0.32 0.29 0.43 0.32 0.29 0.39 0.32
KNN 0.29 0.1069 0.127 0.1644 0.1082 0.1283 0.16446 0.1083 0.1286
Naïve
0.33 0.23 0.22 0.31 0.23 0.21 0.32 0.24 0.21
Bayes
Table 5.6.2 presents the MAE values for different class objects detected at various split
ratios. The MAE values for each object class, such as Car, Cyclist, Miscellaneous,
Pedestrian, Person-sitting, Tram, Truck, and Van, are shown. The MAE values range
from 9.9369 to 28.0924, providing insights into the accuracy of distance estimation for
each class object under different split ratios.
42
Table 5.4.2. Mae Values of Different Class Objects Detected at Various Split Ratios
Object Name 80:20 70:30 60:40

Car 9.978733886 10.0268095 9.936901406
Cyclist 11.16428242 11.4215627 11.17915503
Miscellaneous 12.51776097 13.40813384 13.61461826
Pedestrian 15.19717265 14.87062405 14.93786301
Person-sitting 17.9232253 18.02652054 17.9346339
Tram 27.58358487 27.60191057 28.09238412
Truck 21.73652416 21.54591141 21.53762992
Van 13.62831799 13.99093808 13.90897253
Figure 5.4.1 represents the pictorial representation of Table 5.4.1 in a bar graph to
visually analyze the performance of each classifier in detail
Distance Classification Performance Comparison of Classifiers

1.2
0.8
0.6
0.4
0.2
0
PrecisionRecallF1-scorePrecisionRecallF1-scorePrecisionRecallF1-score Split Ratio 80:20Split Ratio 70:30Split Ratio 60:40
Decision TreeRandom ForestSVMKNNNaïve Bayes
Figure 5.4.1. Distance Classification Performance Comparison of Classifiers
43
Figure 5.4.2 represents the pictorial representation of Table 5.4.2 in a bar graph to
visually analyze the Mean Absolute Error of each object class with various split ratios.
Analysis of Mean Absolute Error (MAE) Values for Various Vehicle Types
30
25
20
15
10
5
MAE Value
Object Types
Split Ratio 80:20Split Ratio 70:30Split Ratio 60:40
Figure 5.4.2. Analysis of Mean Absolute Error (MAE) Values for Various Vehicle Types
5.5. CONCLUSION
To conclude, the approach of employing different classifiers for distance classification,
as demonstrated in the provided table, provides valuable insights into their performance
in accurately categorizing distances into distance range labels. The results consistently
show that Decision Tree and Random Forest classifiers outperform the others across
various split ratios, exhibiting higher precision, recall, and F1 scores. These classifiers
prove effective in accurately classifying distances into their respective categories.
On the contrary, SVM, KNN, and Naïve Bayes classifiers exhibit relatively lower
performance, with lower precision, recall, and F1-scores. This indicates that they may
not be as reliable for distance classification in this particular scenario.
In summary, this study emphasizes the significance of carefully selecting suitable

classifiers for distance classification tasks. Decision Tree and Random Forest classifiers
exhibit promise for accurately categorizing distances, warranting further
44
experimentation and optimization to enhance the performance of classifiers with lower
accuracy.
Future research can explore the evaluation of additional classifiers and alternative
techniques to improve the accuracy and robustness of distance classification. These
findings contribute to the advancement of more accurate and reliable systems for
distance estimation and classification, which find crucial applications in fields such as
autonomous driving, object tracking, and traffic analysis.
45
6. CONCLUSION
In summary, this research paper presents an efficient model for vehicle detection,
distance estimation, and classification using image enhancement, faster R-CNN, and
scale-based distance estimation. rate. Test results prove the effectiveness of the
proposed method, achieving a real-time frame rate of 25fps. The integration of GPU
acceleration and the use of PyTorch CUDA version 7.0 contribute to the high
performance of the system. Additionally, the classifier model uses different classifiers
that allow accurate distance classification based on the bounding box scale index. The
proposed model provides an accuracy of 95 percent for vehicle detection and 72 percent
accuracy for distance estimation with an MAE of 11 percent which is better than
existing models.
6.1.1. FUTURE WORK

Future research should extend the range estimator to include vehicles coming from all
directions. This includes developing algorithms that consider multiple viewing angles,
such as front, back, and side, to accurately estimate distance in real-time scenarios.
Additionally, by exploring advanced deep learning architectures and integrating other
sensor modalities such as LiDAR and radar, we can further improve the accuracy and
robustness of the interval estimation system over there. Taken together, these advances
will contribute to the development of comprehensive and reliable driver assistance
systems that improve safety and efficiency in various transportation sectors.
46
ANNEXURE – A
KITTI Dataset Screenshot
KITTI Vision-Based Benchmark Dataset in CSV format
47
Image/Video Enhancement Output Screenshot
Vehicle Detection Output Screenshot
48
Classification report of Vehicle detection using Faster RCNN screenshot
Bounding Box Coordinates in CSV format screen
49
Distance Estimation & Classification Output Screenshot
Classification report of various classifiers used for classifying distance into

different labels output screenshot
50
Linear Regression used for finding difference between actual and predicted
distance screenshot
MAE values for different vehicle and non-vehicle class objects screenshot
51
Authors Research Publications
1. Srinivasa C, Dharun G K, Manohar N, Suresha R “Deep Learning-based Techniques for

Precise Vehicle Detection and Distance Estimation in Autonomous Systems”, 2023.
(Forwarded for publication)
52
BIBLIOGRAPHY
[Tam 2022] Tamang, B., Poudel, S., Bhandari, S., Damase, B., & Pande, S.D. (2022).
Customized Deep Learning Technique for vehicle detection along with speed
estimation (No. 7851). EasyChair.
[Vaj 2022] Vajgl, M., Hurtik, P., & Nejezchleba, T. (2022). Dist-YOLO: Fast Object
Detection with Distance Estimation. Applied Sciences, 12(3),1354.
[Man 2022] Manohar, N. (2022, November). Traffic Congestion Management based on

Vehicle Density Using Image Processing Techniques. In 2022 International
Conference on Futuristic Technologies (INCOFT) (pp. 1-4). IEEE
[Sur 2022] Suresha, R., Devika, K. M., & Prabhu, A. (2022, October). Support Vector
Machine Classifier-based Lung Cancer Recognition: A Fusion Approach. In
2022 International Conference on Edge Computing and Applications
(ICECAA) (pp. 1-8). IEEE.
[Cha 2021] Chandrajit, M., Rani, N. S., & Manohar, N. (2021, February). Robust
segmentation and classification of moving objects from surveillance video. In
IOP conference series: materials science and engineering (Vol. 1085, No. 1, p.
012009). IOP Publishing.
[Dha 2021] Dhyanjith, G., Manohar, N., & Raj, A. V. (2021, July). Helmet Detection Using
YOLO V3 And Single Shot Detector. In 2021 6th International Detection Using
YOLO V3 And Single Shot Detector. In 2021 6th International Conference on
Communication and Electronics Systems (ICCES) (pp. 1844-1848). IEEE.
[Hua 2021] Huang, K. C., Huang, Y. K., & Hsu, W. H. (2021). Multi-Stream Attention
Learning for Monocular Vehicle Velocity and Inter- Vehicle Distance
Estimation. arXiv preprint arXiv:2110.11608.
53
[Lin 2021] Lin, C. J., Jeng, S. Y., & Lioa, H. W. (2021). A real-time vehicle counting,
speed estimation, and classification system based on virtual detection zone and
YOLO. Mathematical Problems in Engineering, 2021, 1-10.
[Bou 2021] BOURJA, O., DERROUZ, H., ABDELALI, H. A., MAACH, A., THAMI, R.
O. H., & BOURZEIX, F. (2021). Real time vehicle detection,tracking, and inter-
vehicle distance estimation based on stereovision and deep learning using
yolov3. International Journal of Advanced Computer Science and Applications,
12(8).
[Aks 2021] Akshay, S., Abhishek, M. B., Sudhanshu, D., & Anuvaishnav, C. (2021,
August). Drowsy driver detection using eye-tracking through Machine learning.
In 2021 Second International Conference on Electronics and Sustainable
Communication Systems (ICESC) (pp. 1916-1923). IEEE.
[Zaa 2020] Zaarane, A., Slimani, I., Al Okaishi, W., Atouf, I., & Hamdoun, A. (2020).
Distance measurement system for autonomous vehicles using stereo camera.
Array, 5, 100016.
[Ma 2020] Ma, Q., Jiang, G., Lai, D., & Song, H. (2020). Fast, Accurate Vehicle Detection
and Distance Estimation. KSII Transactions on Internet & Information Systems,
14(2).
[Ali 2020] Ali, A., Hassan, A., Ali, A. R., Khan, H. U., Kazmi, W., & Zaheer, A. (2020). Real-
time vehicle distance estimation using single view geometry. In Proceedings of
the IEEE/CVF Winter Conference on Applications of Computer Vision (pp.
1111-1120).
[Sus 2020] Sushmitha, S., Satheesh, N., & Kanchana, V. (2020, June). Multiple car
detection, recognition and tracking in traffic. In 2020 International Conference
for Emerging Technology (INCET) (pp. 1-5). IEEE.
54
[Kim 2019] Kim, J. B. (2019). Efficient vehicle detection and distance estimation based on
aggregated channel features and inverse perspective mapping from a single
camera. Symmetry, 11(10), 1205.
[Liu 2019] Liu, Z., Lu, D., Qian, W., Ren, K., Zhang, J., & Xu, L. (2019). Vision‐ based inter‐
vehicle distance estimation for driver alarm system. IET Intelligent Transport
Systems, 13(6), 927-932.
[Qi S 2019] Qi, S. H., Li, J., Sun, Z. P., Zhang, J. T., & Sun, Y. (2019, February). Distance
estimation of monocular based on vehicle pose information. In Journal of
Physics: Conference Series (Vol. 1168, No. 3, p. 032040). IOP Publishing.
[Yil 2018] Yilmaz, A. A., Guzel, M. S., Askerbeyli, I., & Bostanci, E. (2018). A vehicle
detection approach using deep learning methodologies. arXiv preprint
arXiv:1804.00429.
[Lee 2018] Lee, Y., Ansari, I., & Shim, J. (2018). Rear-approaching vehicle detection using
frame similarity base on faster R-CNN. Int. J. Eng. Technol, 7, 177-180.
[Kim 2018] Kim, H., Boulougouris, E., & Kim, S. H. (2018, December). Object detection
algorithm for unmanned surface vehicle using faster R-CNN. In World Maritime
Technology Conference 2018.
[Som 2018] Sommer, L., Schuchert, T., & Beyerer, J. (2018). Comprehensive analysis of
deep learning-based vehicle detection in aerial images. IEEE Transactions on
Circuits and Systems for Video Technology, 29(9), 2733-2747.
[Ngu 2018] Nguyen, V. D., Tran, D. T., Byun, J. Y., & Jeon, J. W. (2018). Real- time vehicle
detection using an effective region proposal-based depth and 3-channel pattern.
IEEE Transactions on Intelligent Transportation Systems, 20(10), 3634-3646.
[Yab 2016] Yabo, A., Arroyo, S. I., Safar, F. G., & Oliva, D. (2016). Vehicle classification
and speed estimation using computer vision techniques. In XXV Congreso
Argentino de Control Automático (AADECA 2016)(Buenos Aires, 2016).
55
[Rez 2015] Rezaei, M., Terauchi, M., & Klette, R. (2015). Robust vehicle detection and
distance estimation under challenging lighting conditions. IEEE transactions
on intelligent transportation systems, 16(5), 2723-2743.
[Che 2015] Chen, Y. C., Su, T. F., & Lai, S. H. (2015). Integrated vehicle and
lane detection with distance estimation. In Computer Vision-ACCV 2014
Workshops: Singapore, Singapore, November 1-2, 2014, Revised Selected
Papers, Part III 12 (pp. 473-485). Springer International Publishing.
[Kim 2012] Kim, G., & Cho, J. S. (2012, October). Vision-based vehicle detection and inter-
vehicle distance estimation. In 2012 12th International Conference on Control,
Automation and Systems (pp. 625-629). IEEE.
56

Final Report

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final Report

Uploaded by

Copyright:

Available Formats

DEEP LEARNING-BASED TECHNIQUES FOR

PRECISE VEHICLE DETECTION AND DISTANCE

in partial fulfillment of the requirements for the award of the degree of

AMRITA SCHOOL OF COMPUTING

This is to certify that this dissertation report entitled “Deep Learning-based

Mr. Akshay ADWITIYA MUKHOPADHYAY

Internal Examiner: External Examiner

Wholeheartedly, I thank our respected Director, Br. Anantaananda Chaitanya and

We are pleased to acknowledge Mr. Adwitiya Mukhopadhyay, Chairperson,

List of Tables vii

1.1. Introduction to broad area of research 1

1.2. Introduction to specific area of research 2-3

1.6. Organisation of Report 8-9

1.7. Overall Block Diagram 10

2.1 Literature Review 11-16

2.1.2. Observations Table 16-18

2.3. Motivation 18-20

3. Image/ Video Pre-processing

3.2. Proposed Methodology 22

3.2.1. Bilinear Interpolation 24

3.2.2. Autoencoder-decoder EfficientNet B0 26

3.3. Experimental Analysis 27

3.4. Dataset Description 27

3.5. Results of Image/Video Pre-processing 27-29

4.2. Proposed Methodology 33

4.2.1. Faster RCNN ( Resnet-50) As Backbone For Vehicle Detection 34-35

4.3. Results of Vehicle Detection using Faster R-CNN 35-36

5. Distance Estimation and Classification

5.2. Proposed Methodology 38

5.2.2. Classifying the distance using various classifiers 40-41

5.2.3. Linear Regression for calculating MAE 42

Figure 1.7 Overall block diagram 17

Figure 3.1.1. Architecture of Image/video enhancement process 21

Figure 3.2.1.1 Architecture of Autoencoder-decoder 22

Figure 3.2.1.2. Architecture of EfficientNet-B0 23

Figure3.5.1. Output of Enhancement Process 24

Figure 4.2.1. Architecture of Faster RCNN with RESNET-50 as the backbone 27

Figure 5.4.1. Performance Comparison of Classifiers for Distance Classification 32

Figure 5.4.2. Comparison of MAE Values among Different Classifiers at Various

Split Ratios using Bar Graph 34

Table 2.1.2. Literature Review Observations 14

Table 4.6.1. Classification report of vehicle detection using Faster RCNN 30

Abbreviation Full Form

1.1. INTRODUCTION TO DEEP LEARNING TECHNIQUES IN

This research aims to overcome the limitations of traditional computer vision

1.2. INTRODUCTION TO SPECIFIC AREA OF RESEARCH

1.3. INTRODUCTION TO BACKGROUND OF THE PROBLEM OF

Different lighting conditions present another challenge. Changes in lighting due to

To overcome these challenges, deep learning approaches have emerged as a promising

 Develop a vehicle detection model based on deep learning.

 Consider image enhancement techniques to improve performance.

 Implement a faster R-CNN model for scale-based distance estimation.

Research focuses on building faster R-CNN models to calculate the distance to

 Develop a distance classification module.

 Evaluate and analyze the performance of the proposed approach.

1.5. APPLICATIONS AND CONTRIBUTIONS

 Advanced Driver Assistance Systems (ADAS):

 Traffic monitoring and surveillance:

 Intelligent Transport System (ITS):

The proposed approach can contribute to a broader area of intelligent transportation

 Robustness and Adaptability:

 Further development of deep learning techniques:

1.6. Organisation of Report