You are on page 1of 17

3D OBJECT

DETECTION
for autonomous driving
(2D wasn't enough)
Deepshikha Biswas

Team: CSE-3 , 3rd year


18700120114

Nerd Herd Aakansha Prasad


CSE-3 , 3rd year
18700120123

Sinjini Hom Roy


CSE-3 , 3rd year
18700120124

Maaz Shahid
CSE-3 , 3rd year
18700120161
Introduction
Background Study

Objective

Literature Review

Architecture

Content Data Collection

Data processing

Detection

Advantages and Disadvantages

Proposed Methodology

Future scope​

Conclusion
Introduction
Autonomous driving is gaining attention for its potential to improve driving
safety and reduce driver burden. Crucial to the perception system of
autonomous vehicles is 3D object detection, which involves predicting the
sizes, locations, and categories of nearby objects. As autonomous driving
technology advances, the need for more advanced and precise object
detection is growing. The Birds-Eye-View(BEV) method uses a top-down view
to convert LiDAR point cloud data into a 2D image representation,
simplifying object detection. This technique has garnered attention from
researchers for its promising results in accurately detecting and recognizing
objects in real-world driving scenarios.​
Background Study
With the rise of autonomous driving technology,
there is a growing demand for efficient and
reliable object detection systems to identify
objects in real-time. LiDAR sensors are frequently
used for 3D object detection due to their high-
resolution 3D point cloud data, but processing
large amounts of this data can be computationally
expensive.
The BEV method overcomes this challenge by
transforming the point cloud data into a 2D image
representation, which is more computationally
efficient for object detection.
Objective
Our aim is to improve the accuracy and
efficiency of existing architecture of 3D
object detection for autonomous
driving. We plan to do so by merging
LIDAR based tracking algorithm along
with RADAR based tracking algorithm.
Literature Review
Author Year Title Method Findings

Complex-YOLO: An Euler-Region- Implements E-RPN which eliminates closed No external Cameras, Radars were
Martin Simon, Stefan Milz, Karl
Amende, Horst-Michael Gross
2018 Proposal for Real-time 3D Object complex space and avoids singularities, required. A stable 50 FPS was achieved on
Detection on Point Clouds which occur by single angle estimations. GTX Titan X GPU.

The proposed framework consists of two


The method has completely eliminated the
PointRCNN: 3D Object Proposal stages, the first stage aims at generating
Shaoshuai Shi, Xiaogang Wang, use of RGB cameras and only uses LiDAR
Hongsheng Li
2019 Generation and Detection from Point 3D bounding box proposal in a bottom up
data. It also reched avg of 85.94 Average
Cloud scheme. The second stage of PointRCNN
Precision (AP) points
conducts canonical 3D box refinement.

VoxelNet takes the first lead to propose an


The authors conducted a survey on
A Survey on 3D Object Detection end-to-end trainable network via learning
Eduardo Arnold, Omar Y. Al-Jarrah, different image based methods as well as
Mehrdad Dianati, Saber Fallah
2019 Methods for Autonomous Driving
point cloud based and multimodal
an informative 3D volumetric
Applications representation instead of manual feature
methods.
engineering as most previous works do.

The author proposed that BEV images are


BEVDetNet: Bird’s Eye View LiDAR Helped develop detection frameworks
Sanbit Mohapatra, Senthil Yogamani, useful for representing LiDAR data in 2D
Point Cloud based Real-time 3D which leverage multi-sensor fusion in a
Heinrich Gotzig, Stefan Milz, Patrick 2020 Object Detection for Autonomous
grid, with non-overlapping object
recurrent fashion along with RGB-based
Mader locations. BirdNet+ method is used for BEV
Driving recurrent object detection
image construction at 512x256 resolution
Architecture
Complex​YOLO V4​
Architecture​

The data from the LiDAR is stored in the form a


3D Point Cloud Map.
The generated map is then converted into Birds-
Eye-View (BEV) by only selecting the relevant 2D
axes of each point.
A CNN is used to identify the objects in the BEV.
Finally a 3D box is displayed around the 2D Image
1. Point Cloud To Birds- 2. Complex YOLO on 3. 3D Bounding box
Eye-View conversion Birds-Eye-View reconversion

3D Point Cloud Bird Eye View Conversion Output Bird Eye View

Final Output Frame


Data Collection
The LiDAR sensor mounted on the autonomous
vehicle collects data in the form of a set of 3D
points.
The KITTI dataset, recorded using two high-
resolution colour and grayscale cameras trains the
Complex YOLO V4 model.
GPS localization system and a 360° Velodyne laser
scanner provides a precise reference point for
accurate ground truth.
Data Processing
The point cloud data is preprocessed by
converting it into a 2D top-down view of the
environment using a Birds-Eye-View (BEV)
representation.
The BEV is divided into a grid of cells, and
representative points are sampled, reducing
computational complexity of subsequent steps.
Sampled points' characteristics are
represented through features learned by CNNs
trained on annotated point clouds Datasets.
3D object detection is helpful in scene flow and

Detection flow prediction of objects around it

The CNN-based detector employs the learned


features to forecast the size, position, and
category of objects in the environment.
The location and size estimates of the
detected objects are refined, by post-
processing the detection.​
The system produces a list of detected
objects with size, location, and class for
autonomous driving.
Some Points to Consider​

Advantages Disadvantages
Highly Accurate Data of the environment is High Computational cost compared to other
present. methods such as using radars.​

Data points of vehicles, pedestrians and other


obstacles have a very high resolution LiDAR's does not work in foggy environment as
compared to those of radars and other the light gets scattered.
sensors.​

Environment scanning is fast.​ The hardware is bulky and expensive.


Proposed Methodology
POINT
COMPLEX YOLO V4 1st FRAME
CLOUD
3D OBJ ALGO GENERATED
DATA

RADAR RADAR DETECTION


n frames DATA ALGORITHM

3D BOUNDING BOX
CONSTRUCTION​
Future Scope

With increased interest in autonomous driving, there is a


greater need for accurate and efficient LiDAR-based 3D
object detection. Future research aims to improve the
accuracy and efficiency of the detection method and
enhance its robustness and generalizability by collecting
diverse datasets and integrating additional sensors such
as cameras and radar to improve detection performance
and provide LiDAR with supplementary data.​
Conclusion
In conclusion, 3D object detection is a rapidly evolving field with various applications in
industries such as autonomous driving, robotics, augmented reality, and manufacturing.
The advancement of deep learning-based algorithms and sensor technologies has led
to increased accuracy, efficiency, and applicability.
The future of 3D object detection is promising, with expectations of further
advancements in accuracy, speed, and new applications. Continued development of this
technology has the potential to revolutionize many industries and improve safety and
efficiency across a broad range of applications.​
Thank you!

You might also like