Dark Green Light Green White Corporate Geometric Company Internal Deck Business Presentation

3D OBJECT
DETECTION
for autonomous driving
(2D wasn't enough)
Deepshikha Biswas
Team: CSE-3 , 3rd year

18700120114
Nerd Herd Aakansha Prasad

CSE-3 , 3rd year
18700120123
Sinjini Hom Roy

CSE-3 , 3rd year
18700120124
Maaz Shahid
CSE-3 , 3rd year
18700120161
Introduction
Background Study
Objective
Literature Review
Architecture
Content Data Collection
Data processing
Detection
Advantages and Disadvantages
Proposed Methodology
Future scope
Conclusion
Introduction
Autonomous driving is gaining attention for its potential to improve driving
safety and reduce driver burden. Crucial to the perception system of
autonomous vehicles is 3D object detection, which involves predicting the
sizes, locations, and categories of nearby objects. As autonomous driving
technology advances, the need for more advanced and precise object
detection is growing. The Birds-Eye-View(BEV) method uses a top-down view
to convert LiDAR point cloud data into a 2D image representation,
simplifying object detection. This technique has garnered attention from
researchers for its promising results in accurately detecting and recognizing
objects in real-world driving scenarios.
Background Study
With the rise of autonomous driving technology,
there is a growing demand for efficient and
reliable object detection systems to identify
objects in real-time. LiDAR sensors are frequently
used for 3D object detection due to their high-
resolution 3D point cloud data, but processing
large amounts of this data can be computationally
expensive.
The BEV method overcomes this challenge by
transforming the point cloud data into a 2D image
representation, which is more computationally
efficient for object detection.
Objective
Our aim is to improve the accuracy and
efficiency of existing architecture of 3D
object detection for autonomous
driving. We plan to do so by merging
LIDAR based tracking algorithm along
with RADAR based tracking algorithm.
Literature Review
Author Year Title Method Findings
Complex-YOLO: An Euler-Region- Implements E-RPN which eliminates closed No external Cameras, Radars were
Martin Simon, Stefan Milz, Karl
Amende, Horst-Michael Gross
2018 Proposal for Real-time 3D Object complex space and avoids singularities, required. A stable 50 FPS was achieved on
Detection on Point Clouds which occur by single angle estimations. GTX Titan X GPU.
The proposed framework consists of two

The method has completely eliminated the
PointRCNN: 3D Object Proposal stages, the first stage aims at generating
Shaoshuai Shi, Xiaogang Wang, use of RGB cameras and only uses LiDAR
Hongsheng Li
2019 Generation and Detection from Point 3D bounding box proposal in a bottom up
data. It also reched avg of 85.94 Average
Cloud scheme. The second stage of PointRCNN
Precision (AP) points
conducts canonical 3D box refinement.
VoxelNet takes the first lead to propose an

The authors conducted a survey on
A Survey on 3D Object Detection end-to-end trainable network via learning
Eduardo Arnold, Omar Y. Al-Jarrah, different image based methods as well as
Mehrdad Dianati, Saber Fallah
2019 Methods for Autonomous Driving
point cloud based and multimodal
an informative 3D volumetric
Applications representation instead of manual feature
methods.
engineering as most previous works do.
The author proposed that BEV images are

BEVDetNet: Bird’s Eye View LiDAR Helped develop detection frameworks
Sanbit Mohapatra, Senthil Yogamani, useful for representing LiDAR data in 2D
Point Cloud based Real-time 3D which leverage multi-sensor fusion in a
Heinrich Gotzig, Stefan Milz, Patrick 2020 Object Detection for Autonomous
grid, with non-overlapping object
recurrent fashion along with RGB-based
Mader locations. BirdNet+ method is used for BEV
Driving recurrent object detection
image construction at 512x256 resolution
Architecture
ComplexYOLO V4
Architecture
The data from the LiDAR is stored in the form a

3D Point Cloud Map.
The generated map is then converted into Birds-
Eye-View (BEV) by only selecting the relevant 2D
axes of each point.
A CNN is used to identify the objects in the BEV.
Finally a 3D box is displayed around the 2D Image
1. Point Cloud To Birds- 2. Complex YOLO on 3. 3D Bounding box
Eye-View conversion Birds-Eye-View reconversion
3D Point Cloud Bird Eye View Conversion Output Bird Eye View
Final Output Frame

Data Collection
The LiDAR sensor mounted on the autonomous
vehicle collects data in the form of a set of 3D
points.
The KITTI dataset, recorded using two high-
resolution colour and grayscale cameras trains the
Complex YOLO V4 model.
GPS localization system and a 360° Velodyne laser
scanner provides a precise reference point for
accurate ground truth.
Data Processing
The point cloud data is preprocessed by
converting it into a 2D top-down view of the
environment using a Birds-Eye-View (BEV)
representation.
The BEV is divided into a grid of cells, and
representative points are sampled, reducing
computational complexity of subsequent steps.
Sampled points' characteristics are
represented through features learned by CNNs
trained on annotated point clouds Datasets.
3D object detection is helpful in scene flow and
Detection flow prediction of objects around it
The CNN-based detector employs the learned

features to forecast the size, position, and
category of objects in the environment.
The location and size estimates of the
detected objects are refined, by post-
processing the detection.
The system produces a list of detected
objects with size, location, and class for
autonomous driving.
Some Points to Consider
Advantages Disadvantages
Highly Accurate Data of the environment is High Computational cost compared to other
present. methods such as using radars.
Data points of vehicles, pedestrians and other

obstacles have a very high resolution LiDAR's does not work in foggy environment as
compared to those of radars and other the light gets scattered.
sensors.
Environment scanning is fast. The hardware is bulky and expensive.

Proposed Methodology
POINT
COMPLEX YOLO V4 1st FRAME
CLOUD
3D OBJ ALGO GENERATED
DATA
RADAR RADAR DETECTION

n frames DATA ALGORITHM
3D BOUNDING BOX
CONSTRUCTION
Future Scope
With increased interest in autonomous driving, there is a

greater need for accurate and efficient LiDAR-based 3D
object detection. Future research aims to improve the
accuracy and efficiency of the detection method and
enhance its robustness and generalizability by collecting
diverse datasets and integrating additional sensors such
as cameras and radar to improve detection performance
and provide LiDAR with supplementary data.
Conclusion
In conclusion, 3D object detection is a rapidly evolving field with various applications in
industries such as autonomous driving, robotics, augmented reality, and manufacturing.
The advancement of deep learning-based algorithms and sensor technologies has led
to increased accuracy, efficiency, and applicability.
The future of 3D object detection is promising, with expectations of further
advancements in accuracy, speed, and new applications. Continued development of this
technology has the potential to revolutionize many industries and improve safety and
efficiency across a broad range of applications.
Thank you!

Dark Green Light Green White Corporate Geometric Company Internal Deck Business Presentation

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dark Green Light Green White Corporate Geometric Company Internal Deck Business Presentation

Uploaded by

Copyright:

Available Formats

3D OBJECT

Team: CSE-3 , 3rd year

Nerd Herd Aakansha Prasad

Sinjini Hom Roy

Content Data Collection

Advantages and Disadvantages

The proposed framework consists of two

VoxelNet takes the first lead to propose an

The author proposed that BEV images are

The data from the LiDAR is stored in the form a

Final Output Frame

Detection flow prediction of objects around it

The CNN-based detector employs the learned

Data points of vehicles, pedestrians and other

Environment scanning is fast. The hardware is bulky and expensive.

RADAR RADAR DETECTION

With increased interest in autonomous driving, there is a

You might also like

Dark Green Light Green White Corporate Geometric Company Internal Deck Business Presentation

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dark Green Light Green White Corporate Geometric Company Internal Deck Business Presentation

Uploaded by

Copyright:

Available Formats

3D OBJECT

Team: CSE-3 , 3rd year

Nerd Herd Aakansha Prasad

Sinjini Hom Roy

Content Data Collection

Advantages and Disadvantages

The proposed framework consists of two

VoxelNet takes the first lead to propose an

The author proposed that BEV images are

The data from the LiDAR is stored in the form a

Final Output Frame

Detection flow prediction of objects around it

The CNN-based detector employs the learned

Data points of vehicles, pedestrians and other

Environment scanning is fast.​ The hardware is bulky and expensive.

RADAR RADAR DETECTION

With increased interest in autonomous driving, there is a

You might also like

Environment scanning is fast. The hardware is bulky and expensive.