Senior - RP 2

111Equation Chapter 1 Section 1VIETNAM NATIONAL UNIVERSITY
– HOCHIMINH CITY
INTERNATIONAL UNIVERSITY
SCHOOL OF ELECTRICAL ENGINEERING
3D Object Detection from Images for

Autonomous Driving
NGUYỄN HOÀNG DUY BẢO _ EEACIU17036
A SENIOR PROJECT SUBMITTED TO THE SCHOOL OF ELECTRICAL

ENGINEERING IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
DEGREE OF BACHELOR OF ELECTRICAL ENGINEERING
HO CHI MINH CITY, VIET NAM
2022
ii
3D Object Detection from Images for Autonomous Driving
BY
NGUYEN HOANG DUY BAO
Under the guidance and approval of the committee, and approved by its members, this
thesis has been accepted in partial fulfillment of the requirements for the degree.
Approved:
________________________________
Chairperson
_______________________________
Committee member
________________________________
Committee member
________________________________
Committee member
________________________________
Committee member
i
HONESTY DECLARATION
My name is Nguyen Hoàng Duy Bảo. I would like to declare that, apart from the
acknowledged references, this thesis either does not use language, ideas, or other original
material from anyone; or has not been previously submitted to any other educational and
research programs or institutions. I fully understand that any writings in this thesis
contradicted to the above statement will automatically lead to the rejection from the EE
program at the International University – Vietnam National University Hochiminh City.
Date:
Student’s Signature
(Full name)
ii
TURNITIN DECLARATION
Name of Student: Nguyễn Hoàng Duy Bảo
Date: 08/06/2022
Advisor Signature Student Signature
iii
ACKNOWLEGMENT
It is with deep gratitude and appreciation that I acknowledge the professional
guidance of Dr. Nguyen Ngoc Hung. His constant encouragement and support helped me
to achieve my goal.
………………………
iv
TABLE OF CONTENTS
HONESTY DECLARATION.............................................................................................ii
TURNITIN DECLARATION............................................................................................iii
ACKNOWLEGMENT.......................................................................................................iv
TABLE OF CONTENTS....................................................................................................v
LIST OF TABLES............................................................................................................vii
LIST OF FIGURES..........................................................................................................viii
ABBREVIATIONS AND NOTATIONS..........................................................................ix
ABSTRACT........................................................................................................................x
CHAPTER I INTRODUCTION.........................................................................................1
1.1. Level-2 Titl..............................................................................................................1
1.1.1. Level-3 title......................................................................................................1
1.1.1.1. Level-4 title...............................................................................................1
1.1.1.1.1. Level-5 title.......................................................................................1
CHAPTER II DESIGN SPECIFICATIONS AND STANDARDS...................................3
2.1. Design Specifications and Standards.......................................................................3
2.2. Realistic Constraints................................................................................................3
CHAPTER III PROJECT MANAGEMENT......................................................................5
3.1. Budget and Cost Management Plan.........................................................................5
v
3.2. Project Schedule.......................................................................................................5
3.3. Resource Planning....................................................................................................5
CHAPTER IV LITERATURE REVIEW............................................................................6
CHAPTER V METHODOLOGY.......................................................................................7
CHAPTER VI EXPECTED RESULTS..............................................................................8
CHAPTER VII CONCLUSION AND FUTURE WORK..................................................9
REFERENCES..................................................................................................................10
APPENDICES...................................................................................................................11
vi
LIST OF TABLES
Table 1.1..........................................................................................................................1
Table 1.2 .........................................................................................................................2
Table 2.1..........................................................................................................................3
Table 2.2 .........................................................................................................................5
Table 2.3........................................................................................................................12
Table 2.4 .......................................................................................................................22
vii
LIST OF FIGURES
Figure 1.1..........................................................................................................................1
Figure 1.2..........................................................................................................................1
Figure 2.1..........................................................................................................................6
Figure 2.2........................................................................................................................13
Figure 2.3........................................................................................................................22
Figure 3.1........................................................................................................................26
viii
ABBREVIATIONS AND NOTATIONS
PAN: Personal Area Network
MAC: Medium Access Control
SAE: The Society of Automotive Engineers
AVS: Autonomous vehicles
SC: Stereo CenterNet
3D: Three-dimensional
2D: Two-dimensional
ix
ABSTRACT
Though the three-dimensional (3D) detection based on stereo images has advanced
greatly in recent years, most advanced methods still rely on anchor-based two-
dimensional (2D) detection or depth estimation to solve this problem. However, the high
computational cost of these methods prevents them from achieving real-time
performance. Stereo CenterNet (SC), a 3D object detection method based on the
structural information in stereo visual images, is proposed in this study. SC predicts the
four semantic key points of the 3D bounding box of the object in space and restores the
bounding box of the object in 3D space using 2D left and right boxes, 3D dimension,
orientation, and key points. Compared to bounding box-based detectors, CenterNet is
end-to-end differentiable, simpler, faster, and more accurate.
x
CHAPTER I
INTRODUCTION
1.1 Overview
Object detection is a computer vision task that is now used in a wide range of consumer
applications, including surveillance and security systems, autonomous driving, mobile text
recognition, and disease diagnosis using MRI/CT scans. The objective of this senior project is
3D Object Detection from Images for Autonomous Driving
Autonomus driving has the potential to significantly improve people's lives by reducing
travel time, energy consumption, and emissions. One of the most important aspects of
autonomous driving is 3D object detection. As a result, both research and industry have made
significant efforts in the last decade to develop self-driving vehicles. 3D object detection has
gotten a lot of attention as one of the key enabling technologies for autonomous driving. Deep
learning-based 3D object detection approaches have recently gained popularity.
One of the most important components for autonomous driving is object detection. To
ensure safe and reliable driving, autonomous vehicles rely on perception of their surroundings.
Object detection algorithms are used by this perception system to accurately determine objects in
the vehicle's vicinity, such as pedestrians, vehicles, traffic signs, and barriers. Object detectors
based on deep learning are critical for finding and localizing these objects in real time. The
current state of object detectors is discussed in this article, as well as the challenges that remain
for their integration into autonomous vehicles.
1
1.2 Objectives
The main goals of this senior project is studying 3D Object Detection from Images for
Autonomous Driving with Deep Learning and Computer Vision. Then suggest suitable models
that can be used for real applications. Finally, create a training and testing model and run demo
to see if can work properly.
1.3. Report organization
The organization of this thesis report is described below:
 Chapter I: An introduction to the objective motivation of this thesis is stated. Chapter
II: Constructed specification and standards of the projects will be stated.
 Chapter III: The management, planning, project schedules, and resource planning
involved in the thesis project will be enlightened.
 Chapter IV: The review of popular methods will be described
 Chapter V: The proposed methods applied in this project will be stated.
 Chapter VI: The result of the project will be discussed.
 Chapter VII: The conclusion of the project and the recommendation.
2
Chapter II
Design Specification and Standards
2.1. Hardware description
Edition Ubuntu 16.04
Processor Processor Intel(R) Core(TM) I5-10400F CPU @ 2.9 GHz 4.3GHz
Installed Ram Installed RAM 16.0 GB
System type System type 64-bit operating system, x64 based processor
2.2. Software installation
All the experiment in this study had been conducted from the personal computer of
student Duy Bảo. The computer is installed in the latest version of Anaconda, all the experiments
are run on the platform called Jupyter Notebook, which is provided by Anaconda. There are also
some libraries must be installed in order to make this study works:
 OpenCv
 Cython
 Numba
 Matplotlib
 Scipy
3
CHAPTER III
PROJECT MANAGEMENT
3.1. Budget and Cost Management Plan
The total cost of this study is about 15 million VND due to the personal computer
3.2 Project Schedule
The project has 3 parts:
 The first part: finding and understanding all the methods (7 weeks)
 The second part: building, fixing and compiling the model (4 weeks)
 The last part is getting the result from the model and writing report. (4 weeks)
22Equation Section (Next)Chaper IV
LITERATURE REVIEW
4.1. Autonomous Car
Autonomous vehicles (AVs) are vehicles that use technology to partially or completely
replace a human driver in navigating a vehicle from point A to point B while avoiding road
hazards and reacting to traffic conditions. The Society of Automotive Engineers (SAE) has
created a six-level classification system based on the amount of human intervention. This
4
classification system is used by the National Highway Traffic Safety Administration (NHTSA)
in the United States.
Figure 1: Levels of driving automation

4.1.1. Background
The ideal of self-driving vehicle was introduced by General Motors in the year 1939. The
self-driving car has undergone a complete transformation since then and it is an autonomous
vehicle. To drive without human involvement, an autonomous vehicle employs a mix of sensors,
artificial intelligence, radars, and cameras. This sort of vehicle is still in the development stage
since numerous components must be considered to ensure the safety of its passengers.
The priority of the Avs is recognize objects. There are various moving as well as
stationary objects on the road like pedestrians, other vehicles, traffic lights, and more. To avoid
accidents or collisions while driving, the vehicle needs to identify various objects. Autonomous
vehicles use sensors and cameras to collect data and make 3D maps. This helps to identify and
detect objects on the road while driving and makes it safe for its passengers.
5
Deep neural network-based 3D object identification systems are becoming a key
component of self-driving cars. 3D object recognition helps to understand of the geometry of
physical objects in 3D space, which is necessary for predicting future object movements. While
image-based 2D object recognition and instance segmentation have made significant advances,
3D object detection has received less attention in the literature.
Detecting three-dimensional (3D) objects is an essential but difficult job in many
domains, including robotics and autonomous driving. Several current mainstream 3D detectors
rely on light detection and ranging (LiDAR) sensors to gather accurate 3D information, and the
use of LiDAR data is seen to be critical to the success of 3D detectors. Despite their significant
success and the emergence of low-cost LiDAR research, it is critical to note that LiDAR still
confronts a few problems, including its high cost, short life, and restricted perception. In
contrast, stereo cameras, which operate in a way similar to human binocular vision, are less
expensive and have greater resolutions. Hence, they have received significant interest in
academia and business.
The fundamental theoretical understanding of 3D detection based on stereo
pictures may be traced back to triangulation and the prospective n-point issue (pnp). Because of
the advent of large datasets, 3D pose estimation joined the field of object detection. Machine
learning-based approaches have been widely used in practical engineering to date. However,
these approaches have restricted the capacity to search for information in 3D space without
requiring extra information; as a result, their accuracy is unlikely to match that of deep learning-
based methods.
Up to now, 3D Object Detection from Images for Autonomous Driving
categorization methods are:
6
+ Methods based on 2D features: This methods use the input RGB images, they begin by
estimating 2D positions, orientations, and dimensions and then reconstruct the 3D locations from
the 2D features. As a result, these approaches are often known as'result lifting-based methods.
+ Methods based on 3D features: The fundamental advantage of these approaches is that
they first build 3D features from photos before immediately estimating all elements in the
3D bounding boxes, including 3D positions, in 3D space. The technique is classified into
2 categories: Feature Lifting-based Methods, Data Lifting-based Methods. Figure 4.1.1
below describes 3D Object Detection from Images for Autonomous Driving
categorization.
Feature lifting
Methods based
on 3D features
Taxonomies Data lifting
Methods based
on 2D features
[result lifting]
Firgure 4.1.1: 3D Object Detection from Images for Autonomous Driving
4.1.2 How do Autonomous vehicles working?

Sensors, actuators, complex algorithms, machine learning systems, and powerful
processors are used to run software in autonomous vehicles.
7
Based on a variety of sensors located throughout the vehicle, autonomous cars create and
maintain a map of their surroundings. Radar sensors keep an eye on the movement of nearby
vehicles. Traffic lights are detected by video cameras, which also read road signs, track other
vehicles, and look for pedestrians. Lidar sensors measure distances, detect road edges, and
identify lane markings by bouncing light pulses off the car's surroundings.
After processing all of this sensory data, sophisticated software plots a path and sends
commands to the car's actuators, which control acceleration, braking, and steering.
The software follows traffic rules and navigates obstacles thanks to hard-coded rules,
obstacle avoidance algorithms, predictive modeling, and object recognition.
4.1.3. Challenging of self-driving cars?
Fully autonomous (Level 5) vehicles are being tested in a number of locations around the
world, but none are currently available to the general public. The difficulties range from
technological to environmental. In this article, i would like to introduce the solution to the vision
problems for Avs. That is 3D Object Detection from Images for Autonomous Driving.
4.2 3D Object Detection from Images for Autonomous Driving concept
AUTONOMOUS driving has the potential to transform people's lives by increasing
mobility while decreasing travel time, energy consumption, and emissions. Unsurprisingly, both
academics and business have made major efforts in the recent decade to build self-driving
automobiles. 3D object detection has gotten a lot of attention as one of the major enabling
technologies for autonomous driving. Deep learning-based 3D object identification techniques
have recently gained favor.
8
Existing 3D object identification methods may be divided into two groups based on
whether the incoming data is pictures or LiDAR signals. Approaches that estimate 3D bounding
boxes from photos alone confront a far bigger difficulty than LiDAR-based algorithms, since
recovering 3D information from 2D input data is an ill-posed issue. Despite this inherent
complexity, image-based 3D object identification approaches have developed in the computer
vision field during the last six years. More than 80 publications in this domain have been
published in top-tier conferences and journals, delivering notable improvements in detection
accuracy and inference speed.
4.2.1 The overview

The purpose of image-based 3D object recognition is to categorize and locate the
items of interest given the RGB pictures and corresponding camera settings. In the 3D world
space, each item is represented by its category and bounding box.
3D bounding box is parameterize by:
 Location [x,y,z]
 Dimension [h,w,l]
 Orientation [θ, φ, ψ]
Only the heading angle θ around the up-axis (yaw angle) is evaluated in most
autonomous driving scenarios.
9
Figure 2: 3D bounding box
4.2.2 3D Object Detection from Images for Autonomous Driving methods

Up to now, all the methods above are improvisations on top of five basic ideas: TLNet,
3DOP, Stereo-CNN, RTS3D, Pseudo-LiDAR, [1]
4.2.2.1
CHAPTER V
METHODOLOGY
5.1.2. Stereo Centernet
5.1.2.1. Stereo 3D components

We added three regression branches to create a 3D bounding box [d,o,v]
With:
 d: denotes the length, the width and height of 3D box of the object [ L, W , H ]T
 o: the orientation of each object
 v: the bottom vertex of the 3D box as the key point.
T
To calculate the average of 3D dimensions of a category in the entire data set [ L ,W , H ] ,
regressing to the true value and the offset between the prior size:
T T
of f dim =2∗( [ L , W , H ] −[ L ,W , H ] )
10
During inference, the dimensions estimator was trained using an L1 loss and the
predicted size offset was used to recover the size of each item.
For orientation, car’s local direction α instead of yaw rotation Ɵ. Feature map uses eight
scalars to indicate orientation.
The orientation were trained with the L1 loss. Subsequently, we utilized α and the object
position to restore the yaw angle θ.
Figure1: Aerial view, the same car in different perspectives: A, B, C, D
To construct more stringent constraints, this method predicted the four vertices at the
bottom of the 3D bounding box then solely performed keypoint detection on the left image.
Wi
∗hi
R
Following the Gaussian kernel to generate a ground truth vertex heat map: ∗4
V ∈ [ 0,1 ] R
11
2 2
−( x−x v ) + ( y− y v )
V xyv=exp ( 2
)
2σv
( Gaussian kernel )
To improve the accuracy of the key points, we regressed the downsampling offset
Wi
∗hi
R
R
∗2 of each vertex.
Fv ∈ R
To correlate the vertices with the center of the left image, we also returned the distance
Wi
∗hi
R
R from the main center to each vertex, and both the vertex offset and vertex distance
∗8
Fv∈ R
applied the L1 loss. We define the total loss of multitasking:
L=ωm Lm+ ωoff Loff + ωdis Ldis +ω¿ L ¿
¿+ωW L W +ωdim L dim+ ωo Lo + Lv Lv + ωoff Lof f + ωdis L dis ¿
r r v v v v
Where Lo , Lv , Loff and Ldi s represnts orientation, vertex coordinate, vertex coordinate offset, and
v
vertex coordinate distance, respectively. For the parameter ω before each item, we adopted
uncertainty instead of manual tuning.
5.1.2.2. Stereo 3D box Estimation
Following predicted the results in 5.1.2.1 , seven key values

' '
Z={( u l , vl ) , ( ur , v r ) , ul ,u r , x v } represent the upper left and lower right coordinate of the 2D box of
the left image, the left and right abscissas of the 2D box in the right image, and the abscissa of
the perspective key vertex.
Figure 2: 3D box estimation with stereo key-point constraints
12
As shown in Figure 5, a set of constraint equations may be created using the sparse
geometric link between 2D and 3D.
Where:
{w , h∧l:regression
b :represent thebaseline length of the stereo cameras
'
¿ ¿ x , y , z :cordinates of the 3 Dbounding bo x s center point .
Chaper VI
Results
5.1.2. DataSet
StereoCenternet was performance on the popular KITTI 3D object detection dataset,

which contain 7481 trainning images and 7518 testing images. Based on 3DOP
KITTI classifies objects into three levels: Easy, Modertate and Hard.
In this method, original image 1280 x 384 are using for trainning and testing.
13
Figure 6.1. Sample images of the dataset
5.1. Results
The given images:
The result after applied the models
14
15
[1] G. Brazil, G. Pons-Moll, X. Liu, and B. Schiele, “Kinematic 3d object detection in monocular video,” in
ECCV, 2020.
16

Senior - RP 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Senior - RP 2

Uploaded by

Copyright:

Available Formats

111Equation Chapter 1 Section 1VIETNAM NATIONAL UNIVERSITY

3D Object Detection from Images for

NGUYỄN HOÀNG DUY BẢO _ EEACIU17036

A SENIOR PROJECT SUBMITTED TO THE SCHOOL OF ELECTRICAL

NGUYEN HOANG DUY BAO

program at the International University – Vietnam National University Hochiminh City.

Name of Student: Nguyễn Hoàng Duy Bảo

Advisor Signature Student Signature

It is with deep gratitude and appreciation that I acknowledge the professional

ABBREVIATIONS AND NOTATIONS..........................................................................ix

1.1. Level-2 Titl..............................................................................................................1

1.1.1. Level-3 title......................................................................................................1

1.1.1.1. Level-4 title...............................................................................................1

1.1.1.1.1. Level-5 title.......................................................................................1

CHAPTER II DESIGN SPECIFICATIONS AND STANDARDS...................................3

2.1. Design Specifications and Standards.......................................................................3

2.2. Realistic Constraints................................................................................................3

CHAPTER III PROJECT MANAGEMENT......................................................................5

3.1. Budget and Cost Management Plan.........................................................................5

3.3. Resource Planning....................................................................................................5

CHAPTER IV LITERATURE REVIEW............................................................................6

CHAPTER VI EXPECTED RESULTS..............................................................................8

CHAPTER VII CONCLUSION AND FUTURE WORK..................................................9

Table 1.2 .........................................................................................................................2

Table 2.2 .........................................................................................................................5

Table 2.4 .......................................................................................................................22

PAN: Personal Area Network

MAC: Medium Access Control

SAE: The Society of Automotive Engineers

AVS: Autonomous vehicles

SC: Stereo CenterNet

computational cost of these methods prevents them from achieving real-time

performance. Stereo CenterNet (SC), a 3D object detection method based on the

orientation, and key points. Compared to bounding box-based detectors, CenterNet is

end-to-end differentiable, simpler, faster, and more accurate.

3D Object Detection from Images for Autonomous Driving

learning-based 3D object detection approaches have recently gained popularity.

for their integration into autonomous vehicles.

to see if can work properly.

1.3. Report organization

The organization of this thesis report is described below:

 Chapter I: An introduction to the objective motivation of this thesis is stated. Chapter

II: Constructed specification and standards of the projects will be stated.

involved in the thesis project will be enlightened.

 Chapter IV: The review of popular methods will be described

 Chapter V: The proposed methods applied in this project will be stated.

 Chapter VI: The result of the project will be discussed.

 Chapter VII: The conclusion of the project and the recommendation.

Design Specification and Standards

2.1. Hardware description

Edition Ubuntu 16.04

Processor Processor Intel(R) Core(TM) I5-10400F CPU @ 2.9 GHz 4.3GHz

Installed Ram Installed RAM 16.0 GB

2.2. Software installation

some libraries must be installed in order to make this study works:

3.1. Budget and Cost Management Plan

3.2 Project Schedule

The project has 3 parts:

22Equation Section (Next)Chaper IV

4.1. Autonomous Car

in the United States.

Figure 1: Levels of driving automation