Professional Documents
Culture Documents
– HOCHIMINH CITY
INTERNATIONAL UNIVERSITY
SCHOOL OF ELECTRICAL ENGINEERING
ii
3D Object Detection from Images for Autonomous Driving
BY
Under the guidance and approval of the committee, and approved by its members, this
thesis has been accepted in partial fulfillment of the requirements for the degree.
Approved:
________________________________
Chairperson
_______________________________
Committee member
________________________________
Committee member
________________________________
Committee member
________________________________
Committee member
i
HONESTY DECLARATION
My name is Nguyen Hoàng Duy Bảo. I would like to declare that, apart from the
acknowledged references, this thesis either does not use language, ideas, or other original
material from anyone; or has not been previously submitted to any other educational and
research programs or institutions. I fully understand that any writings in this thesis
contradicted to the above statement will automatically lead to the rejection from the EE
Date:
Student’s Signature
(Full name)
ii
TURNITIN DECLARATION
Date: 08/06/2022
iii
ACKNOWLEGMENT
guidance of Dr. Nguyen Ngoc Hung. His constant encouragement and support helped me
to achieve my goal.
………………………
iv
TABLE OF CONTENTS
HONESTY DECLARATION.............................................................................................ii
TURNITIN DECLARATION............................................................................................iii
ACKNOWLEGMENT.......................................................................................................iv
TABLE OF CONTENTS....................................................................................................v
LIST OF TABLES............................................................................................................vii
LIST OF FIGURES..........................................................................................................viii
ABSTRACT........................................................................................................................x
CHAPTER I INTRODUCTION.........................................................................................1
v
3.2. Project Schedule.......................................................................................................5
CHAPTER V METHODOLOGY.......................................................................................7
REFERENCES..................................................................................................................10
APPENDICES...................................................................................................................11
vi
LIST OF TABLES
Table 1.1..........................................................................................................................1
Table 2.1..........................................................................................................................3
Table 2.3........................................................................................................................12
vii
LIST OF FIGURES
Figure 1.1..........................................................................................................................1
Figure 1.2..........................................................................................................................1
Figure 2.1..........................................................................................................................6
Figure 2.2........................................................................................................................13
Figure 2.3........................................................................................................................22
Figure 3.1........................................................................................................................26
viii
ABBREVIATIONS AND NOTATIONS
3D: Three-dimensional
2D: Two-dimensional
ix
ABSTRACT
Though the three-dimensional (3D) detection based on stereo images has advanced
greatly in recent years, most advanced methods still rely on anchor-based two-
dimensional (2D) detection or depth estimation to solve this problem. However, the high
structural information in stereo visual images, is proposed in this study. SC predicts the
four semantic key points of the 3D bounding box of the object in space and restores the
bounding box of the object in 3D space using 2D left and right boxes, 3D dimension,
x
CHAPTER I
INTRODUCTION
1.1 Overview
Object detection is a computer vision task that is now used in a wide range of consumer
applications, including surveillance and security systems, autonomous driving, mobile text
recognition, and disease diagnosis using MRI/CT scans. The objective of this senior project is
Autonomus driving has the potential to significantly improve people's lives by reducing
travel time, energy consumption, and emissions. One of the most important aspects of
autonomous driving is 3D object detection. As a result, both research and industry have made
significant efforts in the last decade to develop self-driving vehicles. 3D object detection has
gotten a lot of attention as one of the key enabling technologies for autonomous driving. Deep
One of the most important components for autonomous driving is object detection. To
ensure safe and reliable driving, autonomous vehicles rely on perception of their surroundings.
Object detection algorithms are used by this perception system to accurately determine objects in
the vehicle's vicinity, such as pedestrians, vehicles, traffic signs, and barriers. Object detectors
based on deep learning are critical for finding and localizing these objects in real time. The
current state of object detectors is discussed in this article, as well as the challenges that remain
1
1.2 Objectives
The main goals of this senior project is studying 3D Object Detection from Images for
Autonomous Driving with Deep Learning and Computer Vision. Then suggest suitable models
that can be used for real applications. Finally, create a training and testing model and run demo
Chapter III: The management, planning, project schedules, and resource planning
2
Chapter II
System type System type 64-bit operating system, x64 based processor
All the experiment in this study had been conducted from the personal computer of
student Duy Bảo. The computer is installed in the latest version of Anaconda, all the experiments
are run on the platform called Jupyter Notebook, which is provided by Anaconda. There are also
OpenCv
Cython
Numba
Matplotlib
Scipy
3
CHAPTER III
PROJECT MANAGEMENT
The total cost of this study is about 15 million VND due to the personal computer
The first part: finding and understanding all the methods (7 weeks)
The second part: building, fixing and compiling the model (4 weeks)
The last part is getting the result from the model and writing report. (4 weeks)
LITERATURE REVIEW
Autonomous vehicles (AVs) are vehicles that use technology to partially or completely
replace a human driver in navigating a vehicle from point A to point B while avoiding road
hazards and reacting to traffic conditions. The Society of Automotive Engineers (SAE) has
created a six-level classification system based on the amount of human intervention. This
4
classification system is used by the National Highway Traffic Safety Administration (NHTSA)
The ideal of self-driving vehicle was introduced by General Motors in the year 1939. The
self-driving car has undergone a complete transformation since then and it is an autonomous
vehicle. To drive without human involvement, an autonomous vehicle employs a mix of sensors,
artificial intelligence, radars, and cameras. This sort of vehicle is still in the development stage
since numerous components must be considered to ensure the safety of its passengers.
The priority of the Avs is recognize objects. There are various moving as well as
stationary objects on the road like pedestrians, other vehicles, traffic lights, and more. To avoid
accidents or collisions while driving, the vehicle needs to identify various objects. Autonomous
vehicles use sensors and cameras to collect data and make 3D maps. This helps to identify and
detect objects on the road while driving and makes it safe for its passengers.
5
Deep neural network-based 3D object identification systems are becoming a key
physical objects in 3D space, which is necessary for predicting future object movements. While
image-based 2D object recognition and instance segmentation have made significant advances,
domains, including robotics and autonomous driving. Several current mainstream 3D detectors
rely on light detection and ranging (LiDAR) sensors to gather accurate 3D information, and the
use of LiDAR data is seen to be critical to the success of 3D detectors. Despite their significant
success and the emergence of low-cost LiDAR research, it is critical to note that LiDAR still
confronts a few problems, including its high cost, short life, and restricted perception. In
contrast, stereo cameras, which operate in a way similar to human binocular vision, are less
expensive and have greater resolutions. Hence, they have received significant interest in
pictures may be traced back to triangulation and the prospective n-point issue (pnp). Because of
the advent of large datasets, 3D pose estimation joined the field of object detection. Machine
learning-based approaches have been widely used in practical engineering to date. However,
these approaches have restricted the capacity to search for information in 3D space without
requiring extra information; as a result, their accuracy is unlikely to match that of deep learning-
based methods.
6
+ Methods based on 2D features: This methods use the input RGB images, they begin by
estimating 2D positions, orientations, and dimensions and then reconstruct the 3D locations from
the 2D features. As a result, these approaches are often known as'result lifting-based methods.
they first build 3D features from photos before immediately estimating all elements in the
categorization.
Feature lifting
Methods based
on 3D features
Taxonomies Data lifting
Methods based
on 2D features
[result lifting]
7
Based on a variety of sensors located throughout the vehicle, autonomous cars create and
maintain a map of their surroundings. Radar sensors keep an eye on the movement of nearby
vehicles. Traffic lights are detected by video cameras, which also read road signs, track other
vehicles, and look for pedestrians. Lidar sensors measure distances, detect road edges, and
identify lane markings by bouncing light pulses off the car's surroundings.
After processing all of this sensory data, sophisticated software plots a path and sends
commands to the car's actuators, which control acceleration, braking, and steering.
The software follows traffic rules and navigates obstacles thanks to hard-coded rules,
Fully autonomous (Level 5) vehicles are being tested in a number of locations around the
world, but none are currently available to the general public. The difficulties range from
technological to environmental. In this article, i would like to introduce the solution to the vision
problems for Avs. That is 3D Object Detection from Images for Autonomous Driving.
mobility while decreasing travel time, energy consumption, and emissions. Unsurprisingly, both
academics and business have made major efforts in the recent decade to build self-driving
automobiles. 3D object detection has gotten a lot of attention as one of the major enabling
8
Existing 3D object identification methods may be divided into two groups based on
whether the incoming data is pictures or LiDAR signals. Approaches that estimate 3D bounding
boxes from photos alone confront a far bigger difficulty than LiDAR-based algorithms, since
recovering 3D information from 2D input data is an ill-posed issue. Despite this inherent
vision field during the last six years. More than 80 publications in this domain have been
items of interest given the RGB pictures and corresponding camera settings. In the 3D world
Location [x,y,z]
Dimension [h,w,l]
Orientation [θ, φ, ψ]
Only the heading angle θ around the up-axis (yaw angle) is evaluated in most
9
Figure 2: 3D bounding box
4.2.2.1
CHAPTER V
METHODOLOGY
10
During inference, the dimensions estimator was trained using an L1 loss and the
predicted size offset was used to recover the size of each item.
For orientation, car’s local direction α instead of yaw rotation Ɵ. Feature map uses eight
scalars to indicate orientation.
The orientation were trained with the L1 loss. Subsequently, we utilized α and the object
position to restore the yaw angle θ.
To construct more stringent constraints, this method predicted the four vertices at the
bottom of the 3D bounding box then solely performed keypoint detection on the left image.
Wi
∗hi
R
Following the Gaussian kernel to generate a ground truth vertex heat map: ∗4
V ∈ [ 0,1 ] R
11
2 2
−( x−x v ) + ( y− y v )
V xyv=exp ( 2
)
2σv
( Gaussian kernel )
To improve the accuracy of the key points, we regressed the downsampling offset
Wi
∗hi
R
R
∗2 of each vertex.
Fv ∈ R
To correlate the vertices with the center of the left image, we also returned the distance
Wi
∗hi
R
R from the main center to each vertex, and both the vertex offset and vertex distance
∗8
Fv∈ R
applied the L1 loss. We define the total loss of multitasking:
L=ωm Lm+ ωoff Loff + ωdis Ldis +ω¿ L ¿
¿+ωW L W +ωdim L dim+ ωo Lo + Lv Lv + ωoff Lof f + ωdis L dis ¿
r r v v v v
Where Lo , Lv , Loff and Ldi s represnts orientation, vertex coordinate, vertex coordinate offset, and
v
vertex coordinate distance, respectively. For the parameter ω before each item, we adopted
uncertainty instead of manual tuning.
12
As shown in Figure 5, a set of constraint equations may be created using the sparse
geometric link between 2D and 3D.
Where:
{w , h∧l:regression
b :represent thebaseline length of the stereo cameras
'
¿ ¿ x , y , z :cordinates of the 3 Dbounding bo x s center point .
Chaper VI
Results
5.1.2. DataSet
13
Figure 6.1. Sample images of the dataset
5.1. Results
14
15
[1] G. Brazil, G. Pons-Moll, X. Liu, and B. Schiele, “Kinematic 3d object detection in monocular video,” in
ECCV, 2020.
16