You are on page 1of 28

111Equation Chapter 1 Section 1VIETNAM NATIONAL UNIVERSITY

– HOCHIMINH CITY
INTERNATIONAL UNIVERSITY
SCHOOL OF ELECTRICAL ENGINEERING

3D Object Detection from Images for


Autonomous Driving

NGUYỄN HOÀNG DUY BẢO _ EEACIU17036

A SENIOR PROJECT SUBMITTED TO THE SCHOOL OF ELECTRICAL


ENGINEERING IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
DEGREE OF BACHELOR OF ELECTRICAL ENGINEERING
HO CHI MINH CITY, VIET NAM
2022

ii
3D Object Detection from Images for Autonomous Driving

BY

NGUYEN HOANG DUY BAO

Under the guidance and approval of the committee, and approved by its members, this

thesis has been accepted in partial fulfillment of the requirements for the degree.

Approved:

________________________________
Chairperson

_______________________________
Committee member

________________________________
Committee member

________________________________
Committee member

________________________________
Committee member

i
HONESTY DECLARATION

My name is Nguyen Hoàng Duy Bảo. I would like to declare that, apart from the

acknowledged references, this thesis either does not use language, ideas, or other original

material from anyone; or has not been previously submitted to any other educational and

research programs or institutions. I fully understand that any writings in this thesis

contradicted to the above statement will automatically lead to the rejection from the EE

program at the International University – Vietnam National University Hochiminh City.

Date:

Student’s Signature

(Full name)

ii
TURNITIN DECLARATION

Name of Student: Nguyễn Hoàng Duy Bảo

Date: 08/06/2022

Advisor Signature Student Signature

iii
ACKNOWLEGMENT

It is with deep gratitude and appreciation that I acknowledge the professional

guidance of Dr. Nguyen Ngoc Hung. His constant encouragement and support helped me

to achieve my goal.

………………………

iv
TABLE OF CONTENTS

HONESTY DECLARATION.............................................................................................ii

TURNITIN DECLARATION............................................................................................iii

ACKNOWLEGMENT.......................................................................................................iv

TABLE OF CONTENTS....................................................................................................v

LIST OF TABLES............................................................................................................vii

LIST OF FIGURES..........................................................................................................viii

ABBREVIATIONS AND NOTATIONS..........................................................................ix

ABSTRACT........................................................................................................................x

CHAPTER I INTRODUCTION.........................................................................................1

1.1. Level-2 Titl..............................................................................................................1

1.1.1. Level-3 title......................................................................................................1

1.1.1.1. Level-4 title...............................................................................................1

1.1.1.1.1. Level-5 title.......................................................................................1

CHAPTER II DESIGN SPECIFICATIONS AND STANDARDS...................................3

2.1. Design Specifications and Standards.......................................................................3

2.2. Realistic Constraints................................................................................................3

CHAPTER III PROJECT MANAGEMENT......................................................................5

3.1. Budget and Cost Management Plan.........................................................................5

v
3.2. Project Schedule.......................................................................................................5

3.3. Resource Planning....................................................................................................5

CHAPTER IV LITERATURE REVIEW............................................................................6

CHAPTER V METHODOLOGY.......................................................................................7

CHAPTER VI EXPECTED RESULTS..............................................................................8

CHAPTER VII CONCLUSION AND FUTURE WORK..................................................9

REFERENCES..................................................................................................................10

APPENDICES...................................................................................................................11

vi
LIST OF TABLES

Table 1.1..........................................................................................................................1

Table 1.2 .........................................................................................................................2

Table 2.1..........................................................................................................................3

Table 2.2 .........................................................................................................................5

Table 2.3........................................................................................................................12

Table 2.4 .......................................................................................................................22

vii
LIST OF FIGURES

Figure 1.1..........................................................................................................................1

Figure 1.2..........................................................................................................................1

Figure 2.1..........................................................................................................................6

Figure 2.2........................................................................................................................13

Figure 2.3........................................................................................................................22

Figure 3.1........................................................................................................................26

viii
ABBREVIATIONS AND NOTATIONS

PAN: Personal Area Network

MAC: Medium Access Control

SAE: The Society of Automotive Engineers

AVS: Autonomous vehicles

SC: Stereo CenterNet

3D: Three-dimensional

2D: Two-dimensional

ix
ABSTRACT
Though the three-dimensional (3D) detection based on stereo images has advanced

greatly in recent years, most advanced methods still rely on anchor-based two-

dimensional (2D) detection or depth estimation to solve this problem. However, the high

computational cost of these methods prevents them from achieving real-time

performance. Stereo CenterNet (SC), a 3D object detection method based on the

structural information in stereo visual images, is proposed in this study. SC predicts the

four semantic key points of the 3D bounding box of the object in space and restores the

bounding box of the object in 3D space using 2D left and right boxes, 3D dimension,

orientation, and key points. Compared to bounding box-based detectors, CenterNet is

end-to-end differentiable, simpler, faster, and more accurate.

x
CHAPTER I

INTRODUCTION

1.1 Overview

Object detection is a computer vision task that is now used in a wide range of consumer

applications, including surveillance and security systems, autonomous driving, mobile text

recognition, and disease diagnosis using MRI/CT scans. The objective of this senior project is

3D Object Detection from Images for Autonomous Driving

Autonomus driving has the potential to significantly improve people's lives by reducing

travel time, energy consumption, and emissions. One of the most important aspects of

autonomous driving is 3D object detection. As a result, both research and industry have made

significant efforts in the last decade to develop self-driving vehicles. 3D object detection has

gotten a lot of attention as one of the key enabling technologies for autonomous driving. Deep

learning-based 3D object detection approaches have recently gained popularity.

One of the most important components for autonomous driving is object detection. To

ensure safe and reliable driving, autonomous vehicles rely on perception of their surroundings.

Object detection algorithms are used by this perception system to accurately determine objects in

the vehicle's vicinity, such as pedestrians, vehicles, traffic signs, and barriers. Object detectors

based on deep learning are critical for finding and localizing these objects in real time. The

current state of object detectors is discussed in this article, as well as the challenges that remain

for their integration into autonomous vehicles.

1
1.2 Objectives

The main goals of this senior project is studying 3D Object Detection from Images for

Autonomous Driving with Deep Learning and Computer Vision. Then suggest suitable models

that can be used for real applications. Finally, create a training and testing model and run demo

to see if can work properly.

1.3. Report organization

The organization of this thesis report is described below:

 Chapter I: An introduction to the objective motivation of this thesis is stated. Chapter

II: Constructed specification and standards of the projects will be stated.

 Chapter III: The management, planning, project schedules, and resource planning

involved in the thesis project will be enlightened.

 Chapter IV: The review of popular methods will be described

 Chapter V: The proposed methods applied in this project will be stated.

 Chapter VI: The result of the project will be discussed.

 Chapter VII: The conclusion of the project and the recommendation.

2
Chapter II

Design Specification and Standards

2.1. Hardware description

Edition Ubuntu 16.04

Processor Processor Intel(R) Core(TM) I5-10400F CPU @ 2.9 GHz 4.3GHz

Installed Ram Installed RAM 16.0 GB

System type System type 64-bit operating system, x64 based processor

2.2. Software installation

All the experiment in this study had been conducted from the personal computer of

student Duy Bảo. The computer is installed in the latest version of Anaconda, all the experiments

are run on the platform called Jupyter Notebook, which is provided by Anaconda. There are also

some libraries must be installed in order to make this study works:

 OpenCv

 Cython

 Numba

 Matplotlib

 Scipy

3
CHAPTER III

PROJECT MANAGEMENT

3.1. Budget and Cost Management Plan

The total cost of this study is about 15 million VND due to the personal computer

3.2 Project Schedule

The project has 3 parts:

 The first part: finding and understanding all the methods (7 weeks)

 The second part: building, fixing and compiling the model (4 weeks)

 The last part is getting the result from the model and writing report. (4 weeks)

22Equation Section (Next)Chaper IV

LITERATURE REVIEW

4.1. Autonomous Car

Autonomous vehicles (AVs) are vehicles that use technology to partially or completely

replace a human driver in navigating a vehicle from point A to point B while avoiding road

hazards and reacting to traffic conditions. The Society of Automotive Engineers (SAE) has

created a six-level classification system based on the amount of human intervention. This

4
classification system is used by the National Highway Traffic Safety Administration (NHTSA)

in the United States.

Figure 1: Levels of driving automation


4.1.1. Background

The ideal of self-driving vehicle was introduced by General Motors in the year 1939. The

self-driving car has undergone a complete transformation since then and it is an autonomous

vehicle. To drive without human involvement, an autonomous vehicle employs a mix of sensors,

artificial intelligence, radars, and cameras. This sort of vehicle is still in the development stage

since numerous components must be considered to ensure the safety of its passengers.

The priority of the Avs is recognize objects. There are various moving as well as

stationary objects on the road like pedestrians, other vehicles, traffic lights, and more. To avoid

accidents or collisions while driving, the vehicle needs to identify various objects. Autonomous

vehicles use sensors and cameras to collect data and make 3D maps. This helps to identify and

detect objects on the road while driving and makes it safe for its passengers.

5
Deep neural network-based 3D object identification systems are becoming a key

component of self-driving cars. 3D object recognition helps to understand of the geometry of

physical objects in 3D space, which is necessary for predicting future object movements. While

image-based 2D object recognition and instance segmentation have made significant advances,

3D object detection has received less attention in the literature.

Detecting three-dimensional (3D) objects is an essential but difficult job in many

domains, including robotics and autonomous driving. Several current mainstream 3D detectors

rely on light detection and ranging (LiDAR) sensors to gather accurate 3D information, and the

use of LiDAR data is seen to be critical to the success of 3D detectors. Despite their significant

success and the emergence of low-cost LiDAR research, it is critical to note that LiDAR still

confronts a few problems, including its high cost, short life, and restricted perception. In

contrast, stereo cameras, which operate in a way similar to human binocular vision, are less

expensive and have greater resolutions. Hence, they have received significant interest in

academia and business.

The fundamental theoretical understanding of 3D detection based on stereo

pictures may be traced back to triangulation and the prospective n-point issue (pnp). Because of

the advent of large datasets, 3D pose estimation joined the field of object detection. Machine

learning-based approaches have been widely used in practical engineering to date. However,

these approaches have restricted the capacity to search for information in 3D space without

requiring extra information; as a result, their accuracy is unlikely to match that of deep learning-

based methods.

Up to now, 3D Object Detection from Images for Autonomous Driving

categorization methods are:

6
+ Methods based on 2D features: This methods use the input RGB images, they begin by

estimating 2D positions, orientations, and dimensions and then reconstruct the 3D locations from

the 2D features. As a result, these approaches are often known as'result lifting-based methods.

+ Methods based on 3D features: The fundamental advantage of these approaches is that

they first build 3D features from photos before immediately estimating all elements in the

3D bounding boxes, including 3D positions, in 3D space. The technique is classified into

2 categories: Feature Lifting-based Methods, Data Lifting-based Methods. Figure 4.1.1

below describes 3D Object Detection from Images for Autonomous Driving

categorization.

Feature lifting
Methods based
on 3D features
Taxonomies Data lifting
Methods based
on 2D features
[result lifting]

Firgure 4.1.1: 3D Object Detection from Images for Autonomous Driving

4.1.2 How do Autonomous vehicles working?


Sensors, actuators, complex algorithms, machine learning systems, and powerful

processors are used to run software in autonomous vehicles.

7
Based on a variety of sensors located throughout the vehicle, autonomous cars create and

maintain a map of their surroundings. Radar sensors keep an eye on the movement of nearby

vehicles. Traffic lights are detected by video cameras, which also read road signs, track other

vehicles, and look for pedestrians. Lidar sensors measure distances, detect road edges, and

identify lane markings by bouncing light pulses off the car's surroundings.

After processing all of this sensory data, sophisticated software plots a path and sends

commands to the car's actuators, which control acceleration, braking, and steering.

The software follows traffic rules and navigates obstacles thanks to hard-coded rules,

obstacle avoidance algorithms, predictive modeling, and object recognition.

4.1.3. Challenging of self-driving cars?

Fully autonomous (Level 5) vehicles are being tested in a number of locations around the

world, but none are currently available to the general public. The difficulties range from

technological to environmental. In this article, i would like to introduce the solution to the vision

problems for Avs. That is 3D Object Detection from Images for Autonomous Driving.

4.2 3D Object Detection from Images for Autonomous Driving concept

AUTONOMOUS driving has the potential to transform people's lives by increasing

mobility while decreasing travel time, energy consumption, and emissions. Unsurprisingly, both

academics and business have made major efforts in the recent decade to build self-driving

automobiles. 3D object detection has gotten a lot of attention as one of the major enabling

technologies for autonomous driving. Deep learning-based 3D object identification techniques

have recently gained favor.

8
Existing 3D object identification methods may be divided into two groups based on

whether the incoming data is pictures or LiDAR signals. Approaches that estimate 3D bounding

boxes from photos alone confront a far bigger difficulty than LiDAR-based algorithms, since

recovering 3D information from 2D input data is an ill-posed issue. Despite this inherent

complexity, image-based 3D object identification approaches have developed in the computer

vision field during the last six years. More than 80 publications in this domain have been

published in top-tier conferences and journals, delivering notable improvements in detection

accuracy and inference speed.

4.2.1 The overview


The purpose of image-based 3D object recognition is to categorize and locate the

items of interest given the RGB pictures and corresponding camera settings. In the 3D world

space, each item is represented by its category and bounding box.

3D bounding box is parameterize by:

 Location [x,y,z]

 Dimension [h,w,l]

 Orientation [θ, φ, ψ]

Only the heading angle θ around the up-axis (yaw angle) is evaluated in most

autonomous driving scenarios.

9
Figure 2: 3D bounding box

4.2.2 3D Object Detection from Images for Autonomous Driving methods


Up to now, all the methods above are improvisations on top of five basic ideas: TLNet,
3DOP, Stereo-CNN, RTS3D, Pseudo-LiDAR, [1]

4.2.2.1

CHAPTER V

METHODOLOGY

5.1.2. Stereo Centernet

5.1.2.1. Stereo 3D components


We added three regression branches to create a 3D bounding box [d,o,v]
With:
 d: denotes the length, the width and height of 3D box of the object [ L, W , H ]T
 o: the orientation of each object
 v: the bottom vertex of the 3D box as the key point.
T
To calculate the average of 3D dimensions of a category in the entire data set [ L ,W , H ] ,
regressing to the true value and the offset between the prior size:
T T
of f dim =2∗( [ L , W , H ] −[ L ,W , H ] )

10
During inference, the dimensions estimator was trained using an L1 loss and the
predicted size offset was used to recover the size of each item.

For orientation, car’s local direction α instead of yaw rotation Ɵ. Feature map uses eight
scalars to indicate orientation.

The orientation were trained with the L1 loss. Subsequently, we utilized α and the object
position to restore the yaw angle θ.

Figure1: Aerial view, the same car in different perspectives: A, B, C, D

To construct more stringent constraints, this method predicted the four vertices at the
bottom of the 3D bounding box then solely performed keypoint detection on the left image.
Wi
∗hi
R
Following the Gaussian kernel to generate a ground truth vertex heat map: ∗4
V ∈ [ 0,1 ] R

11
2 2
−( x−x v ) + ( y− y v )
V xyv=exp ( 2
)
2σv
( Gaussian kernel )

To improve the accuracy of the key points, we regressed the downsampling offset
Wi
∗hi
R
R
∗2 of each vertex.
Fv ∈ R
To correlate the vertices with the center of the left image, we also returned the distance
Wi
∗hi
R
R from the main center to each vertex, and both the vertex offset and vertex distance
∗8
Fv∈ R
applied the L1 loss. We define the total loss of multitasking:
L=ωm Lm+ ωoff Loff + ωdis Ldis +ω¿ L ¿
¿+ωW L W +ωdim L dim+ ωo Lo + Lv Lv + ωoff Lof f + ωdis L dis ¿
r r v v v v

Where Lo , Lv , Loff and Ldi s represnts orientation, vertex coordinate, vertex coordinate offset, and
v

vertex coordinate distance, respectively. For the parameter ω before each item, we adopted
uncertainty instead of manual tuning.

5.1.2.2. Stereo 3D box Estimation

Following predicted the results in 5.1.2.1 , seven key values


' '
Z={( u l , vl ) , ( ur , v r ) , ul ,u r , x v } represent the upper left and lower right coordinate of the 2D box of
the left image, the left and right abscissas of the 2D box in the right image, and the abscissa of
the perspective key vertex.

Figure 2: 3D box estimation with stereo key-point constraints

12
As shown in Figure 5, a set of constraint equations may be created using the sparse
geometric link between 2D and 3D.

Where:

{w , h∧l:regression
b :represent thebaseline length of the stereo cameras
'
¿ ¿ x , y , z :cordinates of the 3 Dbounding bo x s center point .

Chaper VI

Results

5.1.2. DataSet

StereoCenternet was performance on the popular KITTI 3D object detection dataset,


which contain 7481 trainning images and 7518 testing images. Based on 3DOP
KITTI classifies objects into three levels: Easy, Modertate and Hard.
In this method, original image 1280 x 384 are using for trainning and testing.

13
Figure 6.1. Sample images of the dataset

5.1. Results

The given images:

The result after applied the models

14
15
[1] G. Brazil, G. Pons-Moll, X. Liu, and B. Schiele, “Kinematic 3d object detection in monocular video,” in
ECCV, 2020.

16

You might also like