You are on page 1of 32

SEMANTIC SEGMENTATION AND

MAPPING OF A TRAFFIC SCENE

Project Members
Bhavesh S (RA1611018010028)
Jacob P. Mathew (RA1611018010148)
Jawahar Sriram K(RA1611018010158)

Under the Guidance of


Mrs. G.Madhumitha M.E.

12/28/2021 15MH496L - Viva Voce 1


Objective (As per the 1st and 2nd review)
 The main objective of this project is to create a 3D map of the real world
traffic scene using semantic segmentation.
 The 3D map is an integration of semantic segmentation and semantic
mapping which gives the absolute distance of each object from the user.

12/28/2021 15MH496L - Viva Voce 2


Revised Objective (after lockdown)
 The revised objective is to perform semantic segmentation and semantic
mapping separately for the traffic scenes.
 Due to unavailability of resources the integration of semantic segmentation
and semantic mapping could not be done.

12/28/2021 15MH496L - Viva Voce 3


Introduction
 Optical vision is an essential component for autonomous cars.
 Accurate detection of vehicles, street buildings, pedestrians, lanes, traffic
signals and road signs could assist self-driving cars to drive as safely as
humans.
 However, object detection has been a challenging task for decades since
images of objects in the real-world environment are affected by illumination,
rotation, scale, and occlusion.
 As essential information about the environment like objects, their location,
distance, etc. is to be mandatorily known to an autonomous vehicle, semantic
segmentation is used to provide these details.
12/28/2021 15MH496L - Viva Voce 4
Need of the project
 To continuously detect and provide 3D information about the traffic scenario
to the user.
 To help the user for safe, convenient and healthier navigation by providing
useful information.

12/28/2021 15MH496L - Viva Voce 5


Literature survey
1. Title of the article: ENet: A Deep Neural Network Architecture for Real-Time Semantic
Segmentation

2. Authors: Adam Paszke, Abhishek Chaurasia, Eugenio Culurciello and Sangpil Kim

3. Journal details (Journal IEEE conference on computer vision and pattern recognition, 2015.
Name, Volume No, Issue
No, year and Page number)

4. Inference (in bullet point)  Proposed a Deep neural network architecture for faster inference and
higher accuracy named Efficient Neural Network (ENet).
 ENet is created especially for tasks requiring minimal delay.
 ENet is replacement to large neural networks such as SegNet or fully
convolutional networks (FCN) with VGG16 architecture to both finely
segment and spatially classify images.

12/28/2021 15MH496L - Viva Voce 6


Literature survey
1. Title of the article: Effective Object Detection Traffic from Camera Videos

2. Authors: Honghui Shi, , Zhichao Liu, Yuchen Fan, Xinchao Wang,Thomas Huang

3. Journal details (Journal IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced &
Name, Volume No, Issue Trusted Computed, Scalable Computing & Communications, Cloud & Big
No, year and Page number) Data Computing, Internet of People and Smart City Innovation, 2017
4. Inference (in bullet point)  This paper discussed about effective object detection algorithm for
detection of on road objects like vehicles, pedestrians, sign boards,
traffic lights, etc.
 Information on these objects forms the basis for any navigation task
that can be suggested to a driver assistance system or to an autonomous
vehicle.

12/28/2021 15MH496L - Viva Voce 7


Literature survey
1. Title of the article: Extrinsic Calibration between Camera and LIDAR Sensors by Matching
Multiple 3D Plane
2. Authors: Eung-su Kim and Soon-Yong Park
3. Journal details (Journal Multidisciplinary Digital Publishing Institute, Sensors, 2020
Name, Volume No, Issue
No, year and Page number)
4. Inference (in bullet point)  Camera-LIDAR combination forming an independent sensing system
helps in recording the objects in the traffic scene with their distances
from the ego vehicle.
 Proper calibration between them can be accomplished with the
rotational and translational matrices relating their location.
 A 360-degree field of view LIDAR and multi-view vision cameras are
required to record shape and colour information of the scene from a
vehicle.
 An accurate extrinsic calibration between cameras and LIDAR sensors
gives depth information about camera image that is shape and colour
information.

12/28/2021 15MH496L - Viva Voce 8


New system

SEMANTIC
SEGMENTATION

TRAFFIC
VIDEO

LIDAR-CAMERA
MAPPING
FUSION

12/28/2021 15MH496L - Viva Voce 9


Methodology for Semantic Segmentation

12/28/2021 15MH496L - Viva Voce 10


Results and Discussion for Semantic Segmentation
 The task of semantic segmentation is performed with the help of the Python
script which has the labels, colours and model corresponding to the task.

12/28/2021 15MH496L - Viva Voce 11


Results and Discussion
 The program window shows the total number frames present in the traffic video. It also estimates the time taken
to read a frame and also estimates the total time required to read all the frames in the video. When the task was
performed on a test video of duration 29 minutes and 9 seconds, the total number of frames in the video was
calculated to be 38034 frames. The time estimated to read one frame is 0.3922 seconds and the total time
estimated to read all the frames is 14915.7194 seconds

12/28/2021 15MH496L - Viva Voce 12


Results and Discussion
 After the execution of the code, output window contains the semantically segmented
video of the given test video. The segmented scene shows the identified objects
represented in their respective colours.
 

12/28/2021 15MH496L - Viva Voce 13


Results and Discussion
 The comparison between the actual scene and the segmented scene
 

12/28/2021 15MH496L - Viva Voce 14


Methodology for Mapping

Downsample
Load image, Project in
the point Labels
Defining a velodyne Filter out the view 3D
cloud and mapped to Saving the
function points and points that points to 2D
semantic kitti class frames
init_params() semantic are behind us image using
labels at the labels
labels RT matrix
same time

12/28/2021 15MH496L - Viva Voce 15


Results and Discussion
 Map is built with the help of two open sorce libraries 'Open3D' and 'pillow'.
 Open3D library deals with 3D lidar data to help build map and pillow library
helps in imaging.
 For displaying the result three pop up windows namely depth, learn_mapping
and semantic are created.
 

12/28/2021 15MH496L - Viva Voce 16


Results and Discussion
 The depth window shows projected lidar points on image.
 Each color represents corresponding distance range.

12/28/2021 15MH496L - Viva Voce 17


Results and Discussion
 The second window ‘learn_mapping’ shows the map created.
 In the figure shown, the road is identified as purple colour, vegetation as green
colour, unwanted objects as black and cars as peach.
 

12/28/2021 15MH496L - Viva Voce 18


Results and Discussion
 The window ‘semantic’ showns the semantically segmented points plotted on
the image.
 

12/28/2021 15MH496L - Viva Voce 19


Queries

1.On what basis Enet is selected?


Ans: The task of semantic segmentation requires long running times and large floating points which affects the use of
existing deep neural networks. So we use a new deep neural network architecture called Efficient Neural Network (ENet),
designed especially for such tasks as it is 18 times faster and requires less number of flip flops and parameters and
provides better accuracy when compared to other models. So ENet architecture serves a better option for executing
semantic segmentation.
2. ( Slide 12) The time estimated to read video of duration 29 minutes is 248 minutes. Comment on this.
Ans: The estimated total time is the total time taken by the system to read all the frames present in the video. Here to read
a video of 29 minutes duration it takes 248 minutes. Time taken to read each frame is 0.3922 seconds. The speed of frame
reading depends on the processor used in the system. We used a system with basic CPU (no GPU present). The time taken
to read this video can be decreased if a system with high end GPU is used which will be more efficient.
3. (slide 13) Colors representation has some kind of contradiction. Ex- Traffic sign color is shown for vegetation. Why?
Ans: It was a mistake made by us while preparing the legend where we interchanged the color and class.
4. ( slide 18). Map shown corresponds to which scenario?
Ans: Map shown also corresponds to a traffic scenario. It shows the aerial view of the traffic scene developed by
combining all the lidar points mapped for each frame of traffic video.
5. In reality, how fast your algorithm or system to map the scenario.
Ans: Due to unavailability of computers with higher GPU and RAM we are not able to conclude in reality how fast the
algorithm developed will map the traffic scenario.

12/28/2021 15MH496L - Viva Voce 20


Hardwares used

 Computer - Available in Motion Analysis lab

12/28/2021 15MH496L - Viva Voce 21


Conclusions
 In this project, the techniques of semantic segmentation and mapping of a traffic scene is
presented. Semantic segmentation is accomplished with the help of Efficient Neural network
(ENet) which has the ability to perform real-time semantic segmentation pixel-wise. The data from
LIDAR and camera are the two main sensory data being used in this project. When these two are
used together, it is possible to overcome individual limitations and at the same time it provides
perception and scene understanding. So when using two sensory data onboard with calibration,
both of them have the same coordinate system.
 Mapping of traffic scene is developed with mainly two dependencies Open3D and pillow which
helps in handling 3D data. From the map constructed, it is possible to detect roads, side walks,
vegetation, etc. by comparing against the colour assigned to the particular class.

12/28/2021 15MH496L - Viva Voce 22


Future scope
 The integration of semantic segmentation and semantic mapping can be done
to generate an aerial map which shows the absolute distance between the
objects present in the traffic scene and the user using the Lidar data.
 Camera and Lidar can be mounted together on a vehicle to perform real time
semantic segmentation and creating the map of traffic scene.
 Computers with high computation power like good GPU, RAM can be used to
perform semantic segmentation and mapping faster.

12/28/2021 15MH496L - Viva Voce 23


References
 Adam Paszke, Abhishek Chaurasia, Eugenio Culurciello and Sangpil Kim (2015), “ENet: A Deep
Neural Network Architecture for Real-Time Semantic Segmentation”.
 Honghui Shi, , Zhichao Liu, Yuchen Fan, Xinchao Wang,Thomas Huang (2017), “Effective Object
Detection Traffic from Camera Videos”.
 Eung-su Kim and Soon-Yong Park (2020), “Extrinsic Calibration between Camera and LIDAR
Sensors by Matching Multiple 3D Plane”.
 Abdelkrim Menra, Demim Fethi, Kahina Louadj, Mustapha Hamerlain, (2018) ‘Simultaneous
localization, mapping, and path planning for unmanned vehicle using optional control’, Advances
in Mechanical Engineering- SAGE journals, Vol.10.
 Xujie Kang, Jing Li, Xiangtao Fan and Wenhui Wan, (2019) ‘Real-Time RGB-D Simultaneous
Localization and Mapping Guided by Terrestrial LiDAR Point Cloud for Indoor 3-D
Reconstruction and Camera Pose Estimation’, Mdpi- Appl. Sci. 2019.

12/28/2021 15MH496L - Viva Voce 24


Percentage of work completed as per the original plan
S.No Details of works as per the original plan Work completed in %
1 The tasks of semantic segmentation and mapping are 70%
done separately
The innovation / new idea / concept / technique that can be
highlighted in the project
 The idea of integrating semantic segmentation and mapping to
build an aerial map was the new concept to be implemented but
due to computation constraints it could not be achieved.

12/28/2021 15MH496L - Viva Voce 26


Software tools used
Description of work (Design/Modelling/
S.No Software
Analysis/Simulation/ Testing)
1. Program execution for semantic segmentation SPYDER

2. Program script for semantic segmentation OpenCV

3. Programming environment for mapping SPYDER

12/28/2021 15MH496L - Viva Voce 27


Contribution made by the individual team members
Student Name and
S.No Contribution
Register number
Bhavesh S
1 RA1611018010028 Identified datasets, literature survey and assisted for task
completion.

Jacob P Mathew
2 RA1611018010148 Performed the task of semantic mapping.

Jawahar Sriram K
3 RA1611018010158 Performed the task of semantic segmentation.

12/28/2021 15MH496L - Viva Voce 28


Works completed using SRM and external facilities
Facilities used
S.No Work detail with percentage
(SRM/External)

1 Motion Analysis Lab The laboratory PC was used to complete the task of
semantic segmentation. Work percentage 40%.

12/28/2021 15MH496L - Viva Voce 29


Technical Challenges faced during this project
S.No Challenges Action taken to overcome this challenge

1. Computation power Computer with GPU which has high computational


power present in motion analysis lab was used

Downloading large
2. dataset Internet facility provided in the campus was used.

12/28/2021 15MH496L - Viva Voce 30


Due to unexpected lockdown, modifications / deviations made to
shape the project work to completion
 Our proposed system was to integrate semantic segmentation and
mapping tasks to build a 3D map, but due to lack of
computational power we have executed the two tasks seperately.

12/28/2021 15MH496L - Viva Voce 31


Thank You

12/28/2021 15MH496L - Viva Voce 32

You might also like