You are on page 1of 10

HKBK COLLEGE OF ENGINEERING

DEPARTMENT OF EC

PROJECT WORK-PHASE I
PROJECT SYNOPSIS GUIDELINES
This is to inform all the students of final year that the synopsis report is to be
submitted during PROJECT WORK-PHASE I presentations in 3sets the and should
consist of the following headings:
1.Project title page –format is enclosed
2.Certificate-format is enclosed
3.Table of content with page numbers.
3.Abstract (heading centre aligned) single page.
4. Next new page onward should contain Introduction (heading left aligned), Problem
Identification, Relevant literature survey, Objectives, Methodology [viz Block diagram,
Hardware & Software requirements], Expected Result.
10. In new page References [Books, Journals, IEEE papers] in IEEE style.

The font type is Times New Roman, headings font size is 14, rest of the text font size is
12 with 1.5-line spacing. Page Margins- 1.25’’left, 1’’ top, right, bottom.

Only abstract heading must be centre aligned. Rest Headings must be appropriately
numbered and left aligned.

The duly signed synopsis report by the respective guide is to be submitted to the
project coordinators during the presentation.

The synopsis report should be bounded in between transparent sheets (OHP sheets)
with black/brown tape, stick file will not be accepted.

The necessary literature/ Base Papers and data sheets for the project should be
produced during preliminary Presentation.

Project Co-ordinators
Prof. Shaik Imam
Prof. Mohamed Jebran P
VISVESVARAYA TECHNOLOGICAL UNIVERSITY
BELAGAVI

A
PROJECT SYNOPSIS
ON
DEPTH PREDICTION USING MACHINE LEARNING
TECHNIQUES
Submitted in partial fulfillment of the requirement for the award of degree
Bachelor of Engineering
in
Electronics and Communication Engineering
For the academic year
2020-2021

Submitted by

SYED TAUHEED AHMED SYED DAWOOD SHAH


(1HK16EC111) (1HK16EC107)

SALMAN SALEEM RIYAZ AHMED


(1HK16EC088) (1HK16EC076)
Under the Guidance of
Dr. SANJANA PRASAD
Designation, Dept.of E&C Engg.
H.K.B.K.C.E, Bangalore

DEPARTMENT OF
ELECTRONICS AND COMMUNICATION ENGINEERING
HKBK COLLEGE OF ENGINEERING
#22/1 Nagawara, Bangalore-560045
H.K.B.K COLLEGE OF ENGINEERING
S.No.22/1, Nagawara, Bengaluru -560045
Department of Electronics and communication

Certificate

Certified that the Project Work Phase - I entitled DEPTH PREDCTION USING MACHINE
LEARNING TECHNIQUES is a bonafide work carried out by SYED TAUHEED
AHMED(1HK16EC111) SYED DAWOOD SHAH(1HK16EC107) SALMAN SALEEM(1HK16EC088)
RIYAZ AHMED(1HK16EC076) in partial fulfillment for the award of Bachelor of Engineering in
Electronics and Communication Engineering of the Visvesvaraya Technological University, Belagavi during
the year 2020-21. It is certified that all corrections/suggestions indicated for the Internal Assessment have
been incorporated in the report deposited in the departmental library. The Synopsis report has been approved
as it satisfies the academic requirements in respect of Project Work Phase - I prescribed for Bachelor of
Engineering Degree.

Signature of the Guide Signature of the HOD

Signature of the Internal Examiners Signature of the Project Coordinator

1.

2.
3.

Table of Contents
Abstract……………………………………………………………………………………...i
Introduction...........................................................................................................................1
Problem Statement................................................................................................................#
Literature Survey....................................................................................................................
Objectives...............................................................................................................................#
Methodology..........................................................................................................................#
Expected Result.....................................................................................................................#
References…………………………………………………………………………………..#
Abstract
Learning to predict depth from RGB inputs is a challenging task both for indoor and
outdoor robot navigation. In this work we address unsupervised learning of depth where
supervision is provided by monocular videos, as cameras are the cheapest, least restrictive
and most ubiquitous sensor for robotics. Depth estimation of an image has applications
such as autonomous driving, rescue and disaster relief drones and computer vision. Several
works done by various researchers employed LiDAR and SONAR to estimate depth and
estimation among robots. LiDAR technology had drawbacks such as expensive and bulky
in nature, incompatible in mobile robotic platforms. SONAR technology has issues such as
low resolution and absence of object detection. This project will be implemented using
TENSORFLOW an open-source Machine Learning library. The model uses
Convolutional Neural Network to learn and the network is trained with
Unsupervised Learning and the programming language used is Python. The
datasets used for the project are KITTI and NYU these datasets are preferred
because of their accessibility. Once the model is trained to acceptable prediction
accuracy the model can be loaded to any hardware that supports machine learning
instructions, and can be executed on it. Most supervised methods for learning depth
require carefully calibrated setups. This severely limits the amount and variety of training
data they can use, which is why unsupervised techniques are often applied.
Introduction
This project proposes a solution which overcomes the limitations of previous technology
and is also accurate, compact and costs less. We use Machine Learning Techniques to
create a model which can predict the depth from a simple RGB images. This proposed
method for unsupervised learning of depth from monocular (single-camera). Cameras are
by far the best understood and most ubiquitous sensor available to us. High quality cameras
are inexpensive and easy to deploy. The ability to train on arbitrary monocular video opens
up virtually infinite amounts of training data, without sensing artifacts or inter-sensor
calibration issues.

Problem Statement
Many researches and Several works done Previously employed LiDAR and SONAR to
estimate depth and estimation among robots. LiDAR technology had drawbacks such as
expensive and bulky in nature incompatible in mobile robotic platforms. SONAR
technology had issues such as low resolution and absence of object detection.
LiDAR technology provides great details of the surrounding but its accuracy degrades in
bad weather. the LiDAR set up is bulky and cannot be used in mobile robotic platforms like
drones and other application where the formfactor require is small and one of its major
drawbacks it is expensive. the sonar or Ultrasound technology only provides a distance
estimation it does not have a good resolution due to the wide beam used, the slow speed of
sound reduces sensing rate and objects with smooth surface and sound dampening structure
appear invisible.

Literature Survey
Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised
Learning from Monocular Videos
Authors: Vincent Casser, Soeren Pirk, Reza Mahjourian, Anelia Angelova.

Abstract:

Learning to predict scene depth from RGB inputs is a challenging task both for indoor and
outdoor robot navigation. In this work we address unsupervised learning of scene depth and
robot ego-motion where supervision is provided by monocular videos, as cameras are the
cheapest, least restrictive and most ubiquitous sensor for robotics. Previous work in
unsupervised image-to-depth learning has established strong baselines in the domain. We
propose a novel approach which produces higher quality results, is able to model moving
objects and is shown to transfer across data domains, e.g., from outdoors to indoor scenes.
The main idea is to introduce geometric structure in the learning process, by modeling the
scene and the individual objects; camera ego-motion and object motions are learned from
monocular videos as input. Furthermore, an online refinement method is introduced to
adapt learning on the fly to unknown domains. The proposed approach outperforms all
state-of-the-art approaches, including those that handle motion e.g., through learned flow.
Our results are comparable in quality to the ones which used stereo as supervision and
significantly improve depth prediction on scenes and datasets which contain a lot of object
motion. The approach is of practical relevance, as it allows transfer across environments,

Unsupervised Monocular Depth Estimation CNN Robust to Training Data Diversity

Authors: Valery Anisimovskiy, Andrey Shcherbinin, Sergey Turko, Ilya Kurilin.

Abstract:

We present an unsupervised learning method for the task of monocular depth estimation. In
common with many recent works, we leverage convolutional neural network (CNN)
training on stereo pair images with view reconstruction as a self-supervisory signal. In
contrast to the previous work, we employ a stereo camera parameters estimation network to
make our model robust to training data diversity. Another of our contributions is the
introduction of self-supervision correction. With it we address one of the serious drawbacks
of the stereo pair self-supervision in the unsupervised monocular depth estimation
approach: at later training stages, self-supervision by view reconstruction fails to improve
predicted depth map due to various ambiguities in the input images. We mitigate this
problem by making depth estimation CNN produce both depth map and correction map
used to modify the input stereo pair images in the areas of ambiguity. Our contributions
allow us to achieve state-of-the-art results on the KITTI driving dataset (among
unsupervised methods) by training our model on hybrid city driving dataset.

Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D


Geometric Constraints

Authors: Reza Mahjourian, Martin Wicke, Anelia Angelova.


Abstract: We present a novel approach for unsupervised learning of depth and ego-motion
from monocular video. Unsupervised learning removes the need for separate supervisory
signals (depth or ego-motion ground truth, or multi-view video). Prior work in
unsupervised depth learning uses pixel-wise or gradient-based losses, which only consider
pixels in small local neighborhoods. Our main contribution is to explicitly consider the
inferred 3D geometry of the whole scene, and enforce consistency of the estimated 3D
point clouds and ego-motion across consecutive frames. This is a challenging task and is
solved by a novel (approximate) backpropagation algorithm for aligning 3D structures.

Objectives
 To Estimate Depth without using any Sensors.
 To Estimate Depth With a single Camera.
 To Estimate the Depth maps of a series of image from a camera in real time.

Methodology
We implement this project using convolutional neural networks. we use CNN’s instead of
regular linear neural networks because in the Linear neural networks each neuron in the
previous layer is connected to every other neuron in the next layer.it is because of this full
connectivity the linear neural networks are not suitable for application like computer vision
applications and due to these dense connections, the network tends to have a very large
number of parameters even for a simple input set.
Another major drawback of linear neural network Is that it does not take into account the
spatial structure of the data, treating input pixels which are far apart in the same way as the
pixels that are close together thus losing the locality reference in the image.
The CNN’s overcome these drawbacks as the neurons in this architecture are not fully
connected, the input image is divided into segments and each segment is connected only to
few neurons in the next layer this preserves the special structure and the network
parameters are optimal.
Block Diagram:
The Convolutional neural network is trained using Unsupervised learning. Unsupervised
learning is preferred because the network learns the different features present in the training
data without requiring any definitions or labeling. This greatly reduces the time and effort
required.
This project will be implemented using TENSORFLOW an open-source Machine Learning
library. The model uses convolutional neural network to Learn and the network is trained
with Unsupervised Learning and the programming language used is python.
The model is trained on datasets that are available on the internet. the dataset used for the
project are KITTI and NYU these datasets are preferred because of their free accessibility.
once the model is trained to acceptable prediction accuracy the model can be loaded to any
hardware that supports machine learning instructions (AVX)and can be executed on it.

Expected Result

Estimated Depth map (bottom image) from the input RGB image (top image).

References
• Bloesch, M.; Czarnowski, J.; Clark, R.; Leute egger, S.; and Davison, A. J. 2018.
Code slam Bloesch, M.; Czarnowski, J.; Clark, R.; Leute egger, S.; and Davison, A.
J. 2018. Code slam.
• Yang, Z.; Wang, P.; Wang, Y.; Xu, W.; and Nevatia, R. 2018b. Lego: Learning
edge with geometry all at once by watching videos. CVPR
• a number of unsupervised image-to-depth methods have been proposed, which
demonstrate that unsupervised depth prediction models are more accurate than
sensor-supervised ones (Zhou 2017; Garg, Carneiro, and Reid 2016).
• Zhou et al. proposed a novel approach for unsupervised learning of depth and using
only monocular video. This setup is most aligned with our work as we similarly
learn depth and from monocular video in an unsupervised setting.
• Vijayanarasimhan et al. uses a similar approach which additionally tries to learn the
motion of a handful of objects in the scene
• Kars Karsch, K.; Liu, C.; and Kang, S. 2014a. Depth extraction from video using
nonparametric sampling. IEEE transactions on pattern analysis and machine

You might also like