You are on page 1of 58

IMPLEMENTATION OF STEREO

MATCHING ALGORITHM FOR


DEPTH CALCULATION OF
OBSTACLES
A PROJECT REPORT

Submitted by

AYISHA SIDDEEQUA A 811220104007


BENASIR S 811220104008
KIRUTHIKA M 811220104027

in partial fulfillment for the award of the degree

of

BACHELOR OF ENGINEERING

IN

COMPUTER SCIENCE AND ENGINEERING

INDRA GANESAN COLLEGE OF ENGINEERING,

TRICHY

ANNA UNIVERSITY: CHENNAI 600 025

MAY 2024
ANNA UNIVERSITY: CHENNAI 600

025 BONAFIDE CERTIFICATE

Certified that this project report “IMPLEMENTATION OF STEREO


MATCHING ALGORITHM FOR DEPTH CALCULATION OF
OBSTACLES” is the bonafide work of AYISHA
SIDDEEQUA(811220104007), BENASIR S (811220104026) and
KIRUTHIKA M(811220104027) who carried out the project under my
supervision.

SIGNATURE SIGNATURE
Prof. T.Sugashini Dr. G. Balakrishnan
HEAD OF THE DEPARTMENT SUPERVISOR
Assistant Professor, Principal,
Department of Computer Science Department of Computer Science
and Engineering and Engineering
Indra Ganesan College of Indra Ganesan College of
Engineering, Manikandam, Engineering, Manikandam,
Trichy – 620012 Trichy – 620012

Submitted for Anna university project Viva-voce examination held on

INTERNAL EXAMINER EXTERNAL EXAMINER


ACKNOWLEDGEMENT

We praise the god for the blessings in all aspects of our project work
and guiding us in the path of his light. First of all, we praise and thank “our
beloved parents” from the depth of the heart.

We express our deepest gratitude to our honourable Chairman Shri T.Ganesan


and Secretary Er.G.RAJASEKARAN for their valuable guidance and blessings

We would like to thank our dynamic Director Dr. G. BALAKRISHNAN for his
unwavering support during the entire course of this project work. We are grateful
to our Principal Dr. G. BALAKRISHNAN for providing us an excellent
environment to carry out our course successfully.

We are deeply indebted to our beloved Head of the Department,


Prof. T. SUGASHINI, our project coordinator, who make us both technically and
morally for achieving greater success in life.

We express our sincere thanks to our guide Dr. G. BALAKRISHNAN, for


being instrumental in the completion of our project with his exemplary
guidance. We thank all our teaching and nonteaching staff members of the
department and my friends for their valuable support and assistance at various
stages of our project development.
ABSTRACT

Stereo vision, enabled by stereo matching algorithms, has emerged as a powerful technique
for depth estimation in computer vision applications. With the widespread availability of
dual-camera setups in modern smart phones, stereo vision has become increasingly feasible
for real-time depth calculation of obstacles in various scenarios. Stereo matching algorithms
leverage the disparity between corresponding points in stereo image pairs to infer depth
information. Traditional algorithms like block matching, Semi-Global Block Matching
(SGBM), and graph-based methods have been adapted and optimized for smart phone
platforms to enable real-time depth calculation. These algorithms exploit the spatial and
intensity information captured by the dual cameras to generate dense depth maps, providing
valuable insights into the 3D structure of the scene. Stereo Image Acquisition Dual-camera
smart phones capture stereo image pairs simultaneously, providing the necessary input for
stereo matching algorithms. Images acquired by the left and right cameras undergo
preprocessing steps, including rectification and color correction, to ensure alignment and
consistency between the stereo images. The heart of the depth calculation process lies in the
stereo matching algorithm. Camera implementations leverage efficient versions of block
matching, SGBM, or deep learning-based methods to compute the disparity map, which
represents the depth information of the scene The implementation of stereo matching
algorithms on smart phones offers numerous advantages, including portability, accessibility,
and integration with existing smart phone functionalities. By harnessing the computational
power of modern smart phones and leveraging the capabilities of dual-camera setups, stereo
vision becomes a practical and versatile tool for depth calculation and obstacle detection in
everyday scenarios. In conclusion, the implementation of stereo matching algorithms on
smart phones facilitates real-time depth calculation of obstacles are estimated.

iv
TABLE OF CONTENTS
CHAPTER NO. TITLE PAGE NO.

ABSTARCT iv

LIST OF FIGURES vii

LIST OF ABBREVIATIONS viii

1 INTRODUCTION 1

1.1 INTRODUCTION 1

1.2 SCOPE OF THE PROJECT 2

1.3 OBJECTIVE OF THE PROJECT 2

2 LITERATURE REVIEW 3

2.1 INFERENCE FROM THE LITERATUURE 5

3 SYSTEM ANALYSIS 6

3.1 PROPOSED SYSTEM 6

4 SYSTEM SPECIFICATION 8

4.1 HARDWARE SPECIFICATION 8


4.2 SOFTWARE SPECIFICATION 8

4.3 SOFTWARE DESCRIPTION 8

5 SYSTEM DESIGN 17

5.1 SYSTEM ARCHITECTURE 17

5.2 DATA FLOW DIAGRAM 17

5.3 CLASS DIAGRAM 20


5.4 ACTIVITY DIAGRAM 20
5.5 SEQUENCE DIAGRAM 22
6 SYSTEM IMPLEMENTATION 23
6.1 STEREO MATCHING ALGORITHM 23
6.1.1 SEMI -GLOBAL BLOCK MATCHING 23
(SGBM)
6.1.2 COST CALCULATION 27
6.1.3 DISPARITY MAPPING AND 29
PORTRAIT
7 SYSTEM TESTING 31
7.1 FUNCTIONALITY TESTING 31
7.1.1 UNIT TESTING 31
7.1.2 INTEGRATION TESTING 32
7.2 NON-FUNCTIONAL TESTING 32
7.2.1 PERFORMANCE TESTING 32
7.2.2 COMPATABILITY TESTING 33
7.2.3 USABILITY TESTING 34
7.2.4 RELIABILITY TESTING 34
8 EXPERIMENTAL RESULT 36
9 CONCLUSION AND FUTURE 37
ENHANCEMENT
9.1 CONCLUSION 36

9.2 FUTURE ENHANCEMENT 37

10 APPENDICES 38

10.1 SOURCE CODE 39

10.2 SCRENSHOTS 48

REFERENCES 49
LIST OF FIGURES

FIGURE NO. FIGURE NAME PAGE NO.

3.1 PORTRAIT IMAGE 6

3.2 DISTANCE 7

5.1 SYSTEM ARCHITECTURE 17

5.2 DATAFLOW SYMBOL 18

5.2.1 DATAFLOW DIAGRAM 19

5.3 CLASS DIAGRAM 20

5.4 ACTIVITY DIAGRAM 21

5.5 SEQUENCE DIAGRAM 22

6.1.1.1 INTRINSIC AND EXTRINSIC 23


PARAMETER

6.1.1.2 CHECKERBOARD PATTERN 24

6.1.1.3 IMAGE ACQUISITION 25

6.1.1.4 CORNER DETECTION 26

6.1.1.5 RECTIFICATION 27

6.1.3.1 DIFFERENCE OF NORMAL AND 29


DISPARITY MAP

8.1 EXPERIMENTAL RESULT 36

vii
LIST OF ABBEREVIATIONS

ABBEREVIATION EXPLANATION

SGBM Semi-Global Block Matching


GPL General Public License
HTML Hyper-Text Markup Language
CSS Cascading Style Sheet
JS Java Script
API Application Programming Interface
SAD Sum of Absolute Differences
SSD Sum of Squared Differences
NCC Normalized Cross-Correlation

viii
CHAPTER 1

INTRODUCTION

1.1 INTRODUCTION

Stereo vision, a fundamental aspect of human perception, enables us to perceive


depth and three-dimensional structure by integrating visual information from two
eyes. This capability has been replicated in computer vision through stereo
matching algorithms, which analyze pairs of images captured by stereo camera
systems to estimate depth information. With the proliferation of dual-camera setups
in modern smart phones, stereo vision has found its way into the palms of our
hands, opening up a plethora of possibilities for depth calculation and obstacle
detection in real-time scenarios. Traditional depth sensing techniques, such as time-
of-flight and structured light, have been employed in specialized hardware
solutions but are often limited in terms of range, accuracy, and cost-effectiveness.
Stereo vision, on the other hand, offers a compelling alternative, leveraging the
power of computational imaging to infer depth from natural image pairs captured
by dual cameras. Stereo matching algorithms lie at the heart of stereo vision
systems, enabling the computation of depth maps from stereo image pairs. These
algorithms work by identifying corresponding points or features in the left and
right images and measuring the disparity (or depth difference) between them. Over
the years, various stereo matching techniques have been developed, ranging from
traditional block matching and Semi-Global Block Matching (SGBM) to more
advanced deep learning-based approaches.

The integration of stereo matching algorithms into smart phones brings stereo
vision capabilities to a wide range of users, democratizing access to depth
calculation and obstacle detection. By harnessing the computational power of
modern smart phones and leveraging dual-camera setups, stereo vision becomes
accessible and practical for everyday use.

1
1.2 SCOPE OF THE PROJECT

Exploration of different stereo matching algorithms suitable for implementation


on smart phones, including traditional techniques like block matching and Semi-
Global Block Matching (SGBM), as well as deep learning-based approaches.

Consideration of the hardware and software components necessary for stereo vision
on smart phones, including dual-camera setups, camera calibration, image
processing capabilities, and computational resources.

Investigation of techniques for achieving real-time depth calculation on smart


phones, focusing on algorithm optimization, parallelization, and efficient utilization
of Camera resources to ensure interactive performance.

1.3 OBJECTIVE OF THE PROJECT

It develops an efficient and accurate implementation of stereo matching


algorithms on cameras to enable real-time depth calculation of obstacles. It
Optimize stereo matching algorithms for resource-constrained camera platforms,
considering factors such as computational complexity, memory usage, and power
consumption.

Investigating techniques for camera calibration and stereo image acquisition on


smart phones to ensure precise alignment and consistency between stereo image
pairs. It explore strategies for obstacle detection and avoidance using depth
information obtained from stereo vision, including algorithmic approaches for
segmentation, object recognition, and path planning.

In conclusion, it utilizes stereo vision for obstacle detection, focusing on user


feedback, visualization of depth information, and seamless integration with other
camera functionalities.

2
CHAPTER 2

LITERATURE REVIEW

TITLE: MobiDepth: Real-Time Depth Estimation Using On-Device


Dual Cameras
Authors: Jinrui Zhang1†, Huan Yang1, Ju Ren2, Deyu Zhang1, Bangwen
He1, Ting Cao3, Yuanchun Li4, Yaoxue Zhang2, Yunxin Liu4, 2022

Real-time depth estimation is critical for the increasingly popular augmented


reality and virtual reality applications on mobile devices. Yet existing solutions are
insufficient as they require expensivedepth sensors or motion of the device, or have
a high latency. We propose MobiDepth, a real-time depth estimation system using
the widely-available on-device dual cameras. While binocular depth estimation is a
mature technique, it is challenging to realize the technique on commodity mobile
devices due to the different focal lengths and unsynchronized frame flows of the
on-device dual cameras and the heavy stereo-matching algorithm. To address the
challenges, MobiDepth integrates three novel techniques: 1) iterative field-of-view
cropping, which crops the field-of-views of the dual cameras to achieve the
equivalent focal lengths for accurate epipolar rectification; 2) heterogeneous
camera synchronization, which synchronizes the frame flows captured by the dual
cameras to avoid the displacement of moving objects across the frames in the same
pair; 3) mobile GPU-friendly stereo matching, which effectively reduces the
latency of stereo matching on a mobile GPU. We implement MobiDepth on
multiple commodity mobile devices and conduct comprehensive evaluations.
Experimental results show that MobiDepth achieves real-time depth estimation of
22 frames per second with a significantly reduced depth-estimation error compared
with the baselines. Using MobiDepth, we further build an example application of
3D pose estimation, which significantly outperforms the state-of-the-art 3D pose-

3
estimation method, reducing the pose-estimation latency and error by up to 57.1%
and 29.5%, respectively.

TITLE: A Deep Learning-Enhanced Stereo Matching Method and Its


Application to Bin Picking Problems Involving Tiny Cubic Workpieces

Authors: Masaru Yoshizawa, Kazuhiro Motegi and Yoichi Shiraishi, 2023

This paper proposes a stereo matching method enhanced by object detection and
instance segmentation results obtained through the use of a deep convolutional
neural network. Then, this method is applied to generate a picking plan to solve bin
picking problems, that is, to automatically pick up objects with random poses in a
stack using a robotic arm. The system configuration and bin picking process flow
are suggested using the proposed method, and it is applied to bin picking problems,
especially those involving tiny cubic workpieces. The picking plan is generated by
applying the Harris corner detection algorithm to the point cloud in the generated
three-dimensional map. In the experiments, two kinds of stacks consisting of cubic
workpieces with an edge length of 10 mm or 5 mm are tested for bin picking. In the
first bin picking problem, all workpieces are successfully picked up, whereas in the
second, the depths of the workpieces are obtained, but the instance segmentation
process is not completed. In future work, not only cubic workpieces but also other
arbitrarily shaped workpieces should be recognized in various types of bin picking
problems.

TITLE: Normal Assisted Stereo Depth Estimation


Authors: Uday Kusupati; Shuo Cheng; Rui Chen; Hao Su, 2020

Accurate stereo depth estimation plays a critical role in various 3D tasks in both
indoor and outdoor environments. Recently, learning-based multi-view stereo
methods have demonstrated competitive performance with limited number of

4
views. However, in challenging scenarios, especially when building cross-view
correspondences is hard, these methods still cannot produce satisfying results. In
this paper, we study how to enforce the consistency between surface normal and
depth at training time to improve the performance. We couple the learning of a
multi-view normal estimation module and a multi-view depth estimation module.
In addition, we propose a novel consistency loss to train an independent
consistency module that refines the depths from depth/normal pairs. We find that
the joint learning can improve both the prediction of normal and depth, and the
accuracy and smoothness can be further improved by enforcing the consistency.
Experiments on MVS, SUN3D, RGBD and Scenes11 demonstrate the effectiveness
of our method and state-of-the-art performance.

2.1 INFERENCE FROM THE LITERATURE

The implementation of stereo matching algorithms on camera for real-time


depth calculation of obstacles holds significant promise for various applications.
By leveraging the computational power and dual-camera setups of modern camera,
accurate depth estimation and obstacle detection can be achieved in real-time
scenarios. Optimizing stereo matching algorithms for camera platforms enables
efficient utilization of resources while ensuring high accuracy and reliability.

5
CHAPTER 3

SYSTEM ANALYSIS

3.1 PROPOSED SYSTEM

In the proposed system, the utilization of the portrait mode of the image entails
capturing images with a focus on the subject, often resulting in a blurred
background. This mode, commonly available in cameras, enhances the depth
perception of the captured scene by creating a distinct separation between the
subject and its surroundings. By leveraging the portrait mode, the system ensures
that the object of interest, typically an obstacle, stands out prominently in the
image, facilitating accurate depth calculation through stereo matching algorithms.

Fig.3.1Portrait Image

The process begins with the acquisition of images from the smart phone camera in
portrait mode. This mode utilizes depth-sensing technologies, such as dual-camera
setups or depth sensors, to capture images with a shallow depth of field,
emphasizing the subject while blurring the background. The resultant images
exhibit a clear distinction between foreground and background elements, enabling
effective depth mapping for obstacle detection. Once the portrait image is obtained,
the system proceeds to calculate the depth information using stereo matching
algorithms. Stereo matching relies on the principle of triangulation, wherein

6
corresponding points in stereo image pairs are identified and used to estimate the
distance or depth of objects in the scene. By analyzing the disparities between
corresponding points in the left and right images, the system can infer the relative
distances of objects from the camera.

The stereo matching algorithm employed in the system performs a pixel-by-pixel


comparison of the left and right images to determine correspondences. It uses
global block matching algorithm prioritizes accuracy by considering global
information across the entire image. This approach entails comparing pixel blocks
between the left and right images to determine disparities accurately. By analyzing
disparities, the algorithm discerns the relative distances of objects from the camera
with a high degree of precision.

Fig.3.2. Distance

7
CHAPTER 4

SYSTEM SPECIFICATION

4.1 HARDWARE SPECIFICATION

 Processors: Intel® Core™ i5 processor 4300M at 2.60 GHz or 2.59


GHz (1 socket, 2 cores, 2 threads per core), 8 GB of DRAM
 Disk space: 320 GB
 Operating systems: Windows® 10.

4.2 SOFTWARE SPECIFICATION

 Server Side : Python 3.7.4(64-bit)


 Client Side : HTML, CSS, JS
 IDE : Flask 1.1.1
 OS : Windows 10 64 –bit

4.3 SOFTWARE DESCRIPTION

4.3.1 Python 3.7.4


Python is a general-purpose interpreted, interactive, object-oriented, and
high-level programming language. It was created by Guido van Re sum during
1985- 1990. Like Perl, Python source code is also available under the GNU
General Public License (GPL). This tutorial gives enough understanding on Python
programming language.
Python, renowned for its simplicity and elegance, has become a cornerstone in the
world of programming. Since its inception by Guido van Rossum in 1991, Python
has garnered a loyal following due to its readable syntax and versatility. This high-
level language boasts dynamic typing, allowing for fluidity in code creation and
execution. The language's extensive standard library serves as a treasure trove of
modules and packages, catering to diverse programming needs from data
manipulation to web development.

8
One of Python's standout features is its interpretive nature, facilitating rapid
prototyping and interactive development. Developers can swiftly experiment with
code snippets or debug in real-time using Python's interactive interpreter,
enhancing productivity and fostering a seamless development experience.
Furthermore, Python's cross-platform compatibility ensures that code written on
one operating system can effortlessly run on others, offering unparalleled flexibility
for developers across different environments.
The Python ecosystem thrives on collaboration and innovation, with a vibrant
community continuously enriching it with an array of third-party libraries and
frameworks. From web frameworks like Django and Flask to data science tools
such as NumPy and pandas, Python empowers developers to tackle a myriad of
tasks with ease. Its applications span diverse domains, including web development,
data analysis, scientific computing, machine learning, automation, and more.
In terms of syntax, Python's simplicity shines through, with intuitive constructs for
variables, control flow, functions, classes, and file I/O. This simplicity not only
enhances readability but also accelerates the learning curve for beginners
transitioning into programming. Additionally, Python's support for object-oriented,
procedural, and functional programming paradigms ensures adaptability to varying
project requirements and coding styles.
Python's journey from a humble scripting language to a powerhouse in the tech
industry underscores its enduring appeal and relevance. Its user-friendly design,
coupled with an extensive ecosystem and strong community support, cements
Python's position as a top choice for developers worldwide. Whether you're a
seasoned programmer or a newcomer embarking on your coding journey, Python's
charm and utility make it an indispensable tool for turning ideas into reality.

General application programming with Python


Python's versatility lends itself well to a myriad of application domains, making
it a go-to language for developers worldwide. In web development, frameworks

9
like Django and Flask empower developers to build robust and scalable web
applications and APIs with ease. Meanwhile, Python's prowess in data analysis and
visualization is evident through libraries such as NumPy, pandas, and Matplotlib,
enabling data scientists to manipulate and visualize data effortlessly. For desktop
GUI applications, Python offers Tkinter, PyQt, and wxPython, providing
developers with tools to create intuitive user interfaces. In the realm of gaming,
Pygame facilitates the development of 2D games, while frameworks like Panda3D
and Godot Engine support 3D game development. Python's scripting capabilities
make it indispensable for automation tasks, from file manipulation to system
administration. Moreover, Python finds application in mobile app development
through frameworks like Kivy and BeeWare, enabling developers to write code
once and deploy it across multiple platforms. In the IoT space, Python's simplicity
and flexibility shine, with libraries like MicroPython and CircuitPython tailored for
microcontroller programming. Scientific computing and engineering benefit from
Python's rich ecosystem of libraries such as SciPy, SymPy, and OpenCV,
facilitating simulations, modeling, and analysis tasks. Finally, Python's readability
and simplicity make it an ideal choice for educational purposes, with many
institutions using it to teach programming fundamentals to beginners. With its
extensive ecosystem and active community support, Python continues to be a top
choice for developers across diverse application domains.

Data science and machine learning with Python


Sophisticated data analysis has become one of fastest-moving areas of IT and one
of Python’s star use cases. The vast majority of the libraries used for data science
or machine learning have Python interfaces, making the language the most popular
high-level command interface to for machine learning libraries and other numerical
algorithms.

This system services and REST Full APIs in Python

10
Python’s native libraries and third-party This system frameworks provide fast and
convenient ways to create everything from simple REST APIs in a few lines of
code to full-blown, data-driven sites. Python’s latest versions have strong support
for asynchronous operations, letting sites handle tens of thousands of requests per
second with the right libraries.

Meta programming and code generation in Python


In Python, everything in the language is an object, including Python modules and
libraries themselves. This lets Python work as a highly efficient code generator,
making it possible to write applications that manipulate their own functions and
have the kind of extensibility that would be difficult or impossible to pull off in
other languages.
Python can also be used to drive code-generation systems, such as LLVM, to
efficiently create code in other languages.
“Glue code” in Python
Python is often described as a “glue language,” meaning it can let disparate code
(typically libraries with C language interfaces) interoperate. Its use in data science
and machine learning is in this vein, but that’s just one incarnation of the general
idea. If you have applications or program domains that you would like to hitch up,
but cannot talk to each other directly, you can use Python to connect them.

Python 2 vs. Python 3


Python is available in two versions, which are different enough to trip up many
new users. Python 2.x, the older “legacy” branch, will continue to be supported
(that is, receive official updates) through 2020, and it might persist unofficially
after that. Python 3.x, the current and future incarnation of the language, has many
useful and important features not found in Python 2.x, such as new syntax features
(e.g., the “walrus operator”), better concurrency controls, and a more efficient
interpreter.

11
Python 3 adoption was sloThis systemd for the longest time by the relative lack of
third-party library support. Many Python libraries supported only Python 2, making
it difficult to switch.
But over the last couple of years, the number of libraries supporting only Python 2
has dwindled; all of the most popular libraries are now compatible with both
Python 2 and Python 3. Today, Python 3 is the best choice for new projects; there
is no reason to pick Python 2 unless you have no choice.

Python’s libraries
The success of Python rests on a rich ecosystem of first- and third-party software.
Python benefits from both a strong standard library and a generous assortment of
easily obtained and readily used libraries from third-party developers. Python has
been enriched by decades of expansion and contribution.
Python’s standard library provides modules for common programming tasks
—math, string handling, file and directory access, networking, asynchronous
operations, threading, multiprocessors management, and so on. But it also includes
modules that manage common, high-level programming tasks needed by modern
applications: reading and writing structured file formats like JSON and XML,
manipulating compressed files, working with internet protocols and data formats
(This system pages, URLs, email). Most any external code that exposes a C-
compatible foreign function interface can be accessed with Python’s types module.

The default Python distribution also provides a rudimentary, but useful, cross-
platform GUI library via Tkinter, and an embedded copy of the SQLite 3 database.
The thousands of third-party libraries, available through the Python Package
Index (PyPI), constitute the strongest showcase for Python’s popularity and
versatility.
For example:
The Beautiful Soup library provides an all-in-one toolbox for scraping HTML—

12
even tricky, broken HTML—and extracting data from it.
Requests makes working with HTTP requests at scale painless and simple.
Frameworks like Flask and Django allow rapid development of This system
services that encompass both simple and advanced use cases.
Like C#, Java, and Go, Python has garbage-collected memory management,
meaning the programmer doesn’t have to implement code to track and release
objects. Normally, garbage collection happens automatically in the background, but
if that poses a performance problem, you can trigger it manually or disable it
entirely, or declare whole regions of objects exempt from garbage collection as a
performance enhancement.
An important aspect of Python is its dynamism. Everything in the language,
including functions and modules themselves, are handled as objects. This comes at
the expense of speed (more on that later), but makes it far easier to write high-level
code.
Developers can perform complex object manipulations with only a few
instructions, and even treat parts of an application as abstractions that can be
altered if needed.
Python’s use of significant whitespace has been cited as both one of
Python’s best and worst attributes. The indentation on the second line below isn’t
just for readability; it is part of Python’s syntax.
Python interpreters will reject programs that don’t use proper indentation to
indicate control flow.
with open(‘myfile.txt’) as my_file:
file lines = [x. strips(‘\n’) for x in my_file]
Syntactical white space might cause noses to wrinkle, and some people do
reject Python for this reason. But strict indentation rules are far less obtrusive in
practice than they might seem in theory, even with the most minimal of code
editors, and the result is code that is cleaner and more readable.
Another potential turnoff, especially for those coming from languages like C

13
or Java, is how Python handles variable typing. By default, Python uses dynamic or
“duck” typing—great for quick coding, but potentially problematic in large code
bases. That said, Python has recently added support for optional compile-time type
hinting, so projects that might benefit from static typing can use it.
Using an IDE
As good as dedicated program editors can be for your programming productivity,
their utility pales into insignificance when compared to Integrated Developing
Environments (IDEs), which offer many additional features such as in-editor
debugging and program testing, as This system as function descriptions and much
more.
This system Framework
This system Application Framework or simply This system Framework represents
a collection of libraries and Modules that enables a This system application
developer to write applications without having to bother about low-level details
such as protocols, thread manage etc.

4.3.2 HTML, CSS, JS


HTML, CSS, and JavaScript collectively represent the backbone of web
development, with each technology contributing essential elements to the creation
of immersive and dynamic web experiences.
HTML (HyperText Markup Language) serves as the cornerstone of web content,
providing the structure and semantics that define the layout and organization of
information on a webpage. Through a series of tags and elements, HTML enables
developers to structure content hierarchically, from overarching sections and
headings down to individual paragraphs, lists, and media elements. Additionally,
HTML facilitates accessibility by allowing developers to incorporate semantic tags
that convey the meaning and purpose of each element to assistive technologies like
screen readers.
CSS (Cascading Style Sheets) complements HTML by adding style and

14
presentation to web content, transforming plain markup into visually compelling
and aesthetically pleasing designs. By defining stylesheets that dictate the
appearance of HTML elements, developers can control aspects such as typography,
color schemes, layout, and responsive behavior. CSS offers a powerful set of tools
for achieving pixel-perfect designs, enabling developers to create visually stunning
websites that resonate with users across various devices and screen sizes.
JavaScript elevates web development to new heights by introducing interactivity,
dynamic behavior, and client-side scripting capabilities to web pages. As a versatile
programming language, JavaScript empowers developers to create interactive
features, handle user input, manipulate the DOM (Document Object Model), and
communicate with web servers asynchronously. With JavaScript, developers can
build sophisticated web applications that respond to user actions in real-time,
deliver personalized experiences, and seamlessly integrate with backend services
and APIs.
Together, HTML, CSS, and JavaScript form a powerful trio that enables
developers to design, build, and deploy web applications that captivate audiences
and deliver exceptional user experiences. By mastering these technologies and
leveraging their unique strengths, developers can unleash their creativity and bring
their web development visions to life in a dynamic and ever-evolving digital
landscape.

4.3.3 FLASK
Flask's simplicity extends beyond its minimalist design to its ease of use and
rapid development capabilities. With Flask, developers can quickly set up a
development environment and start coding without the overhead of complex
configuration. Its lightweight nature and modular architecture enable developers to
focus on building the core functionality of their applications without being bogged
down by unnecessary features or constraints.

15
One of Flask's strengths lies in its adaptability to various project requirements and
development workflows. Whether building a small prototype, a RESTful API, or a
full-fledged web application, Flask provides the flexibility to tailor the
development process to specific needs. Developers can choose from a wide range
of extensions to integrate additional features seamlessly, ensuring scalability and
extensibility as projects evolve.
Flask's minimalist approach also extends to its learning curve, making it an
excellent choice for developers of all skill levels. Beginners can quickly grasp
Flask's concepts and start building simple web applications, while experienced
developers can leverage its flexibility to tackle more complex projects. The
abundance of tutorials, guides, and community resources further facilitates the
learning process, empowering developers to master Flask and unlock its full
potential.
Moreover, Flask's focus on simplicity does not come at the expense of performance
or reliability. Despite its lightweight footprint, Flask is robust and capable of
handling high-traffic applications with ease. Its minimal overhead and efficient
request handling ensure optimal performance, while its built-in support for testing
enables developers to write reliable and maintainable code.
In summary, Flask's minimalist design, flexibility, ease of use, and performance
make it a compelling choice for web development projects of all sizes and
complexities. Whether building a simple blog, a RESTful API for a mobile app, or
a sophisticated web application, Flask provides the tools and resources needed to
bring ideas to life quickly and efficiently, making it a favorite among developers
worldwide.

16
CHAPTER 5
SYSTEM DESIGN
5.1 SYSTEM ARCHITECTURE
The system architecture, depicted in Fig. 5.1, encompasses the following
components and functionalities: Real-time images are acquired from one or more
cameras, serving as the input source for the system. The system utilizes appropriate

Fig 5.1 SYSTEM ARCHITECTURE


APIs or libraries to access the camera hardware, enabling the capture of live video
feed from the cameras. Incorporating stereo matching algorithms, the system
analyzes pairs of images obtained from the left and right cameras to identify
disparities between corresponding pixels. Leveraging the stereo matching results,
the system extracts foreground subjects from the background, effectively
simulating portrait mode effects commonly found in modern camera systems.

5.2 DATA FLOW DIAGRAM


The Data Flow Diagram (DFD) illustrates the sequential flow of data and
processes involved in the journey from capturing an image on a camera to making
a decision based on obstacle detection. It begins with the camera, serving as the

17
central hub for data capture and processing.
The Image Capture App is responsible for acquiring images through the camera,
which are then passed to the Stereo Cameras for stereoscopic vision. The Stereo
Image Processing stage prepares these images for further analysis, leading to the
application of the Stereo Matching Algorithm. This algorithm computes a Disparity
Map, providing depth information from the stereo image pairs. Concurrently, the
system detects Portrait Mode, determining the orientation of the captured image.
The Disparity & Portrait Mode Comparison stage juxtaposes these two pieces of
information to assess if any obstacles are present. Once obstacles are detected, the
Distance Calculation for Obstacle computes the distance to these obstacles using
the disparity map. Finally, the Stop Decision Logic evaluates the calculated
distance and makes a decision whether to halt or proceed based on the presence and
proximity of obstacles. This structured flow ensures a systematic approach to
obstacle detection and decision-making in real-time scenarios, contributing to
enhanced safety and efficiency.

DATA FLOW SYMBOLS:


Symbol Description

An entity. A source of data or a


destination for data.

A process or task that is


performed by the system.

18
A data store, a place where data
is held betThis system en
processes.

A data flows.

Fig 5.2 DFD Symbol

Fig 5.2.1 Data Flow Diagram

19
5.3 CLASS DIAGRAM
The central class is the 'Stereo Matching Algorithm', which encapsulates
attribute 'Video Capture'. This attributes represent the cameras and the module
responsible for detecting obstacles. It also contains a method ‘#Get_frame()' to
extract one frame from the image. The 'Disparity Mapping to Depth' class
characterized by attributes such as 'depth' defining the distance of the image
captured, ‘focal length’ and ‘baseline’. It also contains a method
‘#Get_DisparityMap()' to acquire an disparity of the image and ‘#Get Portrait()’ to
acquire the portrait of the image. This class hierarchy facilitates the implementation
of stereo matching and obstacle detection functionalities on camera, enabling
effective obstacle detection for various applications.

Fig 5.3 CLASS DIAGRAM

5.4 ACTIVITY DIAGRAM

The Activity Diagram illustrates the sequential flow of activities starting


from the initiation of the system on the camera and culminating in the decision to
stop based on obstacle detection. It begins with the 'Start' node, representing the
system's activation on the camera. Subsequently, the 'Capture Image' activity is
initiated, wherein an image is captured using the camera. This is followed by the
'Stereo Matching' activity, where stereo matching algorithms are applied to the
captured images to generate a disparity map. The process then progresses to

20
'Disparity Map Generation', where the disparity map is constructed, providing
depth information. Next, the system proceeds to 'Detect Portrait Mode' to identify
whether the captured image is in portrait mode. Upon completion, the system
engages in 'Compare Disparity and Portrait', where the disparity map and portrait
mode information are juxtaposed to detect obstacles. Subsequently, the 'Calculate
Distance for Obstacle' activity calculates the distance to the detected obstacles
based on the disparity map. Finally, the system executes the 'Stop Decision Logic'
activity to determine whether to halt or continue based on the calculated obstacle
distances. The sequence concludes with the 'Stop' node, representing the
termination of the system's operation. This Activity Diagram provides a clear
visualization of the system's operational flow, facilitating understanding and
communication of the sequential activities involved in obstacle detection and
decision-making.

Fig 5.4 ACTIVITY DIAGRAM

21
5.5 SEQUENCE DIAGRAM
This Sequence Diagram provides a comprehensive view of the sequential
interactions between the camera and the Stereomatching. It begins with the 'Start'
message, indicating the initialization of the system on the camera. Upon system
initialization, the camera sends a 'Capture Image' message to the Stereomatching,
initiating the capture of an image. Once the image is captured, the process proceeds
Converting it into the disparity map and gives portrait mode of the captured image
and then 'Compare Disparity and Portrait' step, where the disparity map and
portrait mode information are evaluated to potentially identify obstacles.
Subsequently, the module calculates the distance to any detected obstacles
('Calculate Distance for Obstacle') and makes a decision to stop or proceed based
on obstacle detection ('Stop Decision') Finally, the 'Stop' message is sent back to
the camera, marking the conclusion of the system's operation. This Sequence
Diagram offers a clear depiction of the sequential interactions and message passing
between the camera and the stereomatching, facilitating understanding of the
system's operational flow.

Fig 5.5 Sequence Diagram

22
CHAPTER 6
SYSTEM IMPLEMENTATION
6.1 SETREO MATCHING ALGORITHM
Stereo matching is a computer vision technique used to extract depth
information from a pair of stereo images captured by cameras. It involves
identifying corresponding points in the left and right images and then computing
the disparity between these points, which represents the perceived shift in position
due to parallax. The stereo matching algorithm aims to find the best
correspondence between points in the left and right images, typically by
minimizing a cost function that measures the similarity between image patches or
pixels. In this
6.1.1 SEMI-GLOBAL BLOCK MATCHING (SGBM)
Semi-Global Block Matching (SGBM) is a stereo matching algorithm that
combines the strengths of both block matching and global optimization technique.
It aims to find the disparity map between a pair of stereo images by considering the
local similarities between image patches while also incorporating global constraints
to ensure consistency and accuracy.
Calibration:
Process of determining the Intrinsic and Extrinsic parameters of the camera
system. In this calibration process, It identifies the focal length, principle point as
Intrinsic parameters and Translation vector and Rotational matrix as Extrinsic
parameters

Fig 6.1.1.1 Intrinsic and Extrinsic parameter

23
Extrinsic calibration converts World Coordinates to Camera Coordinates. The
extrinsic parameters are called R (rotation matrix) and T (translation matrix).
Intrinsic calibration converts Camera Coordinates to Pixel Coordinates. It requires
inner values for the camera such as focal length, optical center. The intrinsic
parameter is a matrix we call K.
Calibration procedure
Step 1: Checkerboard pattern
This is the first step in the calibration procedure and is the most commonly used
calibration pattern for camera calibration. The control points of the image for this
pattern are the corners that lie inside the checkerboard. Because corners are
extremely small, they are often invariant to perspective and lens distortion.

Fig 6.1.1.2 Checkerboard Pattern


Step 2: Image Acquisition
This is the second step in the calibration procedure, Multiple images of the
chessboard pattern are captured from different viewpoints using both cameras in
the stereo pair. It's crucial to capture images from various angles and distances to
cover the entire field of view and capture different perspectives of the pattern.
In stereo vision, cameras are used to capture images from slightly different
perspectives, akin to the left and right eyes of a human observer. These images are
then processed to extract depth information by identifying corresponding points or

24
features in both images and computing the disparity between them. The greater the
disparity, the closer the object is to the cameras.

Fig 6.1.1.3 Image Acquisition


Step 3: Corner detection
corner detection is an integral part of the calibration procedure for stereo matching
algorithms on camera because it facilitates accurate camera calibration, robust
feature extraction, and reliable stereo correspondence, ultimately leading to
improved depth calculation for obstacle detection applications.

25
Fig 6.1.1.4 Corner Detection
Step 4: Calibration
Calibration in stereo matching on camera is essential for correcting lens distortions,
estimating intrinsic and extrinsic camera parameters, and rectifying stereo image
pairs. Lens distortion correction ensures accurate depth calculation by mitigating
errors caused by lens imperfections. Intrinsic parameter estimation, including focal
length and principal point, aids in accurately aligning images and computing
disparities. Extrinsic parameter estimation determines the relative orientation and
position of the cameras, crucial for precise depth calculation. Rectification
simplifies stereo matching by aligning corresponding epipolar lines, facilitating
accurate correspondence between image pairs. Calibration enhances accuracy and
precision in depth estimation, vital for reliable obstacle detection on camera. It
minimizes errors from camera imperfections and misalignments, ensuring
dependable depth calculation results. Overall, calibration optimizes stereo vision
systems for accurate depth perception, critical for various applications, including
obstacle avoidance and augmented reality on camera.

Step 5: Rectification
This is the last step in the calibration procedure, It aligns stereo image pairs,

26
making corresponding epipolar lines parallel. This simplifies feature matching,
enhancing depth estimation accuracy.
Rectification corrects perspective distortion, ensuring disparities accurately
represent depth differences. It eliminates horizontal disparities, reducing
computational complexity in matching. Rectification improves algorithm
robustness by reducing sensitivity to variations. By aligning images, rectification
enhances reliability for obstacle detection. It optimizes stereo vision systems for
accurate depth perception on camera. This process enhances real-time performance
crucial for dynamic environments. Rectification simplifies stereo matching, aiding
efficient correspondence between images. It mitigates errors caused by
misalignments and perspective variations. Ultimately, rectification in calibration
ensures precise depth estimation for obstacle detection.

27
Fig 6.1.1.5 Rectification

6.1.2 COST CALCULATION


1. Candidate Pixel Selection: For each pixel in the left image, a set of
candidate pixels is selected along the corresponding scanline in the right
image. These candidate pixels represent potential matches for the pixel in the
left image.
2. Comparison Metric: A metric is used to measure the dissimilarity or
similarity between the intensity values of the pixel in the left image and each
candidate pixel in the right image. Common metrics include the sum of
absolute differences (SAD), sum of squared differences (SSD), or
normalized cross-correlation (NCC).
Sum of Absolute Differences (SAD):

SAD=∑i,j∣Ileft(i,j)−Iright(i+d,j)∣
Sum of Squared Differences (SSD):

SSD=∑i,j(Ileft(i,j)−Iright(i+d,j))2

28
Normalized Cross-Correlation (NCC):

NCC=∑i,j(Ileft(i,j)−Iˉleft)2∑i,j(Iright(i+d,j)−Iˉright)2∑i,j(Ileft(i,j)−Iˉleft)(Iright(i+d,j)−Iˉright)
where Iˉleft and Iˉright are the mean intensities of the left and right
image patches, respectively
3. Matching Cost Calculation: The matching cost is computed for each
candidate pixel by applying the chosen metric to compare its intensity value
with that of the pixel in the left image. The lower the matching cost, the
better the match between the pixel pairs.
4. Cost Volume Construction: These matching costs are typically organized
into a cost volume, where each pixel in the left image corresponds to a
disparity range along the horizontal axis, and the value at each disparity
represents the matching cost between the left pixel and the candidate pixel in
the right image at that particular disparity.
5. Disparity Estimation: The disparity (horizontal offset) corresponding to the
minimum matching cost is determined for each pixel in the left image. This
disparity value represents the perceived shift between corresponding points
in the left and right images and is used to calculate the depth of the scene.
.
6.1.3 DISPARITY MAPPING AND PORTRAIT
DISPARITY MAP
A Disparity map is a visual representation of the perceived depth in a scene.
Computed from stereo image pairs captured by the cameras, the disparity map
illustrates the horizontal shift or disparity between corresponding points in the left
and right images. Each pixel in the disparity map corresponds to a pixel in the left
image, with its intensity value indicating the calculated disparity. Brighter regions
denote closer objects or higher disparities, while darker regions represent farther
objects or lower disparities. Depth estimation can be derived from the disparity
values, enabling accurate distance calculation to objects in the scene.

29
Fig 6.1.3.1 Difference of Normal and Disparity map

PORTRAIT IMAGE
portrait images can provide valuable depth information for obstacle detection and
scene understanding. The stereo matching algorithm processes the stereo image
pair captured in portrait orientation to calculate depth maps, which represent the
spatial distribution of obstacles in the scene. These depth maps are then used to
infer distances to objects, aiding in obstacle detection and avoidance tasks. Portrait
images offer unique challenges and opportunities for stereo matching algorithms
due to their vertical orientation, requiring specialized processing techniques to
accurately estimate depth and calculate obstacle distances effectively.

30
Fig 6.1.3.2 Portrait image

CHAPTER 7
SYSTEM TESTING
System testing is a crucial phase in software development where the entire system
is tested as a whole. It aims to validate that the integrated software and hardware
components function correctly and meet the specified requirements.

System testing verifies the behavior of the entire system in a controlled


31
environment, including interactions between software modules, hardware
components, networks, databases, and external interfaces.

The primary objectives of the system testing are to ensure that the system operates
according to the specified requirements, performs all intended functions
accurately, handles expected and unexpected inputs appropriately, and meets
quality standards.

The most common types of testing involved in the development process are:

 Functionality Test
 Non Functionality Test

7.1 FUNCTIONALITY TESTING


This testing focuses on verifying that the algorithm correctly identifies
corresponding points in the left and right images, computes the depth using stereo
disparity, and accurately represents the distance to obstacles in the scene. Test
cases would assess various scenarios, such as different obstacle shapes, sizes, and
distances, as well as challenging conditions like occlusions and textureless
regions. The goal is to ensure that the algorithm performs effectively in real-world
scenarios and meets the requirements for accurate depth estimation in stereo
vision applications.
7.1.1 UNIT TESTING
Each unit test focuses on a specific feature or aspect of the algorithm. For
example, input validation mechanisms are tested to ensure they handle valid input
stereo image pairs appropriately and gracefully manage invalid or missing data.
Similarly, tests are conducted to verify the accuracy of feature detection,
correspondence matching, and disparity calculation. Edge cases, such as zero-
disparity regions or areas with low texture, are simulated to assess the algorithm's
robustness.

7.1.2 INTEGRATION TESTING


32
This testing phase ensures that individual units, which have already been
tested in isolation during unit testing, work together seamlessly as a unified
system.
Integration tests focus on validating the communication and data flow between
different parts of the system. Integration testing also verifies the integration of
the algorithm with external dependencies or interfaces, such as image
acquisition devices or user interfaces. Tests are performed to ensure that the
algorithm can effectively interact with these external components and handle
input data from various sources. Overall, integration testing aims to validate
the interaction and integration of all components within the system, ensuring
that the stereo matching algorithm functions correctly and reliably within the
broader context of the application..

7.2 NON FUNCTIONAL TESTING


Non-functional testing focuses on assessing aspects of the system beyond its
core functionality. This testing evaluates various quality attributes crucial for the
algorithm’s success in real-world scenarios.

7.2.1 PERFORMANCE TESTING


This testing environment comprised high-performance hardware
configurations, ensuring accurate assessment of the algorithm's capabilities.
Throughout the testing process, we followed predefined methodologies, including
speed testing, scalability testing, stress testing. Each methodology was applied to
comprehensively assess different facets of the algorithm's performance under
varied conditions.

The test cases were designed to cover a broad spectrum of real-world scenarios.
During the testing process, we diligently recorded performance metrics such as
execution times, throughput, and ensuring thorough analysis of the algorithm's
behaviour.

33
The performance testing offers insights into the stereo matching algorithm's
performance, ensuring its effectiveness and reliability in real-world environment.

7.2.2 COMPATABILITY TESTING


Compatibility testing ensures that the stereo matching algorithm operates
seamlessly across a variety of environments, including different hardware
configurations, operating systems, and software setups.

We conducted tests on various hardware platforms, including desktop computers,


laptops, to verify the algorithm's compatibility. This involved testing on different
CPU architectures, GPU configurations, and memory capacities to ensure
consistent performance across diverse hardware setups.

Additionally, compatibility testing was performed on operating systems such as


Windows and Android. The algorithm was tested on different versions and editions
of these operating systems to confirm its compatibility and functionality across
various environments.

Furthermore, we assessed the algorithm's compatibility with different software


configurations, including third-party libraries, frameworks, and dependencies
commonly used in stereo vision applications. This involved integration testing with
popular libraries such as OpenCV and TensorFlow to ensure interoperability and
compatibility with industry-standard tools and frameworks.

Throughout the compatibility testing process, we verified that the algorithm


behaved consistently and reliably across all tested environments. Any compatibility
issues or discrepancies were documented, analyzed, and addressed to ensure
optimal performance and compatibility across a wide range of hardware and
software configurations.

7.2.3 USABILITY TESTING

Usability testing focuses on assessing the user-friendliness and intuitiveness of the

34
stereo matching algorithm. Our goal was to ensure that users, including developers
and end-users, can interact with the algorithm effectively and efficiently.

During usability testing, participants were provided with access to the algorithm's
user interface or configuration interface, depending on the context of usage. They
were given tasks to perform, such as adjusting algorithm parameters, inputting
stereo image pairs, and interpreting the algorithm's output, representing typical
usage scenarios.

Participants' interactions with the algorithm were observed and recorded, along
with any difficulties or challenges encountered during the testing process. The
usability testing process identified areas for improvement in the algorithm's user
interface, configuration options, and documentation.

By conducting usability testing, we ensure that the stereo matching algorithm is


user-friendly, intuitive, and easy to use, ultimately enhancing its adoption and
effectiveness in real-world applications.

7.2.4 RELIABILITY TESTING

Reliability testing is crucial for evaluating the consistency and dependability of


the stereo matching algorithm in producing accurate results across different
scenarios and usage conditions.

During reliability testing, the algorithm was subjected to prolonged periods of


operation under various environmental factors, such as changes in lighting
conditions, image quality, and scene complexity. This was done to simulate real-
world usage scenarios and identify any potential reliability issues that may arise
over time.

Throughout the reliability testing process, we monitored the algorithm's


performance metrics, including accuracy, consistency, and error rates. Any
deviations from expected behavior were carefully analyzed, and adjustments were
made to enhance the algorithm's reliability and stability.

35
By conducting thorough reliability testing, we ensure that the stereo matching
algorithm consistently delivers accurate results, providing users with confidence
in its performance and reliability in real-world applications.

CHAPTER 8
EXPERIMENTAL RESULT

36
Block Matching

SGM

Efficient Graph Standard deviation


Execution time

Deep Learning

SGBM

0 0.5 1 1.5 2 2.5 3

Fig 8.1 Experimental Result

These estimates are highly dependent on factors such as image resolution,


disparity range, hardware specifications, and algorithmic optimizations. The
analysis highlights the trade-offs between speed, accuracy, and computational
complexity among the tested stereo matching algorithms. The choice of algorithm
should be based on the specific requirements of the application, balancing
performance considerations with other factors such as accuracy and resource
constraints.

CHAPTER 9
CONCLUSION AND FUTURE ENHANCEMENT

9.1 CONCLUSION

37
In conclusion, the implementation of stereo matching algorithms for depth
calculation of obstacles represents a significant advancement in computer vision
technology. By leveraging stereo image pairs captured by cameras, these
algorithms enable accurate depth estimation and obstacle detection in real- time
scenarios. Through processes such as cost calculation, disparity mapping, and
depth estimation, stereo matching algorithms provide valuable spatial
understanding of the environment, crucial for applications ranging from augmented
reality to navigation assistance. Furthermore, the utilization of portrait images
enhances depth perception and obstacle detection capabilities, particularly in
human-centric scenes and environments with vertical structures.

As technology continues to evolve, stereo matching algorithms offer promising


opportunities for further innovation and improvement in depth calculation,
contributing to enhanced user experiences and safety across diverse applications.

9.2 FUTURE ENHANCEMENT

Future enhancements for stereo matching algorithms on smartphones for depth


calculation of obstacles are poised to revolutionize mobile vision technology. One
potential advancement involves integrating specialized hardware accelerators like
dedicated neural processing units (NPUs) or custom image signal processors (ISPs)
to optimize computations and enhance real-time performance. This would enable
smartphones to process stereo image pairs more efficiently, leading to improved
obstacle detection capabilities.

Additionally, the integration of advanced machine learning techniques, including


deep learning-based stereo matching algorithms, could further enhance accuracy
and robustness by enabling smartphones to learn complex depth cues and scene
features from large-scale datasets. Furthermore, the incorporation of additional
sensors such as depth sensors or LiDAR for multi-sensor fusion could complement

38
stereo vision, improving depth estimation accuracy, especially in challenging
environments.

Advancements in real-time 3D reconstruction techniques could enable smartphones


to generate detailed 3D models of the environment, supporting immersive
augmented reality experiences and applications requiring spatial understanding.
Lastly, optimizing stereo matching algorithms for energy efficiency through low-
power design and adaptive processing would prolong battery life while enabling
prolonged usage of depth calculation features. These future enhancements promise
to elevate obstacle detection capabilities and unlock new possibilities for
augmented reality and spatial computing applications on smartphones.

CHAPTER 10
APPENDICES
10.1 SOURCE CODE

<!DOCTYPE html>

<html lang="en">

39
<head>

<meta charset="UTF-8">

<meta name="viewport" content="width=device-width, initial-scale=1.0">

<title>Depth Estimation & Object Detection</title>

<style>

body {

margin: 0;

padding: 0;

overflow: hidden;

#container {

position: relative;

width: 100%;

height: auto;

text-align: center;

#video, #output, #depthInfo {

display: block;

margin: auto;

40
#video {

width: 320px; /* Shorter width */

height: 240px; /* Shorter height */

#output {

width: 320px; /* Shorter width */

height: 240px; /* Shorter height */

#depthInfo {

font-size: 18px;

font-weight: bold;

color: blue;

margin-top: 10px;

@media only screen and (max-width: 600px) {

#depthInfo {

font-size: 16px;

41
}

</style>

</head>

<body>

<div id="container" style="border: 2px solid red;">

<h1> Depth Calculation & Object Detection</h1>

<video id="video" autoplay></video>

<canvas id="output"></canvas>

<div id="depthInfo">Depth: </div>

</div>

<script>

const video = document.getElementById('video');

const canvas = document.getElementById('output');

const depthInfo = document.getElementById('depthInfo');

const ctx = canvas.getContext('2d');

let prevFrame = null;

let classifier;

// Request access to the camera

async function setupCamera() {

try {

42
const stream = await navigator.mediaDevices.getUserMedia({ video:

true });

video.srcObject = stream;

return new Promise(resolve => {

video.onloadedmetadata = () => {

resolve(video);

};

});

} catch (err) {

console.error('Error accessing the camera:', err);

alert('Error accessing the camera. Please make sure you have granted

camera access permissions.');

return null;

// Function to calculate depth using Global Block Matching

function calculateDepth(currentFrame, prevFrame) {

if (!prevFrame) return null; // Return null if there's no previous frame

const { width, height } = currentFrame;

const depthMap = new Float32Array(width * height); // Use Float32Array

to prevent overflow

43
// Global Block Matching Algorithm (Simplest version)

for (let i = 0; i < height; i++) {

for (let j = 0; j < width; j++) {

const idx = i * width + j;

const currPixel = currentFrame.data.slice(idx * 4, idx * 4 + 3); //

Extract RGB values

const prevPixel = prevFrame.data.slice(idx * 4, idx * 4 + 3);

const diff = Math.abs(currPixel[0] - prevPixel[0]); // Compute

difference

depthMap[idx] = diff;

return depthMap;

// Function to render depth map and display depth information

function renderDepth(depthMap) {

ctx.clearRect(0, 0, canvas.width, canvas.height);

const imageData = ctx.createImageData(canvas.width, canvas.height);

for (let i = 0; i < depthMap.length; i++) {

const pixelValue = depthMap[i];

44
const normalizedValue = Math.min(255, pixelValue); // Ensure pixel

values don't exceed 255

imageData.data[i * 4] = normalizedValue;

imageData.data[i * 4 + 1] = normalizedValue;

imageData.data[i * 4 + 2] = normalizedValue;

imageData.data[i * 4 + 3] = 255;

ctx.putImageData(imageData, 0, 0);

// Display depth information (e.g., average depth)

const averageDepth = (depthMap.reduce((acc, val) => acc + val, 0) /

depthMap.length).toFixed(2);

depthInfo.textContent = `Depth: ${averageDepth}`;

// Function to draw bounding boxes around detected objects

function drawBoundingBoxes(objects) {

ctx.strokeStyle = 'red';

ctx.lineWidth = 2;

for (let i = 0; i < objects.size(); ++i) {

const rect = objects.get(i);

const x = rect.x;

45
const y = rect.y;

const w = rect.width;

const h = rect.height;

ctx.strokeRect(x, y, w, h);

// Function to process each frame

function processFrame() {

ctx.drawImage(video, 0, 0, canvas.width, canvas.height);

const currentFrame = ctx.getImageData(0, 0, canvas.width, canvas.height);

// Calculate depth map and render

const depthMap = calculateDepth(currentFrame, prevFrame);

if (depthMap) {

renderDepth(depthMap);

prevFrame = currentFrame;

// Object detection

if (classifier) {

classifier.detect(canvas).then(objects => {

46
drawBoundingBoxes(objects);

});

requestAnimationFrame(processFrame);

// Load object detection model

function loadModel() {

cocoSsd.load().then(model => {

classifier = model;

start();

});

// Start the camera and processing

async function start() {

const videoElement = await setupCamera();

if (videoElement) {

videoElement.play();

canvas.width = videoElement.videoWidth;

canvas.height = videoElement.videoHeight;

47
requestAnimationFrame(processFrame);

// Start the application

loadModel();

</script>

<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs"></script>

<script

src="https://cdn.jsdelivr.net/npm/@tensorflow-models/coco-ssd"></script>

</body>

</html>

10.2 SCRENSHOT

48
REFERENCES

49
[1]. MobiDepth: Real-Time Depth Estimation Using On-Device Dual
Cameras Jinrui Zhang1†, Huan Yang1, Ju Ren2, Deyu Zhang1, Bangwen
He1, Ting Cao3, Yuanchun Li4, Yaoxue Zhang2, Yunxin Liu4, 2022
[2]. A Deep Learning-Enhanced Stereo Matching Method and Its
Application to Bin Picking Problems Involving Tiny Cubic Workpieces
Masaru Yoshizawa, Kazuhiro Motegi and Yoichi Shiraishi, 2023

[3]. Normal Assisted Stereo Depth Estimation


Authors: Uday Kusupati; Shuo Cheng; Rui Chen; Hao Su, 2020

[4]. Stereovision on mobile devices for obstacle detection in low speed traffic
scenarios A. Trif, F. Oniga, S. Nedevschi Published in International
Conference on… 24 October 2013
[5]. Sensors (Basel). 2019 Jun; 19(12): 2771. Published online 2019 Jun 20. doi:
10.3390/s19122771 Stereo Vision Based Sensory Substitution for the
Visually Impaired
[6]. Region of interest constrained negative obstacle detection and tracking with
a stereo camera Tian Sun, Wei Pan, Yujie Wang, Yong Liu
[7]. Stereo matching algorithm based on deep learning: A survey Mohd Saad
Hamid, NurulFajar Abd Manap, Rostam Affendi Hamzah, Ahmad Fauzan
Kadmin
[8]. Stereo vision based sensory substitution for the visually impaired Simona
Caraiman, Otilia Zvoristeanu, Adrian Burlacu, Paul Herghelegiu Sensors 19
(12),2771,2019
[9]. Real-time 3d object detection and recognition using a smartphone [real-time
3d object detection and recognition using a smartphone] Jin Chen, Zhigang
ZhuProceedings of the 2nd International Conference on Image Processing
and Vision Engineering-IMPROVE, 2022

50

You might also like