Batch 4

A Project Phase – II (18CS0534) Report
On
OBJECT DETECTION AND RECOGNITION USING
TENSORFLOW FOR BLIND PEOPOLE
Submitted in partial fulfillment for the award of the degree of
Bachelor of Technology
in
COMPUTER SCIENCE AND ENGINEERING
Submitted by
S P JESWANTH 184E1A0523
BILLU NAGAMANI 184E1A0531
T ANUJA CHOWDARY 184E1A0502
MADHAN KAVERIPAKAM 184E1A0527
NAMA CHANDU 184E1A0509
Under the esteemed guidance of
Mrs. P. DEVAKI, M.TECH

Associate Professor
Dept.of Computer Science and Engineering
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

SIDDARTHA INSTITUTE OF SCIENCE AND TECHNOLOGY
(AUTONOMOUS)
(Approved by A.I.C.T.E., New Delhi Affiliated to J.N.T.U. Anantapur, Ananthapuramu.)
(Accredited by NBA (EEE, ME, ECE & CSE) & NAAC with ‘A’ Grade)
Siddharth Nagar, Narayanavanam Road, Puttur– 517 583, Chittoor District
2021-2022
SIDDARTHA INSTITUTE OF SCIENCE AND TECHNOLOGY
(AUTONOMOUS)
(Approved by A.I.C.T.E., New Delhi Affiliated to J.N.T.U. Anantapur, Ananthapuramu.)
(Accredited by NBA (EEE, ME, ECE & CSE) & NAAC with „A‟ Grade)
Siddharth Nagar, Narayanavanam Road, Puttur-517583
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
Certificate
This is to certify that the Project entitled
“OBJECT DETECTION AND RECOGNITION USING TENSORFLOW
FOR BLIND PEOPLE”
that is being submitted by
S P JESWANTH 184E1A0523
BILLU NAGAMANI 184E1A0531
T ANUJA CHOWDARY 184E1A0502
MADHAN KAVERIPAKAM 184E1A0527
NAMA CHANDU 184E1A0509
in partial fulfillment of the requirements for the award of BACHELOR OF
TECHNOLOGY in Computer Science And Engineering to JNTUA,
Ananthapuramu. This project Phase – II (18CS0534) work or part thereof has
not been submitted to any other University or Institute for the award of any
degree.
Guide Head of the Department
Submitted for the University Examination held on _______________
INTERNAL EXAMINER EXTERNAL EXAMINER

ACKNOWLEDGEMENT
The satisfaction that accompanies the successful completion of any task would be
incomplete without the mention of the people who made it possible, without whose
guidance, encouragement and help this venture would not have been success. The
acknowledgement transcends the reality of formality when we would like to express deep
gratitude and respect to all those people behind the screen who guided, inspired and
helped for us for the completion of our project work in time and up to the standards.
We owe our gratitude to our Honorable Chairman Dr. K. ASHOK RAJU, Ph.D.,
and also deep sense of gratitude to our honorable principal Dr. M. JANARDHANA
RAJU, M.E, Ph.D., for having provided all the facilities and support in completing our
project successfully.
We owe our deep sense of gratitude to our Head of the Department
Dr. M.A. MANIVASAGAM, M.E, Ph.D., for his valuable guidance and constant
encouragement given to us during this work.
We express our deep sense of gratitude to our project coordinator
Mr. R. PURUSHOTHAMAN, M.E (Ph.D) who evinced keen interest in our effects and
provided his valuable guidance throughout our project work.
We express our deep sense of gratitude to our project guide Ms. P. DEVIKA,
M.Tech for her guidance and supervision at all levels of our project work. We are indebted
to her valuable suggestions and sustained help in completion of our project work.
We also thankful to All Staff Members of CSE Department, for helping us to
complete this project work by giving valuable suggestions.
The last, but not least, we express our sincere thanks to all our friends who have
supported us in the accomplishment of this project.
TABLE OF CONTENTS
Chapter No Title Page. No

ABSTRACT i
LIST OF FIGURES ii
LIST OF TABLES iv
1. INTRODUCTION
1.1 Domain Description 01

1.2 About the Project 03
2. LITERATURE SURVEY 04
3. SYSTEM ANALYSIS
3.1 Problem Statement 10

3.2 Problem Description 10
3.3 Existing System 10
3.4 Disadvantages of Existing System 10
3.5 Proposed System 11
3.6 Advantages of Proposed System 11
4. SYSTEM REQUIREMENTS
4.1 Hardware Requirements 12

4.2 Software Requirements 12
4.3 Feasibility Study 13
5. SYSTEM DEVELOPMENT
5.1 SYSTEM DESIGN 14
5.1.1 System Architecture 14

5.1.2 Modules 15
5.1.2.1 Video Capturing Module 15
5.1.2.2 Image Processing Module 15
5.1.2.3 Object Detection Module 15
5.1.2.4 Distance Calculation Module 16
5.1.2.5 Audio Output Module 16
5.1.3 Algorithm 19
5.1.4 UML diagrams 20
5.1.4.1 Introduction to UML 20
5.1.4.2 Class diagram 22
5.1.4.3 Use case Diagram 23
5.1.4.4 Sequence Diagram 24
5.1.4.5 Activity Diagram 25
5.1.4.6 Deployment Diagram 26
5.2 SYSTEM IMPLEMENTATION
5.2.1 Software Description 27

5.2.2 Hardware Description 37
5.2.3 Sample code 41
5.3 SYSTEM TESTING
5.3.1 Software Testing 50

5.3.2 Types of Testing 50
5.3.2.1 Unit Testing 52
5.3.2.2 Integration Testing 52
5.3.2.3 Acceptance Testing 53

5.3.3 Test Cases 53
6. RESULTS
6.1 Execution Procedure 56

6.2 Screen Shots 58
7. CONCLUSION AND FUTURE ENHANCEMENT
7.1 Conclusion 59
7.2 Future Enhancement 59
REFERENCE
ABSTRACT
Computer Vision impairment or blindness is one of the top ten disabilities in
humans, and unfortunately, India has the world’s largest visually impaired population.
For this we are creating a framework to assist the visually impaired on object detection
and recognition, so that they can independently navigate, and be aware of their
surroundings. In this system the captured image is taken and sent it as input using
camera. SSD Architecture is used here for the detection of objects based on deep neural
networks to make precise detection. This input will be given to the software and it will be
processed under the COCO datasets which are predefined in the Tensor flow library used
as training dataset for the system in general this data set consist of features for nighty
percent of real world data objects and distance is calculated by depth estimation and also
by using voice assistance packages the software will produce the output in the form of
Audio.
The System is implemented completely using Python Programming Language

since python consist of many inbuilt packages and libraries which will make the
complication of writing code more number of lines into simple any less number of lines.
Keywords: Object detection, Tensor Flow, COCO datasets
i
LIST OF FIGURES
Figure No. Figure Name Page No.
1.1 Image processing 02
5.1 System Architecture 14
5.2 SSD Architecture 17
5.3 SSD MultiBox 18
5.4 Training Dataset(COCO) 19
5.5 Accuracy Comparision 19
5.6 Class Diagram 22
5.7 Use case Diagram 23
5.8 Sequence Diagram 24
5.9 Activity Diagram 25
5.10 Deployment Diagram 26
5.11 TensorFlow Installation 29
5.12 TensorFlow Repository Link 29
5.13 A snap of converted PROTOS FILES to PYTHON 30

in the target Directory
5.14 SSD Architecture 31
5.15 MobileNet Architecture 33
5.16 Distance Approximations 33
5.17 Camera 38
5.18 Monitor 39
ii
Figure No. Figure Name Page No.
5.19 Power Supply cable 39
5.20 SD Card 40
5.21 Buzzer 40
6.1 System detected the person and labeled with 56

accuracy percent
6.2 System detecting different objects and giving 57

warnings if it is very close to the camera
6.3 System detecting multiple objects at a time 57
6.4 System finding the mid ranges for calculating 58

distance of the object from the camera
iii
LIST OF TABLES
Table No Table Name Page.No
5.1 Test case 1 53
5.2 Test case 2 54
5.3 Test case 3 55
iv
OBJECT DETECTION AND RECOGNITION USING TENSORFLOW FOR
BLIND PEOPLE
CHAPTER 1
INTRODUCTION
1.1 DOMAIN DESCRIPTION
Object detection is a computer technology related to computer
vision and image processing that deals with detecting instances of semantic objects
of a certain class (such as humans, buildings, or cars) in digital images and
videos. Well-researched domains of object detection include face
detection and pedestrian detection. Object detection has applications in many areas
of computer vision, including image retrieval and video surveillance.
It is widely used in computer vision tasks such as image annotation, activity
recognition, face detection, face recognition, video object co-segmentation. It is also
used in tracking objects, for example tracking a ball during a football
match, tracking movement of a cricket bat, or tracking a person in a video.
Every object class has its own special features that help in classifying the
class – for example all circles are round. Object class detection uses these special
features. For example, when looking for circles, objects that are at a particular
distance from a point (i.e. the center) are sought. Similarly, when looking for
squares, objects that are perpendicular at corners and have equal side lengths are
needed. A similar approach is used for face identification where eyes, nose, and lips
can be found and features like skin color and distance between eyes can be found.
A more generalized (multi-class) application can be used in autonomous
driving where a variety of objects need to be detected. Also it has a important role to
play in surveillance systems. These systems can be integrated with other tasks such
as pose estimation where the first stage in the pipeline is to detect the object, and
then the second stage will be to estimate pose in the detected region. It can be used
for tracking objects and thus can be used in robotics and medical applications. Thus
this problem serves a multitude of applications.
Image Processing
Image processing is a method to perform some operations on an image, in
order to get an enhanced image or to extract some useful information from it. It is a
type of signal processing in which input is an image and output may be image or
SISTK DEPARTMENT OF CSE Page 1

BLIND PEOPLE
characteristics/features associated with that image. Nowadays, image processing is
among rapidly growing technologies. It forms core research area within engineering
and computer science disciplines too.
Image processing basically includes the following three steps:
● Importing the image via image acquisition tools.
● Analyzing and manipulating the image.
● Output in which result can be altered image or report that is based on image
analysis.
Figure 1.1: Image Processing.
There are two types of methods used for image processing namely, analogue
and digital image processing. Analogue image processing can be used for the hard
copies like printouts and photographs. Image analysts use various fundamentals of
interpretation while using these visual techniques. Digital image processing
techniques help in manipulation of the digital images by using computers. The three
general phases that all types of data have to undergo while using digital technique
are pre-processing, enhancement, and display, information extraction.

BLIND PEOPLE
1.2 ABOUT THE PROJECT

The fast progress of data and organize technology has advanced from the
Internet and computerization systems that were initially utilized for authoritative
offices and mechanical and commercial applications to the application of these
innovations everywhere in life. Once you think of augmented reality, one of the key
components to consider is object acknowledgment innovation, moreover known as
object detection. This term alludes to a capacity to distinguish the frame and shape
of diverse objects and their position in space caught by the device‘s camera.
It‘s a known reality that the evaluated number of visually disabled
individuals within the world is almost 285 million, roughly break-even with the
20% of the Indian population. They suffer-normal and consistent challenges in
Navigation particularly when they are on their own. They are generally dependent
on somebody for indeed getting to their fundamental day-to-day needs.
So, it‘s a very challenging errand and the mechanical arrangement for them
is of most extreme significance and much required.
One such attempt from our side is that we came up with an Integrated
Machine Learning Framework permits the Blind Casualties to distinguish and
classify Genuine Time-Based Common Day-to-day Objects and produce voice
feedbacks and calculates distance which produces warnings whether he/she is
exceptionally near or distant absent from the object. The same framework can be
utilized for Obstacle Detection Instrument.
The primary reason for object detection is to find vital things, draw rectangular
bounding boxes around them, and decide the course of each thing found.
Applications of object discovery emerge in numerous diverse areas counting
recognizing people on foot for self-driving cars, checking rural crops, and indeed
real-time ball following in sports

BLIND PEOPLE
CHAPTER 2
LITERATURE SURVEY
 OBJECT DETECTION USING CONVOLUTIONAL NEURAL
NETWORK
In 2019, ―Object Detection using convolutional Neural Networks‖. As
Vision systems are essential in building a mobile robot. That will complete a certain
task like navigation, surveillance, and explosive ordnance disposal (EOD). Vision
systems are essential in building a mobile robot. A project was proposed based on
Convolutional Neural Networks (CNN) which is used to detect objects in the
environment.
Methodology used- Two state of the art models are compared for object
detection, Single Shot Multi-Box Detector (SSD) with MobileNetV1. A Faster
Region-based Convolutional Neural Network (Faster-RCNN) with InceptionV2.
A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm
which can take in an input image, assign importance (learnable weights and biases)
to various aspects/objects in the image and be able to differentiate one from the
other.
 IMAGE BASED REAL TIME OBJECT DETECTION AND

RECOGNITION IN IMAGE PROCESSING
In 2019, ―Image Based Real Time Object Detection and Recognition In
Image Processing‖ Object detection and tracking mainly for human and vehicle is
presently most active research topic. It is used in applications such as surveillance,
image retrieval.
A solution was proposed which has reviewed recent technologies for each
phase of the object detection. The methodology used here is four different
techniques for object detection which is nothing but a computer technology related
to computer vision and image processing that deals with detecting instances of
semantic objects of a certain class in digital images and videos and, they are feature
based detection, region based detection outline based detection illustrations and
model based detection.

BLIND PEOPLE
 SALIENT OBJECT DETECTION WITH SEGMENT FEATURES USING
MEAN SHIFT ALGORITHM
In 2020, ―Salient Object Detection with Segment Features Using Mean Shift
Algorithm‖. The object recognition has attracted high attention for its diverse
applications in everyday life. It is used in applications such as surveillance, image
retrieval.
A solution was proposed which introduced a new fast method for saliency
object detection in images. The main aim was detection of objects in complex
images. The methodology used has four steps: regional feature extraction, segment
clustering, saliency score computation and post-processing.
Regional feature extraction framework allows you to: Compute local image
features (several build-in types or custom Matlab extractors) Use custom
neighborhood sizes and grid definitions. Work with image masks and irregular
image patches.
Segment clustering is the process of putting customers into groups based on
similarities, and clustering is the process of finding similarities in customers so that
they can be grouped, and therefore segmented.
Saliency score computation is a measure comprising five indexes that captures
certain aspects of data quality.
Post processing is used in the video/film business for quality-improvement image
processing methods used in video playback devices, such as stand-alone DVD-
Video players; video playing software; and transcoding software. It is also
commonly used in real-time 3D rendering to add additional effects.
 REAL-TIME OBJECT DETECTION USING DEEP LEARNING

In 2020,―Real-Time Object Detection Using Deep Learning‖. Object
detection and recognition in images and videos is one of the major thing today. For
this a solution was proposed using deep learning.
The methodology used here includes feature extractor with Darknet-53 with
feature map up sampling and concatenation. Model includes various modification in
object detection techniques.

BLIND PEOPLE
Feature extraction is a framework allows you to: Compute local image features
(several build-in types or custom Matlab extractors) Use custom neighborhood sizes
and grid definitions. Work with image masks and irregular image patches.
Segment clustering is the process of putting customers into groups based on
similarities, and clustering is the process of finding similarities in customers so that
they can be grouped, and therefore segmented.
 OBJECT RECOGNITION USING TENSORFLOW

In 2021, ―Object Recognition Using TensorFlow‖. The evaluated number of
people visually impaired inside the world is 285 million, 39 million blind, and 246
million have no vision.
For this a solution was proposed where system is implemented in an
android application that detects various objects in real-time. OpenCV library is
utilized for picture capturing since it provides support to real-time applications and
TensorFlow library is utilized for composing the machine learning application
process.
OpenCV is used as an image processing library in many computer vision real-time
applications. There are thousands of functions available in OpenCV. These simple
techniques are used to shape our images in our required format.
TensorFlow Object Detection API: TensorFlow object detection API is the
framework for creating a deep learning network that solves object detection
problems. There are already pretrained models in their framework which they refer
to as Model Zoo.
 OBJECT DETECTION USING TENSORFLOW CONVOLUTIONAL

NEURAL NETWORK
In 2020, ―Object Detection Using TensorFlow Convolutional Neural
Network‖. To identify a single Object in an Image is easy as it can be done easily by
comparing spatial data of multiple images of similar type. For this a solution was
proposed which is created in a sequential manner layer by layer. Recognize single
object in an image using an efficient repository TensorFlow.
The methodology used here is Fast R-CNN and RPN networks that can be
used for object detection.

BLIND PEOPLE
Faster R-CNN is a single-stage model that is trained end-to-end. It uses a novel
region proposal network (RPN) for generating region proposals, which save time
compared to traditional algorithms like Selective Search. It uses the ROI Pooling
layer to extract a fixed-length feature vector from each region proposal.
A Region Proposal Network, or RPN, is a fully convolutional network that
simultaneously predicts object bounds and objectness scores at each position. The
RPN is trained end-to-end to generate high-quality region proposals.
 IMPLEMENTATION OF REAL-TIME OBJECT DETECTION SYSTEM

USING MACHINE LEARNING ALGORITHMS
In 2021, ―Implementation of Real-Time Object Detection System Using
Machine Learning Algorithms‖ In Object detection system images have diverse
types of object this want to be accurately identified by the computer vision.
For this a solution was proposed which described New Machine learning
methods are more effective compared to traditional Object detection techniques.
This implemented object detection and object counting. The methodology
used here are YOLO algorithm which is more efficient than CNN.
YOLO is an algorithm that uses neural networks to provide real-time object
detection. This algorithm is popular because of its speed and accuracy. It has been
used in various applications to detect traffic signals, people, parking meters, and
animals.
A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm
which can take in an input image, assign importance (learnable weights and biases)
to various aspects/objects in the image and be able to differentiate one from the
other.
 OBJECT DETECTION AND IDENTIFICATION FOR BLIND PEOPLE

IN VIDEO SCENE
In 2019, ―Object Detection and Identification for Blind People in Video
Scene‖ In the computer vision community, developing visual aids for handicapped
persons is one of the most active research projects.
A solution was proposed which present an overview of vision substitution
modalities and their functionalities. The methodology used here is the algorithm

BLIND PEOPLE
demonstrating the RVLDM used in the video analysis processes Extract SIFT
feature points of the given image.
SIFT, or Scale Invariant Feature Transform, is a feature detection algorithm in
Computer Vision.
SIFT helps locate the local features in an image, commonly known as the
‗keypoints‗ of the image. These keypoints are scale & rotation invariant that can be
used for various computer vision applications, like image matching, object
detection, scene detection, etc.
We can also use the keypoints generated using SIFT as features for the image
during model training. The major advantage of SIFT features, over edge features or
hog features, is that they are not affected by the size or orientation of the image.
 ASSISTIVE OBJECT RECOGNITION SYSTEM FOR VISUALLY

IMPAIRED
In 2020 ,―Assistive object Recognition System for visually impaired‖ The
issue of visual impairment or blindness is faced worldwide, for this a solution was
proposed where two cameras placed on blind person's glasses, GPS free service, and
ultrasonic sensors are employed to provide information about the surrounding
environment.
The methodology used here is system captures real-time images, then
images are pre-processed, their background and foreground are separated and then
the DNN module with a pre-trained YOLO model is applied resulting in feature
extraction.
Deep neural network (DNN) models can address these limitations of matrix
factorization. DNNs can easily incorporate query features and item features (due to
the flexibility of the input layer of the network), which can help capture the specific
interests of a user and improve the relevance of recommendations.
 OBJECT DETECTION FOR THE BLIND PEOPLE

In 2020 which describe ―Object Detection for the Blind People‖ The
improvements of a route help for visually impaired and outwardly weakened people.
For this a solution was proposed where the proposed deterrent discovery
framework comprises then in detecting the encompassing condition through sonar

BLIND PEOPLE
sensors and sending vibro-material input to the client of the situation of the nearest
hindrances in range. The methodology is mainly used by fast RCNN algorithm.
Faster R-CNN is a single-stage model that is trained end-to-end. It uses a novel
region proposal network (RPN) for generating region proposals, which save time
compared to traditional algorithms like Selective Search. It uses the ROI Pooling
layer to extract a fixed-length feature vector from each region proposal.

BLIND PEOPLE
CHAPTER 3
SYSTEM ANALYSIS
3.1 PROBLEM STATEMENT
The evaluated number of people visually impaired inside the world is 285
million. In this 39 million blind, and 246 million have no vision. They are an
important portion of our society. It‘s very difficult for them to face the outside
world.
Today in the fast moving society, visually impaired people require
supportive instruments in their day-to-day life. Our thought primarily centered on
designing and actualizing an assistive framework for visually impaired people to
detect objects effectively.
3.2 PROBLEM DESCRIPTION

The system is set up in such a way where an android application will capture
real-time frames and will send it to a laptop based networked server where all the
computations take place and using a pre-trained SSD detection model trained on
COCO datasets the objects will detect and recognized by the system .
After that distance will be calculated and the output for this will be in the form of
audio where the system gives warnings with calculated distance.
3.3 EXISTING SYSTEM

Many computer vision systems exist today to help the visually impaired in
various aspects of their life. These include Augmented Reality based wearable
goggles, video calling apps for the visually impaired to ask for assistance, AI and
GPS based navigation systems, etc.
These systems are built to work in specific cases or conditions, and cannot be
broadly used. There are cases wherein the people with visual impairment have to
realise about their surroundings, which is not easily possible with the existing
systems.
3.4 DISADVANTAGES OF EXISTING SYSTEM

 They are expensive.

BLIND PEOPLE
 Most of the visually impaired people cannot afford such expensive
technological products.
 The systems have complex functionality, making it difficult to be used by
the visually impaired.
 Some systems are not real-time.
3.5 PROPOSED SYSTEM

In this proposed system, we are using Python with an Tensor Flow-based
approach to solve the problem of object detection in an end-to-end fashion.
We used SSD Detection Model for the detection of objects based on deep neural
networks to make precise detection and OpenCV library for real time picture
capturing.
Among ImageNet, Google Open, COCO datasets we are using COCO
since it will provided class of classified feature for more than 90% of the real world
objects.
The image is sent as an input to the model and meanwhile distance is
calculated using depth estimation with the help of voice modules the output of the
object name will be converted into default voice notes which are sent to the blind
victims for their help with calculated distance.
3.6 ADVANTAGES OF PROPOSED SYSTEM

 Easy to use.
 Provides real-time results and this result is in the form of audio with
distance.
 Depending on the video quality, difference between various objects like
chair and table etc can be easily differentiated.
 Due to usage of COCO datasets it will provide the 90% of results efficiently.

BLIND PEOPLE
CHAPTER 4
SYSTEM REQUIREMENTS SPECIFICATION
4.1 HARDWARE REQUIREMENTS
 Web Camera
 Speakers
 Hard Disk
 High Performing Processor
 Ram (1 Gb)
4.2 SOFTWARE REQUIREMENTS
 Idle : Spyder
 Languages Used : Python
 Tensorflow API
Packages Used:
 Pytesseract
 torch
 tarfile
 tensorflow as tf
 pyttsx3
 numpy
Pytesseract or Python-tesseract is an Optical Character Recognition (OCR) tool for
python. It will read and recognize the text in images, license plates, etc. ... (Any
Image with Text). Binarizing the Image (Converting Image to Binary).
PyTorch is an open-source Python library for deep learning developed and
maintained by Facebook. The project started in 2016 and quickly became a popular
framework among developers and researchers.
Python tarfile module is used to read and write tar archives. Python provides us
excellent tools and modules to manage compressed files, which includes (but not
limited to) performing file and directory compression with different mechanisms
like gzip, bz2 and lzma compression. This is similar to python zip function.

BLIND PEOPLE
4.3 FEASIBILITY STUDY
The feasibility of the project is analyzed in this phase and business proposal
is put forth with a very general plan for the project and some cost estimates. During
system analysis the feasibility study of the proposed system is to be carried out. This
is to ensure that the proposed system is not a burden to the company. For feasibility
analysis, some understanding of the major requirements for the system is essential.
Three key considerations involved in the feasibility analysis are:
♦ Economical Feasibility
♦ Technical Feasibility
♦ Social Feasibility
Economical Feasibility
This study is carried out to check the economic impact that the system will
have on the organization. The amount of fund that the company can pour into the
research and development of the system is limited. The expenditures must be
justified. Thus, the developed system as well within the budget and this was
achieved because most of the technologies used are freely available. Only the
customized products had to be purchased.
Technical Feasibility
This study is carried out to check the technical feasibility, that is, the
technical requirements of the system. Any system developed must not have a high
demand on the available technical resources. This will lead to high demands on the
available technical resources. This will lead to high demands being placed on the
client. The developed system must have a modest requirement, as only minimal or
null changes are required for implementing this system.
Social Feasibility
The aspect of study is to check the level of acceptance of the system by the
user. This includes the process of training the user to use the system efficiently. The
user must not feel threatened by the system, instead must accept it as a necessity.
The level of acceptance by the users solely depends on the methods that are
employed to educate the user about the system and to make him familiar with it.
His/ Her level of confidence must be raised so that he/she is also able to make some
constructive criticism, which is welcomed, as he/she is the final user of the system.

BLIND PEOPLE
CHAPTER 5
SYSTEM DEVELOPMENT
5.1 SYSTEM DESIGN
5.1.1 System Architecture
Figure 5.1: System Architecture.

User starts the System, after that the system will activate the camera and
capture instant real time images which will be considered as input. After capturing
images it will store and send to the dataset where using SSD Architecture the
internal computations will takes place mean while after the computations the model
will detect the object and recognition will be done.
After detection of object next it will be displayed on the monitor where
frames are captured to the detected object along will the labels.
Next distance will be calculated using depth estimation by finding mid
ranges to the frames. Now using speakers which are based on voice module

BLIND PEOPLE
packages the detected object images will be read as text and it will be converted to
speech and then alerts in the form of warnings will be given to the user nothing but
Blind people which will help then to do their day to day activities independently
without depending on others with less cost and easy implementation.
5.1.2 MODULES
5.1.2.1 Video Capturing Module:
 When the system is turned on the system capture images using camera. We
have to connect this as input to the COCO dataset and classification of pixels
and features takes place.
 The captured frames can be seen in the monitor with drawn boundaries and
label.
 The method videocapture( ) is used to start the camera and capture the video.
5.1.2.2 Image Processing Module:

 OpenCV (Open Source Computer Vision) is a library of programming
functions mainly aimed at real-time computer vision.
 It is mainly used to do all the operation related to images.
 cv2 is used to perform image processing and make use of methods which are
used to detect and capture the frames and specifies names.
 This module is processed after the input is taken from the camera.
5.1.2.3 Object Detection Module:

 The algorithm will take the image as input and all the computations will take
place like dividing the image into neurons nothing but pixels and classification
of features which will be done on Neural Network.
 Image will be read as string for the next computation and it will be
compared under trained dataset.
 This can be achieved here by using category index where 90 objects are
trained separately.
 Here we used SSD Architecture which comes under TensorFlow API .

BLIND PEOPLE
5.1.2.4 Distance calculation Module:
 To find the distance of the object numpy is used, which is pip package used
for mathematical calculation.
 Finding distance can be achieved by using depth estimation.
 By using detected objects visible on the monitor frames the depth estimation
will takes place by finding mid ranges and rounding the estimation scale to 0-
10.
5.1.2.5 Audio Output Module:

 Next after detecting the object and calculating the distance our aim is to give
the output in the form of audio.
 In the output we are going to specify the distance along with units and the
warning messages to alert the user.
 For audio output the pyttxs3 pip package which is one of the python built in
module used for converting text to speech.
5.1.3 Algorithm
We use an algorithm called Single Shot Detection (SSD) Algorithm
Single Shot MultiBox Detector (SSD) is an object detection algorithm that is a
modification of the VGG16 architecture. It was released at the end of November
2016 and reached new records in terms of performance and precision for object
detection tasks, scoring over 74% mAP (mean Average Precision) at 59 frames per
second on standard datasets such as PascalVOC and COCO.
SSD‘s architecture builds on the venerable VGG-16 architecture, but discards the
fully connected layers.

BLIND PEOPLE
SSD Architecture
Figure 5.2: SSD Architecture

The reason VGG-16 was used as the base network is because of its:
 strong performance in high quality image classification tasks
 popularity for problems where transfer learning helps in improving results
Instead of the original VGG fully connected layers, a set of auxiliary convolutional
layers (from conv6 onwards) were added, thus enabling to extract features at
multiple scales and progressively decrease the size of the input to each subsequent
layer.
Multibox
MultiBox‘s loss function also combined two critical components that made their
way into SSD:

BLIND PEOPLE
Figure 5.3: SSD MultiBox

 Confidence Loss
This measures how confident the network is of the objectness of the computed
bounding box. Categorical cross-entropy is used to compute this loss.
 Location Loss
This measures how far away the network‘s predicted bounding boxes are from the
ground truth ones from the training set. L2-Norm is used here.
multibox_loss = confidence_loss + alpha * location_loss
The alpha term helps us in balancing the contribution of the location loss
Training & Running SSD
Datasets
You will need training and test datasets with ground truth bounding boxes and
assigned class labels (only one per bounding box). The Pascal VOC and COCO
datasets are a good starting point.
To download the COCO dataset you can visit the download link on the COCO
dataset page. Additionally, here is a python script to download the object detection
portion of the COCO dataset to your local drive

BLIND PEOPLE
Figure 5.4: Training Dataset (COCO)
Accuracy Comparision
Figure 5.5: Accuracy Comparision

BLIND PEOPLE
5.1.4 UML Diagrams
5.1.4.1 Introduction to UML
Unified Modeling Language (UML)
UML is a language for visualizing, specifying, constructing and
documenting the artifacts of a software intensive system.UML is simply another
graphical representation of a common semantic model. UML diagrams are the
ultimate output of the entire discussion. All the elements, relationships are used to
make a complete UML diagram and the diagram represents a system.
The visual effect of the UML diagram is the most important part of the
entire process. All the other elements are used to make it a complete one.
UML includes the following nine diagrams and the details are described in the
following chapters.
● Class diagram
● Object diagram
● Use case diagram
● Sequence diagram
● Collaboration diagram
● Activity diagram
● State chart diagram
● Deployment diagram
● Component diagram
UML defines several models for representing systems
1. Use case diagram : represents the functions of a system from the user's point
of view.
2. Class diagram : represents the static structure in terms of classes and
relationships
3. Object diagram : represents objects and their relationships and correspond to
simplified collaboration diagrams that do not represent message broadcasts.
4. Sequence diagram : Temporal representation of objects and their interactions.
5. Collaboration diagram : Spatial representation of objects, links, and
interactions.
6. State chart diagram : represents the behavior of a class in terms of states at
run time.

BLIND PEOPLE
7. Activity diagram : represents the behavior of an operation as a set of actions.
8. Component diagram : represents the physical components of an application.
9. Deployment diagram : represents the deployment of components on particular
pieces of hardware.
Advantages
● To represent complete systems (instead of only the software portion) using
object-oriented concepts.
● To establish an explicit coupling between concepts and executable code.
● To take into account the scaling factors that are inherent to complex and critical
systems.
● To create a modeling language usable by both humans and machines.
Conceptual model of UML can be mastered by learning the following three major
elements:
● UML building blocks.
● Rules to connect the building blocks.
● Common mechanisms of UML.

BLIND PEOPLE
5.1.4.2 Class Diagram
AIM: To implement class diagram for Object Detection and Recognition Using
Tensor flow for Blind People.
DESCRIPTION: A class diagram shows a set of classes, interfaces and
collaborations and their relationships.
OBJECTIVE : The main objective of the class diagram to illustrate the static
design of a view system.
THINGS : class, interfaces, collaboration, active class.
RELATIONSHIP : Dependency, generalization and association
Figure 5.6: Class diagram

BLIND PEOPLE
5.1.4.3 Use Case Diagram
AIM : To implement use case diagram for Object Detection and Recognition
Using Tensor flow for Blind People.
DESCRIPTION : Use case diagrams are central to modeling the behavior of the
system or a class. Use case diagrams are important for testing executable systems
through reverse engineering.
OBJECTIVE : Use case diagram organizes the behavior of the system.
THINGS : Use cases, Actors.
RELATIONSHIP : Dependency, generalization and Association.
Figure 5.7: Use case diagram

BLIND PEOPLE
5.1.4.4 Sequence Diagram
AIM : To implement sequence diagram for Object Detection and Recognition
DESCRIPTION :A Sequence diagram is an interaction diagram that emphasizes
the time ordering messages. It shows a set of objects and messages sent and
received by the by those objects.
OBJECTIVE : To illustrate the dynamic view of system
THINGS : Objects and Messages
RELATIONSHIP : Time and life line, Links
Figure 5.8: Sequence diagram

BLIND PEOPLE
5.1.4.5 Activity Diagram
AIM : To implement activity diagram for Object Detection and Recognition
DESCRIPTION : An activity diagram is essentially a flow chart, showing flow of
control from activity to activity. It involves modeling the sequential steps in
computational process. Activity diagram not only important for modeling dynamic
aspects of a system, but also for constructing executable system through forward
and reverse engineering.
OBJECTIVE : Focused on flow of control from activity to activity
THINGS : State and object
RELATIONSHIP : Transitions
Figure 5.9: Activity diagram

BLIND PEOPLE
5.1.4.6 Deployment Diagram
AIM : To implement deployment diagram for Object Detection and Recognition
DESCRIPTION :
A deployment diagram is a type of diagram that specifies the physical
hardware on which the software system will execute. The software system is
manifested using various artifacts, and they are mapped to an execution
environment that is going to execute software such as nodes.
OBJECTIVE : Used with the sole purpose of describing how software is deployed
into the hardware system.
THINGS : Node, Component, Artifact and Interface.
RELATIONSHIP : Nodes.
Figure 5.10: Deployment diagram

BLIND PEOPLE
5.2 SYSTEM IMPLEMENTATION
5.2.1 Software Description
Python Introduction
● Python laid its foundation in the late 1980s.

● The implementation of Python was started in the December 1989 by Guido Van
Rossum at CWI in Netherland.
● In February 1991, van Rossum published the code (labeled version 0.9.0)
● In 1994, Python 1.0 was released with new features like: lambda, map, filter, and
reduce.
● Python 2.0 added new features like: list comprehensions, garbage collection
system.
● On December 3, 2008, Python 3.0 (also called "Py3K") was released. It was
designed to rectify fundamental flaw of the language.
● ABC programming language is said to be the predecessor of Python language
which was capable of Exception Handling and interfacing with Amoeba Operating
System.
● Python is influenced by following programming languages:
● ABC language.
● Modula-3
Spyder
Spyder is an open-source cross-platform integrated development environment

(IDE) for scientific programming in the Python language. Spyder integrates with a
number of prominent packages in the scientific Python stack, including NumPy,
SciPy, Matplotlib, pandas, IPython, SymPy and Cython, as well as other open-
source software. It is released under the MIT license.
Initially created and developed by Pierre Raybaut in 2009, since 2012 Spyder has
been maintained and continuously improved by a team of scientific Python
developers and the community.

BLIND PEOPLE
Spyder is extensible with first-party and third-party plugins, includes support for
interactive tools for data inspection and embeds Python-specific code quality
assurance and introspection instruments, such as Pyflakes, Pylint and Rope. It is
available cross-platform through Anaconda, on Windows, on macOS through
MacPorts, and on major Linux distributions such as Arch Linux, Debian, Fedora,
Gentoo Linux, openSUSE and Ubuntu.
Spyder uses Qt for its GUI and is designed to use either of the PyQt or PySide
Python bindings. QtPy, a thin abstraction layer developed by the Spyder project
and later adopted by multiple other packages, provides the flexibility to use either
backend.
IMPLEMENTATION:
Efficient Implementation of this Model depends upon the compatibility with

python and library installation hurdles. To be honest, this was among one of the
most challenging phase which I felt in building this project. Thankful to
STACKOVERFLOW and Python unofficial Binary Releases for having pre-
builded files uploaded and you can just download it from here as per your
system‘s compatibility.
TENSORFLOW APIs
We have implemented it by using TensorFlow APIs.The advantage one have by

using APIs is it provides us with a set of common operations. Because of which we
don‘t have to write the code for program from scratch.We can say they are quite
helpful as well as efficient.APIs provides us convenience and hence they are time
saver.The TensorFlow object detection API is basically a structure build for creating

BLIND PEOPLE
a deep learning network that solves the problems for object detection.There are
trained models in their framework and they refer it as Model Zoo .This includes a
collection of COCO dataset, the KITTI dataset, and the Open Images Dataset. Here,
we are primarily focused on COCO DATASETS.
TensorFlow Object Detection API depends on the following libraries:
 Protobuf 3.0.0
 Python-tk
 Pillow 1.0
 lxml
 tf-slim
 slim
 Jupyter notebook
 Matplotlib
 Tensorflow (1.15.0)
 Cython
 contextlib2
 cocoapi
Make sure you‘ve installed all the required Libraries as per your system‘s
requirement. We‘ll need few more libraries for Voice Generation that we‘ll come to
know later in this blog. You can visit TensorFlow for a detailed Installation
analysis.
GETTING READY WITH THE SYSTEM
1. TensorFlow Installation
A typical user can simply install it on Anaconda Prompt with following commands.
Figure 5.11: TensorFlow Installation

2. Now you need to download the TensorFlow model repository from
Figure 5.12: TensorFlow Repository Link

BLIND PEOPLE
3. PROTOBUF COMPILATION
Next you have to Convert the .protos file to .py file extensions.
This can be done via Protobuf Compilation. For achieving this we we‘ll need
Google Protobuf Releases. Head on to the link and Download the protobuf version
which satisfies your system compatibility. I prefer win_64 which supports my
system. After downloading and extracting it make sure to add it‘s path in the
environment variable otherwise it‘ll still give you errors as ‗proto is not recognized
as an internal or external batch file‘. So, make sure it is done!
Further, inside of tensorflow/models/research/ directory :
Hit the following command:
Compilation Code
And Boom! You‘ve successfully converted your protos file into python files.
Figure 5.13: A snap of converted PROTOS FILES to PYTHON in the target

Directory
Now we‘ll be heading upon choosing the model for our proposed system.
MODELS
Now, a bunch of pre-trained models are with Tensorflow . You can use any one of
them. They are pretty good and depending upon your system specifications you can

BLIND PEOPLE
choose one. For a faster accuracy you can go with SSD DETECTION and for better
accuracy you can go with MASK RCNN but most of the system shows smooth
performance with SSD Mobile_Net DETECTION . So, I‘ll elaborate SSD
ALGORITHM. You can check other models here:
SSD ARCHITECTURE
Figure 5.14: SSD Architecture

SSD has two components: SSD head and a backbone model.
Backbone model basically is a trained image classification network as a feature
extractor. Like ResNet this is typically a network trained on ImageNet from which
the final fully connected classification layer has been removed.
The SSD head is just one or more convolutional layers added to this backbone and
the outputs are interpreted as the bounding boxes and classes of objects in the
spatial location of the final layers activations.We are hence left with a deep neural
network which is able to extract semantic meaning from the input image while
preserving the spatial structure of the image albeit at a lower resolution.
For an input image ,the backbone results in a 256 7x7 feature maps in ResNet34 .
SSD divides the image using a grid and have each grid cell be responsible for
detecting objects in that region of the image. Detecting objects basically means
predicting the class and location of an object within that region.
Anchor box
Multiple anchor/prior boxes can be assigned to each grid cell in SSD. These
assigned anchor boxes are pre-defined and each one is responsible for a size and
shape within a grid cell.Matching phase is used by SSD while training, so that

BLIND PEOPLE
there‘s an appropriate match to anchor box with the bounding boxes of each ground
truth object within an image. For predicting that object‘s class and its location the
anchor box with the highest degree of overlap with an object is responsible.Once the
network has been trained,this property is used for training the network and for
predicting the detected objects and their locations. Practically, each anchor box is
specified with an aspect ratio and a zoom level. Well,we know that all objects are
not square in shape. Some are shorter ,some are longer and some are wider, by
varying degrees. The SSD architecture allows pre-defined aspect ratios of the
anchor boxes to account for this.The different aspect ratios can be specified using
ratios parameter of the anchor boxes associated with each grid cell at each
zoom/scale level.
Zoom Level
It is not mandatory for the anchor boxes to have the same size as that of the grid
cell.The user might be interested in finding both smaller or larger objects within a
grid cell. In order to specify how much the anchor boxes need to be scaled up or
down with respect to each grid cell ,the zooms parameter is used.
MobileNet
This model is based on the ideology of THE MobileNet model based on depthwise
separable convolutions and it forms a factorized Convolutions. These converts basic
standard convolutions into a depthwise convolutions. This 1×1 convolutions are
also called as pointwise convolutions. For MobileNet to work, these depthwise
convolutions applies a general single filter based concept to each of the input
channels. These pointwise convolutions applies a 1×1 convolutions to merge with
the outputs of the depthwise convolutions. As a standard convolution both filters
combines the inputs into a new set of outputs in one single step. The depthwise
identifiable convolutions splits this into two layers — a separate layer for the
filtering purpose and the other separate layer for the combining purpose. This
factorization methodology has the effect of drastically reducing the computation and
that of the model size.

BLIND PEOPLE
Figure 5.15: MobileNet Architecture

Depth Estimation
Depth estimation or extraction feature is nothing but the techniques and algorithms
which aims to obtain a representation of the spatial structure of a scene. In simpler
words, it is used to calculate the distance between two objects. Our prototype is
used to assist the blind people which aims to issue warning to the blind people about
the hurdles coming on their way. In order to do this, we need to find that at how
much distance the obstacle and person are located in any real time situation. After
the object is detected rectangular box is generated around that object.
Figure 5.16: Distance Approximations

If that object occupies most of the frame then with respect to some constraints the
approximate distance of the object from the particular person is calculated.

BLIND PEOPLE
Following code is used to recognize objects and to return the information of the
distance and location.
Here, we have established a Tensorflow session comprised of Crucial Features for
Detection. So, for further analysis iteration is done through the boxes. Boxes are an
array, inside of an array. So, for iteration we need to define the following
conditions.
Index of box in boxes array is represented by i. Analysis of the score of the box is
done by index. It is also used to access class. Now the width of the detected object is
measured. This is done by asking the width of an object in terms of pixels.
We got the center of two by subtracting the same axis start coordinates and dividing
them by two. In this way the centre of our detected rectangle is calculated. And at
the last, a dot is drawn in the centre. The default parameter for drawing boxes is a
score of 0.5. if scores[0][i] >= 0.5 (i.e. equal or more than 50 percent) then we
assume that the object is detected. if scores[0][i] >= 0.5:
In the above formula, mid_x is centre of X axis and mid_y is centre of y axis. If the
distance apx_distance < 0.5 and if mid_x > 0.3 and mid_x < 0.7 then it can be
concluded that the object is too close from the particular person. With this code,
relative distance of the object from a particular person can be calculated. After the
detection of object the code is used to determine the relative distance of the object
from the person. If the object is too close then signal or a warning is issued to the
person through voice generation module.
VOICE GENERATION MODULE

After the detection of an object, it is utmost important to acknowledge the person
about the presence of that object on his/her way. For the voice generation module
PYTTSX3 plays an important role. Pyttsx3 is a conversion library in Python which
converts text into speech. This library works well with both Python 2 and 3. To get
reference to a pyttsx. Engine instance, a factory function called as pyttsx.init() is
invoked by an application. Pyttsx3 is a tool which converts text to speech easily.
This algorithm works as whenever an object is being detected, approximate distance
is being calculated,with the help of cv2 library and cv2.putText() function, the texts
are getting displayed on to the screen. To identify the hidden text in an image,we

BLIND PEOPLE
use Python-tesseract for character recognition. OCR detects the text content on
images and encodes it in the form which is easily understood by the computer. This
text detection is done by scanning and analysis of the image. Thus, the text
embedded in images are recognized and ―read‖ using Python-tesseract. Further
these texts are pointed to a pyttsx.Engine instance, a factory function called as
pyttsx.init() is invoked by an application. During construction, a
pyttsx.driver.DriverProxy object is initialized by engine which is responsible for
loading a speech engine driver from the pyttsx.drivers module. After construction,
an object created by an engine is used by the application to register and unregister
event callbacks; produce and stop speech; get and set speech engine properties; and
start and stop event loops.
OpenCV
OpenCV (Open Source Computer Vision Library) is an open source
computer vision and machine learning software library. OpenCV was built to
provide a common infrastructure for computer vision applications and to accelerate
the use of machine perception in the commercial products. Being a BSD-licensed
product, OpenCV makes it easy for businesses to utilize and modify the code. The
library has more than 2500 optimized algorithms, which includes a comprehensive
set of both classic and state-of-the-art computer vision and machine learning
algorithms.
These algorithms can be used to detect and recognize faces, identify
objects, classify human actions in videos, track camera movements, track moving
objects, extract 3D models of objects, produce 3D point clouds from stereo
cameras, stitch images together to produce a high resolution image of an entire
scene, find similar images from an image database, remove red eyes from images
taken using flash, follow eye movements, recognize scenery and establish markers
to overlay it with augmented reality, etc
.
Package
A package is basically a directory with Python files and a file with the
name __init__.py. This means that every directory inside of the Python path,

BLIND PEOPLE
which contains a file named __init__.py, will be treated as a package by Python.
It's possible to put several modules in to a package.
Examples : Pillow, Matplotlib, Keras, Tensorflow, OpenCV, etc
Importing Packages
We will need to import the module with an import statement.

An import statement is made up of the import keyword along with the name of
the module. In a Python file, this will be declared at the top of the code.
Example : import keras
The packages used in this software are :
1) cv2
cv2 (old interface in old OpenCV versions was named as cv ) is the name
that OpenCV developers chose when they created the binding generators. OpenCV-
Python makes use of Numpy, which is a highly optimized library for numerical
operations with a MATLAB-style syntax. All the OpenCV array structures are
converted to and from Numpy arrays. This also makes it easier to integrate with
other libraries that use Numpy such as SciPy and Matplotlib.
2) torch
PyTorch is a Python package that provides two high-level features:
 Tensor computation (like NumPy) with strong GPU acceleration

 Deep neural networks built on a tape-based autograd system
3) pytesseract
Python-tesseract is an optical character recognition (OCR) tool for python. That is,
it will recognize and ―read‖ the text embedded in images. Python-tesseract is a
wrapper for Google‘s Tesseract-OCR Engine. It is also useful as a stand-alone
invocation script to tesseract, as it can read all image types supported by the Pillow
and Leptonica imaging libraries, including jpeg, png, gif, bmp, tiff, and others.

BLIND PEOPLE
Additionally, if used as a script, Python-tesseract will print the recognized text
instead of writing it to a file.
4) numpy
NumPy is a general-purpose array-processing package. It provides a high-

performance multidimensional array object, and tools for working with these arrays.
It is the fundamental package for scientific computing with Python.
5) time
There is a popular time module available in Python which provides functions

for working with times, and for converting between representations. The
function time. time() returns the current system time in ticks since 12:00am, January
1, 1970(epoch).The Python time module provides many ways of representing time
in code, such as objects, numbers, and strings. It also provides functionality other
than representing time, like waiting during code execution and measuring the
efficiency of your code.
● One of the ways you can manage the concept of Python time in your
application is by using a floating point number that represents the number of
seconds that have passed since the beginning of an era—that is, since a
certain starting point.
● Let‘s dive deeper into what that means, why it‘s useful, and how you can use
it to implement logic, based on Python time, in your application.
● understand core concepts at the heart of working with dates and times, such
as epochs, time zones, and daylight savings time
● Represent time in code using floats, tuples, and struct_time
● Convert between different time representations
● Suspend thread execution
● Measure code performance using perf_counter().
5.2.2 Hardware Description

● Camera
● Monitor

BLIND PEOPLE
● Power Supply
● SD card
● Buzzer (Speaker)
Camera
A webcam is a camera that connects to a computer. It captures either
still pictures or motion video, and with the aid of software, can transmit its video on
the Internet in real-time.
We use the camera to shoot the video of the driver . This camera is used to give
input to the system. We have to connect this camera to the raspberry pi board and
this takes the power from the board.
Figure 5.17: Camera
Monitor
A monitor is an electronic visual computer display that includes a screen,
circuitry and the case in which that circuitry is enclosed. Older computer monitors
made use of cathode ray tubes (CRT), which made them large, heavy and
inefficient. Nowadays, flat-screen LCD monitors are used in devices like laptops,
PDAs and desktop computers because they are lighter and more energy efficient.
A monitor is also known as a screen or a visual display unit (VDU).

BLIND PEOPLE
Figure 5.18: Monitor
Power Supply
A power supply is a component that supplies power to at least one electric
load. Typically, it converts one type of electrical power to another, but it may also
convert a different form of energy – such as solar, mechanical, or chemical - into
electrical energy.
A power supply provides components with electric power. The term usually pertains
to devices integrated within the component being powered. For example, computer
power supplies convert AC current to DC current and are generally located at the
rear of the computer case, along with at least one fan.
A power supply is also known as a power supply unit, power brick or power
adapter.
Figure 5.19: Power supply cable

BLIND PEOPLE
SD Card
An SD Card (Secure Digital Card) is an ultra small flash memory card designed
to provide high-capacity memory in a small size. SD cards are used in many
small portable devices such as digital video camcorders, digital cameras, handheld
computers, audio players and mobile phones. In use since 1999, SD Memory Cards
are now available in capacities between 16 Megabytes and 1 Gigabyte. An SD card
typically measures 32 x 24 x 2.1 mm and weighs approximately 2grams.
Figure 5.20: SD card
Buzzer (Speaker)
A buzzer or beeper is an audio signaling device, which may be mechanical,
electromechanical, or piezoelectric ( piezo for short). Typical uses of buzzers and
beepers include alarm devices, timers, and confirmation of user input such as a
mouse click or keystroke.
The buzzer consists of an outside case with two pins to attach it to power and
ground. When current is applied to the buzzer it causes the ceramic disk to contract
or expand. Changing this then causes the surrounding disc to vibrate. That's the
sound that you hear.
Figure 5.21: Buzzer

BLIND PEOPLE
5.2.3 Sample Code
import numpy as np
import os
import six.moves.urllib as urllib
import urllib.request as allib
import sys
import tarfile
import tensorflow as tf
import zipfile
import time
import pytesseract
import engineio
import torch
from torch.autograd import Variable as V
import models as models
from torchvision import transforms as trn
from torch.nn import functional as F
import pyttsx3
#from .engine import Engine
engine =pyttsx3.init()
from collections import defaultdict

from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
arch = 'resnet18'
model_file = 'whole_%s_places365_python36.pth.tar' % arch

if not os.access(model_file, os.W_OK):
weight_url = 'http://places2.csail.mit.edu/models_places365/' + model_file

BLIND PEOPLE
os.system('wget ' + weight_url)
#= label_map_util.create_category_index(categories)
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-

OCR\\tesseract'
from utils import label_map_util
#/object_detection/' m2
from utils import visualization_utils as vis_util

MODEL_NAME = 'ssd_inception_v2_coco_2017_11_17'
MODEL_FILE = MODEL_NAME + '.tar.gz'
DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'
PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')
NUM_CLASSES = 90
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.compat.v1.GraphDef()
with tf.io.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
#label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
#categories = label_map_util.convert_label_map_to_categories(label_map,
max_num_classes=NUM_CLASSES, use_display_name=True)

BLIND PEOPLE
category_index = {1: {'id': 1, 'name': 'person'}, 2: {'id': 2, 'name': 'bicycle'}, 3: {'id':
3, 'name': 'car'}, 4: {'id': 4, 'name': 'motorcycle'}, 5: {'id': 5, 'name': 'airplane'}, 6:
{'id': 6, 'name': 'bus'}, 7: {'id': 7, 'name': 'train'}, 8: {'id': 8, 'name': 'truck'}, 9: {'id':
9, 'name': 'boat'}, 10: {'id': 10, 'name': 'traffic light'}, 11: {'id': 11, 'name': 'fire
hydrant'}, 13: {'id': 13, 'name': 'stop sign'}, 14: {'id': 14, 'name': 'parking meter'}, 15:
{'id': 15, 'name': 'bench'}, 16: {'id': 16, 'name': 'bird'}, 17: {'id': 17, 'name': 'cat'}, 18:
{'id': 18, 'name': 'dog'}, 19: {'id': 19, 'name': 'horse'}, 20: {'id': 20, 'name': 'sheep'},
21: {'id': 21, 'name': 'cow'}, 22: {'id': 22, 'name': 'elephant'}, 23: {'id': 23, 'name':
'bear'}, 24: {'id': 24, 'name': 'zebra'}, 25: {'id': 25, 'name': 'giraffe'}, 27: {'id': 27,
'name': 'backpack'}, 28: {'id': 28, 'name': 'umbrella'}, 31: {'id': 31, 'name':
'handbag'}, 32: {'id': 32, 'name': 'tie'}, 33: {'id': 33, 'name': 'suitcase'}, 34: {'id': 34,
'name': 'frisbee'}, 35: {'id': 35, 'name': 'skis'}, 36: {'id': 36, 'name': 'snowboard'}, 37:
{'id': 37, 'name': 'sports ball'}, 38: {'id': 38, 'name': 'kite'}, 39: {'id': 39, 'name':
'baseball bat'}, 40: {'id': 40, 'name': 'baseball glove'}, 41: {'id': 41, 'name':
'skateboard'}, 42: {'id': 42, 'name': 'surfboard'}, 43: {'id': 43, 'name': 'tennis racket'},
44: {'id': 44, 'name': 'bottle'}, 46: {'id': 46, 'name': 'wine glass'}, 47: {'id': 47, 'name':
'cup'}, 48: {'id': 48, 'name': 'fork'}, 49: {'id': 49, 'name': 'knife'}, 50: {'id': 50, 'name':
'spoon'}, 51: {'id': 51, 'name': 'bowl'}, 52: {'id': 52, 'name': 'banana'}, 53: {'id': 53,
'name': 'apple'}, 54: {'id': 54, 'name': 'sandwich'}, 55: {'id': 55, 'name': 'orange'}, 56:
{'id': 56, 'name': 'broccoli'}, 57: {'id': 57, 'name': 'carrot'}, 58: {'id': 58, 'name': 'hot
dog'}, 59: {'id': 59, 'name': 'pizza'}, 60: {'id': 60, 'name': 'donut'}, 61: {'id': 61,
'name': 'cake'}, 62: {'id': 62, 'name': 'chair'}, 63: {'id': 63, 'name': 'couch'}, 64: {'id':
64, 'name': 'potted plant'}, 65: {'id': 65, 'name': 'bed'}, 67: {'id': 67, 'name': 'dining
table'}, 70: {'id': 70, 'name': 'toilet'}, 72: {'id': 72, 'name': 'tv'}, 73: {'id': 73, 'name':
'laptop'}, 74: {'id': 74, 'name': 'mouse'}, 75: {'id': 75, 'name': 'remote'}, 76: {'id': 76,
'name': 'keyboard'}, 77: {'id': 77, 'name': 'cell phone'}, 78: {'id': 78, 'name':
'microwave'}, 79: {'id': 79, 'name': 'oven'}, 80: {'id': 80, 'name': 'toaster'}, 81: {'id':
81, 'name': 'sink'}, 82: {'id': 82, 'name': 'refrigerator'}, 84: {'id': 84, 'name': 'book'},
85: {'id': 85, 'name': 'clock'}, 86: {'id': 86, 'name': 'vase'}, 87: {'id': 87, 'name':
'scissors'}, 88: {'id': 88, 'name': 'teddy bear'}, 89: {'id': 89, 'name': 'hair drier'}, 90:
{'id': 90, 'name': 'toothbrush'}}
import cv2

BLIND PEOPLE
cap = cv2.VideoCapture(0)
with detection_graph.as_default():
with tf.compat.v1.Session(graph=detection_graph) as sess:
ret = True
while (ret):
ret,image_np = cap.read()
if cv2.waitKey(20) & 0xFF == ord('b'):
cv2.imwrite('opencv'+'.jpg', image_np)
model_file = 'whole_%s_places365_python36.pth.tar' % arch
if not os.access(model_file, os.W_OK):
weight_url = 'http://places2.csail.mit.edu/models_places365/' + model_file
os.system('wget ' + weight_url)
useGPU = 1
if useGPU == 1:
model = torch.load(model_file)
else:
model = torch.load(model_file, map_location=lambda storage, loc: storage)
# model trained in GPU could be deployed in CPU machine like this!
model.eval()
centre_crop = trn.Compose([
trn.Resize((256,256)),
trn.CenterCrop(224),
trn.ToTensor(),
trn.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
file_name = 'categories_places365.txt'
if not os.access(file_name, os.W_OK):

BLIND PEOPLE
synset_url =
'https://raw.githubusercontent.com/csailvision/places365/master/categories_places3
65.txt'
os.system('wget ' + synset_url)
classes = list()
with open(file_name) as class_file:
for line in class_file:
classes.append(line.strip().split(' ')[0][3:])
classes = tuple(classes)
img_name = 'opencv.jpg'
if not os.access(img_name, os.W_OK):
img_url = 'http://places.csail.mit.edu/demo/' + img_name
os.system('wget ' + img_url)
img = Image.open(img_name)
input_img = V(centre_crop(img).unsqueeze(0), volatile=True)
logit = model.forward(input_img)
h_x = F.softmax(logit, 1).data.squeeze()
probs, idx = h_x.sort(0, True)
print('POSSIBLE SCENES ARE: ' + img_name)

engine.say("Possible Scene may be")
engine.say(img_name)
for i in range(0, 5):
engine.say(classes[idx[i]])
print('{}'.format(classes[idx[i]]))
# Expand dimensions since the model expects images to have shape: [1, None,
None, 3]

BLIND PEOPLE
image_np_expanded = np.expand_dims(image_np, axis=0)
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object was detected.
boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
scores = detection_graph.get_tensor_by_name('detection_scores:0')
classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
# Actual detection.
(boxes, scores, classes, num_detections) = sess.run(
[boxes, scores, classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
# Visualization of the results of a detection.

if cv2.waitKey(2) & 0xFF == ord('a'):
vis_util.vislize_boxes_and_labels_on_image_array(
image_np,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=8)
else:
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=8)

BLIND PEOPLE
if cv2.waitKey(2) & 0xFF == ord('r'):
text=pytesseract.image_to_string(image_np)
print(text)
engine.say(text)
engine.runAndWait()
for i,b in enumerate(boxes[0]):
# car bus truck
if classes[0][i] == 3 or classes[0][i] == 6 or classes[0][i] == 8:
if scores[0][i] >= 0.5:
mid_x = (boxes[0][i][1]+boxes[0][i][3])/2
mid_y = (boxes[0][i][0]+boxes[0][i][2])/2
apx_distance = round(((1 - (boxes[0][i][3] - boxes[0][i][1]))**4),1)
cv2.putText(image_np, '{}'.format(apx_distance),
(int(mid_x*800),int(mid_y*450)), cv2.FONT_HERSHEY_SIMPLEX, 0.7,
(255,255,255), 2)
if apx_distance <=0.5:
if mid_x > 0.3 and mid_x < 0.7:
cv2.putText(image_np, 'WARNING!!!', (50,50),
cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0,0,255), 3)
print("Warning -Vehicles Approaching")
engine.say("Warning -Vehicles Approaching")
engine.runAndWait()
if classes[0][i] ==44:
if scores[0][i] >= 0.5:
(255,255,255), 2)
print(apx_distance)

BLIND PEOPLE
engine.say(apx_distance)
engine.say("units")
engine.say("BOTTLE IS AT A SAFER DISTANCE")
print("Warning -BOTTLE very close to the frame")
engine.say("Warning -BOTTLE very close to the frame")
engine.runAndWait()
if scores[0][i] >= 0.5:
cv2.putText(image_np,'{}'.format(apx_distance),
(255,255,255), 2)
print(apx_distance)
engine.say("units")
engine.say("BOOK IS AT A SAFER DISTANCE")
print("Warning -BOOK very close to the frame")
engine.say("Warning -BOOK very close to the frame")
engine.runAndWait()

BLIND PEOPLE
if scores[0][i] >= 0.0:
(255,255,255), 2)
#print(apx_distance)
#engine.say(apx_distance)
#engine.say("meters")
#engine.say("Person is AT A SAFER DISTANCE")
print(apx_distance)
engine.say("meters")
#print("Warning -Person very close to the frame")
engine.say("Warning -Person very close to the frame")
engine.runAndWait()
#plt.figure(figsize=IMAGE_SIZE)
#plt.imshow(image_np)
#cv2.imshow('IPWebcam',image_np)
cv2.imshow('image',cv2.resize(image_np,(1024,768)))
if cv2.waitKey(2) & 0xFF == ord('t'):
cv2.destroyAllWindows()
cap.release()
break

BLIND PEOPLE
5.3 SYSTEM TESTING
5.3.1 Software Testing
The purpose of testing is to discover errors. Testing is the process of trying
to discover every conceivable fault or weakness in a work product. It provides a
way to check the functionality of components, sub-assemblies, assemblies and/or a
finished product. It is the process of exercising software with the intent of ensuring
that the software system meets its requirements and user expectations and does not
fail in an unacceptable manner. There are various types of test. Each test type
addresses a specific testing requirement.
5.3.2 Types of Testing

Unit testing
Unit testing involves the design of test cases that validate that the internal
program logic is functioning properly, and that program inputs produce valid
outputs. All decision branches and internal code flow should be validated. It is the
testing of individual software units of the application .It is done after the completion
of an individual unit before integration. This is a structural testing, that relies on
knowledge of its construction and is invasive. Unit tests perform basic tests at
component level and test a specific business process, application, and/or system
configuration. Unit tests ensure that each unique path of a business process performs
accurately to the documented specifications and contains clearly defined inputs and
expected results.
Integration testing
Integration tests are designed to test integrated software components to
determine if they actually run as one program. Testing is event driven and is more
concerned with the basic outcome of screens or fields. Integration tests demonstrate
that although the components were individually satisfaction, as shown by
successfully unit testing, the combination of components is correct and consistent.
Integration testing is specifically aimed at exposing the problems that arise from
the combination of components.

BLIND PEOPLE
Functional testing
Functional tests provide systematic demonstrations that functions tested are
available as specified by the business and technical requirements, system
documentation, and user manuals.
Functional testing is centered on the following items:
Valid Input : identified classes of valid input must be accepted.
Invalid Input : identified classes of invalid input must be rejected.
Functions : identified functions must be exercised.
Output : identified classes of application outputs must be exercised.
Systems/Procedures : interfacing systems or procedures must be invoked.
Organization and preparation of functional tests is focused on requirements, key

functions, or special test cases. In addition, systematic coverage pertaining to
identify Business process flows; data fields, predefined processes, and successive
processes must be considered for testing. Before functional testing is complete,
additional tests are identified and the effective value of current tests is determined.
System Testing
System testing ensures that the entire integrated software system meets
requirements. It tests a configuration to ensure known and predictable results. An
example of system testing is the configuration oriented system integration test.
System testing is based on process descriptions and flows, emphasizing pre-driven
process links and integration points.
White Box Testing
White Box Testing is a testing in which in which the software tester has
knowledge of the inner workings, structure and language of the software, or at least
its purpose. It is purpose. It is used to test areas that cannot be reached from a black
box level.
Black Box Testing

Black Box Testing is testing the software without any knowledge of the inner
workings, structure or language of the module being tested. Black box tests, as most
other kinds of tests, must be written from a definitive source document, such as
specification or requirements document, such as specification or requirements

BLIND PEOPLE
document. It is a testing in which the software under test is treated, as a black box
.you cannot ―see‖ into it. The test provides inputs and responds to outputs without
considering how the software works.
5.3.2.1 Unit Testing

Unit testing is usually conducted as part of a combined code and unit test
phase of the software lifecycle, although it is not uncommon for coding and unit
testing to be conducted as two distinct phases.
Test strategy and approach
Field testing will be performed manually and functional tests will be written
in detail.
Test objectives
● All field entries must work properly.

● Pages must be activated from the identified link.
● The entry screen, messages and responses must not be delayed.
Features to be tested
● Verify that the entries are of the correct format

● No duplicate entries should be allowed
● All links should take the user to the correct page.
5.3.2.2 Integration Testing

Software integration testing is the incremental integration testing of two or
more integrated software components on a single platform to produce failures
caused by interface defects.
The task of the integration test is to check that components or software
applications, e.g. components in a software system or – one step up – software
applications at the company level – interact without error.
Test Results : All the test cases mentioned above passed successfully. No defects
encountered.

BLIND PEOPLE
5.3.2.3 Acceptance Testing
User Acceptance Testing is a critical phase of any project and requires significant
participation by the end user. It also ensures that the system meets the functional
requirements.
Test Results : All the test cases mentioned above passed successfully. No defects
encountered.
5.3.3 Test Cases

Test Case 1
Table 5.1: Test case 1
SNO Test Case ID Object Detection
1 Precondition User need to set up the system

requirements.
2 Description Detection and recognition of

objects using hardware called
camera
3 Test Steps 1. Run the code.

2. Click on popup window to see
the objects that are detected.
4 Expected Output Objects surrounded with frames

along with object names
5 Actual Output Objects detected and the frames are

surrounded with labels
6 Status PASS
7 Remarks -

BLIND PEOPLE
Test Case 2
Table 5.2: Test Case 2
SNO Test case ID Distance calculation
1 Precondition Objects need to be surrounded with

boundaries along with midranges.
2 Description Depth of each object is estimated

using mid ranges provided by the
frames

2. Mid ranges will automatically
calculated and distance will be send
as output.
4 Expected Output Approximated distance of each

Object based on mid ranges.
5 Actual Output Approximate distance is calculated

based on depth estimation using mid
ranges.
6 Status PASS
7 Remarks -

BLIND PEOPLE
Test case 3:
Table 5.3: Test Case 3
SNO Test case ID Audio output with Alerts
1 Precondition Objects need to be detected and

recognized.
2 Description After detecting the objects and

calculation of the distance the alerts
are send to the user.

2. Check for Speakers whether there
are connected properly are not.
3. Check for Volume
4 Expected Output Voice noted consisting Distance and

warnings based on safe and unsafe
distance.
5 Actual Output Voice noted are generated consisting

distance and warning to help the user
based on approximate distance.
6 Status PASS
7 Remarks -

BLIND PEOPLE
CHAPTER 6
RESULTS
6.1 EXECUTION PROCEDURE
1. After setting up the Tensorflow API in to the system we need to convert the
protos files into executable files.
2. Next in the object detection folder we need to choose research folder and the
python code need to be saved here.
3. In the python idle called Anaconda Spyder we need to open the python file
4. Now when we run the code the output will be generated.
5. Output will be:
 An popup window displaying the object will be opened.
 In that the objects are surrounded with boundaries which labeled
with object names.
 Distance is calculated and warnings are displayed in the frames.
 Along with Distance alerts are send to the user based on the safe
distance and unsafe distance which is calculated using mid ranges
of the frames by the system.
6.2 SCREENSHOTS
6.2.1 Detecting Objects
Figure 6.1: System detected the person and labeled with accuracy percent.

BLIND PEOPLE
Description: When the system is switched on the camera will be activated and
capture the images and send as input ,that input is then computed with the datasets
and object will be detected.
6.2.1 Detecting Objects and Giving Warnings
Figure 6.2: System detecting different objects (cell phone and person) and giving
warnings if it is very close to the camera.
6.2.3 Detecting Multiple Objects
Figure 6.3: System detecting multiple objects at a time.

Description: System detected multiple objects and they are person, water bottle
and cell phone at the same time.

BLIND PEOPLE
6.2.4 Depth Estimation
Figure 6.4: System finding the mid ranges for calculating distance of the object
from the camera.
Description: After detecting the object the frames are detected over the objects and
that frames has the depth values which are used to calculated the distance of the
object from the camera.
These calculated mid ranges will be used for the calculation of the distance and
using voice modules alerts are send to the user based on the safe and unsafe
distances.

BLIND PEOPLE
CHAPTER 7
CONCLUSION AND FUTURE ENHANCEMENT
7.1 CONCLUSION
Previous studies have proposed a number of methods to detect object . After
doing literature survey, different techniques has been found for detecting and
Recognition of Object and they use different types of data as input for their
methodology. After the survey of different types of methods, it is found that using
SSD Architecture model which was trained under COCO datasets is the easy
method which can be easily applied and appropriate in all conditions. We decide to
explore this method of computer vision and proposed a noble method to detect and
recognize the objet based on Tensor flow and finding distance, sending output
through voice assistance like speaker, by this blind person can live without
depending on others for their day to day life on detection and recognizing the object
and will alerted because of voice outputs.
7.2 FUTURE ENHANCEMENT

Future Enhancements are always meant to be items that require more
planning, budget and staffing to have them implemented. The recommended area
for future enhancements is a Standalone Product. It can be implemented as a
standalone product, which can be installed in an automobile for detecting the
Objects.

BLIND PEOPLE
REFERENCES
[1] Aditya Raj, "Model for Object Detection using Computer Vision and Machine
Learning for Decision Making," International Journal of Computer Applications,
2019.
[2] Bhumika Gupta, "Study on Object Detection using Open CV Python,"
International Journal of Computer Applications Foundation of Computer Science,
vol. 162, 2017.
[3] Abdul Muhsin M, "Online Blind Assistive System using Object Recognition,"
International Research Journal of Innovations in Engineering and Technology, vol.
3, pp. 47-51, 2019.
[4] "OpenCV," [Online]. Available: https://opencv.org/ .
[5] "Python programming language," [Online]. Available: https://www.python.org/.
[6] "TensorFlow," [Online]. Available: https://www.tensorflow.org/ .
WEBSITES
[1] www.python.org
[2] wiki.python.org
TEXTBOOKS
[1] Computer Vision: Algorithms and Applications
[2] Learning with Python - How to Think Like a Computer Scientist

e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:04/Issue:03/March-2022 Impact Factor- 6.752 www.irjmets.com
OBJECT DETECTION AND RECOGNITION USING TENSORFLOW
FOR BLIND PEOPLE
Ms. P Devika*1, S P Jeswanth*2, Billu Nagamani*3, T Anuja Chowdary*4,
Madhan Kaveripakam*5, Nama Chandu*6
*1Associate Professor, Department Of Computer Science And Engineering, Siddartha Institute Of
Science And Technology, Puttur, India.
*2,3,4,5,6Students, Department Of Computer Science And Engineering, Siddartha Institute Of Science
And Technology, Puttur, India.
ABSTRACT
Computer Vision impairment or blindness is one such top ten disabilities in humans, and unfortunately, India
has the world’s largest visually impaired population. For this we are creating a framework to guide the visually
impaired on object detecting and recognition, so that they can navigate without others support, and be safe
within their surroundings. In this system the captured image is taken and sent it as input using camera. SSD
Architecture is used here for the detection of objects based on deep neural networks to make precise detection.
This input will be given to the software and it will be processed under the COCO datasets which are predefined
in the Tensor flow library used as training dataset for the system in general this data set consist of features for
nighty percent of real world data objects and distance is calculated by depth estimation and also by using voice
assistance packages the software will produce the output in the way of Audio.
The System is implemented completely using Python Programming Language since python consist of many
inbuilt packages and libraries which will make the complication of writing code more number of lines into
simple any less number of lines.
Keywords: Object Detection, Tensor Flow, COCO Datasets.
I. INTRODUCTION
The fast progress of data and organize technology has advanced from the Internet and computerization systems
that were initially utilized for authoritative offices and mechanical and economical applications to the
application of these innovations every one’s life. Once you think of technology like augmented reality, one of the
key components to consider is object acknowledged innovation, moreover known as object detection. This term
specifies to a capacity to distinguish the frame and shape of diverse objects and their position in space caught
by the camera. It’s a known reality that the number of visually disabled individuals within the world is almost
more than 280 million, roughly break-even with the 25% of the Indian population. They suffer-normal and
difficult challenges in regular activities particularly when they are on their own. They are generally dependent
on somebody for getting to their day-to-day works. So, it’s a very challenging and the nonphysical arrangement
for them is of most extreme significance and much required. One such solution from our side is that we came up
with an Machine Learning Framework permits the blind activities to distinguish and classify general Time-
Based day-to-day object and produce voice outputs and calculates distance using mathematical calculations
which produces alerts whether user is exceptionally near or far absent from the source. The same framework
can be used for Obstacle Detection Instrument. The primary reason for object detection is to find different
things, which draw rectangular bounding boxes around them, and decide the course of each item found.
Applications of object discovery emerge in large no of diverse domains counting recognizing people on foot for
self-driving vehicals, checking crops, and indeed real-time ball following the basket.
II. OBJECTIVE
The project goal is to incorporate an art of techniques for object detection to achieve high accuracy with real-
time detecting performance. In this project, we use Python programming language with an TensorFlow-based
solution for solving the problem of object detection in an end-to-end solving fashion. The proposed system will
be fast and effiecient. A TensorFlow based application approach for an mobile device, using its built-in
hardware component camera is used for detecting objects, more specifically:
The framework is prepared in such a way where an mobile application (assuming you're using it on an
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[1884]
e-ISSN: 2582-5208
Android/ios device) will capture real-time frames and will send them to the backend of the application where
all the predefind computations takes place.
Fig 1: Objects for object recognition which consist a dog and a duck on the beach
 Along with the object finding, we have used an alert framework where distance will get calculated. In case
the Blind Person is especially close to the object or is far away at a safe put, it'll produce voice-based outputs
yields alongside distance units.
 The backend of the application is where the video clip is sent and is taken as an input, which goes through
the COCO DATASETS object detection model one of the datasets predefined and which tests and detects with
accurate metrics.
 After testing output of the application will sent to voice modules the course of the object will be changed
into default voice notes which can at that point be sent to the users for their needs.
 Along with the object finding, we have an alert voice framework where figure out distance. In case the Blind
victim is especially close to the source or is distant far away at a more secure place, it'll generate voice notes
alongside distance measure units.
III. LITERATURE SURVEY
OBJECT DETECTION USING CONVOLUTIONAL NEURAL NETWORK
In 2019, “Object Detection using convolutional Neural Networks”. As Vision systems are essential in building a
mobile robot. That will complete a certain task like navigation, surveillance, and explosive ordnance disposal
(EOD). Vision systems are essential in building a mobile robot. A project was proposed based on CNN, which is
used to detect objects in the environment. Methodology used- Two state of art models are compared for object
detection, Single Shot Multi-Box Detector (SSD) with MobileNetV1. Another methodology is A Faster
Convolutional Neural Network (Faster-RCNN) with the help of InceptionV2.
IMAGE BASED REAL TIME OBJECT DETECTION ALONG WITH RECOGNITION IN IMAGE PROCESSING
In 2019, “Image Based Real Time Object Detection and Recognition, In Image Processing” Object detection and
tracking mainly for human and vehicle is presently most active research topic. It is used in applications such as
surveillance, image retrieval. A solution was proposed which has reviewed recent technologies for each phase
of the object detection. The methodology used here is four different methods for object detection which is
nothing but a computer technology related to computer vision with image processing that deals with detecting
instances of semantic objects of a certain class in digital images and videos and, they are feature based
detection, region based detection outline based detection illustrations and model based detection.
SALIENT OBJECTS DETECTING WITH SEGMENT FEATURES USING MEAN SHIFT TECHNOLOGY
In 2020, “Salient Object Detection with Segment Features Using Mean Shift Algorithm”. The object recognition
has attracted high attention for its diverse applications in everyday life. It is used in applications such as
surveillance, image retrieval. A solution was proposed which introduced a new fast method for saliency object
detecting within images. The main aim was detection of objects in complex images. The methodology used has
four steps: regional feature extraction, segment clustering, saliency score computation and post-processing.
REAL-TIME OBJECT DETECTION USING DEEP LEARNING
In 2020, “Real-Time Object Detection Using Deep Learning”. Object detecting, recognition in images and videos
is one such major thing today. For this a solution was proposed using deep learning. The methodology used
[1885]
e-ISSN: 2582-5208
here includes feature extraction with the help of Darknet-53 along with feature map up sampling and
concatenation. Model includes various changes in object detection techniques.
ASSISTIVE OBJECT FINDING SYSTEM FOR VISUALLY IMPAIRED PEOPLE
In 2020 ,“Assistive object Recognition/finding System for visually impaired” The issue of visual impair or blind
people is faced worldwide, for this a solution was proposed where two cameras placed on blind person's
glasses, GPS free service, and ultrasonic sensors are employed.
To give information about the surroundings. The methodology used here is system takes real-time images as
input, then images are pre-processed based on the job, their background and foreground work are separated
and then the DNN module with the help of pre-trained YOLO model is applied for resulting in featured
extraction.
IV. PROBLEM FINDING
The Populated number of people visually impaired in the world is more than 290 million. In this 42% are blind
and 58% have no vision. They are an important part of our society. It’s very difficult for them to live the outside
world independently. Today in the fast moving society, visually impaired people require supportive
instruments in their day-to-day life. Our thought primarily centered on developing an assistive framework for
impaired people to detect objects effectively which can be helpful to live.
PROBLEM DESCRIPTION
The system is designed in such a way where an mobile application will capture real-time objects and will send
it to a laptop based networked server where all the important computations take place and utilizing a pre-
trained SSD detection model which is trained on COCO datasets the objects will detect and recognized by the
system. After that distance will be calculated and the output for this will be in audio form where the system
gives warnings with calculated distance.
V. EXISTING SYSTEM
Most of the computer vision systems exist now-a-days to help the people who are visually impaired in their life.
These include technological Augmented Reality approached wearable goggles, video calling applications for the
visually impaired to ask for assistance, AI and GPS based navigation systems, etc. These systems are developed
to work in specific cases or conditions, and cannot be used broadly. There are cases wherein the people with
visual impairment have to accept about their surroundings, which is not possible with the existing systems.
DISADVANTAGES OF EXISTING SYSTEM
 They are expensive.
 Most of the visually impaired people (assume single person) cannot afford such highly economical products.
 These systems may be complex in functionality, making it difficult to be used by the blind people.
 Some systems are not real-time.
VI. PROPOSED SYSTEM
In this proposed system, we are using Python with an Tensor Flow-based approach to find the solution for the
problem of object detection in an end-to-end fashion. We used SSD Detection Model for the detecting of items
based on deep neural networks to make effective detection and OpenCV library for real time picture capturing.
Among ImageNet, Google Open, COCO datasets we are using COCO since it will provided class of classified
feature for more than 90% of the real world objects. The image is sent as an input to the model and meanwhile
distance is calculated using depth estimation with the help of voice modules predefined by python the output of
the object name will be converted into default voice notes which are sent to the blind people for their help with
calculated distance along with measures.

[1886]
e-ISSN: 2582-5208
Fig 2: System Architecture

User starts the System, after that the system will activate the camera and capture instant real time images
which will be considered as input. After capturing images it will store and send to the dataset where using SSD
Architecture the internal computations will takes place mean while after the computations the model will
detect the object and recognition will be done.
After detection of object next it will be displayed on the monitor where frames are captured to the detected
object along will the labels. Next distance will be calculated using depth estimation by finding mid ranges to the
frames. Now using speakers which are based on voice module packages the detected object images will be read
as text and it will be the output.
ADVANTAGES
 Easy to use.
 Provides real-time results and this result is in the voice format with distance.
 Depending on the video quality, difference between various objects like chair and table etc can be easily
differentiated.
 Due to usage of COCO datasets it will provide the 90% of results efficiently.
VII. MODULES
Video Capturing Module:
When the system is turned on the system capture images using camera. We have to connect this as input to the
COCO dataset and classification of pixels and features takes place. The captured frames can be seen in the
monitor with drawn boundaries and label. The method videocapture( ) is used to start the camera and capture
the video.
Image Processing Module:
OpenCV (Open Source Computer Vision) is a library in python which functions mainly aimed at real-time
computer vision. It is mainly used to do all the computational operation related to images. cv2 is used to
perform image processing and make use of methods which are used to detect and capture the frames and
specifies names. This module is processed after the input is taken from the camera.
Object Detecting Module:
The algorithm will take the image as input and all the computations will take place like divding the image into
neurons nothing but pixels and classification of features which will be done on Neural Network. Image will be
read as string for the next computation and it will be compared under trained dataset. This can be achieved
[1887]
e-ISSN: 2582-5208
here by using category index where 90 objects are trained separately. Here we used SSD Architecture which
comes under Tensor Flow API .
Distance calculation Module:
To find the distance of the object numpy is used, which is pip package used for mathematical calculation.
Finding distance can be approach by using depth estimation, using detected objects visible on the monitor
frames the depth estimation will take place by finding mid ranges and rounding the estimation scale to 0-10.
Audio Output Module:
Next after detecting the object and calculating the distance our aim is to give the output in the audio using voice
notes. In the output we are going to specify the distance along with units and the warning messages to alert the
user. For audio output the pyttxs3 pip package which is predefinded python built in module used for converting
text to speech.
Fig 3: Object Detection

VIII. CONCLUSION
Previous studies have proposed a number of methods to detect object. After doing literature survey, different
techniques has been found for detecting and Recognition of Object and they use different types of data as input
for their methodology. After the survey of different types of methods, it is found that using SSD Architecture
model which was trained under COCO datasets is the easy method which can be easily applied and appropriate
in all conditions. We decide to explore this method of computer vision and proposed a noble method to
detecting and recognize the objet based on Tensor flow and finding distance, sending output through voice
assistance like speaker, by this blind person can live without depending on others for their day to day life on
detection and recognizing the object and will alerted because of voice outputs. As per future work we are
willing to make an application software for the IOS devices.
IX. REFERENCES
[1] Aditya Raj, "Model for Object Detection using Computer Vision and Machine Learning for Decision
Making," International Journal of Computer Applications, 2019.
[2] Bhumika Gupta, "Study on Object Detection using Open CV Python," International Journal of Computer
Applications Foundation of Computer Science, vol. 162, 2017.
[3] Abdul Muhswin M, "Online Blind Assistive System using Object Recognition," International Research
Journal of Innovations in Engineering and Technology, vol. 4, pp. 49-51, 2018.
[4] "OpenCV," [Online]. Available on: www.opencv.org.
[5] "Python language," [Online]. Available on: www.python.org.

[1888]
International Research Journal Of Modernization
in Engineering Technology and Science
(Peer-Reviewed, Open Access, Fully Refereed International Journal)
e-ISSN: 2582-5208
Ref: IRJMETS/Certificate/Volume 4/Issue 03/40300080316 Date: 26/03/2022
Certificate of Publication
This is to certify that author “S P Jeswanth” with paper ID
“IRJMETS40300080316” has published a paper entitled “OBJECT
DETECTION AND RECOGNITION USING TENSORFLOW FOR
BLIND PEOPLE” in International Research Journal Of Modernization
In Engineering Technology And Science (IRJMETS), Volume 4, Issue
03, March 2022
Editor in Chief
We Wish For Your Better Future

www.irjmets.com
e-ISSN: 2582-5208
This is to certify that author “Billu Nagamani” with paper ID
03, March 2022
Editor in Chief

www.irjmets.com
e-ISSN: 2582-5208
This is to certify that author “T Anuja Chowdary” with paper ID
03, March 2022
Editor in Chief

www.irjmets.com
e-ISSN: 2582-5208
This is to certify that author “Madhan Kaveripakam” with paper ID
03, March 2022
Editor in Chief

www.irjmets.com
e-ISSN: 2582-5208
This is to certify that author “Nama Chandu” with paper ID
03, March 2022
Editor in Chief

www.irjmets.com

Batch 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Batch 4

Uploaded by

Copyright:

Available Formats

A Project Phase – II (18CS0534) Report

Under the esteemed guidance of

Mrs. P. DEVAKI, M.TECH

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Guide Head of the Department

Submitted for the University Examination held on _______________

INTERNAL EXAMINER EXTERNAL EXAMINER

Chapter No Title Page. No

1.1 Domain Description 01

3.1 Problem Statement 10

4.1 Hardware Requirements 12

5.1.1 System Architecture 14

5.2 SYSTEM IMPLEMENTATION

5.2.1 Software Description 27

5.3 SYSTEM TESTING

5.3.1 Software Testing 50

5.3.2.3 Acceptance Testing 53

6.1 Execution Procedure 56

7. CONCLUSION AND FUTURE ENHANCEMENT

The System is implemented completely using Python Programming Language

Keywords: Object detection, Tensor Flow, COCO datasets

Figure No. Figure Name Page No.

1.1 Image processing 02

5.1 System Architecture 14

5.2 SSD Architecture 17

5.3 SSD MultiBox 18

5.4 Training Dataset(COCO) 19

5.5 Accuracy Comparision 19

5.6 Class Diagram 22

5.7 Use case Diagram 23

5.8 Sequence Diagram 24

5.9 Activity Diagram 25

5.10 Deployment Diagram 26

5.11 TensorFlow Installation 29

5.12 TensorFlow Repository Link 29

5.13 A snap of converted PROTOS FILES to PYTHON 30

5.14 SSD Architecture 31

5.15 MobileNet Architecture 33

5.16 Distance Approximations 33

5.19 Power Supply cable 39

6.1 System detected the person and labeled with 56

6.2 System detecting different objects and giving 57

6.3 System detecting multiple objects at a time 57

6.4 System finding the mid ranges for calculating 58

Table No Table Name Page.No

5.1 Test case 1 53

5.2 Test case 2 54

5.3 Test case 3 55

SISTK DEPARTMENT OF CSE Page 1

Figure 1.1: Image Processing.

SISTK DEPARTMENT OF CSE Page 2

1.2 ABOUT THE PROJECT

SISTK DEPARTMENT OF CSE Page 3

 IMAGE BASED REAL TIME OBJECT DETECTION AND

SISTK DEPARTMENT OF CSE Page 4

 REAL-TIME OBJECT DETECTION USING DEEP LEARNING

SISTK DEPARTMENT OF CSE Page 5

 OBJECT RECOGNITION USING TENSORFLOW

 OBJECT DETECTION USING TENSORFLOW CONVOLUTIONAL

SISTK DEPARTMENT OF CSE Page 6

 IMPLEMENTATION OF REAL-TIME OBJECT DETECTION SYSTEM

 OBJECT DETECTION AND IDENTIFICATION FOR BLIND PEOPLE