Professional Documents
Culture Documents
On
OBJECT DETECTION AND RECOGNITION USING
TENSORFLOW FOR BLIND PEOPOLE
Submitted in partial fulfillment for the award of the degree of
Bachelor of Technology
in
COMPUTER SCIENCE AND ENGINEERING
Submitted by
S P JESWANTH 184E1A0523
BILLU NAGAMANI 184E1A0531
T ANUJA CHOWDARY 184E1A0502
MADHAN KAVERIPAKAM 184E1A0527
NAMA CHANDU 184E1A0509
Certificate
This is to certify that the Project entitled
“OBJECT DETECTION AND RECOGNITION USING TENSORFLOW
FOR BLIND PEOPLE”
that is being submitted by
S P JESWANTH 184E1A0523
BILLU NAGAMANI 184E1A0531
T ANUJA CHOWDARY 184E1A0502
MADHAN KAVERIPAKAM 184E1A0527
NAMA CHANDU 184E1A0509
in partial fulfillment of the requirements for the award of BACHELOR OF
TECHNOLOGY in Computer Science And Engineering to JNTUA,
Ananthapuramu. This project Phase – II (18CS0534) work or part thereof has
not been submitted to any other University or Institute for the award of any
degree.
The satisfaction that accompanies the successful completion of any task would be
incomplete without the mention of the people who made it possible, without whose
guidance, encouragement and help this venture would not have been success. The
acknowledgement transcends the reality of formality when we would like to express deep
gratitude and respect to all those people behind the screen who guided, inspired and
helped for us for the completion of our project work in time and up to the standards.
We owe our gratitude to our Honorable Chairman Dr. K. ASHOK RAJU, Ph.D.,
and also deep sense of gratitude to our honorable principal Dr. M. JANARDHANA
RAJU, M.E, Ph.D., for having provided all the facilities and support in completing our
project successfully.
We owe our deep sense of gratitude to our Head of the Department
Dr. M.A. MANIVASAGAM, M.E, Ph.D., for his valuable guidance and constant
encouragement given to us during this work.
We express our deep sense of gratitude to our project coordinator
Mr. R. PURUSHOTHAMAN, M.E (Ph.D) who evinced keen interest in our effects and
provided his valuable guidance throughout our project work.
We express our deep sense of gratitude to our project guide Ms. P. DEVIKA,
M.Tech for her guidance and supervision at all levels of our project work. We are indebted
to her valuable suggestions and sustained help in completion of our project work.
We also thankful to All Staff Members of CSE Department, for helping us to
complete this project work by giving valuable suggestions.
The last, but not least, we express our sincere thanks to all our friends who have
supported us in the accomplishment of this project.
TABLE OF CONTENTS
1. INTRODUCTION
2. LITERATURE SURVEY 04
3. SYSTEM ANALYSIS
4. SYSTEM REQUIREMENTS
5. SYSTEM DEVELOPMENT
5.1 SYSTEM DESIGN 14
6. RESULTS
7.1 Conclusion 59
7.2 Future Enhancement 59
REFERENCE
ABSTRACT
Computer Vision impairment or blindness is one of the top ten disabilities in
humans, and unfortunately, India has the world’s largest visually impaired population.
For this we are creating a framework to assist the visually impaired on object detection
and recognition, so that they can independently navigate, and be aware of their
surroundings. In this system the captured image is taken and sent it as input using
camera. SSD Architecture is used here for the detection of objects based on deep neural
networks to make precise detection. This input will be given to the software and it will be
processed under the COCO datasets which are predefined in the Tensor flow library used
as training dataset for the system in general this data set consist of features for nighty
percent of real world data objects and distance is calculated by depth estimation and also
by using voice assistance packages the software will produce the output in the form of
Audio.
i
LIST OF FIGURES
5.17 Camera 38
5.18 Monitor 39
ii
Figure No. Figure Name Page No.
5.20 SD Card 40
5.21 Buzzer 40
iii
LIST OF TABLES
iv
OBJECT DETECTION AND RECOGNITION USING TENSORFLOW FOR
BLIND PEOPLE
CHAPTER 1
INTRODUCTION
1.1 DOMAIN DESCRIPTION
Object detection is a computer technology related to computer
vision and image processing that deals with detecting instances of semantic objects
of a certain class (such as humans, buildings, or cars) in digital images and
videos. Well-researched domains of object detection include face
detection and pedestrian detection. Object detection has applications in many areas
of computer vision, including image retrieval and video surveillance.
It is widely used in computer vision tasks such as image annotation, activity
recognition, face detection, face recognition, video object co-segmentation. It is also
used in tracking objects, for example tracking a ball during a football
match, tracking movement of a cricket bat, or tracking a person in a video.
Every object class has its own special features that help in classifying the
class – for example all circles are round. Object class detection uses these special
features. For example, when looking for circles, objects that are at a particular
distance from a point (i.e. the center) are sought. Similarly, when looking for
squares, objects that are perpendicular at corners and have equal side lengths are
needed. A similar approach is used for face identification where eyes, nose, and lips
can be found and features like skin color and distance between eyes can be found.
A more generalized (multi-class) application can be used in autonomous
driving where a variety of objects need to be detected. Also it has a important role to
play in surveillance systems. These systems can be integrated with other tasks such
as pose estimation where the first stage in the pipeline is to detect the object, and
then the second stage will be to estimate pose in the detected region. It can be used
for tracking objects and thus can be used in robotics and medical applications. Thus
this problem serves a multitude of applications.
Image Processing
Image processing is a method to perform some operations on an image, in
order to get an enhanced image or to extract some useful information from it. It is a
type of signal processing in which input is an image and output may be image or
There are two types of methods used for image processing namely, analogue
and digital image processing. Analogue image processing can be used for the hard
copies like printouts and photographs. Image analysts use various fundamentals of
interpretation while using these visual techniques. Digital image processing
techniques help in manipulation of the digital images by using computers. The three
general phases that all types of data have to undergo while using digital technique
are pre-processing, enhancement, and display, information extraction.
CHAPTER 2
LITERATURE SURVEY
OBJECT DETECTION USING CONVOLUTIONAL NEURAL
NETWORK
In 2019, ―Object Detection using convolutional Neural Networks‖. As
Vision systems are essential in building a mobile robot. That will complete a certain
task like navigation, surveillance, and explosive ordnance disposal (EOD). Vision
systems are essential in building a mobile robot. A project was proposed based on
Convolutional Neural Networks (CNN) which is used to detect objects in the
environment.
Methodology used- Two state of the art models are compared for object
detection, Single Shot Multi-Box Detector (SSD) with MobileNetV1. A Faster
Region-based Convolutional Neural Network (Faster-RCNN) with InceptionV2.
A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm
which can take in an input image, assign importance (learnable weights and biases)
to various aspects/objects in the image and be able to differentiate one from the
other.
CHAPTER 3
SYSTEM ANALYSIS
3.1 PROBLEM STATEMENT
The evaluated number of people visually impaired inside the world is 285
million. In this 39 million blind, and 246 million have no vision. They are an
important portion of our society. It‘s very difficult for them to face the outside
world.
Today in the fast moving society, visually impaired people require
supportive instruments in their day-to-day life. Our thought primarily centered on
designing and actualizing an assistive framework for visually impaired people to
detect objects effectively.
CHAPTER 4
SYSTEM REQUIREMENTS SPECIFICATION
4.1 HARDWARE REQUIREMENTS
Web Camera
Speakers
Hard Disk
High Performing Processor
Ram (1 Gb)
4.2 SOFTWARE REQUIREMENTS
Idle : Spyder
Languages Used : Python
Tensorflow API
Packages Used:
Pytesseract
torch
tarfile
tensorflow as tf
pyttsx3
numpy
Pytesseract or Python-tesseract is an Optical Character Recognition (OCR) tool for
python. It will read and recognize the text in images, license plates, etc. ... (Any
Image with Text). Binarizing the Image (Converting Image to Binary).
PyTorch is an open-source Python library for deep learning developed and
maintained by Facebook. The project started in 2016 and quickly became a popular
framework among developers and researchers.
Python tarfile module is used to read and write tar archives. Python provides us
excellent tools and modules to manage compressed files, which includes (but not
limited to) performing file and directory compression with different mechanisms
like gzip, bz2 and lzma compression. This is similar to python zip function.
CHAPTER 5
SYSTEM DEVELOPMENT
5.1 SYSTEM DESIGN
5.1.1 System Architecture
5.1.2 MODULES
5.1.2.1 Video Capturing Module:
When the system is turned on the system capture images using camera. We
have to connect this as input to the COCO dataset and classification of pixels
and features takes place.
The captured frames can be seen in the monitor with drawn boundaries and
label.
The method videocapture( ) is used to start the camera and capture the video.
5.1.3 Algorithm
We use an algorithm called Single Shot Detection (SSD) Algorithm
Single Shot MultiBox Detector (SSD) is an object detection algorithm that is a
modification of the VGG16 architecture. It was released at the end of November
2016 and reached new records in terms of performance and precision for object
detection tasks, scoring over 74% mAP (mean Average Precision) at 59 frames per
second on standard datasets such as PascalVOC and COCO.
SSD‘s architecture builds on the venerable VGG-16 architecture, but discards the
fully connected layers.
Accuracy Comparision
Python Introduction
Spyder
Initially created and developed by Pierre Raybaut in 2009, since 2012 Spyder has
been maintained and continuously improved by a team of scientific Python
developers and the community.
Spyder uses Qt for its GUI and is designed to use either of the PyQt or PySide
Python bindings. QtPy, a thin abstraction layer developed by the Spyder project
and later adopted by multiple other packages, provides the flexibility to use either
backend.
IMPLEMENTATION:
TENSORFLOW APIs
Compilation Code
And Boom! You‘ve successfully converted your protos file into python files.
Anchor box
Multiple anchor/prior boxes can be assigned to each grid cell in SSD. These
assigned anchor boxes are pre-defined and each one is responsible for a size and
shape within a grid cell.Matching phase is used by SSD while training, so that
Zoom Level
It is not mandatory for the anchor boxes to have the same size as that of the grid
cell.The user might be interested in finding both smaller or larger objects within a
grid cell. In order to specify how much the anchor boxes need to be scaled up or
down with respect to each grid cell ,the zooms parameter is used.
MobileNet
This model is based on the ideology of THE MobileNet model based on depthwise
separable convolutions and it forms a factorized Convolutions. These converts basic
standard convolutions into a depthwise convolutions. This 1×1 convolutions are
also called as pointwise convolutions. For MobileNet to work, these depthwise
convolutions applies a general single filter based concept to each of the input
channels. These pointwise convolutions applies a 1×1 convolutions to merge with
the outputs of the depthwise convolutions. As a standard convolution both filters
combines the inputs into a new set of outputs in one single step. The depthwise
identifiable convolutions splits this into two layers — a separate layer for the
filtering purpose and the other separate layer for the combining purpose. This
factorization methodology has the effect of drastically reducing the computation and
that of the model size.
We got the center of two by subtracting the same axis start coordinates and dividing
them by two. In this way the centre of our detected rectangle is calculated. And at
the last, a dot is drawn in the centre. The default parameter for drawing boxes is a
score of 0.5. if scores[0][i] >= 0.5 (i.e. equal or more than 50 percent) then we
assume that the object is detected. if scores[0][i] >= 0.5:
In the above formula, mid_x is centre of X axis and mid_y is centre of y axis. If the
distance apx_distance < 0.5 and if mid_x > 0.3 and mid_x < 0.7 then it can be
concluded that the object is too close from the particular person. With this code,
relative distance of the object from a particular person can be calculated. After the
detection of object the code is used to determine the relative distance of the object
from the person. If the object is too close then signal or a warning is issued to the
person through voice generation module.
OpenCV
OpenCV (Open Source Computer Vision Library) is an open source
computer vision and machine learning software library. OpenCV was built to
provide a common infrastructure for computer vision applications and to accelerate
the use of machine perception in the commercial products. Being a BSD-licensed
product, OpenCV makes it easy for businesses to utilize and modify the code. The
library has more than 2500 optimized algorithms, which includes a comprehensive
set of both classic and state-of-the-art computer vision and machine learning
algorithms.
These algorithms can be used to detect and recognize faces, identify
objects, classify human actions in videos, track camera movements, track moving
objects, extract 3D models of objects, produce 3D point clouds from stereo
cameras, stitch images together to produce a high resolution image of an entire
scene, find similar images from an image database, remove red eyes from images
taken using flash, follow eye movements, recognize scenery and establish markers
to overlay it with augmented reality, etc
.
Package
A package is basically a directory with Python files and a file with the
name __init__.py. This means that every directory inside of the Python path,
Importing Packages
1) cv2
cv2 (old interface in old OpenCV versions was named as cv ) is the name
that OpenCV developers chose when they created the binding generators. OpenCV-
Python makes use of Numpy, which is a highly optimized library for numerical
operations with a MATLAB-style syntax. All the OpenCV array structures are
converted to and from Numpy arrays. This also makes it easier to integrate with
other libraries that use Numpy such as SciPy and Matplotlib.
2) torch
3) pytesseract
Python-tesseract is an optical character recognition (OCR) tool for python. That is,
it will recognize and ―read‖ the text embedded in images. Python-tesseract is a
wrapper for Google‘s Tesseract-OCR Engine. It is also useful as a stand-alone
invocation script to tesseract, as it can read all image types supported by the Pillow
and Leptonica imaging libraries, including jpeg, png, gif, bmp, tiff, and others.
4) numpy
5) time
● One of the ways you can manage the concept of Python time in your
application is by using a floating point number that represents the number of
seconds that have passed since the beginning of an era—that is, since a
certain starting point.
● Let‘s dive deeper into what that means, why it‘s useful, and how you can use
it to implement logic, based on Python time, in your application.
● understand core concepts at the heart of working with dates and times, such
as epochs, time zones, and daylight savings time
● Represent time in code using floats, tuples, and struct_time
● Convert between different time representations
● Suspend thread execution
● Measure code performance using perf_counter().
Monitor
A monitor is an electronic visual computer display that includes a screen,
circuitry and the case in which that circuitry is enclosed. Older computer monitors
made use of cathode ray tubes (CRT), which made them large, heavy and
inefficient. Nowadays, flat-screen LCD monitors are used in devices like laptops,
PDAs and desktop computers because they are lighter and more energy efficient.
A monitor is also known as a screen or a visual display unit (VDU).
Power Supply
A power supply is a component that supplies power to at least one electric
load. Typically, it converts one type of electrical power to another, but it may also
convert a different form of energy – such as solar, mechanical, or chemical - into
electrical energy.
A power supply provides components with electric power. The term usually pertains
to devices integrated within the component being powered. For example, computer
power supplies convert AC current to DC current and are generally located at the
rear of the computer case, along with at least one fan.
A power supply is also known as a power supply unit, power brick or power
adapter.
Buzzer (Speaker)
A buzzer or beeper is an audio signaling device, which may be mechanical,
electromechanical, or piezoelectric ( piezo for short). Typical uses of buzzers and
beepers include alarm devices, timers, and confirmation of user input such as a
mouse click or keystroke.
The buzzer consists of an outside case with two pins to attach it to power and
ground. When current is applied to the buzzer it causes the ceramic disk to contract
or expand. Changing this then causes the surrounding disc to vibrate. That's the
sound that you hear.
import torch
from torch.autograd import Variable as V
import models as models
from torchvision import transforms as trn
from torch.nn import functional as F
import pyttsx3
#from .engine import Engine
engine =pyttsx3.init()
arch = 'resnet18'
#= label_map_util.create_category_index(categories)
NUM_CLASSES = 90
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.compat.v1.GraphDef()
with tf.io.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
#label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
#categories = label_map_util.convert_label_map_to_categories(label_map,
max_num_classes=NUM_CLASSES, use_display_name=True)
import cv2
with detection_graph.as_default():
with tf.compat.v1.Session(graph=detection_graph) as sess:
ret = True
while (ret):
ret,image_np = cap.read()
if cv2.waitKey(20) & 0xFF == ord('b'):
cv2.imwrite('opencv'+'.jpg', image_np)
model_file = 'whole_%s_places365_python36.pth.tar' % arch
if not os.access(model_file, os.W_OK):
weight_url = 'http://places2.csail.mit.edu/models_places365/' + model_file
os.system('wget ' + weight_url)
useGPU = 1
if useGPU == 1:
model = torch.load(model_file)
else:
model = torch.load(model_file, map_location=lambda storage, loc: storage)
# model trained in GPU could be deployed in CPU machine like this!
model.eval()
centre_crop = trn.Compose([
trn.Resize((256,256)),
trn.CenterCrop(224),
trn.ToTensor(),
trn.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
file_name = 'categories_places365.txt'
if not os.access(file_name, os.W_OK):
img_name = 'opencv.jpg'
if not os.access(img_name, os.W_OK):
img_url = 'http://places.csail.mit.edu/demo/' + img_name
os.system('wget ' + img_url)
img = Image.open(img_name)
input_img = V(centre_crop(img).unsqueeze(0), volatile=True)
logit = model.forward(input_img)
h_x = F.softmax(logit, 1).data.squeeze()
probs, idx = h_x.sort(0, True)
# Expand dimensions since the model expects images to have shape: [1, None,
None, 3]
if apx_distance <=0.5:
if mid_x > 0.3 and mid_x < 0.7:
cv2.putText(image_np, 'WARNING!!!', (50,50),
cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0,0,255), 3)
print("Warning -Vehicles Approaching")
engine.say("Warning -Vehicles Approaching")
engine.runAndWait()
if classes[0][i] ==44:
if scores[0][i] >= 0.5:
mid_x = (boxes[0][i][1]+boxes[0][i][3])/2
mid_y = (boxes[0][i][0]+boxes[0][i][2])/2
apx_distance = round(((1 - (boxes[0][i][3] - boxes[0][i][1]))**4),1)
cv2.putText(image_np, '{}'.format(apx_distance),
(int(mid_x*800),int(mid_y*450)), cv2.FONT_HERSHEY_SIMPLEX, 0.7,
(255,255,255), 2)
print(apx_distance)
if apx_distance <=0.5:
if mid_x > 0.3 and mid_x < 0.7:
cv2.putText(image_np, 'WARNING!!!', (50,50),
cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0,0,255), 3)
print("Warning -BOTTLE very close to the frame")
engine.say("Warning -BOTTLE very close to the frame")
engine.runAndWait()
if classes[0][i] ==84:
if scores[0][i] >= 0.5:
mid_x = (boxes[0][i][1]+boxes[0][i][3])/2
mid_y = (boxes[0][i][0]+boxes[0][i][2])/2
apx_distance = round(((1 - (boxes[0][i][3] - boxes[0][i][1]))**4),1)
cv2.putText(image_np,'{}'.format(apx_distance),
(int(mid_x*800),int(mid_y*450)), cv2.FONT_HERSHEY_SIMPLEX, 0.7,
(255,255,255), 2)
print(apx_distance)
engine.say(apx_distance)
engine.say("units")
engine.say("BOOK IS AT A SAFER DISTANCE")
if apx_distance <=0.5:
if mid_x > 0.3 and mid_x < 0.7:
cv2.putText(image_np, 'WARNING!!!', (50,50),
cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0,0,255), 3)
print("Warning -BOOK very close to the frame")
engine.say("Warning -BOOK very close to the frame")
engine.runAndWait()
if apx_distance <=0.0:
if mid_x > 0.3 and mid_x < 0.4:
cv2.putText(image_np, 'WARNING!!!', (50,50),
cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0,0,255), 3)
print(apx_distance)
engine.say(apx_distance)
engine.say("meters")
#print("Warning -Person very close to the frame")
engine.say("Warning -Person very close to the frame")
engine.runAndWait()
#plt.figure(figsize=IMAGE_SIZE)
#plt.imshow(image_np)
#cv2.imshow('IPWebcam',image_np)
cv2.imshow('image',cv2.resize(image_np,(1024,768)))
if cv2.waitKey(2) & 0xFF == ord('t'):
cv2.destroyAllWindows()
cap.release()
break
Integration testing is specifically aimed at exposing the problems that arise from
the combination of components.
Test objectives
Test Results : All the test cases mentioned above passed successfully. No defects
encountered.
6 Status PASS
7 Remarks -
Test Case 2
6 Status PASS
7 Remarks -
Test case 3:
6 Status PASS
7 Remarks -
CHAPTER 6
RESULTS
6.1 EXECUTION PROCEDURE
1. After setting up the Tensorflow API in to the system we need to convert the
protos files into executable files.
2. Next in the object detection folder we need to choose research folder and the
python code need to be saved here.
3. In the python idle called Anaconda Spyder we need to open the python file
4. Now when we run the code the output will be generated.
5. Output will be:
An popup window displaying the object will be opened.
In that the objects are surrounded with boundaries which labeled
with object names.
Distance is calculated and warnings are displayed in the frames.
Along with Distance alerts are send to the user based on the safe
distance and unsafe distance which is calculated using mid ranges
of the frames by the system.
6.2 SCREENSHOTS
6.2.1 Detecting Objects
Figure 6.1: System detected the person and labeled with accuracy percent.
Figure 6.2: System detecting different objects (cell phone and person) and giving
warnings if it is very close to the camera.
Figure 6.4: System finding the mid ranges for calculating distance of the object
from the camera.
Description: After detecting the object the frames are detected over the objects and
that frames has the depth values which are used to calculated the distance of the
object from the camera.
These calculated mid ranges will be used for the calculation of the distance and
using voice modules alerts are send to the user based on the safe and unsafe
distances.
7.1 CONCLUSION
Previous studies have proposed a number of methods to detect object . After
doing literature survey, different techniques has been found for detecting and
Recognition of Object and they use different types of data as input for their
methodology. After the survey of different types of methods, it is found that using
SSD Architecture model which was trained under COCO datasets is the easy
method which can be easily applied and appropriate in all conditions. We decide to
explore this method of computer vision and proposed a noble method to detect and
recognize the objet based on Tensor flow and finding distance, sending output
through voice assistance like speaker, by this blind person can live without
depending on others for their day to day life on detection and recognizing the object
and will alerted because of voice outputs.
[1] Aditya Raj, "Model for Object Detection using Computer Vision and Machine
Learning for Decision Making," International Journal of Computer Applications,
2019.
[2] Bhumika Gupta, "Study on Object Detection using Open CV Python,"
International Journal of Computer Applications Foundation of Computer Science,
vol. 162, 2017.
[3] Abdul Muhsin M, "Online Blind Assistive System using Object Recognition,"
International Research Journal of Innovations in Engineering and Technology, vol.
3, pp. 47-51, 2019.
[4] "OpenCV," [Online]. Available: https://opencv.org/ .
[5] "Python programming language," [Online]. Available: https://www.python.org/.
[6] "TensorFlow," [Online]. Available: https://www.tensorflow.org/ .
WEBSITES
[1] www.python.org
[2] wiki.python.org
TEXTBOOKS
[1] Computer Vision: Algorithms and Applications
[2] Learning with Python - How to Think Like a Computer Scientist
Fig 1: Objects for object recognition which consist a dog and a duck on the beach
Along with the object finding, we have used an alert framework where distance will get calculated. In case
the Blind Person is especially close to the object or is far away at a safe put, it'll produce voice-based outputs
yields alongside distance units.
The backend of the application is where the video clip is sent and is taken as an input, which goes through
the COCO DATASETS object detection model one of the datasets predefined and which tests and detects with
accurate metrics.
After testing output of the application will sent to voice modules the course of the object will be changed
into default voice notes which can at that point be sent to the users for their needs.
Along with the object finding, we have an alert voice framework where figure out distance. In case the Blind
victim is especially close to the source or is distant far away at a more secure place, it'll generate voice notes
alongside distance measure units.
III. LITERATURE SURVEY
OBJECT DETECTION USING CONVOLUTIONAL NEURAL NETWORK
In 2019, “Object Detection using convolutional Neural Networks”. As Vision systems are essential in building a
mobile robot. That will complete a certain task like navigation, surveillance, and explosive ordnance disposal
(EOD). Vision systems are essential in building a mobile robot. A project was proposed based on CNN, which is
used to detect objects in the environment. Methodology used- Two state of art models are compared for object
detection, Single Shot Multi-Box Detector (SSD) with MobileNetV1. Another methodology is A Faster
Convolutional Neural Network (Faster-RCNN) with the help of InceptionV2.
IMAGE BASED REAL TIME OBJECT DETECTION ALONG WITH RECOGNITION IN IMAGE PROCESSING
In 2019, “Image Based Real Time Object Detection and Recognition, In Image Processing” Object detection and
tracking mainly for human and vehicle is presently most active research topic. It is used in applications such as
surveillance, image retrieval. A solution was proposed which has reviewed recent technologies for each phase
of the object detection. The methodology used here is four different methods for object detection which is
nothing but a computer technology related to computer vision with image processing that deals with detecting
instances of semantic objects of a certain class in digital images and videos and, they are feature based
detection, region based detection outline based detection illustrations and model based detection.
SALIENT OBJECTS DETECTING WITH SEGMENT FEATURES USING MEAN SHIFT TECHNOLOGY
In 2020, “Salient Object Detection with Segment Features Using Mean Shift Algorithm”. The object recognition
has attracted high attention for its diverse applications in everyday life. It is used in applications such as
surveillance, image retrieval. A solution was proposed which introduced a new fast method for saliency object
detecting within images. The main aim was detection of objects in complex images. The methodology used has
four steps: regional feature extraction, segment clustering, saliency score computation and post-processing.
REAL-TIME OBJECT DETECTION USING DEEP LEARNING
In 2020, “Real-Time Object Detection Using Deep Learning”. Object detecting, recognition in images and videos
is one such major thing today. For this a solution was proposed using deep learning. The methodology used
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[1885]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:04/Issue:03/March-2022 Impact Factor- 6.752 www.irjmets.com
here includes feature extraction with the help of Darknet-53 along with feature map up sampling and
concatenation. Model includes various changes in object detection techniques.
ASSISTIVE OBJECT FINDING SYSTEM FOR VISUALLY IMPAIRED PEOPLE
In 2020 ,“Assistive object Recognition/finding System for visually impaired” The issue of visual impair or blind
people is faced worldwide, for this a solution was proposed where two cameras placed on blind person's
glasses, GPS free service, and ultrasonic sensors are employed.
To give information about the surroundings. The methodology used here is system takes real-time images as
input, then images are pre-processed based on the job, their background and foreground work are separated
and then the DNN module with the help of pre-trained YOLO model is applied for resulting in featured
extraction.
IV. PROBLEM FINDING
The Populated number of people visually impaired in the world is more than 290 million. In this 42% are blind
and 58% have no vision. They are an important part of our society. It’s very difficult for them to live the outside
world independently. Today in the fast moving society, visually impaired people require supportive
instruments in their day-to-day life. Our thought primarily centered on developing an assistive framework for
impaired people to detect objects effectively which can be helpful to live.
PROBLEM DESCRIPTION
The system is designed in such a way where an mobile application will capture real-time objects and will send
it to a laptop based networked server where all the important computations take place and utilizing a pre-
trained SSD detection model which is trained on COCO datasets the objects will detect and recognized by the
system. After that distance will be calculated and the output for this will be in audio form where the system
gives warnings with calculated distance.
V. EXISTING SYSTEM
Most of the computer vision systems exist now-a-days to help the people who are visually impaired in their life.
These include technological Augmented Reality approached wearable goggles, video calling applications for the
visually impaired to ask for assistance, AI and GPS based navigation systems, etc. These systems are developed
to work in specific cases or conditions, and cannot be used broadly. There are cases wherein the people with
visual impairment have to accept about their surroundings, which is not possible with the existing systems.
DISADVANTAGES OF EXISTING SYSTEM
They are expensive.
Most of the visually impaired people (assume single person) cannot afford such highly economical products.
These systems may be complex in functionality, making it difficult to be used by the blind people.
Some systems are not real-time.
VI. PROPOSED SYSTEM
In this proposed system, we are using Python with an Tensor Flow-based approach to find the solution for the
problem of object detection in an end-to-end fashion. We used SSD Detection Model for the detecting of items
based on deep neural networks to make effective detection and OpenCV library for real time picture capturing.
Among ImageNet, Google Open, COCO datasets we are using COCO since it will provided class of classified
feature for more than 90% of the real world objects. The image is sent as an input to the model and meanwhile
distance is calculated using depth estimation with the help of voice modules predefined by python the output of
the object name will be converted into default voice notes which are sent to the blind people for their help with
calculated distance along with measures.
Certificate of Publication
This is to certify that author “S P Jeswanth” with paper ID
“IRJMETS40300080316” has published a paper entitled “OBJECT
DETECTION AND RECOGNITION USING TENSORFLOW FOR
BLIND PEOPLE” in International Research Journal Of Modernization
In Engineering Technology And Science (IRJMETS), Volume 4, Issue
03, March 2022
Editor in Chief
Certificate of Publication
This is to certify that author “Billu Nagamani” with paper ID
“IRJMETS40300080316” has published a paper entitled “OBJECT
DETECTION AND RECOGNITION USING TENSORFLOW FOR
BLIND PEOPLE” in International Research Journal Of Modernization
In Engineering Technology And Science (IRJMETS), Volume 4, Issue
03, March 2022
Editor in Chief
Certificate of Publication
This is to certify that author “T Anuja Chowdary” with paper ID
“IRJMETS40300080316” has published a paper entitled “OBJECT
DETECTION AND RECOGNITION USING TENSORFLOW FOR
BLIND PEOPLE” in International Research Journal Of Modernization
In Engineering Technology And Science (IRJMETS), Volume 4, Issue
03, March 2022
Editor in Chief
Certificate of Publication
This is to certify that author “Madhan Kaveripakam” with paper ID
“IRJMETS40300080316” has published a paper entitled “OBJECT
DETECTION AND RECOGNITION USING TENSORFLOW FOR
BLIND PEOPLE” in International Research Journal Of Modernization
In Engineering Technology And Science (IRJMETS), Volume 4, Issue
03, March 2022
Editor in Chief
Certificate of Publication
This is to certify that author “Nama Chandu” with paper ID
“IRJMETS40300080316” has published a paper entitled “OBJECT
DETECTION AND RECOGNITION USING TENSORFLOW FOR
BLIND PEOPLE” in International Research Journal Of Modernization
In Engineering Technology And Science (IRJMETS), Volume 4, Issue
03, March 2022
Editor in Chief