You are on page 1of 6

Proceedings of the Fourth International Conference on Smart Systems and Inventive Technology (ICSSIT-2022)

IEEE Xplore Part Number: CFP22P17-ART; ISBN: 978-1-6654-0118-0

Virtual Smart Glass for Blind using Object Detection


V.P.Gowtham Raj1 , E.Esakki Vigneswaran2 , M.Deshnaa 3 ,K. RajPrasanth4
Assistant Professor 2,, Student 1,3,4
Department of Electronics and Communication Engineering
Sri Ramakrishna Engineering College,
Coimbatore, India
esakkivignesh@srec.ac.in
2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT) | 978-1-6654-0118-0/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICSSIT53264.2022.9716412

Abstract— One among the prevailing detection is a software which helps in identification of
problems is that the vision disabilities are rapidly objects, places, people, writing and actions in images.
increasing. The disability of the person with vision Computers uses computer vision technologies in
difficulty is the inability to see and recognize people or combination with a camera and artificial intelligence
objects. Therefore, the visually impaired person has to software to achieve image recognition. Software for image
walk depending on a stick or dog or another person. The recognition requires deep machine learning. In
main motive of this experiment is to help the blind by convolutional neural networks , processors performance is
modelling a device that make them walk like normal best but requires massive amount of power.
people and to overcome their disability. This device, the
virtual smart glass assists them in their ways without the II. RELATED WORK
need of human help and help them walk independently.
The device is installed on the glass frame and these In recent times, the number of vision difficulties in the
glasses help to figure out the surroundings. The object world has been increased, which led to the innovation in
detection technology is given as input to the such area to serve the blind community.
microcontroller, to find out if there is any obstacle in
front of it. If there is an obstacle, the device will produce Mayuresh Banne (2020)[5], proposed a system
an audio form for alerting the user. which consists of a box like structure which has a portable
camera and a processing system which will process the
input image. The pictures are captured with a camera
Keywords- microcontroller, GPU processor, numPy, OpenCV,
matlab, yolo, API, XML, TTS, Co-create.
mounted device with real-time image recognition on
existing object detection methodology. V.Balaji (2020) [9],
I. INTRODUCTION explained the idea and methodology and architecture of the
system. He implemented the system and obtained immediate
In the recent days, many people suffer from different results and analyzed the performance of the tools . On
diseases or are disabled. NCBI tells that 1.5% of the comparing these papers , the virtual smart glass can be
population in Saudi are blind and 7.8% have vision implemented in any device that is easily portable for the
difficulties problem. They require some help to make their visually impaired people to detect objects from a particular
life easier and comfortable. A new technology needed to be distance.
introduced to help them. Thus, the motive Smart Glass is to
aid them walk independently and freely. This system Bhumika Gupta (2017) [14], briefly explained that, in
consists of wearable smart glass with object detection several purposes, image processing and object detection can
using neural network methodology to identify objects like be used including video surveillance and also its retrieval.
person, rupees, QR, car , bus etc. Individual image In the study, various basic concepts including image
processing description labels are used to detect the objects processing were used in object detection while making use
and are converted into voice command. Therefore, smart of the library file called OpenCV library of python 2.7.
glasses are a device which alters the wearer’s vision, where which improves the efficiency and accuracy of object
one is physically located or where he/she looks. detection and can be brought into picture. Lijun Yu (2018)
There are several ways for differentiating the visual [10], proposed that the experimental result of the device
information which a wearer perceives. The human brain shows that the object recognition system has major
make vision seem very much easy. The brain need not work significant effects on the image detection and recognition of
work hard to identify, apart from human identification, single-target motion objects and has good accurate output
reading a sign, or recognizing a human's face. These are values. This concept has good engineering application
complicated to solve using a computer. They only seem prospect.
easier just because our brains are good at understanding and The above provide a brief explanation on the object
recognizing the image. Image recognition and object recognition steps by various tools, the uniqueness of the

978-1-6654-0118-0/22/$31.00 ©2022 IEEE 1419

Authorized licensed use limited to: Higher College of Technology. Downloaded on July 16,2022 at 06:14:47 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Smart Systems and Inventive Technology (ICSSIT-2022)
IEEE Xplore Part Number: CFP22P17-ART; ISBN: 978-1-6654-0118-0

audio conversion and 3D printed glass frame that have been 3. OpenCV
implemented in order to provide an appropriate solution. 4. NumPy 1.14
A. Interpreting images and determining each object in the
The pre-trained model of python OpenCV, recognizes
images through tensorflow
objects using camera. It works based on real-time object
The camera tracks any object using Image processing detection with access of minicamera in an efficient manner
algorith m. This leads the device to receive informat ion on for each frame. The minicamera is pre-installed in the
the user’s surroundings. The smart glasses ’ unique system. The live stream can access with minicam output.
computing device detects more intensive objects. The video frame detects the objects depending on the speed
of computer’s CPU or GPU resources.
B. Conversion from Voice to Audio
After recognizing each object, they are labelled with text, Anaconda command prompt can be used, but make sure that
then using Speech Synthesis the Text to voice conversion the anaconda commands prompt without changing the
technique is performed and can be heard using the directories of the path.
headphone.
D. Object Recognition
C. Prerequisisites
The prerequisites are processor devices with mini-GPU and The Object Recognition Workflow of object detected model
is created to avoid complexities in system. The TensorFlow
nano cameras in g lasses frames. In this s mart glass, software
and open cv are models with dataset. The deep learning
solution plays a wide role using image processing.
frameworks use library for real time computer neural
network in many real time applications like medical
The mach ine learn ing algorith m set is trained to make the
recognition, face detection, object detection.
glass able to recognize objects of the input images. The
training set in recognition phase contains the data about E. Speech Synthesis
objects known to the computer through the training
The process of synthesis ing text to voice comment is
algorith m (examp le: YOLO). The classification phase uses
calibrated using Google’s Text to Speech. API commonly
the training set to recognize the objects read fro m the
used is the GTTS API which computes system speech
camera’s images. synthesizer and is implemented in python library files. Thus
the system converts the speech format into traditional
III. PROPOSED WORK speech text.

A. Image Input
The system detection is the task to identify image along with
label bo xes. System visions’ deep learning algorithm
possesses approx. 90 FPS, accurate SSD and Faster-
datasets. Database support for blobs is not universal. Hence
there is no need to use server because the data is fixed,
because the things which the smart glas s captures and
recognizes have particular feature and shape, for examp le, at
home (tables, chairs, doors and so on ) all these things have
same feature in all p laces (in streets, cars, sidewalk, vehicle
and so on). Therefore, the database is fixed on pattern
recognition, that it can be stored in device which run the
application process the eyeglasses.
B. Audio Input Fig. 1. Block Diagram of the Proposed Work

The result of the detected objects and the distance it is in,


are converted to text. Then using the prepared audio library From the figure 1, Object detection images and videos
for conversion of text to voices , the voice output is heard inputted to the high GPU (raspberry pi 3) is a quadcore
through the headphones. processor and has roughly 50-60 percent better performance
in 32-bit mode and 10x faster than the original single-core
C. Tools Utlized Raspberry Pi. Also, the camera placed in front of the camera
Stereo views pre-trained model along with the necessary slot, detects the images using the detection algorithm and
packages has about 30 trained datasets useful system to glasses can capture images using the camera to recognize
detect the object. Framework model used to create deep images and determine each object present in front by
learning networks solves the object detection problem. computer vision technique & pattern recognition system.
The system can calculate the distance between the person
Some necessary Tools Utilized are: and each specific object and convert the information to
1. Python 3.5 voice commands which is to be understood by the blind user
2. Caffe model framework

978-1-6654-0118-0/22/$31.00 ©2022 IEEE 1420

Authorized licensed use limited to: Higher College of Technology. Downloaded on July 16,2022 at 06:14:47 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Smart Systems and Inventive Technology (ICSSIT-2022)
IEEE Xplore Part Number: CFP22P17-ART; ISBN: 978-1-6654-0118-0

via earphones and intimates them if they are very near to the point of the image whether there is feature of the image set
object. or not. The resulting features are considered as the subsets
of the image domain. Feature detection are mostly applied
The proposed system is well trained and has a unique
by four ways such as edges, corner (interest points), blobs
novelty in providing good accurate detection results in a
(regions of interest point) and ridges .
simplified manner. The input of the virtual glass is taken
from the output provided by the strip camera and provides
subsequent feedback to the person through the earphones
B. Edges
and thus it can help the blind while walking alone in an
The boundary between two images is considered to
outdoor environment. This is a form of computing device,
be the edge. Generally, they should include junctions and
used and worn in head.
are mostly arbitrary in shape. During the application, the
When the person is wearing the glass moves from his or her edges are usually considered as set of points within the
position, the camera adjusts such that it gets clear vision particular image having a robust gradient magnitude. The
according to the position and orientation. Thus, the glasses Gradient points in the image are collectively chained
are smart and unique compared to other computing devices together inorder to make the image a more complete
of such kind. They also possess some unique features such description of a foothold.
as enabling new applications which is not easily adaptable
compared to others.
C. Interest points

The corner or interest point is a methodology


IV. METHODOLOGY where the terms corners are referred to point-like features in
a picture (fig.3). They have a two dimensional (2D)
structure. The term "Corner" was first performed by the
Initially the image is processed by several edge detection and then the edges are individually analyzed
methodologies like detection, extraction, classification and to find the rapid changes in direction or corners.
finally the objects are being recognized. Binary images are However. it was noticed that the corners were also
processed in the form of pixels, then they are extracted and detected on individual parts of the image which weren't
applied for real time binary image processing. corners in the sense.

Fig.3 Corner Interest points

D. Feature Extraction

The feature extraction is the method of extracting


the useful and crucial information from the images. In this
Fig.2.Object Recognition M ethodology step, the image is transformed from input file to a group of
specific features. Features are nothing but distinctive
properties of exclusive input patterns that help in
differentiating between the categories of input by deriving
The above fig. explains the methodology of object new features from the existing initial features inorder to
identification and recognition techniques incorporated in the scale back the cost and work of the feature measurement and
work. to extend the efficiency of the image classifier with a higher
A. Feature Detection order of classification accuracy. While extraction, if the
features extracted are carefully chosen, the classification
phase are reportedly proved to produce better results.
One of the methodologies employed in the system
is called as feature detection, where the computing Several data analysis software packages exist in
abstractions of the specific set of image information occurs order to supply and support the feature extraction and also
the dimension reduction. Therefore, for this process the
and detection by making decisions and recognition at every
numerical programming environments that are commonly

978-1-6654-0118-0/22/$31.00 ©2022 IEEE 1421

Authorized licensed use limited to: Higher College of Technology. Downloaded on July 16,2022 at 06:14:47 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Smart Systems and Inventive Technology (ICSSIT-2022)
IEEE Xplore Part Number: CFP22P17-ART; ISBN: 978-1-6654-0118-0

used such as MATLAB and C language, supports a number the voice of assistance by simply declaring the instance and
of simpler feature extraction techniques. The fig.4 explains using CoCreate Instance where SAPI uses intelligent
the feature extraction techniques. defaults. This generally requires a minimal amount of
initialization where the voice can be used immediately.
In this, the defaults are located on top of things
Panel in the speech properties which includes a variety of
voices and several languages including the native ones such
as English, Japanese, etc. Speech properties can be
programmatically set to default values. The voices of the
EXTRACTED speech could also be modified using some sort of methods
DATA such as using XML commands on to the stream. In this
situation, a relative rating will lower the pitch of the voice.

Fig.4. Feature Extraction V. RESULT ANALYSIS

E. Classification

Image classification is the methodology of


recognizing the images using the feature vector which is the
output of the feature extraction phase supported to the
training set given to this program. Spectral signatures are
obtained from the training samples using the supervised
classification method employed in classification of the
picture (fig.5). Training samples can be easily created to
represent the classes that have to be extracted, using the
image classification toolbar.
Fig.6. Developed Hardware Setup

Fig.7. Hardware Model


Fig.5. Detection Result
Fig.6 represents the hardware setup of the project, where the
This project uses the Speech Application outlook skeleton of the frame of the glass has been designed
Programming Interface or SAPI that allows the use of using the 3D printed frame in which the camera has been set
speech recognition and speech synthesis within the up in order to recognize the objects detected in front of the
respective applications. There are several applications that model. This is inbuilt with the earphones on the side which
use SAPI, such as MS Office, MS Agent and Microsoft gives the speech output of the detected objects through TTS
Speech Server. The Speech API generally could be a free from the earphones.
redistributable component. They can be transferred with
The object recognition is done by extracting features
Windows application that uses speech technology. Many
from each object and comparing them with the objects in the
other versions of the speech recognition and synthesis
training set. After segmentation in (2D) and triangulation
engines are also available to freely redistribute.
(3D), each object is connected to its Z dimension values to
calculate the space between the blind man and every object
F. Text to Speech Conversion (TTS)
within the image. Results of distance measurement and
recognition are converted to text then to voice that helps the
The text to speech conversion is built up though
blind man to beat obstacles.
nonfunctional COM framework. Various new steps are
available for every new function demonstrating the use The virtual smart glass (fig.7) is a device used for
XML tags to modify speech in sample. This step is to create guidance and just guides the blind man the way to traverse.

978-1-6654-0118-0/22/$31.00 ©2022 IEEE 1422

Authorized licensed use limited to: Higher College of Technology. Downloaded on July 16,2022 at 06:14:47 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Smart Systems and Inventive Technology (ICSSIT-2022)
IEEE Xplore Part Number: CFP22P17-ART; ISBN: 978-1-6654-0118-0

This device guarantees that if a person with vision difficulty [2] Dhaya, R. “Analysis of Adaptive Image Retrieval by
uses this, he/she can walk almost like a normal person. TransitionKalman Filter approach based on Intensity
Since the system requires some amount of time to process Parameter.” Journal of Innovative Image Processing (JIIP)3,
the detected values and guide them, they need to be walking no. 01 (2021): 7-20.
at a relatively less speed. [3] Manoharan, J. Samuel. “Capsule Network Algorithm for
TABLE I. COMPARISON METRICS Performance Optimization of Text Classification.” Journal
of Soft Computing Paradigm (JSCP) 3, no. 01 (2021): 1-9.
Metrics YOLO YOLO V3
[4] Tesfamikael, HadishHabte, Adam Fray, Israel
Precision % 83.12 93.27 Mengsteab, AdonaySemere, and ZebibAmanuel.
“Simulation of Eye Tracking Control based Electric
Recall % 81.04 92.11 Wheelchair Construction by Image Segmentation
Inference Time 5.0 2.6 Algorithm.” Journal of Innovative Image Processing (JIIP)
3, no. 01 (2021): 21-35.
[5] MayureshBanne, Rahul Vhatkar, RuchitaTatkare(2020),
The table-1 shows the comparison metrics of the YOLO and Object Detection and Translation for Blind People Using
YOLO V3. From this table, it is evident that, YOLO V3 Deep Learning, International Research Journal of
produces very high accuracy. YOLO V3 is employed in the Engineering and Technology (IRJET), Volume: 07 Issue:
proposed project and is fed into the system for producing 03.
the sound signal of the corresponding word. [6] M. Murali, Shreya Sharma, Neel Nagansure(2020),
Reader and Object Detector for Blind, International
Conference on Communication and Signal Processing, July
28 - 30, 2020, India.
[7] Qi-Chao Mao,Hong-Mei Sun,Yan-Bo Liu,Rui-Sheng
Jia. (2019) “Mini-YOLOv3:Real-Time Object Detector for
Embedded Applications” IEEE Access 7 .
[8]Mohammad Marufur Rahman Milon Islam
Shishir Ahmmed , Saeed Anwar Khan(2020), Obstacle
and Fall Detection to Guide the Visually Impaired People
with Real Time Monitoring, SN Computer Science (2020)
1:219
[9] V.Balaji, S. KanagaSuba Raja , C.J. Raman ,
Fig:8. Detected objects S.Priyadarshini , S.Priyanka5 , S.P.Salaikamalathai,(2020),
real time object detector for visual impaired using open cv,
It is seen that the device consisting of a camera on the European Journal of Molecular & Clinical Medicine ISSN
frame placed in a camera slot, is an individual computer 2515-8260 Volume 7, Issue 4, 2020.
processing unit which is accurate enough to urge slot in the
pocket and transparent and thus the displays on the [10]Lijun Yu, Weijie Sun, HuiWang,Qiang Wang and
eyepieces are accessible to the software that gives the Chaoda Liu(2018), The Design of Single Moving Object
images of the objects. This device is suitable for completely Detection and Recognition System Based on OpenCV,
blind people and also it is recommended just for people with Proceedings of 2018 IEEE International Conference on
low vision, nyctalopia or night blindness. Mechatronics and Automation August 5 - 8, Changchun,
China.
[11] HemalathaVadlamudi,(2020), Evaluation of Object
VI CONCLUSION Tracking System using Open-CV In Python, International
Journal of Engineering Research & Technology (IJERT)
The main objective of this paper is explained
ISSN: 2278-0181, Vol. 9 Issue 09.
briefly which is the need of voice assistant system for the
blind people all over the globe. The " Virtual Smart Glass for [12] M. Ramalingeswara Rao, Sai Sri Harsha,
Blind" is a practically feasible device which can be easily AlekhyaSaiSree, Srikanth, SaiChaitanya(2020),raspberry pi
worn and used by any person. The above figure (fig:8) based accessory for blind people, 2020 JETIR April 2020,
shows the detected images of the objects in front of the Volume 7, Issue 4.
camera and these detected images are then converted to
audio and fed as audio. Thus , the smart glass will be a major [13] Chen LB, Su JP, Chen MC, Chang WJ, Yang CH, Sie
help for the blind society in guiding them. CY (2019). An Implementation of an Intelligent Assistance
System for Visually Impaired/Blind People.
[14] Bhumika Gupta, Ashish Chaube, Ashish Negi,
REFERENCES UmangGoel, UmangGoel,(2017) Study on Object Detection
using Open CV - Python, International Journal of Computer
[1] Sungheetha, Akey, and Rajesh Sharma. “3D Image
Applications (0975 – 8887) Volume 162 – No 8.
Processing using Machine Learning based Input Processing
for Man-Machine Interaction.” Journal of Innovative Image
Processing (JIIP) 3, no. 01 (2021): 1-6.

978-1-6654-0118-0/22/$31.00 ©2022 IEEE 1423

Authorized licensed use limited to: Higher College of Technology. Downloaded on July 16,2022 at 06:14:47 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Smart Systems and Inventive Technology (ICSSIT-2022)
IEEE Xplore Part Number: CFP22P17-ART; ISBN: 978-1-6654-0118-0

[15] Wei Fang,LinWang,PeimingRen. (2019) “Tinier-


YOLO:A Real-Time Object Detection Method for
Constrained Environments” IEEE Access 8 .

978-1-6654-0118-0/22/$31.00 ©2022 IEEE 1424

Authorized licensed use limited to: Higher College of Technology. Downloaded on July 16,2022 at 06:14:47 UTC from IEEE Xplore. Restrictions apply.

You might also like