Professional Documents
Culture Documents
BACHELOR OF TECHNOLOGY
IN
BY
PROF. A.R.BABHULGOANKAR
i
DR BABASAHEB AMBEDKAR TECHNOLOGICAL UNIVERSITY
CERTIFICATE
This is to certify that the Project Phase II report on ”Communication System for Differ-
ently Abled People” is submitted Mr. Subodh Ravindra Jangale (10303320181124610009)
Mr. Shreeyansh Deepak Bandal (10303320181124613001) for the partial fulfilment of the
requirements of the degree of Bachelor of Technology in Information Technology of the
Dr.Babasaheb Ambedkar Technological University,Lonere is a bonafide work carried out
during the academic year 2021-2022.
EXAMINER:
1.
2.
Date:12 July 2022
Place:Lonere
ii
ACKNOWLEDGEMENT
This work is just not an individual contribution till its completion. We take this
opportunity to thank all for bringing it close to the conclusion. A special thanks goes to
my Guide Prof. A. R. Babhulgoankar, for leading me to many new insights, encourag-
ing and teaching me how to get down to the root of the problem.
We would also like to thank all my friends and well wishers for support during
the demanding period of this work. We would also like to thank my wonderful colleagues
and friends for listening my ideas, asking questions and providing feedback and sugges-
tions for improving my ideas.
iii
ABSTRACT
According to the World Health Organization (WHO), 466 million people across
the world have disabling hearing loss (over 534 million are children. There are only
about 250 certified sign language interpreters in India for a deaf population of around 7
million. With these significant statistics, the need for developing a tool for smooth flow of
communication between abled and people with speech/hearing impairment is very high.
Our application promises to secure a two way conversation, as it deploys machine learning
and deep learning models to convert sign language to speech/text. The opposite receiver
can either speak or text his response, which will then be visible to the disabled person in
the form of text. The client can make use of the tutorials and learn the basic functioning
of the application and ASL. This system eliminates the need of an interpreter and the
traditional methods of pen and paper can also be discarded. This application ensures the
automation of communication and thereby provides a solution to the hurdles faced by
hearing/speech impaired people.
iv
Contents
1 Introduction 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Literature Survey 3
3 Problem Definition 5
4 Proposed System 6
5 Existing Systems 8
6 System Specification 10
6.1 System Requirement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
6.1.1 Hardware Requirements . . . . . . . . . . . . . . . . . . . . . . 10
6.1.2 Software Requirements . . . . . . . . . . . . . . . . . . . . . . . 10
6.2 System Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
7 System Design 12
7.1 Modules in the system . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
7.2 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
7.3 Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
7.4 Activity Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
8 Implementation 16
8.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
8.2 Proposed Hand Gesture Recognition System . . . . . . . . . . . . . . . . 17
8.2.1 Camera module . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
8.2.2 Detection module . . . . . . . . . . . . . . . . . . . . . . . . . . 18
8.2.3 Interface module . . . . . . . . . . . . . . . . . . . . . . . . . . 19
v
8.3 Proposed method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
8.3.1 MediaPipe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
8.3.2 Noise removal and Image smoothening . . . . . . . . . . . . . . 21
8.3.3 Long Short Term Memory(LSTM) . . . . . . . . . . . . . . . . . 22
8.3.4 Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
8.3.5 Contour Extraction . . . . . . . . . . . . . . . . . . . . . . . . . 24
8.3.6 Convex hull and Convexity defects . . . . . . . . . . . . . . . . . 25
8.3.7 Haar Cascade Classifier . . . . . . . . . . . . . . . . . . . . . . 26
8.3.8 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
8.3.9 Firebase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
8.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
References 32
vi
List of Figures
7.1 System Architecture for Sign Language Recognition Using Hand Ges-
tures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
7.2 Use Case Diagram for Sign Language Recognition Using Hand Gestures. 14
7.3 Activity Diagram for Sign Language Recognition Using Hand Gestures. . 15
vii
Chapter 1
Introduction
1.1 Overview
Some of the major problems faced by a person who are unable to speak is they cannot
express their emotion as freely in this world. Utilize that voice recognition and voice
search systems in smartphone(s).Audio results cannot be retrieved. They are not able
to utilize (Artificial Intelligence/personal Butler) like google assistance, or Apple’s SIRI
etc.because all those apps are based on voice controlling.
There is a need for such platforms for such kind of people. American Sign Lan-
guage (ASL) is a complete, complex language that employs signs made by moving the
hands combined with facial expressions and postures of the body. It is the go-to language
of many North Americans who are not able to talk and is one of various communication
alternatives used by people who are deaf or hard-of-hearing.
While sign language is very essential for deaf-mute people, to communicate both
with normal people and with themselves, is still getting less attention from the normal
people.The importance of sign language has been tending to ignored, unless there are
areas of concern with individuals who are deaf-mute. One of the solutions to talk with the
deaf-mute people is by using the mechanisms of sign language.
Hand gesture is one of the methods used in sign language for non-verbal com-
munication. It is most commonly used by deaf & dumb people who have hearing or
talking disorders to communicate among themselves or with normal people.Various sign
language systems have been developed by many manufacturers around the world but they
are neither flexible nor cost-effective for the end users.
1.2 Scope
One of the solutions to communicate with the deaf-mute people is by using the ser-
1
vices of sign language interpreter. But the usage of sign language interpreters could be
expensive.Cost-effective solution is required so that the deaf-mute and normal people can
communicate normally and easily.
Our strategy involves implementing such an application which detects pre-defined
American sign language (ASL) through hand gestures. For the detection of movement of
gesture, we would use basic level of hardware component like camera and interfacing is
required. Our application would be a comprehensive User-friendly Based system built on
PyQt5 module.
Instead of using technology like gloves or kinect, we are trying to solve this prob-
lem using state of the art computer vision and machine learning algorithms.
This application will comprise of two core module one is that simply detects the
gesture and displays appropriate alphabet. The second is after a certain amount of interval
period the scanned frame would be stored into buffer so that a string of character could
be generated forming a meaningful word.
Additionally, an-addon facility for the user would be available where a user can
build their own custom-based gesture for a special character like period (.) or any delim-
iter so that a user could form a whole bunch of sentences enhancing this into paragraph
and likewise. Whatever the predicted outcome was, it would be stored into a .txt file.
1.3 Objectives
1. Eliminate the need of an interpreter.
2. Ease the communication flow for hearing/speech impaired people through our model
predictions and text to speech system.
3. Ability to create new signs for any text or sentence in the browser (client side).
2
Chapter 2
Literature Survey
There are many classification methods are available as well there are many paper
published on Sentimental analysis and aspect based analysis. Based on those a literature
survey got carried away. We have search papers in IEEE transactions and gone through it.
[1] ”Sign Language Recognition System Using Deep Neural Network,”,S. Suresh, H. T.
P. Mithun and M. H. Supriya, 2019 5th International Conference on Advanced Computing
& Communication Systems (ICACCS), 2019, pp. 614-618, doi: 10.1109/ICACCS.2019.
8728411.
[2] “Sign Language Recognition Application Systems for Deaf-Mute People A Review
Based on Input-Process-Output”,Suharjito, Suharjito & Anderson, Ricky & Wiryana,
Fanny & Ariesta, Meita & Kusuma Negara, I Gede Putra. (2017). A Review Based on
Input-Process-Output. Procedia Computer Science. 116. 441-448. 10.1016/j.procs.2017.
10.028.
Sign Language Recognition is a breakthrough for helping deaf-mute people and has
been researched for many years. Unfortunately, every research has its own limitations
3
and are still unable to be used commercially. Some of the researches have known to be
successful for recognizing sign language, but require an expensive cost to be commercial-
ized. Nowadays, researchers have gotten more attention for developing Sign Language
Recognition that can be used commercially. Researchers do their researches in various
ways. It starts from the data acquisition methods. The data acquisition method varies
because of the cost needed for a good device, but cheap method is needed for the Sign
Language Recognition System to be commercialized. The methods used in developing
Sign Language Recognition are also varied between researchers. Each method has its
own strength compare to other methods and researchers are still using different methods
in developing their own Sign Language Recognition. Each method also has its own limi-
tations compared to other methods.
[3] Deep convolutional neural networks for sign language recognition”G. A. Rao, K.
Syamala, P.V.V. Kishore and A. S. C. S. Sastry Conference on Signal Processing And
Communication Engineering Systems(SPACES), 2018, pp. 194-197, doi:10.1109/SPACE
S.2018.8316344.Extraction of complex head and hand movements along with their con-
stantly changing shapes for recognition of sign language is considered a difficult problem
in computer vision. This paper proposes the recognition of Indian sign language gestures
using a powerful artificial intelligence tool, convolutional neural networks (CNN). Selfie
mode continuous sign language video is the capture method used in this work, where
a hearing-impaired person can operate the SLR mobile application independently. Due
to non-availability of datasets on mobile selfie sign language, we initiated to create the
dataset with five different subjects performing 200 signs in 5 different viewing angles un-
der various background environments. Each sign occupied for 60 frames or images in a
video. CNN training is performed with 3 different sample sizes, each consisting of multi-
ple sets of subjects and viewing angles. The remaining 2 samples are used for testing the
trained CNN. Different CNN architectures were designed and tested with our selfie sign
language data to obtain better accuracy in recognition.
4
Chapter 3
Problem Definition
”To design a system for sign language recognition using hand gestures.”
The traditional methods of communicating with the deaf and mute are really not con-
venient in many aspects. The alternatives that are available to break this barrier have
definite flaws. An interpreter is not always available and this method is not cost efficient
either. The pen and paper method is highly unprofessional and also time consuming. Tex-
ting and messaging are fine to a certain extent but still does not tackle the bigger problem
at hand. This has created a grave need to develop a solution to destroy the barricade of
communication effectively.
Given a hand gesture, implementing such an application which detects pre-defined
American sign language (ASL) in a real time through hand gestures and providing facility
for the user to be able to store the result of the character detected in a txt file, also allowing
such users to build their customized gesture so that the problems faced by persons who
aren’t able to talk vocally can be accommodated with technological assistance and the
barrier of expressing can be overshadowed.
5
Chapter 4
Proposed System
The proposed study aims to develop a system that will recognize static sign gestures
and convert them into corresponding words. A vision-based approach using a web camera
is introduced to obtain the data from the signer and can be used offline. The purpose of
creating the system is that it will serve as the learning tool for those who want to know
more about the basics of sign language such as alphabets, numbers, and common static
signs. The proponents provided a white background and a specific location for image pro-
cessing of the hand, thus, improving the accuracy of the system and used Convolutional
Neural Network (CNN) and Long Short Term memory (LSTM) as the recognizer of the
system. The scope of the study includes basic static signs, numbers, ASL alphabets (A–Z)
and gestures. One of the main features of this study is the ability of the system to create
simple words by fingerspelling and understanding gestures without the use of sensors and
other external technologies.
For the purpose of the study,we used some of the gestures in ASL Fig. 4.1 represents
the ASL gestures that will be fed onto the system. In the figure we can see gestures such
as hello, bye, how are you. ASL also is strict when it comes to the angle of the hands
while one is hand gesturing, the face movement maybe different for another gesture.This
would affect the accuracy of the system.
6
Figure 4.1: ASL gestures
7
Chapter 5
Existing Systems
1. The first sign-language glove to gain any notoriety came out in 2001. A high-school
student from Colorado, Ryan Patterson, fitted a leather golf glove with 10 sensors
that monitored finger position, then relayed finger spellings to a computer which
rendered them as text on a screen. In 2002, the public-affairs office of the National
Institute on Deafness and Other Communicative Disorders effused said about Pat-
terson, The glove doesn’t translate anything beyond individual letters, certainly not
the full range of signs used in American Sign Language, and works only with the
American Manual Alphabet.
2. MotionSavvy is building a tablet which detects when a person is using ASL and
converts it to text or voice. The software also has voice recognition through the
tablet’s mic, which allows a hearing person to respond with voice to the person
signing. It then converts their voice into text, which the hearing-impaired receiver
can understand.
3. The application, Lingo jam only translates alphabetically. The manual sign lan-
guage or fingerspelling is followed and not the universal sign language . Each letter
is translated as it is, and displayed over text only.
5. CAS-PEAL face database [14] was developed by Joint Research and Development
Laboratory (JDL) for Advanced Computer and Communication Technologies of
8
Chinese Academy of Sciences (CAS), under the support of the Chinese National
Hi-Tech Program and the ISVISION Tech. Co. Ltd. The construction of the CAS-
PEAL face database was aimed for providing the researchers a large-scale Chi-
nese face database for studying, developing, and evaluating their algorithms. The
CAS-PEAL large-scale face images with different sources of variations, like Pose,
Expression, Accessories, and Lighting (PEAL) were used to advance the state-of-
the-art face recognition technologies. The database contains 99, 594 images from
1040 individuals (595 males and 445 females). For each subject of the database,
nine cameras with equal spaced in a horizontal semicircular layer were setup to
capture images across different poses in one shot. The person who was used to per-
form sign gestures also asked to look up and down to capture 18 images in another
two shots. The developers also considered five kinds of expressions, six kinds ac-
cessories (three goggles, and three caps), and fifteen lighting directions, also with
varying backgrounds, distance from cameras, and aging.
9
Chapter 6
System Specification
• 512 MB RAM.
• Any external or inbuild camera with minimum pixel resolution 200 x 200 (300ppi
or 150lpi) 4-megapixel cameras and up.
• Real time American standard character detection based on gesture made by user.
10
• Customized gesture generation.
• Forming a stream of sentences based on the gesture made after a certain interval of
time.
11
Chapter 7
System Design
• Scan Single Gesture – A gesture scanner will be available in front of the end user
where the user will have to do a hand gesture. Based on Pre-Processed module
output, a user shall be able to see associated label assigned for each hand gestures,
based on the predefined American Sign Language (ASL) standard inside the output
window screen.
• Create gesture – A user will give a desired hand gesture as an input to the system
with the text box available at the bottom of the screen where the user needs to type
whatever he/she desires to associate that gesture with. This customize gesture will
then be stored for future purposes and will be detected in the upcoming time.
• Formation of a sentence – A user will be able to select a delimiter and until that
delimiter is encountered every scanned gesture character will be appended with the
previous results forming a stream of meaning-full words and sentences.
• Exporting –A user would be able to export the results of the scanned character
into an ASCII standard textual file format.
12
7.2 System Architecture
Figure 7.1: System Architecture for Sign Language Recognition Using Hand Gestures.
13
7.3 Use Case Diagram
Figure 7.2: Use Case Diagram for Sign Language Recognition Using Hand Gestures.
14
7.4 Activity Diagram
Figure 7.3: Activity Diagram for Sign Language Recognition Using Hand Gestures.
15
Chapter 8
Implementation
The basic goal of Human Computer Interaction is to improve the interaction between users
and computers by making the computer more receptive to user needs. Human Computer
Interaction with a personal computer today is not just limited to keyboard and mouse
interaction. Interaction between humans comes from different sensory modes like gesture,
speech, facial and body expressions. Being able to interact with the system naturally is
becoming ever more important in many fields of Human Computer Interaction. Both non-
vision and vision based approaches have been used to achieve hand gesture recognition.
An example of a non-vision based approach is the detection of finger movement with a
pair of wired gloves. In general vision based approaches are more natural as they require
no hand devices. Theoretically the literature classifies hand gestures into two types static
and dynamic gestures. Static hand gestures can be defined as the gestures where the
position and orientation of hand in space does not change for an amount of time. If there
are any changes within the given time, the gestures are called dynamic gestures. Dynamic
hand gestures include gestures like waving of hand while static hand gestures include
joining the thumb and the forefinger to form the “Ok” symbol.
16
put extraction through data gloves. A hand belt with gyroscope, accelerometer and a
Bluetooth was deployed to read hand movements are used. The author used a creative
Senz3D Camera to capture both colour and depth information and used a Bumblebee2
stereo camera. A monocular camera was used. Cost efficient models like [8], [9] and
have implemented their systems using simple web cameras. The methods make use of a
kinect depth RGB camera which was used to capture colour stream. As depth cameras
provide additional depth information for each pixel (depth images) at frame rate along
with the traditional images. Most technologies allow a hand region to be extracted ro-
bustly by utilizing the colour space. These do not fully solve the background problem.
This background problem was resolved by using a black and white pattern of augmented
reality markers (monochrome glove). While inbuilt webcams do not give depth informa-
tion, they require less computing costs. Hence in our model, we used a webcam available
in the laptop without the use of any additional cameras or hand markers such as gloves.
A large number of methods have been utilized for pre-processing the image which in-
cludes algorithms and techniques for noise removal, edge detection, smoothening fol-
lowed by different segmentation techniques for boundary extraction i.e. separating the
foreground from the background. The authors used a morphology algorithm that per-
forms image erosion and image dilation to eliminate noise. Gaussian filter was used to
smoothen the contours after binarization. To perform segmentation, in [6] a depth map
was calculated by matching the left and right images with the SAD (Sum of Absolute Dif-
ferences) algorithm. In [6], the Theo Pavildis Algorithm which visits only the boundary
pixels was used to find the contours. This method brings down the computational costs.
In the biggest contour was chosen as the contour of the hand palm after which the contour
was simplified using polygonal approximation. Classification is a process in which indi-
vidual items are grouped based on the similarity between the items. The approach uses
Euclidean distance based classifier to recognise 25 hand postures. Support Vector Ma-
chine (SVM) classifier was used in. We deviate from other traditional methods without
using any hand markers such as gloves for gesture recognition. In our model, we used a
webcam available in the laptop without the use of any additional cameras by making the
system cost effective. Thus our system finds applications in day to day system.
17
Figure 8.1: Back End Architecture.
18
8.2.3 Interface module
This module is responsible for mapping the detected hand gestures to their associated ac-
tions. These actions are then passed to the appropriate application. The front end consists
of three windows. The first window consists of the video input that is captured from the
camera with the corresponding name of the gesture detected. The second window dis-
plays the contours found within the input images. The third window displays the smooth
thresholded version of the image. The advantage of adding the threshold and contour
window as a part of the Graphical User Interface is to make the user aware of the back-
ground inconsistencies that would affect the input to the system and thus they can adjust
their laptop or desktop web camera in order to avoid them. This would result in better
performance.
19
8.3 Proposed method
We propose a marker less gesture recognition system,that follows a very efficient method-
ology as shown in fig.
8.3.1 MediaPipe
Hands signs are tracked with the help of Mediapipe. MediaPipe powers revolutionary
products and services we use daily. Unlike power-hungry machine learning Frameworks,
MediaPipe requires minimal resources. It is so tiny and efficient that even embedded IoT
devices can run it.
MediaPipe is a Framework for building machine learning pipelines for processing
time-series data like video, audio, etc. This cross-platform Framework works in Desk-
top/Server, Android, iOS, and embedded devices like Raspberry Pi and Jetson Nano.
Graphs
The MediaPipe perception pipeline is called a Graph. Let us take the example of the
first solution, Hands. We feed a stream of images as input which comes out with hand
landmarks rendered on the images.
The flow chart below represents MediaPipe hand solution graph.
20
Figure 8.3: Hand landmarks
Calculators
The packets of data ( Video frame or audio segment ) enter and leave through the ports in
a calculator. When initializing a calculator, it declares the packet payload type that will
traverse the port. Every time a graph runs, the Framework implements Open, Process, and
Close methods in the calculators. Open initiates the calculator; the process repeatedly runs
when a packet enters. The process is closed after an entire graph run.
The calculator, ImageTransform, takes an image at the input port and returns a trans-
formed image in the output port. On the other hand, the second calculator, ImageToTen-
sor, takes an image as input and outputs a tensor.
21
Figure 8.5: Process of cropping and converting RGB input image to grey scale
−(x−µx )2 −(y−µy )2
2 + 2
G0 (x, y) = Ae 2σx 2σy
An LSTM has four “gates”: forget, remember, learn and use(or output)
Step 1: When the 3 inputs enter the LSTM they go into either the forget gate, or learn
gate. The long-term info goes into the forget gate, where, shocker, some of it is forgotten
(the irrelevant parts). The short-term info and “E” go into the learn gate. This gate decides
22
what info will be learned. Bet you didn’t see that one coming!!!!
Step 2: information that passes the forget gate (it is not forgotten, forgotten info stays at
the gate) and info that passes learn gate (it is learned) will go to the remember gate (which
makes up the new long term memory) and the use gate (which updates short term memory
+is the outcome of the network).
8.3.4 Thresholding
Thresholding, which is a simple segmentation method, is then carried out. Thresholding
is applied to obtain a binary image from the gray scale image. Thresholding technique
compares each pixel intensity value (I) with respect to the threshold value (T). If I<T, the
particular pixel is replaced with a black pixel and if I>T, it is replaced with a white pixel.
A threshold value (T) of 127 is used in our work which classifies the pixel intensities in
the gray scale image. Maximum value of 255 is the pixel value used if any given pixel in
the image passes the threshold value. The two types of thresholding that are implemented
are Inverted Binary Thresholding and Otsu’s Thresholding. Inverted Binary Thresholding
inverts the colors, to be white image in a black background. This thresholding operation
can be expressed as shown in Eqn.
0, if src(x, y) ≥ T
Dest(x, y) =
max(255) otherwise
So, if the pixel intensity src(x, y) is greater than the threshold value T, then the new
intensity of the pixel is initialized to 0. Otherwise, the pixels are set to maxVal.Nobuyuki
Otsu has given us the Otsu’s method[20]. Clustering-based image thresholding is achieved
from this method. Otsu binarization automatically calculates a threshold value from im-
age histogram for a bimodal image, which is an image whose histogram has two peaks.
In Otsu’s method we try to find the threshold that minimizes the intra-class variance (the
variance within the class), defined as a weighted sum of variances of the two classes as
seen in Eqn.
σb2 (t) = ω0 (t)σ02 (t) + ω1 (t)σ12 (t)
t−1
X
ω0 (t) = p(i)
i=0
L−1
X
ω1 (t) = p(i)
i=t
Otsu shows that minimizing the intra-class variance and maximizing inter-class vari-
23
ance generates the same results as seen below in Eqn.
This is expressed in terms of w for probabilities and u for means. While the class mean u
0, 1, T (t) can be expressed as shown in Eqn.
t−1
X p(i)
µ0 (t) = i
i=0
ω0
L−1
X p(i)
µ1 (t) = i
i=t
ω1
L−1
X
µT = ip(i)
i=0
ω0 µ0 + ω1 µ1 = µT
ω0 + ω1 = 1
The class probabilities and means can be computed iteratively. This can provide an
effective algorithm. Before finding contours, threshold has been applied to the binary
image to achieve higher accuracy. The below image Fig. shows the front end window that
portrays the thresholded version of the user’s gesture input.
24
Figure 8.7: Front end window that shows the thresholded version of the input gesture
Thesecond step is to draw the contours which can be used to draw any shape pro-
vided the boundary points are known. Some gestures in our recognition system with their
appropriate contours are shown in the below Fig.
25
|s|
X |s|
X
Convex(S) = αi xi |(∀i : αi 0) ∧ αi = 1
i=1 i=1
8.3.9 Firebase
Firebase is a toolset to “build, improve, and grow an app”, and the tools it gives cover a
large portion of the services that developers would normally have to build themselves, but
don’t really want to build, because they’d rather be focusing on the app experience itself.
This includes things like analytics, authentication, databases, configuration, file storage,
push messaging, and the list goes on. The services are hosted in the cloud, and scale with
little to no effort on the part of the developer. In our Project we used firebase as database
for storing the result of the program.
Firebase Realtime Database is a cloud-hosted database. Realtime means that any
changes in data are reflected immediately across all platforms and devices within millisec-
onds. Most traditional databases make you work with a request/response model, but the
26
Figure 8.9: User Interface
Realtime Database uses data synchronization and subcriber mechanisms instead of typical
HTTP requests, which allows you to build more flexible real-time apps, easily, with less
effort and without the need to worry about networking code. Many apps become unre-
sponsive when you lose the network connection. Realtime Database provides great offline
support because it keeps an internal cache of all the data you’ve queried. When there’s
no Internet connection, the app uses the data from the cache, allowing apps to remain re-
sponsive. When the device connects to the Internet, the Realtime Database synchronizes
the local data changes with the remote updates that occurred while the client was offline,
resolving any conflicts automatically. Realtime Database is a NoSQL database. NoSQL
stands for “Not only SQL”. The easiest way to think of NoSQL is that it’s a database
that does not adhere to the traditional relational database management system (RDMS)
structure. As such, the Realtime Database has different optimizations and functionality
compared to a relational database. It stores the data in the JSON format. The entire
database is a big JSON tree with multiple nodes.
27
8.4 Results
In our gesture recognition system we have included a total of seven gestures, where six of
them are static gestures and one is a dynamic gesture. These static gestures are shown in
the figure below Fig. The captions written at the top of each gesture i.e. “1”, “2” denotes
the number of convexity defects in each gesture. In gestures that do not have any defects
i.e. fist, palm, their name has been written as a caption above the gesture.
Figure 8.10: The static gestures used in the gesture recognition system
The first gesture from the left is a “V” sign or a number two sign which launches the VLC
Media Player application as shown in Fig. The second is a number three gesture and it
launches Google home page within the user’s default browser as shown in Fig. and the
third gesture which is a number four gesture launches YouTube home page. The fourth
gesture is a number five gesture or an open palm gesture which in our system closes the
application that is running in the foreground. The fifth gesture in the above image is a
closed fist that launches Microsoft PowerPoint. The sixth and final static gesture is a
closed palm which toggles the Wi-Fi of the computing apparatus.
In addition to the above mentioned static gestures, the model also has provision for a dy-
namic gesture. When a moving closed palm gesture is recognized for 5 continuous frames,
it is considered to be a dynamic swipe motion. It is used when Microsoft PowerPoint is
running in the foreground, to swipe to the next slide within the presentation. Our first
approach to create a gesture recognition system was through the method of background
subtraction. Background subtraction, as the name suggests, is the process of separating
foreground objects from the background in a sequence of video frames. It is a widely
used approach for detecting moving objects from static cameras. When implementing
the recognition system using background subtraction, we encountered several drawbacks
and accuracy issues. Background subtraction cannot deal with sudden, drastic lighting
changes leading to several inconsistencies. This method also requires relatively many
parameters, which needs to be selected intelligently. Due to these complications faced,
we made a decision to utilize contours, convexity defects and Haar cascade to detect the
object (hand). The combination of these methods enabled us to achieve a greater range
of accuracy and overcome the challenges faced during the use of background subtrac-
tion. To compute the accuracy of our system, we conducted two sets of evaluations. In
28
the first set of evaluation, we used environments which contained different kinds of plain
backgrounds without any inconsistencies. In the second evaluation, we used backgrounds
with several inconsistencies. Each gesture was performed 10 times in both the environ-
mental setups. The average of the number of times a particular gesture was recognized
correctly was taken as its accuracy in percentage and the accuracy obtained is shown in
table 1.When implemented against any plain background, the gesture recognition system
was robust and performed with good accuracy. This accuracy was maintained irrespective
of the colour of the background, provided it is a plain, solid colour background devoid
of any inconsistencies. In cases where the background was not plain, the objects in the
background proved to be inconsistencies to the image capture process, resulting in faulty
outputs. Thus, the accuracy was not as good, in scenarios with plain background. Af-
ter observing the results produced by the gesture recognition system in different back-
grounds, it is recommended that this system be used with a plain background to produce
the best possible results and great accuracy
29
Chapter 9
9.1 Conclusion
From this project we have tried to overshadow some of the major problems faced
by the disabled persons in terms of talking. We found out the root cause of why they
can’t express more freely. The result that we got was the other side of the audience are
not able to interpret what these persons are trying to say or what is the message that they
want to convey. Thereby this application serves the person who wants to learn and talk
in sign languages. With this application a person will quickly adapt various gestures and
their meaning as per ASL standards. They can quickly learn what alphabet is assigned to
which gesture. Add-on to this custom gesture facility is also provided along with sentence
formation. A user need not be a literate person if they know the action of the gesture,
they can quickly form the gesture and appropriate assigned character will be shown onto
the screen. Concerning to the implementation, we have used TensorFlow framework,
with keras API. And for the user feasibility complete front-end is designed using PyQT5.
Appropriate user-friendly messages are prompted as per the user actions along with what
gesture means which character window.
• This project is working on image currently, further development can lead to detect-
ing the motion of video sequence and assigning it to a meaningful sentence with
TTS assistance.
30
• The model and text to speech can be embedded into a video calling system. Thereby
allowing the user to show the gestures and the receiver on the call will receive the
message in the form of text or speech. While the receiver responds, the message
will be relayed to the hearing/speech impaired user via text (subtitles).
31
References
[1] Shobhit Agarwal, “What are some problems faced by deaf and dumb people
while using todays common tech like phones and PCs”, 2017 [Online]. Avail-
able: https://www.quora.com/What-are-some-problems-faced-by-deaf-and-dumb-
people-while-using-todays-common-tech-like-phones-and-PCs, [Accessed April
06, 2019].
[3] M. Ibrahim, “Sign Language Translation via Image Processing”, [Online]. Avail-
able: https://www.kics.edu.pk/project/startup/203 [Accessed April 06, 2019].
[4] NAD, “American sign language-community and culture frequently asked ques-
tions”, 2017 [Online]. Available: https://www.nad.org/resources/american-sign-
language/community-and-culture-frequently-asked-questions/ [Accessed April 06,
2019].
[6] C. Hardie and D. Fahim, ”Sign Language Recognition Using Temporal Classifica-
tion,” arXiv, 2017.
32
Appendices
33
34
35
36
37
38
39
40