You are on page 1of 15

MINOR-2 PROJECT REPORT

END-SEM Report

Sign Language Recognition

Submitted By:
Name Roll No SAP Id Branch
Harshita Mittal R134219044 500077452 CSF
Ankur Gupta R110219018 500076127 CCVT
Astha Kumari R110219027 500076107 CCVT
F Rohith Immanuel R134219041 500075157 CSF

Under the guidance of:


Dr. Neelu Jyoti Ahuja
Professor
Department of Systemic

School of Computer Science


UNIVERSITY OF PETROLEUM AND ENERGY STUDIES
B. Tech, 6th Sem, Batch: 2019-23

Approved By:

Project Guide: Cluster Head:

Dr. Neelu Jyoti Ahuja Dr. Neelu Jyoti Ahuja


Table of Content
Topic Page
No
Table of Content 2
1 Abstract 3
2 Introduction 3-6
2.1 Sign Language 3
2.2 Sign Language Detection 4
2.2.1 OpenCV 4
2.2.2 Media-pipe 4
2.2.3 TensorFlow 5
2.2.4 RNN/LSTM 5-6
3 Literature Review 6
4 Problem Statement 6
5 Project Objective 7
5.1 Sub Objective 7
6 Methodology 7-10
7 Experimental Setup 11
8 UML Diagrams 12-14
8.1 Use case Diagram 12-13
8.2 Class Diagram 13
8.3 Dataflow Diagram 14
8.4 Sequence Diagram 14
9 Pert Chart 15
10 Reference and GIT Link 15
Project Title: Sign Language Recognition

I. Abstract
Communication is difficult for those with hearing and speech impairments. The ability to
communicate in sign language can help deaf and hearing individuals communicate more
effectively. In this paper, a new sign language identification technique for detecting the 26
alphabets and 6 motions in sign language is suggested. We can recognize the indications and
provide the appropriate text output using computer vision and neural networks. There is no longer
a need for a complicated and expensive hardware system to identify Sign Language instead, a
smart phone or a web cam will suffice. Google's Media Pipe framework, which was released in
2019, and an advanced recurrent neural networks model are used to do this.
Key Points: RNN, TensorFlow, OpenCV, Media pipe.

II. Introduction

There have been various technological improvements, as well as much study, to assist the
deaf and dumb. Deep learning and computer vision can also be utilized to help with the
cause. Because understanding Sign Language is not something that everyone has, our
project can be very useful for deaf and dumb people in interacting with others.
Furthermore, this can be expanded to building automated editors, where a person can easily
write using only their hand movements.

2.1 Sign Language

Sign Languages are the set of signs/gestures which are used by physically impaired people,
so that they can easily communicate with others. Sign Language helps to bridge the
communication gap between the people. There are so many different Sign Languages
which are used by physically impaired people like ASL (American Sign Language), ISL
(Indian Sign Language), BSL (British Sign Language). ASL is shown in (Fig 1)
Fig 1 (ASL Symbols) [1]
2.2 Sign Language Detection

The use of AI algorithms, along with the availability of massive data sets and great
processing capacity, is a technological benefit. We can identify what people wish to
say with the assistance of these technologies by turning motions into letters using
Machine Learning and OpenCV. These technologies can be used to create real-time
situations.

2.2.1 OpenCV

It is an open-source computer vision and machine learning software library. It was built
to accelerate the use of machine learning algorithms. It can be used to detect and
recognize faces, identify objects and most importantly classify the human actions by
accessing the web cam in real time. Further we use this library along with media pipe
to extract the landmarks of face and hands.
2.2.2 Media-pipe
Live perception of simultaneous human pose, face landmarks, and hand tracking in real-
time can enable various modern life applications: fitness and sport analysis, gesture
control and sign language recognition, Mid-Air writing.
By using the inferred pose landmarks we derive three regions of interest (ROI) crops for
each hand (2x) and the face, and employ a re-crop model to improve the ROI.
We then crop the full-resolution input frame to these ROIs and apply task-specific face
and hand models to estimate their corresponding landmarks. Finally, we merge all
landmarks with those of the pose model to yield the full 540+ landmarks. As shown in
(Fig 2)

(Fig 2) Hand landmarks after extraction [2]

2.2.3 TensorFlow
TensorFlow is one of Google's most well-known deep learning frameworks. It is a
Python programming language-based free and open-source software library.
TensorFlow is used to train the dataset. The TensorFlow library combines many APIs
to produce a large-scale deep learning architecture similar to an RNN. To decrease the
computing effort, TensorFlow employs a graph structure.

2.2.4 RNN/ LSTM


Sometimes in the real-world situation we have to retain the previous value and based
on the previous and the current input context we have to decide the output. As in our
project we have created and define the dataset for different hand gestures, as RNN
model will use predefined dataset to predict the input for the next layer. Here, we
require long term temporal dependencies. It uses tanh or sigmoid function as an
activation function. But, RNN suffers from vanishing gradient and exploding gradient
problem during the training process. This problem is be solved with LSTM model
which we used in our project. LSTM cells are operated with the help of real number
parameters called gates. The input gate parameter helps to decide how much new input
is required to change the memory state. The forget gate parameter helps to decide how
much previous value to retain in-memory state. And the output gate parameter controls
how strongly the current memory state is to pass into the next layer. LSTM model is
shown in (Fig 3)
(Fig 3) Working of Forget, Input and Output gate [4]
III. Literature Review

[1] In this paper, SIGN LANGUAGE RECOGNITION (Feb 2014): STATE OF THE ART
by Ashok K Sahoo [1], Gouri Sankar Mishra [2] and Kiran Kumar Ravulakollu [3]
Journal: ARPN Journal of Engineering and Applied Sciences
They introduced that Systems should be able to distinguish face, hand (right/left) and other
parts of body simultaneously.

[2] American Sign Language Recognition System (2018): An Optimal Approach by


Shivashankara S [1] Srinath S [2]. In this paper, it can be extended to recognize the rotation
and distance invariant ASL Alphabets gestures, numbers gestures and other complex
gestures in different background.

[3] SIGN LANGUAGE RECOGNITION by Muskan Dhiman [1] and Dr G.N. Rathna [2].
In this paper they introduced that for user- dependent, the user will give a set of images to
the model for training, so it becomes familiar with the user.

IV. Problem Statement


Understanding the exact context of symbolic expressions of deaf and dumb people is the
challenging job in real life until unless it is properly specified.
V. Project Objective

The objective of this project is to develop a system for the symbolic expression through
images so that the communication gap between a normal and physically-impaired people
can be easily bridged.

5.1 Sub Objective


To detect the hand gestures in Real-Time
To achieve the maximum accuracy.

VI. Methodology
Phase 1 Extracting the holistic key points
Live perception of simultaneous human pose, face landmarks, and hand tracking in real-
time can enable various modern life applications: fitness and sport analysis, gesture control
and sign language recognition, Mid-Air writing. By using the inferred pose landmarks, we
derive three regions of interest (ROI) crops for each hand (2x) and the face, and employ a
re-crop model to improve the ROI. We then crop the full-resolution input frame to these
ROIs and apply task-specific face and hand models to estimate their corresponding
landmarks. Finally, we merge all landmarks with those of the pose model to yield the full
540+ landmarks.
Step 1: We first defined the draw_landmarks through we will drawing the landmarks for face,
hands and poses.
Step 2: then we defined styled_landmarks which give some style to our landmarks by defining
the radius, thickness and the dedicate color to the face, left, right hand and pose.
mp_drawing. draw_landmarks (image, results. face_landmarks, mp_holistic.
FACE_CONNECTIONS,
mp_drawing.DrawingSpec(color= (80,110,10), thickness=1, circle_radius=1),
mp_drawing.DrawingSpec(color= (80,256,121), thickness=1, circle_radius=1))
Step 3: Then we saved the extracted Key points or landmarks for further use in dataset creation.
As shown in (Fig 4).
(Fig 4) After merging the hand and face landmarks extraction

Phase 2 Creating the dataset


2.1 Comparing Methods
2.1.1 Using Grayscale for dataset creation
In this method we use grey scaling to convert the input video stream or frames so as to
decrease the computational power but this needs a plain background to capture the images
to get a perfect structure of the hand gestures (Fig 5). When we tried this with different
background the hand gestures were not clear (Fig 6)

(Fig 5) Structure is clearly visible in this senario


(Fig 6) Here we can see that in this background hand structure is not clear (Fig 6)

2.1.2 Using media pipe for dataset creation


In this method media pipe has been used to create the dataset efficiently as through this
method we can capture the images with any background (fig) because it will work on the
key points or landmarks to detect the hand and face poses and as a result will provide more
accuracy in Real-time detection.
Further we will be creating the dataset with the help of various users (Fig) and different
angles for the uniqueness of the dataset.
Step 1: First we loaded the extracted keypoints np. load('0.npy').
Step 2: we then created the folder named “MP Data” which the help of the OS library
DATA_PATH = os.path.join ('MP Data')
Step 3: After created the folder we defined an array containing the name of the hand gesture that
our system will be going to recognize in real time.
actions = np.array(['Hello’, ‘Thank you'])
**Note: Just for the initial testing we have just created 2 gestures.
Step 4: then we are defining the no of sequences that we will be capturing or storing for the
training of our model.
Step 5: Then we are creating a loop and capturing the frames with the help of OpenCV and the
key points which we have extracted earlier.
Step 5: After capturing the frames we are labeling them in the form of dictionary
{'Hello': 0, 'Thank you': 1}
As shown in (Fig 1.8)
(Fig 7) Hand and face landmarks after extraction

Phase 3 Training and testing the model (LSTM)


Step 1: we begin with its object model= Sequential ()
Step 2: Then consist of layers with their types like model.add (Name of the layer)
Like: Max-pooling, Dense layer.
Step 3: After adding the sufficient layers of model the keras will communicate with
TensorFlow for the construction of the model.
Step 4: Then we will compile the model and during compilation it is important to add loss
function and an optimizer algorithm.
model. Compile (optimizer='Adam', loss='categorical_crossentropy',
metrics=['categorical_accuracy'])
Step 4: Then we will save the model as model.save
Step 5: Then after compilation we will train this model with our dataset.
3.1 Optimizer (Adam)
It is an adaptive learning rate method, which means, it computes individual learning
rates for different parameters. Adam uses estimations of first and second moment of
gradient to adopt the learning rate for each weight of the neural network.
3.2 Loss Function
Categorical cross entropy is a loss function that is used for single label categorization.
This is when only one category is applicable for each data point. In other words, an
example can belong to one class only.
(Fig 8) Accuracy graph

Phase 4 Real-Time Detection


Step 1: load the model that has been created.
Step 2: Then we used the OpenCV for the real time detection.
Step 3: loaded the media pipe model created earlier for the landmarks
Step 4: labeling the gesture so as they should be visible when detecting the signs.
Step 5: created a break function so that we can stop the program anytime by tapping q in
the keyboard.

(Fig 9) Real time detection


VII. Experimental Setup
System Requirements
Recommended Operating Systems.
1. Windows 7 or later.
2. Linux
3. Mac-OS
Hardware Requirements
1. Processor: Minimum 1 GHz.
2. A webcam or a USB Camera.
3. Memory (RAM): Minimum 2 GB.
Other Software used: (For Compilation of the project)
1. Python (3.7.4) IDE
2. Jupyter Notebook

VIII. UML diagrams


8.1 Use case Diagram

(Fig 10) Use case Diagram


Here, we have a physically impaired person as a user and our system
Prerequisite: Webcam
Scenario 1: Detecting the hand gesture

Primary Actor: User


Secondary Actor: System
Steps:
1. User will open he/her webcam
2. System will wait for the user to create a gesture
3. User will create a hand gesture of his choice
4. System will recognize the gesture.
5. System will show the respective name of that hand gesture on the screen.

8.2 Class diagram

(Fig 11) Class Diagram


We have a class named Camera which is having a function of capture video and which is associated
with OpenCV class as it will be reading the video in binary and perform the functions like
transform in frames and display the label. We have a one more class which is model which is
having attribute image frame and has a function to predict label.
8.3 Dataflow Diagram

(Fig 12) Dataflow Diagram


Here in DFD diagram we have entities which will interact with the system as are represented by
rectangles that is User, webcam, OpenCV and our model along with these entities we have some
processes that will perform represented by the circle which are transform in frames and extracting
landmarks and predicting the gestures.

8.4 Sequence Diagram

(Fig 13) Sequence Diagram


In sequence diagram first the webcam captures the gesture then it will call the OpenCV or send a
message to OpenCV to process the video stream and transform it into frames after that a
synchronous message will send to the model to predict the label and ask the OpenCV to display
that label to the end user
IX. Pert Chart

(Fig 14) Pert Chart

X. Reference and GIT link

https://github.com/harshita0501/Sign-Language-Recongnition/
[1]https://www.ai-media.tv/ai-media-blog/sign-language-alphabets-from-
around-the-world/
[2] https://google.github.io/mediapipe/
[3]https://www.researchgate.net/publication/262187093_Sign_language_recogni
tion_State_of_the_art/
[4]https://www.researchgate.net/publication/326972551_American_Sign_Langu
age_Recognition_System_An_Optimal_Approach/
[5]https://edu.authorcafe.com/academies/6813/sign-language-recognition/
[6]https://upcommons.upc.edu/bitstream/handle/2117/343984/ASL%20recognit
ion%20in%20real%20time%20with%20RNN%20-
%20Antonio%20Dom%C3%A8nech.pdf?sequence=1&isAllowed=y/

You might also like