Professional Documents
Culture Documents
CS4099D Project
End Semester Report
Submitted by
Dr. Lijiya A
Assistant Professor
May 2023
NATIONAL INSTITUTE OF TECHNOLOGY CALICUT
KERALA, INDIA - 673 601
CERTIFICATE
Certified that this is a bonafide report of the project work titled
done by
Challa Saketh
Suddala Varun
Masina Sai Bhargav Teja
of Eighth Semester B. Tech, during the Winter Semester 2022-’23, in
partial fulfillment of the requirements for the award of the degree of
Bachelor of Technology in Computer Science and Engineering of the
National Institute of Technology, Calicut.
(Dr. Lijiya A)
04-05-2023 (Assistant Professor)
Date Project Guide
DECLARATION
I hereby declare that the project titled, Indian Sign Language Recog-
nition, is our own work and that, to the best of our knowledge and belief,
it contains no material previously published or written by another person
nor material which has been accepted for the award of any other degree or
diploma of the university or any other institute of higher learning, except
where due acknowledgement and reference has been made in the text.
ii
Abstract
This project explores the design and implementation of the Indian Sign Lan-
guage recognition system of static gestures. The system segregates and iden-
tifies the signs from the input video file containing gestures. This project aims
to ease the communication between the hearing impaired and normal people
without any involvement of sophisticated devices. The system will take a
video with a gesture or series of gestures as input and give corresponding
text as output.
ACKNOWLEDGEMENT
We would like to express our sincere and heartfelt gratitude to our guide
and mentor Dr. Lijiya A and Renjith P, who have guided us throughout
the course of the final year project. Without their active guidance, help,
cooperation and encouragement, we would not have made headway in the
project. We would like to thank our parents and the faculty members for
motivating us and being supportive throughout our work. We also take this
opportunity to thank our friends who have cooperated with us throughout
the course of the project.
i
Contents
1 Introduction 2
2 Problem Statement 4
3 Literature Survey 5
5 Experimental Results 14
5.1 Vision Transformers . . . . . . . . . . . . . . . . . . . . . . . 14
5.2 Convolutional Neural Networks . . . . . . . . . . . . . . . . . 17
6 Conclusion 20
References 21
ii
List of Figures
iii
List of Tables
1
Chapter 1
Introduction
Indian Sign Language (ISL) is the sign language used by the speech and
hearing impaired populations in India. In order to convey linguistic infor-
mation, it makes use of face, head, arm, and hand motions. ISL generates
both solitary and continuous indicators. An isolated sign is portrayed with
a precise hand placement and poses that are only concerned with one hand
motion. A series of images used to indicate a moving gesture is called a
continuous sign.
However, since the gestures made by the hearing impaired may not always
be directly related to the referent phrase, there may be a significant communi-
cation gap between them and the hearing. As a result, it’s essential to trans-
late sign language into text or speech that everyone can understand. This
2
CHAPTER 1. INTRODUCTION 3
Problem Statement
4
Chapter 3
Literature Survey
5
CHAPTER 3. LITERATURE SURVEY 6
One more recent work is in the area of single hand dynamic gesture
recognition by [2].The database used here is the data collected from Rah-
CHAPTER 3. LITERATURE SURVEY 7
maniya HSS special school, Calicut,India.It included 900 static images and
700 videos.During the pre-processing phase of images Viola Jones algorithm
is used to remove the face and the standard color phase of RGB is converted
to YCbCr which is much less sensitive to light and the large connected frame
is assumed to be hand thereby eliminating the rest.Next step involved an ROI
(Region Of Interest) extraction algorithm.The speciality of this was unlike the
other ones discussed in the model which imposed extra conditions on the im-
agery such as wearing full sleeves or wearing some identification band to the
palm for identifying the palm, it just removes all the unwanted areas of skin
and identifies the palm without any such external conditions.This is achieved
by extending the boundary box of face to neck to get a complete black space
for them.Centroid calculation comes next in which the centroid of the min-
imal boundary box surrounding the palm area is calculated.Now, once we
track the movement of this particular point ,the trajectory we gonna get can
be used in trajectory based gesture recognition.After this a series of steps are
involved such as key frame extraction(Eliminating of uninformative frames
which have no significant change in hand position or shape),Co-articulation
detection and resolving,where one gesture is influenced by other.This can
be of 3 types static and static co-articulation,static dynamic and even dy-
namic and dynamic.Next is feature extraction where features here mean hand
shape, hand motion, hand location and orientation and finally comes the clas-
sification part where the classification/separation of these gestures is done
carefully.This model achieves an accuracy of 89% .
Chapter 4
8
CHAPTER 4. PROPOSED WORK AND DESIGN OVERVIEW 9
4.2 Design
1. Image preprocessing is done first to get noise free frames and color
space is changed to YCbCr which is much less sensitive to light.
2. The palm region extraction and removal of other skin areas like face
and neck.
3. Each frame is XORed with the previous frame to identify the Keyframes
based on the change in number of white pixels
The models are trained for static gestures recognizing the letters. The
data set is split into 80:20 ratios for training and testing respectively and
accordingly, lables are assigned. Necessary image transformations are made
to make the raw image set robust against overfitting. Batches of images are
created to train the model. Training parameters are defined for each model
and the models are trained accordingly. Lastly, the model is tested against
the test set and the results were inferred.
CHAPTER 4. PROPOSED WORK AND DESIGN OVERVIEW 11
For the current dataset, both models were applied and results were an-
alyzed. Vi-Transformers achieved an accuracy of about 76% whereas CNNs
outperformed it by achieving an accuracy of about 98% .CNNs produced con-
siderably less training loss and high training accuracy as well when compared
to the Vi-T model. The figures Fig 5.1 and Fig 5.2 represent the graphs of
Training loss and training accuracy vs each epoch while training. Similarly,
CHAPTER 4. PROPOSED WORK AND DESIGN OVERVIEW 13
the graphs Fig 5.5 and Fig 5.6 represent the same data for Convolutional
Neural Networks.
Also, the variation in the accuracy of training with each epoch can be
analyzed with the help of the tables represented in Table 5.1 and Table 5.2.
Finally, it can be concluded that CNNs perform better than Vi-Transformers
in classifying the hand signs accurately for this dataset.
The following section of experimental results will give a much better idea
of the performance of the models.
Chapter 5
Experimental Results
Validation
Epoch Loss Accuracy
loss
1 2.621300 0.612790 2.427532
2 1.660000 0.711876 1.607804
3 1.358800 0.735770 1.375403
4 1.106200 0.738580 1.205604
5 1.188300 0.744202 1.095763
6 1.159000 0.750527 1.022064
7 0.993400 0.767393 0.983076
8 0.888500 0.743500 1.027860
9 0.932500 0.768096 0.951064
10 0.925400 0.754041 0.981194
14
CHAPTER 5. EXPERIMENTAL RESULTS 15
Validation
Epoch Loss Accuracy
loss
1 0.7559 0.8472 0.2046
2 0.0681 0.9834 0.1596
3 0.0241 0.9934 0.1283
4 0.0131 0.9965 0.1167
5 0.0018 0.9995 0.1189
6 0.0016 0.9995 0.1215
7 0.0015 0.9995 0.1235
8 0.0015 0.9998 0.1256
9 0.0015 0.9998 0.1271
10 0.0015 0.9998 0.1286
Conclusion
20
References
[1] S. C.J. and L.A., ”Signet: A Deep Learning based Indian Sign Language
Recognition System,” 2019 International Conference on Communication
and Signal Processing (ICCSP), 2019,pp.0596-0600.
[2] P.K. Athira, C.J. Sruthi, A. Lijiya,A Signer Independent Sign Language
Recognition with Co-articulation Elimination from Live Videos: An In-
dian Scenario,Journal of King Saud University - Computer and Infor-
mation Sciences,Volume 34, Issue 3,2022.
[3] Cheok, Ming Jin Omar, Zaid Jaward, Mohamed. (2019). A review
of hand gesture and sign language recognition techniques. International
Journal of Machine Learning and Cybernetics. 10. 10.1007/s13042-017-
0705-5
21
REFERENCES 22
[8] Das, S., Biswas, S.K. Purkayastha, B. A deep sign language recognition
system for Indian sign langu.age. Neural Computing and Applications
(2022)