0% found this document useful (0 votes)
90 views6 pages

PFX 48420843

The document discusses the development of a Sign Language Recognition (SLR) system using Long Short-Term Memory (LSTM) deep learning models to detect and translate Marathi sign language gestures into text and audio. The system captures real-time hand gestures using a camera, processes the data, and achieves an accuracy of 97.22% with a dataset of 50 video samples for each of the seven gestures. This technology aims to facilitate communication for speech and hearing-impaired individuals by providing an automatic translation device.

Uploaded by

knucklechee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views6 pages

PFX 48420843

The document discusses the development of a Sign Language Recognition (SLR) system using Long Short-Term Memory (LSTM) deep learning models to detect and translate Marathi sign language gestures into text and audio. The system captures real-time hand gestures using a camera, processes the data, and achieves an accuracy of 97.22% with a dataset of 50 video samples for each of the seven gestures. This technology aims to facilitate communication for speech and hearing-impaired individuals by providing an automatic translation device.

Uploaded by

knucklechee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Sign Language Detection using LSTM

Shreyas Mhatre Sarang Joshi Hrushikesh B. Kulkarni


M. Tech Scholar Assistant Professor Assistant Professor
School of Mechatronics Engineering School of Mechatronics Engineering School of Mechatronics Engineering
Symbiosis Skills and Professional Symbiosis Skills and Professional Symbiosis Skills and Professional
University, University, University,
Pune (MH), India Pune (MH), India Pune (MH), India
shreyasnarendramhatre@gmail.com Sarang.joshi@sspu.ac.in hbkulkarni.coeo@gmail.com
2022 IEEE International Conference on Current Development in Engineering and Technology (CCET) | 978-1-6654-5415-5/22/$31.00 ©2022 IEEE

Abstract— Sign language is used by speech and hearing- In SLR, a system must recognize the identification and
impaired people as a method of communication. There are extraction of hand shape, position, facial expressions, and
thousands of sign languages used all around the globe. body posture.
Understanding sign language is a challenging task for a normal
person. At present, speech and hearing-impaired people depend Object detection is a computer vision technique in which
on human translators to make this task easier, but it is not system draws bounding boxes around objects, and locates
always feasible to have a human translator. The objective of the signs or actions. SLR Systems can be sensor and vision based
present work is to design and implement an algorithm which [14-18]. In a sensor-based system, users need to wear a glove
detects real-time sign language using deep learning. The but may create hesitance of they do not want to wear it.
proposed system uses a Long Short-term memory model to train Vision-based doesn’t require any wearables and it is based on
the dataset. The LSTM model works similarly as compared to the image processing technique which captures images to
recurrent neural networks. LSTM neural networks are used for recognize hand gestures.
the classification of sign language actions. A deep learning
neural network is similar to the human brain in that it uses a In SLR system is to capture the hand signs made by the
combination of factors such as inputs, weights, and biases to user. To capture the signs, the camera needs to be used which
perform various tasks, such as identifying and classifying provides a better balance between accuracy and affordability.
objects. Deep learning algorithms work better when they are SLR systems must be trained with sign language data for
trained with a large amount of data. The performance of the smooth and uninterrupted sign recognition. The aim of the
proposed system has been evaluated based on accuracy, present work is to build and implement an SLR system using
precision, and recall. The dataset consists of a Marathi sign deep learning on real-time basis.
which are daily used. The output of sign gesture is in text as well
as in audio format. For audio we used google text to speech II. LITERATURE REVIEW
library. The proposed system could classify seven gestures with
the highest training accuracy of 90-96%.
Challa Neha et al. have done a survey on sign language
Keywords— Sign language, speech and deaf people, deep detection and work done in this field. Various methodologies
learning, LSTM. such as sign gesture, text-to-speech enabling, and
combinations also explained [19]. Rastgoo et al. has made a
I. INTRODUCTION review on vision-based models of SLR using deep learning. It
Humans can communicate with each other with some form is observed that models proposed in these research shows
of language in verbal communication and by non-verbal by improvement in accuracy of SLR recognition with some
body movement, hand gestures to present their emotions [1]. challenges. Some taxonomy to categorize these models is
Hands are more frequently used to perform action which proposed for isolated and continuous SLR considering
resembles with the thought that person want to share with applications, datasets, complexity, and future scope [20].
others [2]. Speech and hearing-impaired people can only Rastgoo et al. have reviewed Sign Language Recognition
communicate by Sign language [3]. Hearing-impaired people and Production (SLP) deep learning. Summary of recent
use sign language for communicating their message. In sign developments their advantages, limitations, and future scope
language, they can show letters, words, or even sentences of is made [21]. Abedin et al. have proposed a novel architecture
by hand or finger action. of “Concatenated Bangla sign language (BdSL) Network”
Every country has different mother tongue and national which have 91.51% accuracy [22]. Mustafa et al. have
languages [4]. The biggest problem arrives when impaired reviewed many systems of SLR based on different classifier
person wants to converse with a normal person because they and focused on Arabic SLR systems using deep learning [23].
do not understand sign language. A human interpreter may Few authors also worked on recognition of American
understand sign language and communicate the message to Bengali Sign Language also by developing gloves enclosed
normal person, but availability and affordability is another with sensors. It can detect static as well as ddynamic symbols
problem [5]. To tackle these problems the best alternative is in the alphabets up to 96% accurately. Further an innovative
to develop a technology based automatic translator device that approach is used to perform continuous assessment of actions
will recognize sign language and convert it into a proper during run-time data [24].
message which a normal person can understand [6-10].
III. SIGN LANGUAGE AND ITS USE
Sign Language Recognition (SLR) is a computational
method to identify signs from the action of hearing Sign language is a naturally developed language like other
impairments [11-13]. spoken languages. It is used by the deaf to communicate in

Authorized licensed use limited to: University of New Brunswick. Downloaded on March 06,2025 at 19:18:47 UTC from IEEE Xplore. Restrictions apply.
daily life. It is regarded as the native language of the mute. It or computer. In this, more hardware complexity is required.
has complicated grammar and is an independent language, just
like other languages. Natural advanced visual and manual
signs are used.
Sign languages have been used for three centuries by
humans as a means of delivering messages and
communicating, especially for mute communities. In order to
convey a speaker's thoughts via sign language, hand gestures,
arm, hand, and body movements, as well as facial expressions,
may be used. But few normal people understand sign
language. Accordingly, like the results of the huge evolution
of technology, hearing aids were introduced to help mute
communities communicate with others or hear others. As
shown in fig. 1 we can see how the classification of a sign
language family has been done.
Fig. 2. Glove based Sign gesture recognition

Fig. 2 shows the block diagram of Glove based sign


gesture recognition. The Sensor-based approach limits the
natural motion of the hand because of complex hardware. Less
computational power is required in the sensor-based approach
as compared to vision-based approach.
2. Vision-Based Approach
Fig. 1. Classification of Sign Language Family In this approach software part plays important role in the
recognition of sign gestures. The vision-based approach is
Use a sign language interpreter to translate oral speech into having less hardware complexity.
sign language so that deaf people can understand what others This used the camera as the primary tool to obtain the
are thinking. When working with deaf children, teachers and required input data. In the vision-based approach, there are
special educators utilize sign language to convey knowledge two types: dynamic and static sign recognition. Dynamic sign
and teach a variety of subjects. Parents of dumb children often recognition videos are used and static images are used. In this,
use everyday life circumstances to communicate with their we must create our dataset of sign language. There are various
kids. It is used by many different professionals who work with techniques and algorithms are present. The only limitation
people with cognitive disabilities to address their unique with the vision-based approach is high computational power
needs. is required. Fig. 3 shows the block diagram of vision-based
A. Recognition of Sign Language sign gesture recognition.
The recognition of sign language largely uses two
methods: sensor-based and vision methods.
1. Sensor-Based Approach
In sensor-based recognition, hardware complexity is
greater than that of a vision-based approach. Most of the work
is performed by the sensors. There are mainly two types of
sensors used for data inputs: flex sensors and accelerometer
sensors. The processing is taken care of by the microcontroller
or microprocessor. Sensors provide input to the
microprocessor. Microprocessors analyze the signals and
provide output in the form of audio or text on a mobile device

Fig. 3. Vision-based Sign gesture recognition

Authorized licensed use limited to: University of New Brunswick. Downloaded on March 06,2025 at 19:18:47 UTC from IEEE Xplore. Restrictions apply.
IV. SIGN LANGUAGE DETECTION
B. Real Time Sign Recognition Process
In this project a sign language detection system is designed
System takes hand sign gesture as a input by using camera
which will detect hand gesture and create voice using deep
and gives output in real time on the computer/laptop screen.
learning approach.
For detection of hand sign there are so many libraries are
A. Indian Sign Language (ISL) present to extract the features from the camera input. One of
Every country in the world follows their own sign them is Media-Pipe. Fig. 6 shows a flow chart of the SLR
language. Indian people uses sign language as per mother process.
tongue of their state. In Maharashtra, deaf people uses Marathi
sign language alphabet. In this project Marathi SLR system is
designed and developed. When a deaf person creates a
Particular sign, it is recognized and respective text along with
voice is generated. Fig. 4 shows the alphabet and signs in
Marathi Sign Language.

Fig. 6. Flow Chart of SLR process

C. Working of Real Time SLR process


In this process model need to be trained, further a program
needs to execute which recognizes the hand gestures on real
time basis. Program recognizes gestures word to word as per
the sign of user. The proposed SLR system consists of four
major phases that are Video capturing, data pre-processing,
train, and test dataset.
i. i. Video Capturing
For video capturing web camera is used as input device or any
other camera will work as input device. By using NumPy
library we stored videos in NumPy format. Each video is of
Fig. 4. Marathi Sign Language images
10 sec. Dataset is stored offline. The video capturing is shown
in fig. 7.
Fig. 5 shows the Proposed system flowchart.

Fig. 7. Video capturing

ii. ii. Data pre-processing


In data pre-processing, MediaPipe Holistic generates total 543
landmarks. Hand gestures/signs are recognized by the
MediaPipe holistic library, which collects the key points of the
Fig. 5. Proposed System Flowchart hand and face. We use hand landmark model to get high-
fidelity 3D hand key points. There are total 21 key points are
collected of each hand. Extract the key point values in the form
of NumPy array. For data training the model uses 30 sample

Authorized licensed use limited to: University of New Brunswick. Downloaded on March 06,2025 at 19:18:47 UTC from IEEE Xplore. Restrictions apply.
videos of each sign gesture were collected for 8 commonly
used in Marathi sign language. Each actions have 30 videos
and each videos have 30 frames.
iii. Training
Dataset is split into two parts train and test.
In pre-processing the data, we have given a label
to each action. Long short-term memory is kind
of advanced Re-current Neural network. There
are three LSTM layers and three dense layers.
After the model training weights are saved. The
following fig. 8 shows the system architecture.

Fig. 9. Confusion matrix

V. TESTING OF MARATHI SIGN LANGUAGE DETECTION


SYSTEM
In this work, some of the sign used daily are collected and
converted in to voice as follows,
1. Namaskar (Hello)
2. Dhanyawad (Thank you)

Fig. 8. System Architecture 3. Mi thik ahe (I am ok)


4. Kay kartay ? (What you’re doing?)
iv. Testing
5. Bharat (India)
In the testing phase, we evaluate the model
using a confusion matrix and accuracy. A 6. Somvar (Monday)
confusion matrix is a representation of a
7. Mangalvar (Tuesday)
prediction of our model on test data. True
positive(TP), True negative(TN), false Code developed in Python, Fig. 10 shows the code
positive(FP), and false negative(FN) are the four flowchart and table I shows the comparative analysis.
main values in the confusion matrix. On the basis
of this value counts accuracy, precision and recall
are calculated.
Accuracy is calculated by the relation: -
. . . .
. . . . . . . .
Precision is calculated by the relation: -
. .
. . . .
Recall is calculated by the relation: -
. .
. . . .
For audio output, we used google text to speech
library. It gives us real-time output. The
following fig. 9 shows the confusion matrix.

Fig. 10. Flowchart of code in Python

Authorized licensed use limited to: University of New Brunswick. Downloaded on March 06,2025 at 19:18:47 UTC from IEEE Xplore. Restrictions apply.
A. Test Results

Fig. 14. Output of Fourth Sign


Fig. 11. Output of first Sign

Fig. 15. Accuracy graph

TABLE I. COMPARATIVE ANALYSIS

Author Technique used Accuracy


Abedin et al. CNN 91.51%
Rahaman et al. KNN 95.95
Kacper and CNN 93%
urszula
Uddin and SVM 97%
Fig. 12. Output of second Sign Chowdhury
Neha baranwal DWPT 91%
Dong Glove based 90%

VI. RESULT
After processing data using these methods and the
evaluation strategy, promising results were achieved. Around
7 dynamic signs from Marathi Sign Language with 50 video
samples of each sign and I used it for training, validation, and
testing. As shown in fig. 11-15 we can see the accuracy graph
during the epoch run. The audio output is in Marathi. The
above table shows a comparative analysis of various
techniques. The proposed system has achieved an accuracy
of 97.22%.
VII. CONCLUSION
In present work for speech and hearing-impaired person is
Fig. 13. Output of Third Sign designed and developed for detection of Marathi sign
language. This work will be helpful for deaf people to
communicate with other people. The long short term model
networks are then used to categorize the received gesture data
into appropriate text and audio outputs. The Google Media-
Pipe hand tracking tool is effective enough to be utilized for
gesture recognition. Google Text to Speech works effectively

Authorized licensed use limited to: University of New Brunswick. Downloaded on March 06,2025 at 19:18:47 UTC from IEEE Xplore. Restrictions apply.
for audio output. When compared to picture dataset, video Information Technology (ICITIIT), Kottayam, India, 2021, pp. 1-
dataset is more accurate. It produces more accurate results. In 6.Yin, Kayo, and Jesse Read. "Better sign language translation with
MC-transformer." Proceedings of the 28th International Conference on
this project, video datasets were created. We can create Computational Linguistics. 2020, pp. 5975-5989.
different datasets for different languages. Further this system [12] Khomami, Sara Askari, and Sina Shamekhi. "Persian sign language
converts hand gestures into text as well as speech or sound recognition using IMU and surface EMG sensors." Measurement vol.
audible for normal person. 168 (2021): 108471.
[13] B. Samal and M. Panda, "Integrative Review on Vision-Based
VIII. FUTURE SCOPE Dynamic Indian Sign Language Recognition Systems," 2021 1st
Odisha International Conference on Electrical Power Engineering,
The efficiency of this system can be further improved by Communication and Computing Technology(ODICON),
recognizing dynamic hand gestures with facial expressions. Bhubaneswar, India, 2021, pp. 1-6.
Facial expressions show the emotion of the person which may [14] Aloysius, Neena, and M. Geetha. "Understanding vision-based
improve communication. This approach might be done with a continuous sign language recognition." Multimedia Tools and
python library called dlib that tracks a total of 64 points of the Applications 79.31 (2020), pp. 22177-22209.
human face. [15] Kudrinko, Karly, et al. "Wearable sensor-based sign language
recognition: A comprehensive review." IEEE Reviews in Biomedical
REFERENCES Engineering 14 (2020), pp.82-97.
[16] A. S. Antony, K. B. V. Santhosh, N. Salimath, S. H. Tanmaya, Y.
Ramyapriya and M. Suchith, "Sign Language Recognition using
[1] DeVito, Joseph A., Susan O'Rourke, and Linda O'Neill. Human Sensor and Vision Based Approach," 2022 International Conference on
communication. New York: Longman, 2000. Advances in Computing, Communication and Applied Informatics
[2] Narayana, Pradyumna, Ross Beveridge, and Bruce A. Draper. "Gesture (ACCAI), Chennai, India, 2022, pp. 1-8
recognition: Focus on the hands." Proceedings of the IEEE conference [17] M. Al-Qurishi, T. Khalid and R. Souissi, "Deep Learning for Sign
on computer vision and pattern recognition. 2018, pp. 5235-5244. Language Recognition: Current Techniques, Benchmarks, and Open
[3] M. K. Bhuyan, D. Ghoah and P. K. Bora, "A Framework for Hand Issues," in IEEE Access, vol. 9, pp. 126917-126951, 2021.
Gesture Recognition with Applications to Sign Language," 2006 [18] Challa, Neha, et al. "Recent Advances in Sign Language Detection: A
Annual IEEE India Conference, New Delhi, India, 2006, pp. 1-6. Brief Survey." Recent Advances in Sign Language Detection: A Brief
[4] Jepsen, Julie Bakken, et al., eds. Sign languages of the world: A Survey (July 14, 2022) (2022).
comparative handbook. Walter de Gruyter GmbH & Co KG, 2015. [19] Rastgoo, Razieh, Kourosh Kiani, and Sergio Escalera. "Sign language
[5] Bragg, Danielle, et al. "Sign language recognition, generation, and recognition: A deep survey." Expert Systems with Applications 164
translation: An interdisciplinary perspective." The 21st international (2021): 113794.
ACM SIGACCESS conference on computers and accessibility. 2019, [20] Rastgoo, Razieh, et al. "Sign language production: a
pp. 16-35. review." Proceedings of the IEEE/CVF Conference on Computer
[6] M. F. Tolba and A. S. Elons, "Recent developments in sign language Vision and Pattern Recognition. 2021.
recognition systems," 2013 8th International Conference on Computer [21] Abedin, Thasin, et al. "Bangla sign language recognition using
Engineering & Systems (ICCES), Cairo, Egypt, 2013, pp. xxxvi-xlii. concatenated BdSL network." arXiv preprint
[7] R. Akmeliawati, M. P. -L. Ooi and Y. C. Kuang, "Real-Time Malaysian arXiv:2107.11818 (2021).
Sign Language Translation using Colour Segmentation and Neural [22] Mustafa, Mohammed. "A study on Arabic sign language recognition
Network," 2007 IEEE Instrumentation & Measurement Technology for differently abled using advanced machine learning
Conference IMTC 2007, Warsaw, Poland, 2007, pp. 1-6. classifiers." Journal of Ambient Intelligence and Humanized
[8] Luqman, Hamzah, and Sabri A. Mahmoud. "Automatic translation of Computing 12.3 (2021), pp. 4101-4115.
Arabic text-to-Arabic sign language." Universal Access in the [23] Saquib, Nazmus, and Ashikur Rahman. "Application of machine
Information Society 18.4 (2019), pp. 939-951. learning techniques for real-time sign language detection using
[9] San-Segundo, Rubén, et al. "Design, development and field evaluation wearable sensors." Proceedings of the 11th ACM Multimedia Systems
of a Spanish into sign language translation system." Pattern Analysis Conference. 2020.
and Applications 15.2 (2012), pp. 203-224. [24] H. D. Alon, M. A. D. Ligayo, M. P. Melegrito, C. Franco Cunanan and
[10] Dhanjal, Amandeep Singh, and Williamjeet Singh. "An automatic E. E. Uy II, "Deep-Hand: A Deep Inference Vision Approach of
machine translation system for multi-lingual speech to Indian sign Recognizing a Hand Sign Language using American Alphabet," 2021
language." Multimedia Tools and Applications 81.3 (2022): pp. 4283- International Conference on Computational Intelligence and
4321. Knowledge Economy (ICCIKE), Dubai, United Arab Emirates, 2021,
[11] K. Amrutha and P. Prabu, "ML Based Sign Language Recognition pp. 373-377.
System," 2021 International Conference on Innovative Trends in

Authorized licensed use limited to: University of New Brunswick. Downloaded on March 06,2025 at 19:18:47 UTC from IEEE Xplore. Restrictions apply.

You might also like