You are on page 1of 3

RELATED STUDIES

The work-related to Sign language translation can be broadly classified into two sections. Sensor-based
sign language translation systems and Camera-based sign language translation systems. Sensor-based
systems use IMU’s for hand 2 A NEW ARCHITECTURE FOR HAND-WORN SIGN LANGUAGE TO SPEECH
TRANSLATOR. and finger tracking[[2, 4, 6]] and Camera-based systems record the user’s hand
movements and perform Computer vision algorithms for hand and finger tracking[[1, 3, 5]]. In [1] the
authors used deep learning and computer vision to recognize American sign language. Their proposed
model takes video sequences of sign making as input and extracts temporal (corresponding to hand
movements) and spatial features (corresponding to hand position) from these video sequences.CNN is
used for recognition of the extracted spacial features and RNN is used to train on the temporal features.
Both NNs work together and recognize the gestures. since these models are trained on video sequences,
there is visible training bias like dependency on skin tone. In [3] similar to [1] the authors used Camera-
based systems, their model takes real-time video sequence or video recording of hand movements and
uses optical-flow to interpret sign language. Optical flow works on determining how much an image
pixel moves between two frame images sequentially in 2D. In this method, the user is required to wear a
red glove for hand recognition and tracking across frames. In other researches like [5], authors used
HMM-based algorithms for gesture recognition. The major drawback in the above research is that the
signer should perform signs in front of the camera and some times fingers or movements can be
shadowed from the camera field of view. In [4] authors used hardware-based systems for sign detection.
The authors used hall effect sensors and a strong magnet for finger tracking. A strong magnet is placed
on the palm and hall sensors on the tip of the fingers. The value produced in sensors is related to
distance from magnet and this information is used for finger position tracking. In [2] ,[6] , the authors
used similar hardware sensors(flex, accelerometer and gyroscope) for getting finger position and
orientation. The above methods [2, 4] ,[6] are hardware-dependent and glove and training process is
custom for each person. Our research is an extension of [6] with methods to reduce hardware
dependency issues and usage of ML rather than hard-coding for sensors.

Steve Daniels, Nanik Suciati, and Chastine Fathichah in their article The Indonesian Sign Language using
YOLO has been implemented. The experiment on image and video data got 100% and 72.97% accuracy,
respectively. Recognition of the transition frames between one sign to another on video data
contributes to the misrecognition error. An algorithm to distinguish a transition frame and a sign frame
should be considered in the future to improve the accuracy of the system. With the processing speed of
8 fps, realtime recognition of video data can be performed by adjusting the speed of the recognition
process and frames rate. https://iopscience.iop.org/article/10.1088/1757-899X/1077/1/012029/pdf

The Indonesian Sign Language using YOLO was implemented in the study conducted by Steve Daniels et
al 2021 . Image and video data experiments yielded 100% and 72.97% accuracy, respectively. The
misrecognition mistake is exacerbated by the recognition of transition frames between signs on video
data. In the future, an algorithm for distinguishing between a transition frame and a sign frame should
be considered to increase the system's accuracy. Real-time video data recognition can be achieved with
a processing speed of 8 fps by modifying the recognition process speed and frame rate.
https://iopscience.iop.org/article/10.1088/1757-899X/1077/1/012029

Bhavadharshini M. et al (2021) concluded in his study entitled American Sign Dialect Translator which
utilizing Convolutional Neural Network (along with YOLO) in a real-world scenario. The proposed system
is verified to detect the hand gestures at an accuracy of 92.5% for commonly used phrases. In future, a
high-end semantic examination can be connected to the operative apparatus to improve the
acknowledgment competency complicated individual assignments. The acknowledgment rate also can
be expanded by improving the handling picture step as future work

The Indonesian Sign Language using YOLO was implemented in the study conducted by Steve Daniels et
al. in 2021. Image and video data experiments yielded 100% and 72.97% accuracy, respectively. The
misrecognition mistake is exacerbated by the recognition of transition frames between signs on video
data. In the future, an algorithm for distinguishing between a transition frame and a sign frame should
be considered to increase the system's accuracy. Real-time video data recognition can be achieved with
a processing speed of 8 fps by modifying the recognition process speed and frame rate.

This finding was established by Ms. Bhavadharshini et al. (2021) in their study named "American Sign
Dialect Translator," which used Convolutional Neural Network (together with YOLO) in a real-world
scenario. The proposed method has been tested to recognize hand movements with an accuracy of
92.5% for commonly used texts. A high-end semantic evaluation could be coupled to the operational
apparatus in the future to improve the recognition competency of complex individual activities.

Abey Abraham et al. (2018) concluded in their study entitled "Real-time conversion of sign language to
speech and prediction of gestures using Artificial Neural Network. In this work, association of various
needs of mute people with values of flex sensors is done and predicted their needs using a back-
propagation neural network. The neural network model will adjust their weights and make the
prediction accurate. These evaluation results can again be used as an input for training the neural
networks for future predictions.

Based on his research, Medhini Prabhakar et al. (2022) advised Sign Language Conversion to Text and
Speech. This project uses four different algorithms to recognize sign language. The FRCNN algorithm
achieves good accuracy but falls short of performance in terms of speed, which is necessary for real-
world use. The Media Pipe algorithm swiftly and accurately classifies letters in sign language, and the
identified hand gesture is transferred to voice for a better result.

Real time conversion of sign language to speech and prediction of gestures using Artificial Neural
Network - ScienceDirect
Abey Abraham et al. (2018) concluded in their study entitled Real time conversion of sign language to
speech and prediction of gestures using Artificial Neural Network. In this work, association of various
needs of mute people with values of flex sensor is done and predicted their needs using back-
propagation neural network. The Neural network model will adjust their weights and makes the
prediction accurate. There can be future works done to get those predicted values from this model to
the mute person‟s android application so that it can very easy for them to communicate with others and
get their needs without much strain. Along with real time conversion of American Sign Language to
speech, prediction of mute person‟s needs is also done using this model and future enhancements can
be done as if the prediction can reach mute person‟s mobile device and that person could evaluate
whether the prediction is true or not. This evaluation results can again be used as an input for training
the neural networks for the future predictions.

Medhini Prabhakar et al. (2022) recommended based on his study entitled Sign Language
Conversion to Text and Speech.This project uses four different algorithms to recognize sign
language. The FRCNN algorithm crosses the mark of good accuracy but fails to achieve performance
in terms of speed required for usage in real world. Media Pipe algorithm classifies alphabet in sign
language efficiently with a good number of accuracy and also the identified hand gesture is
converted to speech for a better result.

This project uses four different algorithms to recognize sign language. The FRCNN algorithm crosses
the mark of good accuracy but fails to achieve performance in terms of speed required for usage in
real world. The second algorithm which was used to implement this project was CNN for ASL. This
model was performing well in terms of speed but remained behind in terms of accuracy which was
again not hitting the brief of our project objective. YOLO algorithm was used in this project with an
aim of achieving good accuracy as well as speed in the same instance which was lacking in the first
two models. Though the conjunction of good accuracy as well as speed was achieved, the model was
not collecting input in real time. Keeping these hinderances into consideration, the project was
finally implemented using Media Pipe algorithm. The algorithm classifies alphabet in sign language
efficiently with a good number of accuracy and also the identified hand gesture is converted to
speech for a better result so that it be used for communication not only among deaf - mute and
vocals, but also can be applicable for visually impaired people. Regarding this problem, proposed
system is developed to solve communication problems for vocally disabled people and there by
encouraging all people to make better interactions and hence not make them feel isolated. To
improve accuracy and applicability of this Project work can be further extended on Server Based
Systems which will be implemented to improve the coverage. High Range Cameras can be used to
getthe better sign detection. By interconnecting such systems with a central computer will help in
accumulating the data and hence the performance of the whole network is benefited with this
exchangeof data since it will help to train the algorithm better.

You might also like