You are on page 1of 8

Alphabet Signs Recognition using Pixels-based Analysis

Mohammad Raees1, Sehat Ullah2


Department of Computer Science and IT, University of Malakand, Pakistan.
1
visitrais@yahoo.com
2
sehatullah@uom.edu.pk

Abstract view the position of each known finger, the system


recognizes all those isolated signs in which thumb is
Sign language is the language of gestures and visible. The two earlier efforts made for Urdu
postures used for non-verbal communication. This alphabets recognition of PSL are that of Aleem et al
paper presents a novel vision based approach for the [2] and of Kausar et al [3]. The former system is 5DT
detection of isolated signs of Pakistan Sign Language DataGlove based which compares a scanned gesture
(PSL). The signs representing alphabets of Urdu image with stored database images. The second PSL
(national language of Pakistan) are recognized by recognition system needs the glove of eleven
distinguishing fingers. The algorithm, following a different colors.
model of seven phases, identifies each of the five One distinguishing feature of the proposed approach
fingers from their respective positions. After fingers’ is that neither colors nor gloves are used for fingers
recognition, signs are deduced from their states of identification, rather a finger-cap is used to cover the
being raised or down. For quick recognition, signs upper part of thumb. To further reduce computation,
are categorized into three groups based on the thumb instead of scanning whole of the scene, the system
position. Five testers evaluated the system using a captures only from the specified Region of Interest
simple low cost USB camera in a semi-controlled (ROI). Along with displaying the appropriate Urdu
environment. The results obtained are encouraging alphabet, the system pronounces the letter on behalf
as accuracy of the system exceeds a level of 85.4%. of the signer as well.

2. Literature review
1. Introduction
Sign language recognition is becoming a fat area of
Sign Language (SL) is used to communicate through HCI to make mute persons able for their daily
gestures instead of sounds. Apart from Sign conversation. The crux of HCI domains is to make
Language; the language of mute people, hand computer able to recognition signs and symbols.
gestures are used in different domains like traffic Several methods have been suggested to easily and
controllers, robot instructions, hand-free computing accurately detect the signs. Common methodologies
and remote video gaming. Although there is no used for signs recognition are Template Matching[4],
international sign language, every region of the world Conditional Random Fields (CRF) [5] and Dynamic
has its own sign language. Like American Sign Time Warping ( DTW) [6].
language (ASL) and British Sign Language (BSL), The procedure of Transition-Movement Models
Pakistan has its own sign language called Pakistan (TMM) [7] is based on hand movements recognition
Sign Language (PSL) where signs are used to to deduce continuous signs. The system although
represent Urdu letters and terms. Mute people always recognizes a large vocabulary of 513 signs but as the
need interpreter to make them able to communicate costlier sensor-based glove with magnetic trackers
in the society. Let the availability of interpreter alone, are used, the technique is rarely affordable.
it is impossible for an interpreter to accompany an HMM based hand gesture recognition is proposed in
individual (mute) in every matter and every time for [8] having 82.2% accuracy in recognizing 20 signs
the whole of his life. To solve this problem, this for Arabic alphabets. The system needs pre-training
paper is the first of our long-term goal of designing a of hand gestures with which an input image sign is
Digitized Urdu Sign Language Interpreter (DUSLI). compared by using eight feature points. Most of the
Unlike other sign language systems most of which sign recognition systems are gloves based as with
rely on prior training of signs [1], the proposed gloves, fingers positions and orientations are easily
algorithm dynamically finds out which finger is detectedable. System of Al-Jarah [9] uses Cyber
small, which one is middle and so on. Keeping in Gloves while that of Kadous et al [10] needs Power
Gloves. The approach based on Data Gloves for FSL phase. Group of sign is identified in fifth phase using
(French Sign Language) [11] have 83% accuracy. positions and status of the distinguished fingers.
Although all signs of Arabic alphabets can be Signs are recognized in sixth phase while the last
recognized by the algorithm of Khaled et al [12], due phase is to produce output both in audio and visual
to light sensitivity accuracy remains below 57%. The forms. The abstract level diagram of the algorithm is
algorithm of Simon et al [13] makes use of Kinect, given in Fig.1
which using infrared camera, is independent of
lighting conditions. As some signs needs facial Core and Kernel Extractions
expression and body motion, Kinect based system
could hardly cope with such dynamic signs. Skin
color segmentation is followed in [14, 15] where Edge Detection
hand section is extracted from the image. Boundary
tracking and fingertip methodology is pursued for
sign detection in [16]. Though the algorithm is robust
Template Matching
as only 5% errors occurred in the detection of all
those signs of ASL in which fingers are open, the
approach is neither applicable for signs in which
Distinguishing Fingers
fingers are closed nor for the motion based signs. The
approach of Thinning the Segmented Image is
presented in [17]. The technique of Morphological
Group Detection
operation has been exercised in [18] for hand
gestures recognition. The work of Nagarajan et al
[19] identifies hand postures used for signs of Sign Recognition
counting with the help of Convexity defects. The
Convexity defects of fingers vary at large with a slight
orientation of hand or of fingers. Furthermore, a skin Output
color object in the background could be falsely
treated as finger. The Non-manual features extraction Fig.1 Schematic of the proposed System
is a sub-domain of sign language used mostly for
continuous signs. Head pose and motion for sign 3.1. Core and Kernel images extractions
recognition is suggested in [20] where head shakes
based signs are well recognized by trackers. Facial Core image, in the proposed algorithm, is the portion
expression and Gaze direction are traced in the of captured scene most probable to hold hand posture
approaches discussed in [21] and [22] respectively while Kernel is the image that encloses exactly and
for continuous signs. The module of T. Shanableh merely the hand gesture. The very first phase is to
extracts features by K-NN and Polynomial Networks segment out the portion containing hand posture from
[23]. Algorithm of Tauseef et al [24] employs neural the rest of the whole input scene. This is achieved in
network for alphabet recognition where learning two steps, first extracting the Core image and then
schemes requires significant training and data the Kernel image. The process to obtain meaningful
sampling. The Visual Command system is suggested hand posture (Kernel) from a scanned frame is shown
in [25] where hand posture is surrounded by ordinary in Fig. 2
daily items. The work is interesting and can pave way
for the use of new symbols in sign language domain. Capturing whole scene

3. The proposed method Core image extraction

The proposed method is first of its kind to make use


of both Template Matching and low level pixels’ Binary of Core image
analysis for recognition of images captured from live
video. The system’s algorithm comprises of seven
phases. In first phase, image tightly bounding the Sliding Scan
hand posture is extracted for which edges are
detected in second phase. Third phase is to search out
thumb’s position through Template Matching. Kernel Image
Fingers are distinguished from one another in fourth Fig.2 Schematic of Kernel extraction
Core image is a 265x245 pixels image extracted in columns n extracted from rows r and columns c of
RGB from the dedicated region of running video, as Cr, exactly circumscribes the hand posture.
shown in Fig.3. The core image contains hand
postures along with the unwanted background, is
assigned to an image variable Cr.
(3)

HP
(4)
HP

For efficient results edges are detected by Sobel


Fig.3 The extracted Core image operator. For each point of Kr the gradient is
obtained as G;
If Cr, Kr and HP represents Core image, Kernel
image and the Hand Posture respectively then,
√( ) ( ) (5)
(1)

The white-spaces of the Cr containing mostly the


meaningless background are removed to achieve The edges detected Kernel is shown in Fig.5(b)
faster computation. Kernel (Kr) is the crux of the
core image tightly bounding the hand posture. To get
Kr, Cr is first assigned to a temporary image (T img)
which is converted to binary for the purpose of
finding the Tm(Topmost), Lm (Leftmost) and Rm
(Rightmost) pixels enclosing the HP as shown in
Fig.4.

Tm

(a) (b)
Fig.5 (a)The RGB of Kernel (b) Edge detected Kernel
Lm
Rm
3.3 Template matching

Fig.4 Binary of Cr with Lm,Tm and Rm Pixels. For easy identification of thumb, a unique Template-
sign is designed over thumb jacket (covering) that
can be seen in all the given figures. The thumb’s
template (Tt) is searched out in G of Kr using the
Considering the Rows and Columns of Cr enclosed most suitable Square Difference Matching method.
inside Tm, Lm and Rm, Kr is obtained from the Core
image.
∑ ( ) ( ) (6)
(2)
( )

3.2 Edge detection Where Tx and Ty represents width and height of the
template image respectively.
Before the detection of edges, slice of the scanned
RGB image confined by Tm, Lm and Rm is assigned to
Kr, as shown in Fig.5(a). Kr with rows m and
3.4 Distinguishing fingers I

Excluding thumb, the rest of the fingers are first


detected and then distinguished from each other. The M
R
algorithm follows the sliding technique of scanning S
to detect finger’s tip. So, is scanned from top-left
to bottom-right for the first black pixel to encounter.
To make the system signer-independent and to avoid
the distance problem, Finger’s Width ( ) is
calculated once for the topmost finger. This is Fig.7 Small(S), Ring(R), Middle (M) and Index (I)
achieved by going five pixels deep from the finger’s fingers recognized.
tip and calculating the Euclidean distance ( )
between the left ( ) and right ( ) edge- 3.5 Group detection
pixels of the finger as shown in Fig.6(b).
Based on the respective positions of fingers, closely
√( ) ( ) (7) resembled signs are group together. The logic
structure of the groups namely G1, G2, G3 is given
To avoid the slight width differences among fingers, as,
is enlarged by the addition of five pixels which is
then assumed constant for all the four fingers.
Thumb.x-20 ≤ index.x 1
(8)

G1
0

1 Thumb.x < index.x

G2
0
(a) (b)
Fig.6 (a) (a
Finger’s tip (b) Finger’s width
The same method of sliding is repeated to find out Thumb.x ≥ index.x+20 1
)
Topmost pixel ( ) for the detection of the
remaining three fingers. To avoid the scanning of an
already identified finger, G3
( ) lying inside of
any of the previously identified fingers are omitted.

( ) * }
where i representing the four fingers, ranges from 1 3.6 Sign recognition
to 4.
The next step is to distinguish each finger from the The distinguished fingers with their respective
rest. For this purpose, leftmost ( ) is find out positions are fed to the Engine; especially designed
from the set of known ( ) . The finger thus for Isolated signs. The system dynamically assigns x
accessed first from left is sure to be SMALL finger, and y positions of each finger to a Point variable
the next will be RING finger and so on as shown in declared with the name of that finger. The Engine
Fig.7 inferring from the respective positions of
distinguished fingers recognizes the signs.
Standard deviation (SD) about y-axis of all the
fingers, except Thumb, is calculated with the
following formula to find out whether the fingers are
down are not.
3.6.2 G2 Signs recognition. In signs, thumb
lies at left of index finger as in shown in Fig.9
(9)

If all the four fingers are down will be less than 6


while in the signs where some fingers are down and (a) (b)
other are raised the value of will be greater than 6.

3.6.1 G1 Signs recognition. In signs, thumb


being adjacent to index finger rests within a limit of
20 pixels at right of index, as clear from all the signs (c) (d)
shown in Fig.8

(e)

(a) (b) Fig.9 Signs (a) BAA (b) KAF (c) YAA
(d) ZOWAD and (e) SEEN

Logic structure of signs is presented as,

(c) (d)
Fig.8 Signs of (a) ALIF (b) JAA (c) BAA if(SD<8)
CHOTE-YA and (d) WAWO AND(Middle.y<Ring.y)
AND(Ring.y<Index.y)
Signs of are recognized using the following AND(Index.y<Small.y)
decision control statements. ZOWAD if (SD>6)
AND ( Index.y<Small.y )
AND (Small.y<Ring.y)
ALIF if ((SD<6) AND (Abs(Ring.y-Middle.y)<5)
AND ( Thumb.y <Index.y)) SEEN if (SD<6)
=
JAA if ((SD>6) AND (Thumb.y>Index.y+10)
AND ( Smallf.y>Ring.y ) KAAF if (Abs(Middle.y-Index.y)<5)
AND (Middle.y<Index.y)) AND(Abs(Ring.y-Small.y)<5)
= WAWO if( (SD<6) AND(Thumb.x<Index.x)
AND AND(Thumb.x>Middle.x)
(Thumb.y>Index.y+40)) YAA if (SD<6)
CHOTE if ((SD>6) AND (Thumb.y>Index.y+40)
YAA AND (Thumb.y<Index.y) AND (Thumb.x<=Index.x)
AND (Small.y<Index.y)
AND (Index.y<=Middle.y))
3.6.3 G3 Signs recognition. The signs in which reset the system for a new sign. Clicking over the
thumb is at least 20 pixels away at right from index Still button halts whole of the system while Exit
are grouped in , shown in Fig.10 button is to quit. To test the system, twenty trials
were performed for each sign by five signers.

Baa

(a) (b) Audio Visual


Fig.10 Signs (a) LAAM (b) SOWAD Output Output

Fig.11 The captured image with specified ROI


The two Signs of are distinguished using the
following rule.

Details of correct, false and missed detection of all


the ten digits are shown in Table 1. Accuracy was
calculated using the following formula, which yields
LAAM if ((Index.y<Middle.y)
85.4%.
AND(Index.y<Ring.y)
Accuracy (in %age) = ( No. of Correct Recognition –
AND(Index.y<Small.x)
No of Failed Recognition )/Total Signs * 100
= AND(Thumb.x>Index.x+20))
SOWAD if ((Thumb.x>Index.x+20)
AND (SD<6)) Table 1
Correct False Missed
Alphabets TOTAL
detection detection detection
ALIF 18 2 0 20
3.7 Output JAA 18 1 1 20

Engine of the system after recognizing the exact sign, WAWO 17 1 2 20


invokes the last module dealing with both audio and CHOTE YA 18 2 0 20
visual outputs. The detected sign is pronounced in the
exact standard Urdu accent and is displayed in a BAA 20 0 0 20
200X200 pixels image form. KAF 19 1 0 20
ZOWAD 19 1 0 20
4. Experimental result and analysis SEEN 20 0 0 20

We implemented the model in Microsoft Visual YAA 18 0 2 20


Studio 2010 with the library of OpenCV. A Corei5 LAAM 19 0 1 20
Laptop running at 2.5GHz was used for the
development and testing of the system. Each captured SOWAD 18 1 1 20
frame will hold four buttons which are highlighted
Total 204 09 07 220
with mouse move-over, as shown in Fig.11. A new
frame is captured after providing a healthy time of
30ms for the signer to pose. One can directly capture
sign by clicking the Capture button if hand gesture is
posed before the expiry of 30ms. ReStart button is to
A noteworthy attribute of the method is its accurate [5] S. Akram, J. Beskow and H. Kjellström, “Visual
recognition of the closely resembled signs like that of Recognition of Isolated Swedish Sign Language
ALIF and SOWAD already shown in Fig.8(a) and Signs”, Ph.D thesis, Cornell University Ithaca, New
10(b) and signs of WAWO and YAA as shown in York, United States, 2012.
[6] P. Doliotis, C. Mcmurrough, D. Eckhard, and V.
Fig.12(a) and 12(b) respectively. The earlier highly Athitsos, “Comparing gesture recognition accuracy
accurate systems like the Fuzzy Classifier Method fail using color and depth information”, Conference on
to distinguish such signs. Pervasive Technologies Related to Assistive
Environments, 2011.pp.20-20.
[7] G. Fang, W. Gao, and D. Zhao, “Large-Vocabulary
Continuous Sign Language Recognition Based on
Transition-Movement Models,” IEEE Transactions
on Systems, Man and Cybernetics; 2007: Vol.34
Issue 3, pp.305-314.
[8] W. Gao, J. Ma, J. Wu and C. Wang, “Sign language
recognition based on HMM/ ANN/DP”, International
(b) Journal of Pattern Recognition and Artificial
(a)
Intelligent, 2000, p. 587-602.
Fig.12 The closely resembled signs of (a) [9] O. Al-Jarrah and A. Halawani, “Recognition of
WAWO and (b) YAA gestures in Arabic sign language using neuro-fuzzy
systems”, Artificial Intelligence and Soft Computing,
2006: pp.117-138.
[10] Kadous and M. Waleed, “Machine recognition of
Auslan signs using PowerGloves towards large-
5. Conclusion and future work lexicon recognition of sign language”, Proceedings of
the Workshop on the Integration of Gestures in
The system is specifically designed for the Language and Speech, 1996: pp. 165-174.
recognition of alphabets from PSL but applicability [11] H.D. Yang and S.W. Lee, “Robust sign language
and accuracy of the algorithm testifies that it is recognition with hierarchical conditional random
pursuable for any sign language. Tilt or orientation of fields”, IAPR International Conference on Pattern
hand exceeding ten degrees in either axis is the main Recognition, Istanbol Turkey, August 2010, pp. 2202
reason of false detection. We have aimed to enhance - 2205.
[12] K. Assaleh and M. Al-Rousan, “Recognition of
the algorithm in future so that orientation up to large Arabic Sign Language Alphabet Using Polynomial
extent could be tolerated. Classifiers”, EURASIP Journal on Applied Signal
Processing, 2005, pp. 2136-2146.
[13] S. Lang, D.Marco and B. Berlitz, “Sign Language
6. References Recognition with Kinect”, MS thesis, Freie
Universität, Berlin, 2011.
[1] P. S. Rajam, G. Balakrishnan. “Real Time Indian [14] Vezhnevets, V. Sazonov and A. Andreeva, “A Survey
Sign Language Recognition System to aid Deaf- on Pixel-based Skin Color Detection Techniques”,
dumb People”, IEEE 13th International Conference Proc. Graphicon, Moscow, Russia 2003, pp. 85-92.
on Communication Technology, China, 2011, pp. [15] X. Zabulisy, H. Baltzakisy and A. Argyroszy,
737-742. “Vision-based Hand Gesture Recognition for Human-
[2] A. K. Alvi, M. Y. Azhar, M. Usman, S. Mumtaz, S. Computer Interaction”, Ph.D. thesis, Institute of
Rafiq, R. U. Rehman and I. Ahmed, “Pakistan Sign Computer Science Foundation for Research and
Language Recognition Using Statistical Template Technology, University of Crete Heraklion, Crete,
Matching”, International Conference On Information Grece, 2007.
Technology, Istanbul, Turkey, December 2005, [16] J. Ravikiran, K. Mahesh, S. Mahishi, R. Dheeraj, S.
pp.108-111. Sudheender and N. V. Pujari, “Finger Detection for
[3] S. Kausar, M. Younus, Javed and S. Sohail. Sign Language Recognition”, Proceedings of the
“Recognition of Gestures in Pakistani Sign Language International MultiConference of Engineers and
using Fuzzy Classifier”, 8th WSEAS International Computer Scientists, Hong Kong, 2009, Vol. 01,
Conference on Signal Processing, Computational pp.18-20.
Geometry and Artificial Vision, Rhodes, Greece, [17] R. Rokade and M. Kokare, “Hand Gesture
2008, pp. 101-105. Reccognition by Thinning Method”, International
[4] T. Starner, J. Weaver and A. Pentland. “Real-time Conferenceon Digital Image Processing, IEEE
American Sign Language recognition using desk and Computer Society, Bangkok, 2009, pp. 284 - 287.
wearable computer based video”, IEEE Transactions [18] J. Kuch and T. S. Huang, ”Vision-based hand
on Pattern Analysis and Machine Intelligence, 1998. modeling and tracking for virtual teleconferencing
vol. 20: p. 1371-1375. and telecollaboration”, IEEE Int.Conf. Computer
Vision, Cambridge,1995, pp. 666-671.
[19] S. Nagarajan, T. Subashini and V. Ramalingam,
“Vision Based Real Time Finger Counter for Hand
Gesture Recognition”, International Journal of
Technology CPMR-IJT, December 2012. Vol. 2.
[20] U.M. Erdem and S. Sclaroff, “Automatic detection of
relevant head gestures in American Sign Language
communication”, IAPR International Conference on
Pattern Recognition, BRAC University, 2002, Vol.1,
pp.460-463.
[21] V. Agris, M. Knorr, K.F. Kraiss and J. Kim, “The
significance of facial features for automatic sign
language recognition”, 8th IEEE International
Conference on Automatic Face Gesture Recognition,
Amsterdam, Netherlands, Sept. 17-19 2008, pp. 1-6.
[22] L. Muir and S. Leaper, “Gaze tracking and its
application to video coding for sign language”,
Picture Coding Symposium, the Robert Gordon
University, Schoolhill, Aberdeen, UK, 2003, pp. 321-
325.
[23] T. Shanableh and K. Assaleh, “Arabic sign language
recognition in user independent mode”, IEEE
International Conference on Intelligent and
Advanced Systems, Kuala Lumpur 2007, pp.597-600.
[24] H. Tauseef, M.A. Fahiem, and S. Farhan,
“Recognition and Translation of Hand Gestures to
Urdu Alphabets Using a Geometrical Classification”,
Second International Conference in Visualization,
page 213-217, Barcelona, Spain, July 2009
[25] Chalechale and F. Safae, “Visual-based interface
using hand gesture Recognition and Object
Tracking”, Iranian Journal of Science & Technology,
Transaction B, Engineering Shiraz University, 2008
Vol. 32, pp. 279–293.

You might also like