You are on page 1of 12

IPASJ International Journal of Information Technology (IIJIT)

Web Site:

A Publisher for Research Motivation ........
Volume 5, Issue 10, October 2017 ISSN 2321-5976

Implementation of Human Hand Gesture

Recognition System for External Control
R Vadivelu1, G Santhakumar2 , D Balasubramaniam3
1, 2
Department of Electronics and Communication Engineering, Sri Krishna College of Technology, Coimbatore, India
Department of Electronics and Communication Engineering, GKM College of Engineering and Technology, Chennai, India

Image processing plays a vital role in all the emerging domains. Cameras that are connected to computer systems record
sequence of digital images of human hand in order to interpret human posture/gesture. Human hand posture/gesture
recognition system has been utilized for providing virtual reality mechanism that is still an ongoing research in human-
computer interaction (HCI) community. Active HCI techniques like keyboard, mouse, joysticks etc., limit the speed of human
interaction. In this paper we propose a novel system for detecting and recognizing human hand gesture. This system is more
user-friendly and can be used even in robust environment. There are a lot of single hand gestures, the required gestures are
first recorded in the form of images and during run time the gestures to be recognized are also taken in the form of image and
are compared with the set of gestures previously recorded. If the images are matched with the stored one then corresponding
output results will be generated. This results are interfaced with the hardware circuits with the help of an arduino board and
interfaced with the boards or circuits between computer and then applied networks. Finally the hand gestures of a human are
thereby implemented in controlling electronic Devices. This technique can be applied in controlling robots, machines and other
electronic instruments.
Keywords: Digital Image, human computer Interaction hardware circuits, hand gesture, electronic devices.

Digital image processing (DIP) is the process processing digital images using computer algorithms [1]. Image
processing classified into digital image processing and analog image processing. DIP has lots of reward over analog
image processing. DIP has wider range of algorithms to be applied to the input data and evade the noise and signal
distortion during processing. Images are defined over two dimensions, DIP may be modelled in the form
of multidimensional system. Image processing is a method to convert an image into digital form and perform some
operations on it, in order to get an enhanced image and to extract some useful information from it. It is a type of signal
dispensation in that input is an image, like video frame or photograph and output may be image or characteristics
associated with that image. Usually Image Processing, treating images as two dimensional signals while applying a set
of signal processing algorithms to them. DIP applications vary from digital camera images, intelligent transport
systems, image sharpening and restoration, biomedical image processing, machine vision, pattern recognition and
video processing etc.,

Fundamental steps involved in digital image processing composed of a finite number of elements, each of that has a
particular location and value. These elements are referred as pixels. Before processing an image, it is converted into
digital image. Many possible processing and analysis steps are performed on a digital image, in this paper, edge
detection, segmentation, image matching and image recognition processes [2] are considered.

The computers evolution in the middle of last century, pattern recognition (PR) based on digital images and
correlations have been applied in various fields of science and technology. Pattern recognition is a probabilistic method
in that the objects are represented as their statistical characteristic features [3]. PR are used for face recognition and
image restoration [4, 5] and nger prints classication [6, 7]. These works useful to classify micro-objects, where the
object variation, background, scale, illumination, etc., are taking into account [8, 9, 10]. The aim of this work is

Volume 5, Issue 10, October 2017 Page 5

IPASJ International Journal of Information Technology (IIJIT)
Web Site:
A Publisher for Research Motivation ........
Volume 5, Issue 10, October 2017 ISSN 2321-5976

identify a specic target no matter the position or the angle of rotation presented on the plane.
2.1 Edge Detection Algorithm
Edge detection is a type of image segmentation techniques that determine the presence of an edge or a line in an image
and outlines them in an appropriate way. Applying edge detection leads to simplify the image processing that maximize
the data concerned. Generally, an edge is defined as the border between two image regions, where large changes in
intensity occur. The detection operation begins with the examination of the local discontinuity at each pixel element in
an image. Amplitude, orientation, and location of a particular region in the image that is of interest, are essentially
important characteristics of possible edge. Based on these characteristics, the detector has to decide whether each of the
examined pixels is an edge or not. A widely used method for first order derivative edge detection that is adopted in this
paper is based on evaluating the gradients generated along two orthogonal directions. According to this method, an
edge is said to be present if the gradient exceeds a defined threshold value, t = T. For a location (x,y), the gradient can
be computed as the derivatives along both orthogonal axes
Gx, y Fx, y x cos Fx, yy sin (1)
The gradient is estimated in a direction normal to the edge gradient. The spatial average gradient can be expressed as
Gj , k GR ( j , k )2 GC ( j , k )2 (2)
In equations (1) and (2), F(x, y) is the original input image, whereas G(j, k) refers to the output differential image. A
simplest discrete row and column gradients (GR and GC) are given by.
GR ( j , k ) F ( j , k ) F ( j , k 1) (3)
GC ( j , k ) F ( j , k ) F ( j 1, k ) (4)
Based on equations (3) and (4), Roberts proposed an operator for edge detection by evaluating gradients and the
directions, for image pixels, using a [2x2] array that is referred as mask operators. The elements of the Roberts mask
operators as shown in Figure 1

1 0 0 1

0 -1 -1 0

x-axis y-axis
Figure 1 Roberts mask operators

Prewitt, Sobel, and Canny [11] developed other edge detector operators, whose mask operators for both x, and y axes
are tabulated in Table I. The mask operators are [3x3] arrays.

After applying an edge detector, a smoothing process can be conducted, where detected edges are subjected to thinning
and linking process to optimize the boundaries leading to segmentation. A function called canny smoothing is adopted.
Edges are significant local changes of intensity in an image or Edges typically occur on the boundary between two
different regions in an image. Edge Detection is used to produce a line drawing of a scene from an image of that scene.
Important features can be extracted from the edges of an image e.g., corners, lines and curves. These features are used
by higher-level computer vision algorithms e.g., recognition. Cause of intensity changes various physical events.
Geometric events object boundary (discontinuity in depth and/or surface colour and texture) surface boundary
(discontinuity in surface orientation and/or surface colour and texture) Non-geometric events specularity (direct
reflection of light, such as a mirror) shadows (from other objects or from the same object) inter-reflections.
Edge descriptors
Edge normal - Unit vector in the direction of maximum intensity change.
Edge direction - Unit vector to perpendicular to the edge normal.
Edge position or centre - Image position at which the edge is located.
Edge strength - It is related to the local image contrast along the normal.
The four steps of edge detection:
Smoothing - Suppress as much noise as possible, without destroying the true edges.
Enhancement - Apply a filter to enhance the quality of the edges in the image (sharpening).
Detection - Determine which edge pixels should be discarded as noise and which should be retained (usually,
thresholding provides the criterion used for detection).

Volume 5, Issue 10, October 2017 Page 6

IPASJ International Journal of Information Technology (IIJIT)
Web Site:
A Publisher for Research Motivation ........
Volume 5, Issue 10, October 2017 ISSN 2321-5976

Localization - Determine the exact location of an edge (sub-pixel resolution might be required for some
applications, that is, estimate the location of an edge to better than the spacing between pixels).
2.2 Feature Extraction

Feature extraction and representation is a crucial step for multimedia processing. How to extract ideal features that can
reflect the intrinsic content of the images as complete as possible is still a challenging problem in computer vision.
However, very little research has paid attention to this problem in the last decades. So in this paper, we focus our
review on the latest development in image feature extraction and provide a comprehensive survey on image feature
representation techniques. In particular, we analyze the effectiveness of the fusion of global and local features in
automatic image annotation and content based image retrieval community, including some classic models and their
illustrations in the literature.

2.3 Issues of Hand Gesture Interaction

Most of the researchers classified gesture recognition system into mainly three steps after acquiring the input image
from camera.

Extraction Method Features Classification


Figure 2 Gesture Recognition System

3.1 Image Feature Extraction

1) Colour Features: Colour is one of the most important features of image. Colour features are defined subject to a
particular colour space or model. A number of colour spaces have been used in literature, such as RGB, LUV, HSV and
HMMD. Once the colour space is specified, colour feature can be extracted from images or regions. A number of
important colour features have been proposed in the literature, including colour histogram, colour moments (CM),
colour coherence vector (CCV) and colour correlogram, etc. Among these, CM is one of the simplest and effective
features. The common moments are mean, standard deviation and skewness, the corresponding calculation can be
defined as
1 N
i fij
N j 1

1 N 2 2
i f ij i (6)
N j 1
1 N 3 3
i f ij i (7)
N j 1
Where fij is the colour value of ith colour component of the jth image pixel and N is the total number of pixels in the
image. i , si , (i=1, 2, 3) denote the mean, standard deviation and skewness of each channel of an image respectively.
Table I provides pros and cons of different colour methods from the literature such as dominant colour descriptor
(DCD), colour structure descriptor (CSD) and scalable colour descriptor (SCD) respectively.

TABLE I Contrast of different colour descriptors

Colour Contrast of different colour descriptors

Method Pros. Cons.
Histogram Simple to compute, intuitive High dimension, no spatial info,
sensitive to noise
CM Compact, robust Not enough to describe all the
colours, No spatial Information
CCV Spatial Information High dimension and computation
Correlogram Spatial Information High computation cost, Sensitive to

Volume 5, Issue 10, October 2017 Page 7

IPASJ International Journal of Information Technology (IIJIT)
Web Site:
A Publisher for Research Motivation ........
Volume 5, Issue 10, October 2017 ISSN 2321-5976
noise, rotation and scale
DCD Compact, robust and Need post processing for spatial
perpetual info
CSD Spatial Info, robust Sensitive to noise
SCD Compact on need, No spatial info, less accurate

Texture Features: Texture is a very useful characteristic of an image. It is believed that human visual systems use
texture for recognition and interpretation. In general, colour is a pixel property whereas texture is a measure of pixel.
Various techniques have been proposed to extract texture features. Texture feature is broadly classified into spatial
texture feature extraction methods and spectral texture feature extraction methods. For the former approach, texture
features are extracted by computing the pixel statistics or finding the local pixel structures in original image domain,
whereas in the latter transforms an image is converted into frequency domain and then texture feature is extracted from
the transformed image. Table II summarizes the pros. and cons. of the spatial and spectral features.


Texture Contrast of different colour descriptors

Method Pros Cons
Spatial Meaningful, easy to Sensitive to noise and distortions
Texture understand, can be extracted
from any shape without
losing information
Spectral Robust, require less No semantic meaning, need square
Texture computation image regions with sufficient size

As the most common method for texture feature extraction, Gabor filter [12] has been widely used in image texture
feature extraction. To be specific, Gabor filter is designed to sample the entire frequency domain of an image by
characterizing the centre frequency and orientation parameters. The image is filtered with a bank of Gabor wavelets of
different preferred spatial frequencies and orientations. Each wavelet captures energy at a specific frequency and
direction that provide a localized frequency as a feature vector.
Thus, texture features can be extracted from this group of energy distributions. Given an input image I (x,y), Gabor
wavelet transform convolves I (x,y) with a set of Gabor filters of different spatial frequencies and orientations. A two-
dimensional Gabor function g(x,y) can be defined as
1 1 x2 y2
g ( x, y ) exp 2 2 2 jWx (8)
2 x y
2 x y
Where x and y are the scaling parameters of the filter (the standard deviations of the Gaussian envelopes), W is the
centre frequency, and defines the orientation of the filter. Figure 3 shows Gabor function in spatial domain.

Figure 3 Gabor function in the spatial domain

Volume 5, Issue 10, October 2017 Page 8

IPASJ International Journal of Information Technology (IIJIT)
Web Site:
A Publisher for Research Motivation ........
Volume 5, Issue 10, October 2017 ISSN 2321-5976

Hand gesture has been identified as one of the most common and natural communication media among human being.
Hand gesture recognition research has gained a lot of attention because of its applications for interactive human-
machine interface and virtual environments. Most of the recent works related to hand gesture interface techniques has
been categorized as:
Glove-based method and
Vision-based method
Glove-based gesture interfaces require the user to wear a cumbersome device, and generally carry a load of cables that
connect the device to a computer. There are many vision-based techniques, such as model-based and state-based that
has been proposed for locating objects and recognizing gesturers. Recently, there have been an increasing number of
gesture recognition researches based on vision. To design and build a human machine interface using the 3D
information provided by a camera for automatic and real time identification of hand gesture.

In this paper, our focus is on two applications.

Designed to recognize the number of raised fingers that appear in a hand gesture.
Moving an object in a virtual environment using only a hand gesture on the acquired images.

For a real-time application, the expectation is to obtain the best possible images of the hand gesture within the lowest
possible time. Some experiments have been conducted with the purpose of defining the best configuration for imaging
the hand. This configuration includes the relative position of the hand and the camera, the influence of the integration
time of the camera, the amplitude threshold, and the lighting conditions of the environment, the surrounding objects
and the skin colour.


Segmentation of the hand information from the captured image, points is projected into the palm plane. Following steps
are used to extract the hand outline that is smoothed. Number of active fingers inferred from the number of U-turns that
appear in the refined outline. The methodologies used in the proposed system are as shown in the Figure 4. Ragged
corners still appear in the outline after smoothing.

Segmentation Projection Extraction of Outline Gesture

into Palm Outer Points Smoothing Recognition

Figure 4 Methodologies followed for the proposed system

1. Projection into Palm Plane: In order to extract the outline of the hand gesture, the point cloud describing the
hand is projected onto the palm plane. First the palm is detected by computing the centre of gravity of the segmented
data. The points within a range threshold from the centre of gravity are assumed to belong to the palm. Using a least
square adjustment, a regression plane is fitted within these points. All points in the hand segment are then projected
into that plane.
2. Extraction of Outer Points: The objective of extraction of outer points is to generate a first approximation of the
outline with the maximum number of corners. To achieve this goal, the method of a modified version of the convex hull
has been adopted. The convex hull is the smallest convex boundary containing all the points. The use of a modified
version of convex hull instead of the original version is justified by the fact that the original convex hull doesnt
provide a boundary with all necessary corners. Some concave corners are not part of the outline. The process starts with
the determination of the lowest left corner. The following corners are determined successively. A moving window
centred on the current corner is used to collect neighbouring points. The second outline corner is the point that forms
with the first corner the least azimuth (Line 1 in Figure 5). For the remaining corners, the exterior topographic angle
between the previous corner, the current corner and each of the selected points is computed. The next corner of the
outline is chosen in such a way that the computed angle is the least and the current segment line doesnt cross over
any previously determined segment (Lines 2 and 3 in Figure 5). The line 4 in Figure 5 shows the result of the computed
boundary compared to the raw points and the original convex hull.

Volume 5, Issue 10, October 2017 Page 9

IPASJ International Journal of Information Technology (IIJIT)
Web Site:
A Publisher for Research Motivation ........
Volume 5, Issue 10, October 2017 ISSN 2321-5976

3. Smoothing Outline: This step takes input from the outer points produced in the previous step that generates line
segments. The main difference between the previous steps is an outline better regularized with a noticeable reduction of
the number of outline points. The Douglas-Peucker [13] algorithm uses the closeness of a vertex to an edge segment to
smooth an outline. This algorithm starts by considering the single edge joining the first and last vertices of the poly
line (Stage 1 in Figure 6). Then the remaining vertices are tested for closeness to that edge. If there are vertices
further than a specified tolerance, away from the edge, then the vertex furthest from it is added to the simplified
polygon. This creates a new guess for the simplified poly line (Stages 2, 3 and 4 in Figure 6). This procedure is
repeated recursively. If at any time, all of the intermediate distances are less than the threshold, then all the
intermediate points are eliminated. Successive stages of this process are shown in Figure 6 and an example of result is
provided in Figure 6.
4. Feature Extraction: In the proposed system, the motion of the object provides important and useful information
for object localization and extraction. To obtain the movement information, it is assumed that the input gesture is non-
stationary. When objects move in the spatial-time space, motion detector able to track the moving objects by examining
the local gray-level changes. Let f(x,y) be the ith frame of the sequence and d(x,y) be the difference image between the
ith and the dth frame dened as
Di ( x, y ) Ti {| Fi ( x, y ) Fi 1 ( x, y ) |} (9)
1,| Fi ( x, y ) Fi 1( x, y ) | threshold
Di ( x, y ) { (10)
0, otherwise
5. Thresholding: Having extracted the moving object region, we can apply the thresholding on the frame
difference (i.e. Eq. (10)) to extract the possible moving region in complex background. It is found that conventional
thresholding methods, such as Ostu thresholding [14], are not suitable for detecting motion difference. Instead, a
simple thresholding technique is used to extract moving regions. The threshold for motion detection is determined as t
0:2m; where m is the average luminous of captured image FMi(x,y): Figure 7 shows that if there is no signicant
movement, Ostu thresholding method will generate a noise. We choose the weighting factor 0.2 because we dont need
highly precise segmented image. Our thresholding technique is not very sensitive to the speed of the hand movement, it
is more stable than the Ostu method.

Figure 5 Principle of the modified version of the convex hull

Volume 5, Issue 10, October 2017 Page 10

IPASJ International Journal of Information Technology (IIJIT)
Web Site:
A Publisher for Research Motivation ........
Volume 5, Issue 10, October 2017 ISSN 2321-5976

Figure 6 At each stage:

Dashed line: Next Approximation
Gray Line: Original Polyline
Black line: Initial Approximation

Figure 7 Douglas-Peucker Principle

6. Skin colour detection: Skin can be easily detected by using the colour information. First, we use the constraint,
i.e. R . G . B, to found the skin colour regions that may include a wide range of colours, such as red, pink, brown, and
orange colour. Therefore, we will nd many regions other than the skin regions. However, those non-skin regions
satisfy our constraint that will exclude motion information, e.g. a region in orange colour will not be Miss Identied as
the hand region. Second, we may obtain some sample colours from the hand region. To nd the skin regions, we
compare the colours in the regions with the pre stored sample colour. If they are similar, then the region must be skin
region. The hand region is obtained by the hand tracking process in the previous frame. Figure 8 shows our skin
detection results. The rectangular region is the hand region in the previous frame. Finally, we may eliminate some skin
similar colours, e.g. the orange colour, and denote the skin colour image as Si(x,y)

Figure 8 (a) The origin frame, (b) extracted skin regions satisfying R . G . B, and (c) compare the colours of the
extracted skin regions with the sample skin color.

7. Edge detection: Edge detection is applied to separate the arm region from the hand region. It is easy to nd that
there are fewer edges on the arm region than on the palm region. Here, we use a Kirsch edge operator [15] technique to
obtain different direction edges and then choose the absolute maximum value of each pixel (x,y) as shown in Figure 9.
Here the edges on the arm region are less than those on the palm region. We combine edge, motion, skin colour region
information to allocate the hand region. To form the edge image of ith frame as Ei (x,y) combination of motion, skin

Volume 5, Issue 10, October 2017 Page 11

IPASJ International Journal of Information Technology (IIJIT)
Web Site:
A Publisher for Research Motivation ........
Volume 5, Issue 10, October 2017 ISSN 2321-5976

colour, and edge. The hand gestures information consists of movement, skin colour and edge feature that may use the
logic AND to combine these three types of information, that is
Ci ( x, y ) Di ( x, y ) Si ( x, y ) Ei ( x, y ) (11)
Where Di(x,y), Si(x,y) and Ei(x,y) indicate the movement, skin colour and edge images. The combined image Ci(x,y) as
many features that can be extracted. Because the different image processing methods have extracted different kind of
information. Each image consists of different characteristic regions such as motion regions, skin colour regions and
edge regions as shown in Figure 9 shows the combined region Ci(x,y). The combined image consists of a large region
in the palm area and some small regions in the arm area. We may separate these two regions to allocate the hand

Figure 9. (a) Origin frame, (b) Edge detection result.

8. Region identication: A simple method for region identication is to label each region with a unique integer number
that is called the labelling process. After labelling, the largest integer label indicates the number of regions in the
image. After the labelling process, the small regions can be treated as noise that removed. Figure 9 (a) shows that the
labelling results and Figure 9 (b) shows the centre position.
9. Robustness and low complexity: Using motion and colour information is not sufficient, and hand-shape is not
always the largest labelled region. If there are other skin-colour objects moving rapidly, the tracking process may fail.
We need to take advantage of the motion smoothness constraint for trajectory justification then use background
subtraction to nd the foreground object, and finally identify the hand region as shown in Figure 10.

Figure 10 Hand Gesture Information (a) Original Image Fi(x,y) (b) Motion Region DiS(x,y) (c) Skin colour
region Si(x,y) (d) Edge Region Ei(x,y)

Figure 11 Combined Region Ci(x,y)

Volume 5, Issue 10, October 2017 Page 12

IPASJ International Journal of Information Technology (IIJIT)
Web Site:
A Publisher for Research Motivation ........
Volume 5, Issue 10, October 2017 ISSN 2321-5976


To nd a more precise hand region, here we have used the foreground region information. The hand position has been
found by using motion, skin colour and edge information. Sometime, the identied locations will not be at the centre of
the real hand region. This is because the extracted information is located on the boundary of the moving object.
Therefore, the local renement is necessary. The overall system for hand region tracking has two stages:
First stage is focus on the motion information
Second stage is focus on the foreground information
The local tracking processing is mentioned as follows:
Select the foreground and skin colour region near the inaccurate centre
Select the boundary points in the foreground region
Find the centre of the boundary points as a new centre.
Here the process is formulated based on Figure 12. as shown in
pc 2 (i ) Tc {TR ( pc (i ), FGi Ei Si )} (11)

Figure 12 The three functional blocks: (a) Motion Detection, (b) Skin Colour Detection, (c) Edge Detection.

1. Background Subtraction Process: After rening the hand gesture centre point, we nd the bounding box of hand
region. The bounding box process is done by using the foreground, the skin colour information, and the centre point
located in the second stage. We search the boundary of hand region from centre to top, bottom, left, and right. We use
four parameters to describe the width and height of the extracted hand region, e.g. LW, RW, TH, and BH. Since the
arm region is not the target region, we develop a simple criterion to obtain a more precise hand region. The bounding
box is determined by considering
If RW > LW then RW=1.1LW else LW=1.1RW and
If TH > BH then TH=1.1BH else BH =1.1TH.
In the Figure 12 (a), the length TH is shorter than BH, we let BH =1.1TH, and similarly RW =1.1LW. Figure 12 (b)
shows the updated bounding box. The new bounding box does not include the arm region as shown in Figure 12 (c).
The method is effective for the following gesture recognition process using background subtraction process a shown in
the Figure 13.

Figure 13 Background Subtraction Process

Volume 5, Issue 10, October 2017 Page 13

IPASJ International Journal of Information Technology (IIJIT)
Web Site:
A Publisher for Research Motivation ........
Volume 5, Issue 10, October 2017 ISSN 2321-5976

2. Gesture Recognition Process: The objective is to identify the number of fingers active in a hand gesture. The strategy
applied is to count the number of U turns appearing in the refined outline. From the example as shown in the Figure 14
the posture contains five active fingers and ten U-turns. The scale factor of two is the same for all other cases expect for
zero that was not considered in this project. The average length of a human finger has been considered as threshold,
the reason being to avoid counting. To detect the U-turns, four consecutive segment lines are considered at a time. By
computing the azimuth, the algorithm looks for two segments with opposite directions within a threshold. The process
is repeated throughout the outline.

Figure 14 U-turns in a hand posture

The methodology applied for manipulating the moving object in virtual environment is as shown in Figure 15.

Hand Position of Moving Orientation of

Segmentation Object Moving Object

Figure 15 Methodology for moving object

The movement of the virtual object is supposed to be the same as the movement of the hand. Thus the 3D translation
and rotation applied to the moving object are obtained from the hand segment. Regarding the translation, the
coordinates of the centre of the moving object are same as the one of the centre of gravity of the hand segment. The
rotation angles , and are taken from three orthogonal vectors computed after fitting a plane to the hand points.
The first vector joins the centre of gravity of the hand segment to the farthest point in the computed plane. The second
vector is the normal to the plane and the third one is the cross product of the first two. After making them unit, the new
coordinates m are used to define a rotation matrix M (4) from which, the angles (5), (6) and (7) are derived. This
rotation angles are applied to the moving object.
m11 m12 m13
M R3 ( K ) R2 ( ) R1 ( ) m21 m22 m23 (12)
m32 m32 m33
arctan 32
arcsin(m31 )
K arctan 21
3. Hand Gesture Recognition: From the results, it is concluded that the hand gesture recognition algorithm is
independent of the hand type (left hand or right hand) and the distance between the hand and the camera as confusion
matrix as shown in Figure 16.

Volume 5, Issue 10, October 2017 Page 14

IPASJ International Journal of Information Technology (IIJIT)
Web Site:
A Publisher for Research Motivation ........
Volume 5, Issue 10, October 2017 ISSN 2321-5976

Figure 16 Hand Gesture Recognition


Reference Values (No. of Total

Obtained 1 2 3 4 5 Total
Results 1 3 1 0 0 0 4
(No. of 2 4 6 5 0 0 15
Fingers) 3 1 1 5 5 1 13
4 0 1 2 2 4 7
5 1 0 1 1 3 5
Others Others 1 1 2 2 2 6
Total Total 10 10 10 10 10 50

Figure 16 Confusion Matrix of First Application

The results are summarized in the confusion matrix showing the occurrences of predicted outcomes against actual
values. The proportion of correctly classified gestures is estimated as (3+6+5+2+3)/50=38%. Though promising,
obtained results are not as good as expected, it can be noticed though a trend showing that a high number of results are
close to the true values.
Indeed, most of the obtained segments contain information related to the hand gesture. But in several cases, some noisy
pixels could not be removed. Because of the presence of hanging points in the segment, the outline obtained does not
properly delineate all fingers appearing in the gesture. Some incorrect results are also obtained from the gesture
recognition algorithm. The strategy designed to identify u-turns fails due to the threshold used to compare two segment
lines azimuth or due to a highly smoothed outline. As a consequence, the number of counted u-turns is not correct and
so is the number of fingers.

Applying the edge detection schemes between the difference stage and thresholding stage will increase the accuracy of
the system and hence this proposed system will have the ability to apply in robust environments, thereby achieving one
of the problem definitions. Applying the skin detection schemes we are able to make this system reach to people of
different cultures with different type of skin tones. A multiple set of gestures of the user is stored while installing the
system. This is done when the system is interconnected with the external networks. The histogram values of the set of
images are stored once they are loaded. The whole system is made to work in run time. The camera is made to capture
images continuously at the rate of 2 frames per second. So for every second 2 images are been captured. These two
images are given as input to the system. The input images to be tracked and recognized are compared with the set of
images previously stored with the help of their histogram values. The similarity is tracked with the help of some well-
known similarity matching algorithms. Once the match has been found the corresponding output signals are produced.
Each output signal has its own function to control a device. The signals are made to control the external circuit with the
help of an interface network such as the arduino board. The external network maybe an network of electrical
appliances, Traffic control network or some electronic goods. With the help of this system the human gestures are made
to act as switches. The application of this concept is extended to many fields.

Volume 5, Issue 10, October 2017 Page 15

IPASJ International Journal of Information Technology (IIJIT)
Web Site:
A Publisher for Research Motivation ........
Volume 5, Issue 10, October 2017 ISSN 2321-5976


1) R C Gonzalez, R E Woods S L Eddins, Digital Image Processing using MATLAB, 2nd ed., Prentice Hall
Press, 2007.
2) Peihua Qiu, Image Processing and Jump Regression Analysis, John Wiley & Sons, 2005
3) R. Fergus, P. Perona and A. Zisserman, Object Class Recognition by Unsupervised Scale-Invariant Learning,
Computer Vision and Pattern Recognition, 2003.
4) S. G. Kong, J. Heo, F. Boughorbel, Y. Zheng, B. R. Abidi, A. Koschan, M. Yi, and M. A. Abidi, Multiscale
fusion of visible and thermal IR images for illumination invariant face recognition, International Journal of
Computer Vision, Volume 71, Issue 2, pp 215233, 2007.
5) S Lazebnik, C Schmid, J Ponce, Beyond bags of features: Spatial pyramid matching for recognizing natural
scene categories, Computer vision and pattern recognition, Vol. 2, 2169-2178, 2006.
6) Jain, A.K., Ross, A., and Prabhakar, S, An introduction to biometric recognition. IEEE Trans. Circuits and
Systems for Video Technology, 14(1), pp. 420, 2004.
7) Adini, Y., Moses, Y., and Ullman, S, Face recognition: The problem of compensating for changes in
illumination direction. IEEE Trans. Pattern Analysis and Machine Intelligence, 19(7), pp. 721732, 1997.
8) R Arandjelovic and A Zisserman, Efficient image retrieval for 3D structures, Proceedings of the British
Machine Vision Conference, 2010.
9) R Shapovalov, A Velizhev, and O Barinova, Non-associative markov networks for 3d point cloud
classification, Photogrammetric Computer Vision and Image Analysis 38 (Part 3A), 103-108, 2010.
10) M Everingham, L Van Gool, CKI Williams, J Winn and A Zisserman, The pascal visual object classes (voc)
challenge, International journal of computer vision 88 (2), 303-338, 2010.
11) R Muthukrishnan and M Radha, Edge detection techniques for image segmentation, International Computer
Science and Information Technology Journal, 3, 259267, 2011.
12) Weitao Li, KeZhi Mao, Hong Zhang and Tianyou Chai, Selection of Gabor filters for improved texture feature
extraction, IEEE International Conference on Image Processing, 2010.
13) Shin-Ting Wu, Adler C. G. da Silva and Mercedes and R. G. Mrquez, The Douglas-peucker algorithm:
sufficiency conditions for non-self-intersections, J. Braz. Comp. Soc. vol.9 no.3, 2004.
14) Hetal J. Vala, and Astha Baxi, A Review on Otsu Image Segmentation Algorithm, International Journal of
Advanced Research in Computer Engineering & Technology, Volume 2, Issue 2, 2013.
15) A.R. Venmathi, E.N. Ganesh and N. Kumaratharan, Kirsch Compass Kernel Edge Detection Algorithm for
Micro Calcification Clusters in Mammograms, Middle-East Journal of Scientific Research 24 (4), pp. 1530-
1535, 2016.


R Vadivelu received the B. Sc, Physics From Bhrathiar University in 1995, M. Sc., Electronics from Madras
University in 1998, Received M. Tech ., Electronics with Microwave and Radar System from Cochin University of
Science and Technology, Cochin in 2001 and Ph. D in Information and Communication Engineering from Anna
University, Chennai in 2015. Published various articles in peer reviewed International Journals in the area of
Cognitive radio and the area of interest are Antennas and Radar Systems. Currently working as Associate Professor
in Department of Electronics and Communication Engineering, Sri Krishna College of Technology, Coimbatore, India.

Volume 5, Issue 10, October 2017 Page 16