You are on page 1of 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/224162775

Wireless Vision Based Mobile Robot Control Using Hand Gesture Recognition
Through Perceptual Color Space

Conference Paper · July 2010


DOI: 10.1109/ACE.2010.69 · Source: IEEE Xplore

CITATIONS READS

41 50

2 authors:

Manigandan Muniraj Manju I Jackin


Integrated Institute of Education Technology Velammal Engineering College
6 PUBLICATIONS   61 CITATIONS    4 PUBLICATIONS   49 CITATIONS   

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Manigandan Muniraj on 17 May 2019.

The user has requested enhancement of the downloaded file.


2010 International Conference on Advances in Computer Engineering

Wireless Vision based Mobile Robot control using Hand Gesture Recognition
through Perceptual Color Space

Mr Manigandan M Mrs.I Manju Jackin


Senior Lecturer, Department of ECE, Assistant Professor, Department of ECE,
Velammal Engineering College Velammal Engineering College
Chennai, INDIA Chennai, INDIA
mani.mit2005@gmail.com rebejack@gmail.com

Abstract—In this paper we have implemented a wireless vision gestures. We will focus on the recognition of static images in
based mobile robot control through hand gesture recognition which skin colour and textures detection place a vital role.
based on perceptual color space such as HIS, HSV/HSB, HSL. Skin color and textures are important cues that people use
Vision-based hand gesture recognition is an important consciously or unconsciously to infer variety of culture-
problem in the field of human-computer interaction, since related aspects about each other. In images and videos, skin
hand motions and gestures could potentially be used to interact color is an indication of the existence of humans in such
with computers in more natural ways. The robot control was media. Skin detection means detecting image pixels and
done purely based on the orientation histograms a simple and regions that contain skin-tone color. Many of those
fast algorithm on the system which would recognize static hand
approaches have been implemented to focus on a single
gestures with HSV color spaces as major parameters. The
wireless based mobile robot system using hand gestures is a
aspect of gestures, such as hand tracking, hand posture
new innovative user interface that resolves the complications of estimation or hand pose classification using uniquely colored
using numerous remote controls for various applications. gloves or markers on hands/fingers.
Based on one unified set of hand gestures, this system The desire to develop a limited set of hand gestures that
interprets the user hand gestures into pre-defined commands are distinctive has improved the processing accuracy of
to control the remote robot. The experimental results are very captured gestures from wireless pin hole camera. This also
encouraging as the system produces real-time responses and requires a less sophisticated system model for controlling the
highly accurate recognition towards various gestures under robot action from remote place using proper Decision
different lighting conditions. Making rules implemented in MATLAB software engine.
Related works are discussed in Section II and Section III
Keywords - Hand Gesture recognition, Wireless vision, Mobile describes the color spaces for Skin detection. Section IV
robot, Color spaces, Skin Detection describes about the Gesture Normalization and Section V
describes the Hand gesture recognition using COG profile.
Section VI describes the system model implementation and it
I. INTRODUCTION
are being discussed in the Section VII.
Vision-based automatic hand gesture recognition has
been a very active research topic in recent years with II. RELATED WORKS
motivating applications such as human computer interaction Early approaches to the hand gesture recognition problem
(HCI), robot control, and sign language interpretation. The in a robot control context involved the use of markers on the
general problem is quite challenging due a number of issues finger tips [1]. An associated algorithm is used to detect the
including the complicated nature of static and dynamic hand presence and colour of the markers, through which one can
gestures, complex backgrounds, and occlusions. Attacking identify which fingers are active in the gesture. The
the problem in its generality requires elaborate algorithms inconvenience of placing markers on the user’s hand makes
requiring intensive computer resources. What motivated us this an infeasible approach in practice. Recent methods use
for this work is a robot navigation problem, in which we more advanced computer vision techniques and do not
were interested in controlling a robot by pose signs given by require markers. Hand gesture recognition is performed
a human. Due to real-time operational requirements, we were through a curvature space method in [2], which involves
interested in a computationally efficient algorithm. finding the boundary contours of the hand. This is a robust
Our approach involves segmenting the hand, based on approach that is scale, translation and rotation invariant on
skin colour statistics, as well as size constraints. We then the hand pose, yet it is computationally demanding. In [3], a
find the centre of gravity (COG) of the hand region as well vision-based hand pose recognition technique using skeleton
the farthest point from the COG. Based on these pre- images is proposed, in which a multi-system camera is used
processing steps, we derive a signal that carries information to pick the centre of gravity of the hand and points with
on the activity of the fingers in the sign. Finally we identify farthest distances from the centre, providing the locations of
the sign based on that signal. We demonstrate the the finger tips, which are then used to obtain a skeleton
effectiveness of our approach on real images of hand image, and finally for gesture recognition.

978-0-7695-4058-0/10 $26.00 © 2010 IEEE 95


DOI 10.1109/ACE.2010.69
As mentioned in the introduction, the skin detection brightness information, they are often dropped to reduce
problem is still a much investigated problem; many authors illumination dependency of skin color. These spaces have
have proposed techniques to solve it by fixing one or more been used by Shin et al. [10] and Albiol et al. [8].
parameters of the problem. The use of a normalized color Skin detection in color images and videos is a very
space, in this case the RGB normalized color space, is efficient way to locate skin-colored pixels, which might
interesting because it allow isolating skin locus with simple indicate the existence of human faces and hands. However,
quadratic functions. Also for [4], [5], [6], [7] a normalized many objects in the real world have skin-tone colors, such as
color space, the RGB normalized color space again, is the some kinds of leather, sand, wood, fur, etc., which might be
most effective to extract with success a skin locus. This is mistakenly detected by a skin detector. Therefore, skin
because it is as little as possible dependent on the illuminant. detection can be very useful in finding human faces and
In addition, Albiol et al. [8] affirmed that an optimum filter hands in controlled environments where the background is
for skin detection will have the same performance even guaranteed not to contain skin-tone colors. Since skin
working in different color spaces. In [6] again and also in [9] detection depends on locating skin-colored pixels, its use is
a static prefilter on RGB space is used too: with this last kind limited to color images, i.e., it is not useful with gray-scale,
of filters it is easier and more natural to remove zones that infrared, or other types of image modalities that do not
surely are non-skin areas (pixels too inclined to black, to contain color information. There has been extensive research
green or to blue etc).Our focus is the recognition of a fixed on finding human faces in images and videos using other
set of manual commands by a robot, in a reasonably cues such as finding local facial features or finding holistic
structured environment in real time and to control the robot facial templates [11]. Skin detection can also be used as an
based on the static gesture with proper Decision making efficient preprocessing filter to find potential skin regions in
algorithm implemented based on the threshold values of the color images prior to applying more computationally
each hand gesture values by applying canny edge and proper expensive face or hand detectors.
image normalization for better gesture recognition along In order to do this, we first create a model of the desired
with COG algorithm. hue using a color histogram. Normally is used the Hue
Saturation Value (HSV) color system that correspond to
III. COLOR SPACES FOR SKIN DETECTION projecting standard Red, Green, and Blue (RGB) color space
Most the research in this area has focused on detecting along its principle diagonal from white to black. This results
skin pixels and regions based on their color. In many in the hex cone in Figure 1.
applications, where the background is controlled or unlikely
to contain skin-colored regions, detecting skin-colored pixels
can be a very efficient cue to find human faces and hands in
images. Detecting skin-colored pixels, although seems a
straightforward easy task, has proven quite challenging for
many reasons. The appearance of skin in an image depends
on the illumination conditions (illumination geometry and
color), where the image was captured. As will be discussed
shortly, the choice of the color space affects the performance Figure 1. Hex Cone Based HSV
of any skin detector and its sensitivity to change in
illumination conditions. Another challenge comes from the
fact that many objects in the real world might have skin-tone IV. GESTURE NORMALIZATION
colors. For example, wood, leather, skin-colored clothing, The Skin detection problem, First of all we shall formalize
hair, sand, etc. This causes any skin detector to have much the skin detection problem as generally as possible Given
false detection in the background if the environment is not I(R, G, B), in the following simply I, an arbitrary image we
controlled. In any given color space, skin color occupies a don’t know anything about it (which are its contents, type of
part of such a space, which might be a compact or large source and the environment conditions when it has been
region in the space. Such region is usually called the skin generated), we want to identify all the regions and only the
color cluster. regions of I where human skin is present.
A. Perceptual Color Spaces: A. Image Morphological
Perceptual color spaces, such as HSI, HSV/HSB, and Gesture normalization is done by the well-known
HSL (HLS), have also been popular in skin detection. These morphological filtering technique, erosion combined with
color spaces separates three components: the hue (H), the dilation. The output of this stage is a smooth region of the
saturation (S) and the brightness (I, V or L). Essentially, hand figure, which is stored in a logical bitmap image as
HSV-type color spaces are deformations of the RGB color shown in Figure. 2. Image normalization is implemented
cube and they can be mapped from the RGB space via a using the morphological function available in MATLAB.
nonlinear transformation. One of the advantages of these
color spaces in skin detection is that they allow users to B. Skin segmentation testing
intuitively specify the boundary of the skin color class in This test was mainly aimed at evaluating the
terms of the hue and saturation. As I, V or L give the performance of the skin segmentation and normalization

96
modules of the control system. A number of hand gesture are invariant to these operations. Moment invariants offer a
images were taken and the skin segmentation and the set of features that encapsulate these properties.
subsequent normalization are shown in Figure. 3. As seen
from Figure. 3, the filter had successfully segmented the skin C. Canny edge detector
regions out of all the tested images. It was also noticeable The purpose of edge detection in general is to
that the shadow of the hand and the body did not have any significantly reduce the amount of data in an image, while
effect on the filtering process. The remaining noise and preserving the structural properties to be used for further
unfilled pixels were removed by the normalization filter image processing. In this case to extract the hand gesture
which resulted in a smooth and clear region. The resultant pose without any noise due to the lightning conditions we
effect was that the object might have been recognized as a intend to use the canny edge detector along with the
skin region as its color had been modified. In conclusion, the morphological techniques. Steps involved in canny edge
performance of the skin segmentation and normalization detection is shown in the below Figure.4
filters was firmly robust against the variance in backgrounds
and lighting conditions.

Figure 4. Steps involved while computing Canny Edge Detection

V. HANG GESTURE RECOGNITION


Chain code has been applied to acquire contour from the
extracted hand region and hand gestures recognition is
Figure 2. Smooth region output of the morphological filtering Technique carried out by improved centroidal profile with proper
gesture normalization as described in the above Section IV.
It is not too difficult to realize that effective real-time
classification cannot be achieved using attempts such as A. Centroidal Profile
template matching. Template matching itself is very much Centroidal profile expresses detected object with set of
prone to error when a user cannot reproduce an exact hand vectors of its contour. Usually this type of algorithm
gesture to a gesture that is already stored in the library. It computes the centroid and expresses the shape with
also fails because of variance to scaling as the distance to the normalized distance from the centroid. Figure.5 shows the
camera may produce a scaled version of the gesture. centroidal profile for object recognition. This algorithm
computes centroidal profile and compares it with profile of
ideal object to recognize the shape of the object. When the
contour of an object has more than one point for any angle,
you usually take the one closest to the center. However in the
case of hand shape, it is not possible, thus every pixel point
on the contour is used to compute normalized distances.
Chain code expresses contour of an object with predefined
set of vectors, moving along the edge of the object and
evaluates the boundary. In this study, 8 directional chain
codes are used.
Compute the distance (r ) of ( x, y ) each point on the
contour from the centroid. The distance can be computed
with equation 1. The centroid can be computed with each
Figure 3. Smooth region output by deploying Canny Edge detector and pixel’s intensity as its weight. The centroid is computed with
Morphological Filtering technique equation 2. I (i, j ) is the intensity of the pixel and A is total
number of pixels in the image and can be expressed as
The gesture variations because of rotation, scaling and
equation.3.
translation can be circumvented using a set of features that

97
r= ( x i − x ) 2 + ( y i − y )2 (1)

N M N M


∑∑
i =1 j =1
jI (i, j )

∑ ∑ jI (i, j )
i =1 j =1
x= ,y = (2) Figure 6. Luminosity behind the han
A A
A number of images are used to evaluate effectiveness of
the classification system. Both the original image and the
N M skin-segmented image are used to observe and verify the
A= ∑ ∑ I (i, j )
i =1 j =1
(3) accuracy of the segmentation algorithm. Images with low
lightning condition are also tested. It is quite apparent from
Figure. 6. There is always a number of noisy spots in the
filtered images, regardless of the lightning condition. This
distortion, as expected, becomes more pronounced in low
lighting conditions. As a result, the skin-segmented image is
noisy and distorted and is likely to result in incorrect
recognition at the subsequent stages. These distortions,
however, can be removed during the gesture normalization
stage.

Figure 5. Centre of Gravity for various Hand Gesture pose

VI. OVERVIEW OF SYSTEM MODEL IMPLEMENTAION


The system comprises a Wireless Pin hole camera,
gesture processing unit, hardware interface for the control
unit and a Zigbee Module. The wireless camera is used to
capture the hand gestures which are then registered,
normalized and feature-extracted for eventual classified for
controlling the wireless robot which comprises of Zigbee
module for receiving the data from the remote personal
computer based on the Hand Gesture pose. The setup of the
basic components is shown in Figure. 7. MATLAB is used
throughout the project for real-time data processing and
classification and controlling the robot through data
transferred from the system. Once the user hand gesture
matches with a pre-defined command which is programmed Figure 7. Block Structure of the entire Hand Gesture recognition based
under MATLAB environment, the command will be issued Wireless robot control with wireless vision capability.
to the robot control via a serial port. If an unknown gesture is
issued, the system rejects it, notifying the user..Threshold
VII. TEST RESULTS
filter is applied to remove ‘non-skin’ components. The major
advantage of this approach is that the influence of luminosity A software program has been written to display the
can be removed during the conversion process. Thus it feedback to the user controlling the robot from remote place
makes the segmentation less dependent on the lightning based on the Hand gesture recognition with proper decision
condition, which has always been a critical obstacle for making rules. Currently the system needs few seconds to
image recognition. The threshold values were obtained using analyze the user’s hand in order to determine the threshold
our own data set. value for skin segmentation and store it. The first gesture
Above Fig.6 would give a better idea regarding the needed to initialize the hardware is the ‘Start’ followed by
lightning condition i.e., luminosity behind the hand. This can ‘Front’ gesture for robot to start and move forward. The
be removed by software approach based threshold value images are being acquired from the remote place from the
change and the threshold value should be always less than robot. Any command can be issued randomly; however, if
1.Since we deal which gray images during the entire they are not issued in a logical manner, a proper course of
computation of the system action cannot be taken. For instance, if ‘Back’ command is

98
issued prior to control the robot backward, even though the hand gestures. This set of hand gestures is adequate for any
command is recognized, no action will be taken. The system electronic control system. Our research distinguishes from
was observed to be 100% accurate under normal lighting the previous attempts because of few marked differences a
conditions. The tests have firmly consolidated the hardware minimum number of gestures are used to offer higher
design and the software interface of the developed prototype. accuracy with less confusion, only low processing power is
Different hand gestures along with their gesture extractions required to process the gestures, which is useful for simple
are shown in Figure.8. it is suggested that the hand gestures consumer control devices, Very robust to lighting variations,
can easily be recognized with maximum distribution values of Real-time operation.
centroidal profile. Table I below shows the equivalent command
words for controlling the robot. IX. FUTURE WORK
These algorithms can be incorporated as part of a robot
test-bed architecture which can be used to demonstrate their
effectiveness. By processing real-time images and
communicating wirelessly with our robot, we can control the
different appliances even under complex and cluttered
backgrounds. Work is currently underway to extend our
platform by: Incorporating these algorithms for Automatic
Guided Vehicle deployment for traffic sign recognition and
AGV should be capable of taking its control action based on
the traffic sign.
ACKNOWLEDGMENT
Figure 8. Hand Gesture Pose for Various Robot action control
The authors gratefully acknowledge the following
individuals for their support: Prof.V.Bhaskaran, Dept., of
TABLE I. COMPLETE TABLE FOR COMMAND WORDS Mathematics and our colleagues for their valuable guidance
for devoting their precious time, sharing their knowledge and
S. No Robot Control Command Word to co-operation.
Operation be Transmitted
1 LEFT ‘L’
REFERENCES
2 RIGHT ‘R’
3 FRONT ‘F’ [1] J. Davis and M. Shah "Visual Gesture Recognition", IEEProc.-Vis.
4 BACK ‘B’ Image Signal Process., Vol. 141, No.2, April 1994.
5 STOP ‘S’ [2] C.-C. Chang, I.-Y Chen, and Y.-S. Huang, "Hand Pose Recognition
Table II shows the recognition results from 5 different Using Curvature Scale Space", IEEE International Conference on
Pattern Recognition, 2002.
users of each 5 input images of different hand gestures. The
[3] A. Utsumi, T. Miyasato and F. Kishino, "Multi-CameraHand Pose
numbers in the table represent each hand gesture number in Recognition System Using Skeleton Image",IEEE International
Figure.8 Proposed hand gestures recognition method can Workshop on Robot and HumanCommunication, pp. 219-224, 1995.
process the recognitions at the rate of 15 fps(frames/sec) [4] B. Martinkauppi. Face colour under varying illumination - analysis
with image size of 131x109 and PNG image format is being and applications.PhD thesis, University of Oulu, 2002.
used for better accuracy and recognition. [5] K. Schwerdt and J. Crowley. Robust face tracking using color, 2000.
[6] F. Tomaz, T. Candeias, and H. Shahbazkia. Improved automatic skin
TABLE II. RECOGNITION RATE OF HAND POSTURE WITH THE detection incolor images. In 7th Digital Image Computing:
PROPOSED ALGORITHM Techniques and Applications, pages 419–427, 2003.
[7] T. Wilhelm, H. J. Bhme, and H. M. Gross. A multi-modal system for
Left Right Back Stop Front
tracking andanalyzing faces on a mobile robot. In Robotics and
Posture
Autonomous Systems 48, pages 31–40, 2004.
Users
#1 100% 100% 97% 100% 93% [8] A. Albiol, L. Torres, and E. J. Delp. Optimum color spaces for skin
#2 100% 100% 94% 100% 98% detection. In International Conference on Image Processing, volume
1, pages 122–124, 2001.
#3 100% 100% 96% 100% 100%
#4 100% 100% 98% 100% 97% [9] R. Kjeldsen and J. Kender. Finding skin in color images. In 2nd
#5 100% 100% 95% 100% 96% International Conference on Automatic Face and Gesture
Recognition, pages 312–317, 1996.
[10] Shin, M.C., Chang, K.I., Tsap, L.V.: Does colorspace transformation
VIII. CONCLUSIONS make any difference on skin detection? In: WACV ’02: Proceedings
of the Sixth IEEEWorkshop on Applications of Computer Vision,
The system is developed to reject unintentional and Washington, DC, USA, IEEE Computer Society (2002) 275
erratic hand gestures and to supply visual feedback on the [11] Yang, M., Kriegman, D., Ahuja, N.: Detecting faces in images: A
gestures registered. This work managed to invent a set of survey. IEEE Trans. On Pattern Analysis and Machine Intelligence
gestures that are distinct from each other yet easy to (PAMI) 24(1) (2002) 34–58
recognize by the system. The accuracy of the control system [12] Shan, C., Wei, Y., Qiu, X., and Tan, T.: ‘Gesture recognition using
temporal template based trajectories’. Proc. 17th Int. Conf. Pattern
was 100% and was mainly because of the limited number of Recognition, 2004, vol. 3, pp. 954–957

99

View publication stats

You might also like