You are on page 1of 5

2014 IEEE Students Conference on Electrical, Electronics and Computer Science

Real-time Gesture Recognition and Robot control


through Blob Tracking
Deepika Ravipati

Prathyusha Karreddi

Electronics & Communication Engineering


Padmasri Dr.B.V.Raju Institute of technology
Narsapur, Andhrapradesh, India
deepikaravipati@gmail.com

Electronics & Communication Engineering


Padmasri Dr.B.V.Raju Institute of technology
Narsapur, Andhrapradesh, India
kprathyu14@gmail.com

Amulya Patlola
Computer Science Engineering
Padmasri Dr.B.V.Raju Institute of technology
Narsapur, Andhrapradesh, India
Patlola.amulya@gmail.com
AbstractThis paper presents the framework on vision
based interface that has been designed to instruct a humanoid
robot through gestures using image processing. Image
thresholding and blob detection techniques were used to obtain
gestures. Then we analyze the images to recognize the
gesture given by the user in front of a web camera and
take an appropriate action (like taking picture, moving robot,
etc). The application is developed using OpenCV (Open
Computer Vision) libraries and Microsoft Visual C++. The
gestures obtained by processing the live images are used to
command a humanoid robot with simple capabilities. A
commercial humanoid toy robot Robosapien was used as the
output module of the system. The robot was interfaced to
computer by USB-UIRT (Universal Infrared Receiver and
Transmitter) module.
KeywordsOpencv libraries; blob tracking; thresholding;
USB-UIRT; image moments; Robosapian

I.

INTRODUCTION

In recent years, research efforts seeking to provide more


natural, human centered means of interacting with computers
have gained growing interest. A particularly important
direction is that of perceptive user interfaces, where the
computer is endowed with perceptive capabilities that allow
it to acquire both implicit and explicit information about the
user and the environment. Vision has the potential of carrying
a wealth of information in a non-intrusive manner and at a
low cost; therefore it constitutes a very attractive sensing
modality for developing perceptive user interfaces. Proposed
approaches for vision-driven interactive user interfaces resort
to technologies such as head tracking, face and facial
expression recognition, eye tracking and gesture recognition.
There is a drastic development in the human computer
interfaces in order to facilitate users and make them
comfortable. The mode of interfacing has been rapidly
changing since the last decade. Keyboards are fast being
replaced by touch screens and speech is ready to occupy the
place of touch mechanism, and its predictable that the next

978-1-4799-2526-1/14/$31.00 2014 IEEE

mode of interface will be gestures followed by thoughts.


Generations went on with all types of keypads, touch, multi
touch, etc. Now time has come to change the era to a better
human-computer interaction.
Gestures are the means of non verbal communication. Our
natural hand gestures can be used as an interface to operate
machines, communicate with intelligent environments to
control home appliances, smart home, etc. In this paper we
review those type of gestures based on colors.
II.

MOTIVATION

Robots are used successfully in many areas today,


particularly in industrial production, military operations, deep
sea drilling, and space exploration. This success drives the
interest in the feasibility of using robots in human social
environments, particularly in the care of the aged and the
handicapped. In social environments, humans communicate
easily and naturally by both speech (audio) and gesture
(vision) without the use of any external devices (like
keyboards) requiring special training. Robots have to adapt to
human modes of communication to promote a more natural
interaction with humans. Given a choice between speech and
gesture, some researchers have opined that gesture recognition
would be more reliable than speech recognition because the
latter would need a greater number of training datasets to deal
with the greater variability in human voice and speech.
Emerging sixth sense technology is the motivation of this
project. Sixth sense technology is a wearable gesture interface
that augments the physical world around us with digital
information and lets us use natural hand gestures to interact
with that information. In this paper, we present the first part
of the technology - hand gesture recognition, and hence
developing a Gesture Controlled User Interface (GCUI).
III.

RELATED WORKS

Many systems exist that are used for controlling the robot
through gestures. Some gesture recognition systems involve,
adaptive color segmentation [1], hand finding and labeling

with blocking, morphological filtering, and then gesture


actions are found by template matching . This does not
provide dynamicity for the gesture inputs due to template
matching. Another system uses machine interface device to
provide real-time gestures to the robot [2]. Analog flex sensors
are used on the hand glove to measure the finger bending [3];
also hand position and orientation are measured by ultrasonic
for gesture recognition [4]. And in another approach, gestures
are recognized using MS (Microsoft) Kinect sensor [5]. Kinect
gathers the color and depth information using an RGB and
Infrared camera respectively. This system though is not very
cost effective.
IV.

an IR remote controlled . So, the generated signal is sent to the


robot via USB-UIRT that transmits the IR signal as required.
Once a command signal is given to the robot, it continues to
move in that direction till the next command is given or any
obstacle comes in the path. Figure 1 shows basic flow of the
system.
V.

BLOCK DIAGRAM

The Figure 2 shows the block diagram of the complete


project. The remote control of the robot is hacked by the USBUIRT module and correspondingly it is programmed to react to
human hand gestures.

PROPOSED SYSTEM

VI.

We propose a system, using which the user can control the


robot in any environment using various gestures commands,
making it semi-autonomous. Here the user operates the robot
through a laptop or a PC with a good quality in-built webcam
or external webcam. This webcam is used to capture real time
video stream of hand gestures to generate commands for the
robot. Gesture commands are given using hand palm. Using
the gesture technique developed, robot is moved in all
possible directions in the environment. The designed system
can capture limited number of gestures like move right, left,
front and back, raise the hand and down the hand.
Web camera

APPROACH

A. OpenCV Libraries :
OpenCV (Open Source Computer Vision Library)
is a library of programming functions mainly aimed at realtime computer vision and image processing, developed by
Intel and can be used to develop accurate and efficient image
processing systems.
B. Thresholding :
Thresholding is the simplest method of image segmentation
i.e. it is the process of partitioning a digital image into
multiple segments (sets of pixels). Thresholding can be used to
create binary images from their corresponding grayscale
images.
If f(x, y) > T then f (x, y) = 0 else f (x, y) = 255

Real-time
video capture

Blob detection
and tracking

Pattern
recognition

Based on the above equation, we segment every


point (x,y) of the image into two categories, the object
point and the background point. The pixels intensity
value becomes zero if it satisfies the threshold condition else
the pixel is set to 255. Hence binary images are created.

Command
detection

Generate
signal

USB-UIRT

Robosapien

C. Blobs and Blob Detection :


Blob is a region of an image in which some properties are
constant or vary within a prescribed range of values; all the
points in a blob can be considered in some sense to be similar
to each other. We use blob detection to detect blobs in an
image and make selected measurements of those blobs. Blob
detection refers to mathematical methods that are aimed at
detecting regions in a image that differ in properties, such as
brightness or color, compared to areas surrounding those
regions.

Fig. 1. Proposed system

Image frame is taken as an input and processed accordingly


to extract the gesture. This gesture command can have one
of the six possible commands as specified. From this
generated gesture command, a control signal is generated to
pass the given command to the robot. The robot used here is

The basic algorithm for this blob detection technique is to


filter the layers of the image at different scales and thereby
extracting the required blob regions corresponding to the
color. Laplacian of Gaussian (LOG) technique can be used for
enhanced applications of blob detection.

SCEECS 2014

Fig. 2. Block Diagram

calculated based on Image moments. So, once the positions


are calculated, blob can be tracked. Depending on the pattern
traced by the palm, the command is extracted as per the
predefined set of 6 gestures. Now when the user gives a
particular colored hand gesture, the video is processed and the
blob is tracked as explained above.

Fig. 3. Static red blob detection (Users palm)

D. USB-UIRT :
As stated above, USB-UIRT stands for Universal Infrared
Receiver and Transmitter which is available as USB module.
This module has capacity to learn the IR signals sent from
most remote controls. It has a small microcontroller in it,
which is flash upgradable and when an IR signal is detected, it
interprets these signals, decodes them, and sends them to your
PC via a USB connection.
Fig. 4. Flow chart of the gesture recognition

VII. IMPLEMENTATION
The entire process starts with compiling OpenCV libraries
on the desktop, configuring them with visual C++, which
requires many other dependencies to be installed as well. The
user when comes in front of the camera and gives his gesture
using his colored palm, the first step becomes acquiring each
single frame of image from the web camera. Considering an
image as a two dimensional matrix, we apply the principle of
thresholding followed by the blob detection. Once the static
blob is detected, its position in the image is calculated and
then all the frames are made as a series with respect to time
called video and hence blob is tracked. The blob positions are

The process of the gesture capture is like a simple C switch


case, that is, the gesture pattern given by the user is captured
and when it matches with the pattern in the set of predefined
gestures, the control signals are generated. The control signals
generated are sent to USB-UIRT, which in turn sends them as
IR signals to the robot. Hence the robot can be activated and
commanded by the human using his gestures.
The principle of area of blob can limit the usage of
specific blob (gloves in actual term) as the interface or the
commanding unit. The area being calculated as the zeroth
moment of the binary image formed. For example, an nth

SCEECS 2014

order moment for any two dimensional function can be


represented as,

Ar >3000

Move front

Ar<-3000

Move back

=
The two dimensional function, in our case is an image. So,
for zero order moment, (m, n) becomes zero and since we find
the moment about the origin (0, 0), the equation becomes the
double integral of the function f(x, y), which gives he area of
the image, considerably, it is the area of the blob in our binary
image.

The designed gesture interface works with an accuracy of


97% and takes a maximum time of 3 seconds to recognize the
gesture. The interface is tested at different complex
environments and found to be successful. Hence when the user
raises his hand, the robot does the same. When the user waves
his hand from left to right, the robot moves from left to right.
Therefore, this work can augment the research on the
development of gesture interfaces.

Fig. 5. Screenshot showing the entire project setup

Likewise, using this principle of moments, the co-ordinates


(x, y) are calculated. To track the blob and recognize the
gesture, it is necessary to store the previous (x, y) positions.
Now three variables, namely p, q and Ar continuously keep the
track of the difference between the present and previous x
values, y values and areas. Gestures are defined using these
two variables. Different variables used are as follows:
Fig. 6. Screenshot of the console showing the commands given by the user
through gestures

P = CurrentX PreviousX
Q = CurrentY PreviousY
Ar = CurrentArea PreviousArea
TABLE I
PATTERN MATHING USING THE CONDITIONS AND POSING THE
GESTURES TO THE ROBOT

Condition

Gesture

p<-30 && q<=40 && q>=-40

Move Right

p>30 && q<=50 && q>=-50

Move left

q<-30 && p<=50 && p>=-50

q>30 && p<=50 && p>=-50

Raise your
hand
Down your
hand

Fig. 7. Screenshot showing the robot raising his hand on human gesture.

VIII. APPLICATIONS
Image processing has become increasingly powerful in
recent years largely due to its numerous applications in many
fields. The potential applications of Image processing based
gesture capture are:

SCEECS 2014

[5]

1.

2.

3.
4.
5.

Ability to track a person's movements accurately. It


covers applications where one or more subjects are
being tracked over time and possibly monitored for
special actions. A classic example is the surveillance
of a parking lot, where a system tracks subjects to
evaluate whether they may be about to commit a
crime e.g., stealing a car.
Applications where the captured motion is used to
provide controlling functionalities. It could be used
as an interface to games, virtual environments, or
animation or to control remotely located implements.
This tends to develop a Smart Home where the home
appliances can be controlled by user gestures.
Can be a phase of Human-machine interaction.
Motion tracking controlled video games and other
interface units.

IX.

[6]

[7]
[8]

[9]

CONCLUSION

Image Processing is the present hot topic for research in


digital world. Deriving the features of the image, and in a way
that can be used in the field of robotics, is what the entire idea
is. Gestures are the most costly features to be derived from
humans, hence different methodologies can be adapted to
capture them, keeping the strategies of real time applications,
efficiency, and accuracy in concern. The method adapted for
this gesture capture is very economical, and so can say it is
unique in field of robotics, open software era.
ACKNOWLEDGEMENT
We are very thankful to Randy Pausch Robotics
Engineering Center of Padmasri Dr.B.V.Raju Institute of
Technology for giving us this opportunity to implement this
project. We would like to express our sincere gratitude to our
professors Dr.I.A.Pasha, Head of Electronics and
Communication Engineering, BVRIT and Prof. Prabhakar
Kapula for their kind support and encouragement. We are also
thankful to our mentors Dr. Sasidhar Tadanki, Vanderbilt
University and Mr. Nagasrikanth Kallakuri, Carnegie Mellon
University for their constant support without which we could
not have completed the project.
REFERENCES
[1]
[2]

[3]
[4]

Chao Hy Xiang Wang, Mrinal K. Mandal, Max Meng, and


Donglin Li, Efficient Face and Gesture Recognition Techniques
for Robot Control, CCECE, 1757-1762, 2003.
Asanterabi Malima, Erol Ozgur, and Mujdat Cetin, A Fast
Algorithm for Vision-Based Hand Gesture Recognition for Robot
Control, IEEE International Conference on Computer Vision,
2006.
Thomas G. Zimmerman, Jaron Lanier, Chuck Blanchard, Steve
Bryson and Young Harvill, A Hand Gesture Interface Device,
189-192, 1987.
Jagdish Lal Raheja, Radhey Shyam, Umesh Kumar and P Bhanu
Prasad, Real-Time Robotic Hand Control using Hand Gestures,
Second International Conference on Machine Learning and
Computing, 2010.

SCEECS 2014

Gesture
Controlled
Robot
using
Kinect
http://www.eyantra.org/home/projects-wiki/item/180-gesturecontrolled-robot-usingfirebirdv-and-kinect
Gesture Controlled Robot using Image Processing, Harish Kumar
Kaura, Vipul Honrao, , Sayali Patil, , Pravish Shetty , International
Journal of Advanced Research in Artificial Intelligence, Vol. 2,
No. 5, 2013
Vision-based Hand Gesture Recognition for Human-Computer
Interaction, X. Zabulis, H. Baltzakis, A. Argyros
Md. Hasanuzzaman, T. Zhang, V. Ampornaramveth, H. Gotoda,
Y. Shirai, and H. Ueno, Adaptive visual gesture recognition for
human-robot interaction using a knowledge-based software
platform, Robotics and Autonomous Systems, vol. 55, Issue 8, 31
August 2007, pp. 643-657
M. Van den Bergh, F. Bosch, E. Koller-Meier, and L. Van
Gool. Haarlet-based hand gesture recognition for 3D Interaction.
Proc. Of the Workshop on Applications of Computer Vision
(WACV), December 2009.

You might also like