You are on page 1of 3

Color Based Hand and Finger Detection Technology for User Interaction

Originally Published By: Sung Kwan Kang, Mi Young Nam , Phill Kyu Rhee
Dept. of Computer Science & Engineering, Inha University 253, Yong-Hyun Dong, Nam-Gu Incheon, South Korea

Rewritten By: Daniel Dupaľ


Dept. Of Applied Informatics, Comenius University, Bratislava, Slovakia

Today's computers and all other devices are mainly designed, to react on user's interaction
with device of some kind. Wether it is a mouse, commonly used with computers, sensor gloves, or
remote for a TV set, or just buttons placed on Hi-Fi. All these "sensors" can be replaced by one, that
all of us already have, and uses all the time, by our hand. With our hands only, we should be able to
control most of device's functionality. With our fingers, we can have gestures for numbers,
directions, some key-words and even many more. We can use fingers to point exactly on things of
our interest, or we can use index finger as a cursor similar to a mouse as we know it today. And it
does not stop there...

Spatial hand modeling in general, can be divided into two main categories. 3-D model-based
techniques and appearance based approaches. Each approach differs in use and time needed to
compute. While 3D model takes longer to compute, it can produce (depending on available time per
image) more approximate information about hand placement and shape, but besides view-based
techniques are much less time consuming, they offer enough information about user's hands.

3D Model-Based Spatial Gesture Models


With 2 cameras (stereo vision) we can create a 3D Volumetric model of a hand like it would
appear in real life. This technique is commonly used in computer animation. Firstly it would take
the 3D model of the hand and than vary it's parameters until desired similarity, even the surface of a
skin can be modeled with many realistic techniques, but all these computations are too complex to
run real-time in gesture recognition applications. To shorten compute time, we can use model of the
hand represented with much more elementary objects such as cylinders, spheres, ellipsoids, and
hyper rectangles. This can result in fairly realistic representation. Even after this effort, it is
complex and time consuming to create model hands for each user using computer vision based
techniques, because every hands would need different model.
To reduce the need of many parameters per user, we can use Skeletal model approach to
obtain informations about hand position and placement. This technique will produce model of an
hand described with joints and lines to create representation similar to hand bones. Again, because
great complexity, and generality this technique might not be suited for real-time gesture recognition
applications.

Appearance-Based Spatial Gesture Models


On the other hand, Appearance-based models use different approach. Instead of generating
model of the hand and then gaining information about it's position or gesture, these models are
derived directly from the information contained in the images. This approach can work with one
camera, but it can be used with any number, to achieve better or various results.
In general, gesture recognition system can be divided into three parts. Firstly we need to to
create model hand gestures, that we will be willing to detect later. Depending on collection of
gestures that will be detected in application we need to determine the complexity of this model, or
models. Analysis need to be performed on collected image set, to compute model parameters from
image features extracted from video stream. Finally a gesture recognition classifies collected model
parameters with representative of a specific gesture, taking into consideration the model and maybe
some grammar.

Model
Parameters
Video Analysis Recognition Gesture
Inputs Description

Mathematical
A C
Model of Gestures B
Grammar
Model Parameter Space and Classes

Important task is to detect relevant features of the image. In general, this can't be achieved
without hand localization, only after hand detection, application can analyze it's features.
One of the popular ways to detect hand on the image is color segmentation, because of
distinctive hand color. When using gloves, this segmentations is trivial. To achieve better result,
some techniques use motion to detect hand area to addition with some other color-based
segmentation. This approach is known as fusion.
Application needs to extract features from located hand, like silhouettes, contours, key points,
and distant-transformed images. These features are used to compute model parameters, or can be
directly used as parameters for recognition phase.

Proposed Methodology
Firstly we need to locate hand region. Then we use skin color to distinguish the hand, because
it does not change while hand is moving, and doing gestures. In next step we use estimated hand
state to extract several features to define a deterministic process of finger recognition.
The simple model of skin color indicates that the red component of the skin is in the range of
37 and 60, whereas the green component of the skin is between 28 and 34. Only disadvantage in
RGB color space is, that it is sensitive to brightness and intensity, on the other hand, HSI and YUV
color spaces can separate intensity component, so they are used more often. Because video capture
devices mostly use RGB space, software conversion is needed.
By locating the pixel values on the object edges and connecting them together, we get
contours, from which we can get Blob representation of the hand.
In the proposed paper, locating all fingertips was not implemented. Application was just
searching for the highest point of the Blob, which actually works only for one finger pointing out,
however, by subtracting Morphological Opening from hand object, we should be able to get Blob
representation of the fingers.
This representation just needs to be divided into each separate finger and we can apply mentioned
method, with the highest point, but this would not work for fingers pointing in other direction than
upwards. I suggest several methods, depending on usage of this application, that can be used. If
mainly pointing fingers are being recognized, filled in ellipse in each Blob might get us the
approximate position of the finger, and selecting the most distant point from it's center (opposite
from Morphological Opening position) can get us to the finger-tip. In case we need to recognize
gestures with bended fingers, this approach would need more complex methodology to lead us to
desired result. Something similar to Skeletal model might work better. To find a line through the
center of the finger and then finding it's other end can result in getting the finger-tip in anyhow
bended or rotated finger Blob.
The process of finger recognition starts in
moment, when hand is placed in front of camera. For
this application, model with 12 gestures was used as
shown in the figure to the right. For avoiding fast
finger gesture changes or not intentional gestures,
every change should be kept fixed for 5 frames
approximately to recognize it. This model was used for
hand-operated calculator, and proposed methodology
was significantly successful.
In case we need other, or perhaps more gestures,
we can use two hands. One for a cursor and one set of
gestures and other for additional commands and
combinations of gestures, but this might lead us to
more complex model. Although the paper only shortly
mentioned this approach, using finger as a pointing
device, and the other hand as an "action activator"
might be a good idea for future development. This way
we don't need to learn complex gestures to represent
actions with one hand only, and we can use more
natural representation with two hands, for instance showing number 8 as in model above, and with
two hands, that looks and feels much more natural.

In conclusion, there are many methods that can provide us with the similar result, and each of
them is better for different circumstances, but to recognize hand by color detection algorithm is
very successful, because hand deformation does not affect it's result, and can be operated real-time.
After applying algorithm for detecting all finger-tips and creating two hand gesture model, this
approach can be widely used with many "working stations" and devices in our household.

You might also like