Group30 Superpixel Based Human Computer Interface UsingaHand Gesture Recognition

Final Project Report
Superpixel-Based Human Computer Interface Using

Hand-Gesture Recognition
Abhishek Maheshwari, Anurag Semwal, Susmit Wagle
161110{01,33,52}@iitk.ac.in
Indian Institute of Technology, Kanpur
April 28, 2017
Problem Description
To increase positive impact on ease of use of computers, Hand Gestures play an outstanding role.
As stated in [2], involvement of hands is maximum in gesturing among all other means (i.e. Fin-
gers, Head, Body, Multiple-hands). Hand gestures can be classified mainly in two kinds.
(1) Static Hand Gestures : Where hand position and structure of fingers is almost static with re-
spect to the body in a specified time frame.
(2) Dynamic hand Gestures : Where Hand position or structure of fingers are in motion during a
specified time frame and hence called Dynamic.
General Solutions : To capture the true sense behind a gesture,multiple techniques are used, few
require users to wear an electronic glove so that the key features of hand can be accurately measured
but the device is somewhat costly and inconvenient for domestic applications [4] . Optical markers,
Skin colour models and hand shape models have also been proposed as stated in [4]. After Locating
the hand, hand gesture recognition aims to classify gestures into multiple signs using classifiers like
HMM (Hidden Markov Models), SVM (Support Vector Machine), K-NN (K nearest Neighbor) etc.
Previous Work
Chong Wang et al. proposed method which implements a superpixel earth mover's distance (SP-
EMD) to measure the dissimilarity between the hand gesture[4]. This measurement was promised
to be robust to distortion and articulation, but also invariant to scaling, translation and rotation with
proper preprocessing. Another effective gesture recognition algorithm proposed in [3] is based on
the Finger Earth Mover's Distance (FEMD) and Template Matching Method (TMM), which showed
promising performance and Considered State-of-the-art. One gesture recognition proposal [1] was
to use recurrent neural networks to implement a generic deep framework for activity recognition
based on convolutional and LSTM recurrent units, which was showed to be suitable for multi-modal
wearable sensors and does not require expert knowledge in designing features.
What We Implemented
We have implemented the paper cite. It consist of following steps:
1
• Hand Localization and Segmentation
The Kinect joints are directly used to locate the hands, wrists and elbows.By assuming that
the hand is visible to the camera without any occlusion, it allows us to quickly separate the
hands from background objects using depth information alone we have wrist join depth
information from kinect, we take that as threshold to filter out the hand from the background.
• Shape Representation Using Joint Color-Depth Superpixel

Instead of representation the hand shape in contour , we use SLIC superpixels to simplify
the hand shape but retain as much information as possible. author’s of paper have modified
the Simple Linear Iterative Clustering (SLIC) algorithm to include clustering in a 6-D space
including the (L,a,b) values of the CIELAB color space and the (x,y,d) pixel co-ordinates,
where d is the depth value at the pixel location (x,y). Slic superpixels with depth informa-
tion not only have the 2D shape but also the corresponding texture and depth can be jointly
utilized in hand gesture recognition. We have tried to modify original SLIC to incorporate
the additional depth information but achieved only partial success. So we have used original
SLIC along with average depth of pixels for each superpixel. This representation is robust to
distortions of the hand contours, since the centroids of the superpixels are not determined by
the contour alone.
• Depth Normalization
we have taken the minimum depth from superpixels of the image and then substracted from
depth of each superpixels so that to normalize for different depths in different images
• SPEMD
SPEMD is modification of ubiquitous distance measure FEMD. Super-Pixel based version
of EMD is expressed in below equation. It takes flow and corresponding cost of edges from
minimum flow calculated using Min-Flow algorithm e.g. Ford-Fulkerson Algorithm. So it
calculates a distance metric between two graphs with normalization over total flow.
Pk Pl
i=0 j=0 cij fij
SP EM D(P, Q) = Pk Pl
i=0 j=0 fij
where
cij = [xpi − xqj + ypi − yqj + α(dpi − dqj )]β
α, β are hyper parameters.
• Template Matching
Template matching is utilized for hand gesture recognition based on the SP-EMD. In par-
ticular, the input hand gesture is recognized as a certain class, namely , with the minimum
dissimilarity distance as,
g = arg min SP EM D(H, Tg )
g
where H is the input gesture and Tg is the template of class g.
Results
Performance Table: As stated in the cited paper, we calculated % accuracy over multiple runs. Each
run is characterized by different settings e.g. LOO (Leave-One-Out), L2O (Leave-two-Out) etc. As
explained in dataset section, we have hand gestures from 5 different subjects. So in Leave-One-
Out, we train or create template from 4 subjects and test on the left one. similarly for L2O and rest
2
of the test settings. We also implemented randomization while running for a particular setting. For
example in LOO we ran program for random left out subject and calculate performance by taking
average over all runs. following table summarizes our test results
S No Test Setting Approx. Accuracy

1 LOO 41%
2 L2O 38%
3 L3O 37%
So on average we are getting 38.6 % accuracy.
Future Work
For now we are yet to implement real life intricacies like hand-in-plane angle rotation invariance,
Out-of-plane rotation etc. We also intend to include recognition of dynamic hand gesture by using
skeleton-joint information provided by kinect-depth camera.
References
[1] Francisco Javier Ordóñez and Daniel Roggen. Deep convolutional and lstm recurrent neural
networks for multimodal wearable activity recognition. Sensors, 16(1):115, 2016.
[2] Siddharth S Rautaray and Anupam Agrawal. Vision based hand gesture recognition for human
computer interaction: a survey. Artificial Intelligence Review, 43(1):1–54, 2015.
[3] Zhou Ren, Junsong Yuan, Jingjing Meng, and Zhengyou Zhang. Robust part-based hand
gesture recognition using kinect sensor. IEEE transactions on multimedia, 15(5):1110–1120,
2013.
[4] Chong Wang, Zhong Liu, and Shing-Chow Chan. Superpixel-based hand gesture recognition
with kinect depth camera. IEEE transactions on multimedia, 17(1):29–39, 2015.

Group30 Superpixel Based Human Computer Interface UsingaHand Gesture Recognition

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Group30 Superpixel Based Human Computer Interface UsingaHand Gesture Recognition

Uploaded by

Copyright:

Available Formats

Final Project Report

Superpixel-Based Human Computer Interface Using

• Shape Representation Using Joint Color-Depth Superpixel

α, β are hyper parameters.

where H is the input gesture and Tg is the template of class g.

S No Test Setting Approx. Accuracy

So on average we are getting 38.6 % accuracy.

You might also like