You are on page 1of 4

A Gesture Learning and Recognition System for Multitouch Interaction Design

Sashikanth Damaraju and Andruid Kerne
Interface Ecology Lab at Texas A&M University
{damaraju, andruid}

Abstract multitouch hardware, bringing the potential for

supporting new forms of human-centered experience;
To design interfaces for more embodied this potential is as yet hardly realized. Our contribution
interaction, we build a system that learns and is a system that understands human expressions of
recognizes multitouch gestures – movements of human intent by learning and recognizing gestures performed
fingers on a multitouch surface. We form an example on a multitouch surface.
vocabulary of gestures that demonstrates the subtleties
of finger movements that can be recognized. A video Researchers have investigated multitouch interaction
processing system extracts a feature set representing for decades [1]. Multitouch gestures have generally
these movements. Hidden Markov Models (HMMs) are been limited to coarse movements such as the ‘wipe’
used to learn gestures by example, and consistently and ‘pile-n-browse’ techniques [8], simple primitives
recognize them. We present an evaluation that such as ‘flicks’ [5] or fixed combinations of ‘chords’
demonstrates the robustness of the methodology. [6]. We present a technique that learns and recognizes
Implications for interaction design are presented. sophisticated hand gestures on a multitouch surface,
providing end-user customizability. Our long term
1. Introduction objective is to give human participants an interactive
As digital systems become more ubiquitous, more experience in which embodied gestures performed by
integrated into the fabric of human life, we need to the human hand are mapped to actions in ways that are
discover more human-centered forms of interaction. natural, meaningful and intuitive.
Multitouch systems capture movements of multiple
human fingers on an interactive surface enabling direct 2. Learning and Recognition Pipeline
hands on interaction. This modality of input enlarges Computationally, we define a gesture as a human
the dimensionality of interaction in comparison with action that begins with placing one or more fingers on
the conventional mouse-pointer interface. Human the interactive surface, and ends when those fingers are
hands afford more degrees of freedom due to the lifted off the surface. A sequence of frames from a
dexterity inherent in their operation. Recent camera is captured to record each gesture. For each
developments facilitate access to the required frame of the sequence, a feature set is derived by an

Figure 1. Multitouch Gesture Learning and Recognition Pipeline. Identification of spatial arrangement in the
learning phase feeds back to feature pre-processing.
Figure 2. Vocabulary
V off gestures dem
monstrating thhe subtleties thhat can be recoognized. The starting
s positiions of
the finggers are shownn in red, the bllue trails show
ws the path du
uring the gestuure.

open sourcce video proceessing and feaature acquisitioon recognnized irrespectiive of where it
i was perform
med on
toolkit [3],, and used to reecognize the gesture
g throughh a the scrreen. The sammples of the training
t set arre also
pipeline of
o processing stages. Figurre 1 shows thhe scaled to a constant size
s for improvved recognitionn.
stages invvolved in th he learning anda recognitioon
pipeline. The orrdering of the fingers in the feature set proovided
by viddeo processingg is determineed by the tem mporal
2.1. Train
ning Set order ofo placement of o the fingers on o the surface in the
Our goal iss to build a praactical system that uses naturral first frrame of a gessture. Withoutt preprocessingg, this
movements of the hand d to perform complex
c actionns orderinng is inconsistent across diffferent samples of the
within ann application. To start, we w developed a same gesture,
g whichh is a problem for the learninng and
vocabularyy of 40 different gestures for testing thhe recognnition system. We develop a solution to this
current leaarning and recoognition systemm (Figure 2). Thhe problem of consistennt feature orderring, which ideentifies
gestures were
w formed too exemplify suubtle differences the sppatial arrangeement of finggers as horizzontal,
in initial positions
p of the
t fingers annd directions of verticaal, or radial. Thhe spatial arranngement is idenntified
movement. They are grouped by the nuumber of fingeers for thee first frame of each gesturee during the learning
used to peerform each one.o Twenty samples
s of eacch stage, and then appplied during preprocessing
p d
gesture weere collected from
fr each of 10
1 users, 9 maale subseqquent frames (see feedbackk line at the top t of
and 1 femaale, all of whom m are right-hannded. We expeect Figuree 1).
that the abbility to recognnize a vocabuulary of gesturres
with subttle variations, such as thhe direction of 2.3. Learning
L Stagge
movement of the thumb in gestures 3aa and 3b, will be b To leaarn gestures, we have ranndomly selecteed 20
critical for developing exxpressive appliccations. samplees of each geesture from accross all userss. The
remainning samples area used to dettermine the acccuracy
2.2. Featu
ure Preproceessing of classsification of the recognitionn system. Durinng the
The feature preprocessin
ng stage normaalizes the data in learninng phase, aftter the obserrvations have been
the X, Y feature spacee to allow thee gesture to be b collectted and preproocessed, they are passed too a K-
means clustering alggorithm. The K-means clusstering
100.0 increase consistency between samples of a gesture
97.0 96.9 were raised, and solutions were developed to solve
Gesture Classification

80.0 92.4 them. The system can recognize samples of gestures

Accuracy (%)

This gesture recognition system will serve as a
40.0 foundation for advanced interactive visual systems.
Gestures will be used to design interaction techniques
20.0 for performing complex actions. Future work will
develop interaction techniques mapping gestures to
0.0 actions of searching, browsing, collecting and
2 Fingers 3 Fingers 4 Fingers 5 Fingers organizing within combinFormation, a mixed-initiative
information composition platform [2]. The system
Figure 3. Classification accuracy results for 20 runs agents process documents from the internet and form
of learning gestures with a random training set of 20 image-text surrogates, placing them on a composition
samples, and testing with the remaining samples. The space. combinFormation provides the user with the
gesture vocabulary of 40 gestures is grouped by the ability perform a large number of operations, such as
number of fingers in each gesture. expressing interest to direct the agent, navigating to the
source document, actions for image manipulation and
efficiently forms an initial estimate of the HMM text editing. Inspired by prior work on choreographing
parameters. The estimate is, in turn, passed to the human movements for human-computer interaction
Baum-Welch algorithm, which tunes these parameters [6], we will design interaction techniques to enable
to return a high probability for only the given training participants in performing these complex tasks with
sample sequences. well choreographed and embodied gestures. We
hypothesize that interfaces with intuitively designed
2.4. Recognition Stage gesture-actions mappings will result in a more
For each sample to be recognized, we pass the sample expressive, engaging and fluid human computer
into each trained HMM. The probabilities output from interaction.
each HMM are collected and compared. We label the
sample as the multitouch gesture whose model returns 5. References
the highest probability. [1] Buxton, W. Multitouch Systems That I Have Known
And Loved. 2008
3. Results
A random selection of 20 training samples for each [2] Kerne, A., Koh, E., et al., A Mixed-Initiative System for
gesture from the entire set of all 10 users is used to Representing Collections as Compositions of Image and
train HMMs, and the remaining samples used to Text Surrogates, Proc JCDL 2006.
determine the overall classification rate of the system. [3] Nuigroup, Touchlib: A Multi-Touch development kit
The classification rates shown in Figure 3 are for the
gestures using two, three, four and five fingers are [4] Rabiner, L. R., A Tutorial on Hidden Markov Models
97.0%, 95.4%, 96.9% and 92.4%, for an average and Selected Applications in Speech Recognition.
gesture recognition rate of 91.5%, and the result is Proceedings of the IEEE, Feb 1989, 77 (2), 257–286.
statistically significant (p < .0001). These results show [5] Tse, E., Shen, C., Greenberg, S., Forlines, C., Enabling
that our technique classifies gestures within a interaction with single user applications through speech &
reasonable accuracy for practical use. As the HMM can gestures on a multi-user tabletop. Proc ACM AVI 2006.
be trained using samples from more than one user, it [6] Webb, A., Kerne, A., Koh, E., Joshi, P., Park, Y.,
can also be stored within an application or shared Graeber, R., Choreographic Buttons: Promoting Social
across users to avoid re-training. Interaction through Human Movement and Clear
Affordances, Proc ACM Multimedia 2006.
4. Conclusion & Future Work [7] Westerman, W, Hand Tracking, Finger Identification,
We have presented a system that can successfully learn and Chordic Manipulation on a Multi-Touch Surface,
and recognize gestures performed by the human hand Phd Diss
on multitouch surfaces. A vocabulary of gestures was [8] Wu, M., Shen, C., Ryall, K., Forlines, C., Balakrishnan, R.,
built to test the system’s ability to distinguish the Gesture Registration, Relaxation, & Reuse for Multi-Point
subtle differences between the gestures. Issues to Direct-Touch Surfaces, Proc IEEE TableTop 2006, 185-192.