Yao 2012

2012 International Conference on Systems and Informatics (ICSAI 2012)
A Multi-Touch Natural User Interface Framework
Jialiang Yao Terrence Fernando Hongxia Wang

Thinklab Thinklab School of Built Environment
University of Salford University of Salford University of Salford
Salford, UK Salford, UK Salford, UK
Abstract—This paper presents the design and implementation of exploited the power of PowerWall technology [3] for
a multi-touch gesture interface framework that has been stakeholder engagement in urban planning. The interaction and
designed and implemented for collaborative urban planning. The team collaboration within the COPE is currently supported
user gestures required for supporting collaborative urban through a gamepad interface. This paper presents the authors’
planning have been defined by analysing a well established urban approach for enhancing the current user interaction capability
planning environment, previously developed by the authors. by exploiting the power of natural hand gestures. Furthermore,
Although, the urban planning have been considered as the it presents a generic multi-touch based natural user interface
application context, the overall multi-touch gesture interface framework that provides a range of high level gestures that can
framework has been designed and implemented as a layered
be used to develop interactive information exploration in
architecture to support any high level applications. This layered
gesture architecture comprises a multi-touch raw data layer, a
applications such as urban planning.
basic gesture layer and an application specific gesture layer.
Keywords: Multi Touch; NUI; gesture design; gesture II. REVIEW
framework During recent years, there have been many research and
development efforts towards creating multi-touch natural user
I. INTRODUCTION interfaces to enhance user interaction and collaboration.
However, designing and implementing a multi-touch natural
The term natural user interface (NUI) is used to refer to
user interface is still a challenging task. In this research the
interfaces which allow the user to interact with a system based
authors’ main concerns are the selection of a set of natural
on the knowledge learnt from using other systems [1].
gestures appropriate for urban planning tasks and the
Typically, NUI can be aided by technologies allowing users to
development of a multi-touch gesture framework that can
carry out natural motions, movements or gestures to control the
easily be deployed in supporting geo-spatial applications such
computer application or manipulate on-screen content. A multi-
as urban planning.
touch device is one of the technologies that have emerged
recently and is widely used in creating NUI for developing The design of multi-touch gesture sets have been
interactive computer applications. investigated by some researchers, such as Ringel et al. and Wu
et al [4, 5]. Hinrichs et al. observed many different gesture
Multi-touch devices enable users to directly interact with
instances from a field study and categorised them into a group
information on screens using fingers as input devices. It
of low-level actions: drag/move, enlarge/shrink, rotate, tap,
provides users with a stronger feeling of control over their
sweep, flick, and hold [6]. Existing multi-touch based systems,
interactions rather than being controlled by the system [2].
such as Apple iOS and Android systems, also impact on the
Multi-touch interaction is becoming common as a type of user
gesture set design. Due to their increasing popularity, certain
interface for many software applications running on mobile
gestures have become widely known by the general public.
devices such as tablets and smart phones. Furthermore, due to
Other related work, such as GestureWorks [7], has also
its popularity and ease of use, the multi-touch input method
influenced gesture sets and their possible meanings. Although
opens up possibilities for a broader range of applications where
the selection of a gesture set mainly depends on the application
users need to participate and collaborate together to explore
requirements, all of these efforts have played a significant role
alternative solutions and build consensus.
in influencing the nature of multi-touch based applications and
This research explores how a multi-touch interface the design features of multi-touch based user interfaces.
framework can be developed to support intuitive user
Some multi-touch gesture frameworks have been developed
interaction during collaborative urban planning discussions.
to provide a development platform for various applications.
Since many stakeholders, with varying levels of computing
Three of the most popular multi-touch gesture frameworks are:
knowledge are involved in collaborative urban regeneration
an open source framework – Tangible User Interface Objects
projects, this research explores the development of a multi-
(TUIO), Windows 7 SDK from Microsoft [8], and PQ Labs
touch interface that is based on natural gestures, requiring little
multi-touch SDK (http://multi-touch-screen.com/sdk.html).
training for exploring the urban spaces and proposed designs.
TUIO is an open framework that defines a common protocol
This research builds on the authors’ previous research on and API for tangible multi-touch surfaces. The TUIO protocol
the COllaborative Planning Environment (COPE) which allows the transmission of an abstract description of interactive
499 978-1-4673-0199-2/12/$31.00 ©2012 IEEE

surfaces, including touch events and tangible object states. discussion meetings. The key functions that are supported by
However, TUIO does not provide gesture analysis. As a result the COPE environment can be summarised as:
it has to be implemented by users. There is a wide range of
applications which have used TUIO listed on the TUIO web • Navigation (with both exo-centric and ego-centric
site (http://www.tuio.org/). To the best of the authors’ mode) – this allows the user to navigate
knowledge, no suitable TUIO-based framework can provide • Object and menu selection/query
the desired support to the applications within the context of • Area selection
collaborative urban planning. Both Windows 7 SDK and PQ Table I below shows the task analysis conducted by
Labs provide either raw touch data support or very basic researchers to identify the key interaction techniques used
gestures’ support which is very limited for complex during a typical urban analysis session.
applications such as urban planning. Developers need to further
develop gesture operations for their own applications using At present, the above actions are supported through a
these SDKs. gamepad and therefore the challenge was to identify a natural
interface based around the multi-touch display. The next
section explains the authors’ approach for prototyping a
III. NATURAL USER INTERFACE DESIGN FOR URBAN
gesture-based interface to support the above functions.
PLANNING
The design and implementation of the natural interface is B. Defining Gestures
being carried out as an iterative process and this paper presents
In order to identify natural gestures for developing the first
the outcome of the first cycle of the iterative process. Seven
prototype, a workshop was organised involving subjects who
subjects (three academics and four researchers), who closely
are usually involved in driving the COPE environment or
work with urban planning teams in using the COPE technology
working with urban planning teams.
platform, were used to analyse the typical interaction tasks
during urban planning and mapping them to natural gestures. Prior to the workshop, the authors completed a literature
review of gesture applications. This literature review searched
TABLE I. KEY UI FUNCTIONS USED IN COPE FOR URBAN PLANNING
the common gestures discussed and designed by other research
projects (such as GestureWorks.com) as well as some basic
Key Interaction gesture design guidelines which emphasised key aspects such
Purpose
functions Features as familiarity, searchability, expressivity [9] and simplicity
Exo-centric To allow users to fly around the Move: left, right,
navigation urban space to explore green up, down.
[6]. Important suggestions by Hinrichs [6] that gesture types
spaces, road layout and other Rotate : pitch, yaw should be classified based on the relation of the touching
urban data sets such as crime, Zoom : in, out fingers and their movement instead of the number of fingers,
unemployment and health were identified. Real world examples researched by the authors
Ego-centric To allow users to walk along a Walk / drive along a included applications on the Microsoft surface table,
navigation street or a urban space to pointed direction.
demonstration programmes from PQLabs as well as
understand the proposed spatial
changes to the space or to explore applications on the Apple iOS.
social challenges in a given
neighbourhood.
During the workshop, the literature review results, the
Area To allow the users to specify an Ability to mark a gesture design guidelines, and real world examples as well as
selection area of interest with a view to boundary the outcome of the functional analysis of the COPE
uploading physical data such as environment were presented and discussed by the participants.
3D buildings, and social data Since the idea of this research was to develop a natural
such as crimes and interface that is familiar to users, it was decided to use the user
unemployment .
Select objects To allow users to interrogate the Mouse click,
interaction with a paper map on a fixed table as the basis for
data attached to a building (such virtual pointer developing a gesture interface for the urban planning
as address, energy, ownership) or environment. This physical interaction metaphor was
interrogate detailed data behind considered as being the closest to the manipulation of urban
visual icons. maps on a multi-touch table. The participants were asked to
Select menu To allow users to select and Mouse click, virtual consider the following questions:
widgets populate the urban space with a pointer
particular type of social data from • How would you perform the interaction operations
a menu
described in Table I on a physical map?
Remove data To allow users to delete uploaded Menu option to
data (buildings or social data) to remove data • If the paper is an e-paper, what additional gestures
bring new data sets would you use to interact with the map?
A. Functional Analysis of the COPE’s Environment Table II below summarises the proposals that were derived
by the participants in mapping the key interaction operations
At present, the COPE urban planning environment allows
into gestures.
users to load a particular urban space, created by combining
terrain data, aerial photographs, building models and social The above user feedback was used as the basis for creating
data sets, and to explore the space during urban planning a natural gesture interface to support the interaction with the
500
urban planning environment. The following section explains
the final gestures designed for the application.
TABLE II. GESTURE RECOMMENDATION

Recommended natural gestures (based on paper (a) (b) (c)
Interaction Task
metaphor)
Move: left, right, Move the paper by using one or more fingers
up, down
Rotate: about the Rotate the paper by touching it and then rotating by
axis perpendicular two or more fingers
to the paper.
Rotate: about a Pin down the paper by one or more fingers from one (d) (e) (f) (g)
given point hand and then rotate the paper by using the fingers Figure 1. Examples of navigation gestures. (a) and (b) move gestures, (c)
(one or more) from the other hand and (d) zoom gestures, (e) – (g) rotate gestures
Rotate (tilt): tilt Pin down one point using a finger from one hand and
the paper about a tilt the plane by applying a directional movement
point from the fingers from the other hand
Mark a boundary Pin down the paper by one or two fingers from one
hand and use one finger from another as a pen to
draw the boundary
Delete data Use palm with finger together as a eraser to delete
data
Walk / drive along Move a finger along the path that the user wants to
a pointed travel Figure 2. Tilt gesture
direction.
Select object or Touch the object using any finger However, all the above navigation operations are not
menu widget
mutually exclusive. Sometimes, compound actions such as
Zoom (in and out) Use the pinch motion and its reverse to zoom in and
out. This gesture was heavily influenced by similar move-and-zoom, move-and-rotate can happen at the same time
gestures used in many mobile phones which is showed in Fig. 3. In which case, all actions will be
performed in this design.
C. Gesture design
Gestures identified in the previous section were analysed
and designed to support the functions presented in Table I,
ensuring each gesture is mapped to only one type of operation.
1) Navigation
Figure 3. Example of move-and-zoom gesture happening at the same time
a) Exo-centric navigation
In the exo-centric navigation mode, typical navigation b) Ego-centric navigation
functions include move, zoom and rotate (Table I). The In ego-centric navigation mode, only “driving/walking”
gestures for these operations are defined as moving fingers in functions are required. This function is mainly used to simulate
parallel, towards or apart from each other or rotating around a one who walks (moves and rotates) along a street. In this mode,
point respectively. In this design, the number of fingers to be the gestures are designed to use single touch to do both the
used are not defined which means the user can use any number navigation and the rotation. When a finger is placed on the
of fingers to do such operations. However, the zoom and rotate screen, the first touch point is used as the reference point of
operations cannot be operated with only one finger; they navigation. Then the finger will move around the reference
require a minimum of two fingers (Fig. 1) 1. point to control the navigation (Fig. 4). The distance between
the current touch point and the reference point indicates the
Although, the forwards and backwards tilting of the map is moving velocity. The angle between the “up vector” and the
a rotational operation, a different gesture is required to allow vector along reference point and the touch point defines the
the user to rotate the normal of the map away from the screen. rotation factor.
For this special action, when the user pins down the map using
one finger, a vertical scroll bar will be prompted on the screen.
Moving another finger within this bar will tilt the map (Fig. 2).
1
Some of the gesture figures are taken from GestureWorks [7]. When a full
hand is used in a gesture in these figures, it indicates that any number of
fingers can be used. Otherwise, it indicates the exact number of fingers used to Figure 4. Ego-centric navigation control
express a gesture
501
2) Undo or erase By using these four types of gestures, the multi-touch
This gesture imitates the action of erasing drawings from a device can control the digital map like a floating piece of paper
piece of paper. Using a palm to touch the screen, and moving map being handled with two hands.
the palm left and right will carry out the undo or erase
operation (Fig. 5). This operation will undo one action. To IV. THREE-LAYER FRAMEWORK FOR MULTI-TOUCH UI
undo again, the palm must be lifted off the screen first before
the gesture is repeated. A. Framework design
One of the objectives of this work is to explore a general
purpose multi-touch framework to support NUI for different
applications. Rather than developing all possible gestures, this
framework provides the possibility of aggregating low level
gestures to appropriate high level gestures, depending on the
application and user requirements. The authors’ solution is to
Figure 5. Undo or erase gesture. break gestures into limited numbers of low level gestures first.
Then higher-level gestures are recognised by combining a
3) Area selection sequence of low level gestures.
In this work, the application level gestures are broken into
several gesture “atoms”. The gesture atoms are basic “pure”
gestures. A series of atom gestures can be grouped together to
form an application gesture to represent a certain functional
meaning.
The basic gesture information is classified into two
Figure 6. Line-drawing gesture categories: dynamic gestures and touch information. Dynamic
gestures are formed by moving touches, such as finger-down,
Like a pin attaching paper to a board, the paper can rotate finger-up, move, zoom, rotate, etc. A touch information set
about the pin but cannot move. However, if there are more than provides all the touch points’ information (e.g. position,
two pins used to attach the paper to the board, the paper will be moving/non-moving, moved before, touch area, etc.) The two
fixed on the board. Based on this idea, the gesture for area types of information are used to define application gestures.
selection also comes from the way of handling paper on a Therefore, the multi-touch framework is organised into
board. Users can use two or more fingers to hold the map, and three layers which is illustrated in Fig. 7. The left part in Fig. 7
then use another finger moving on the screen to draw the shows the structure of the framework. The right part of Fig. 7
boundary of an area (Fig. 6). The end of the gesture is defined shows its relation to a COPE based application on a Windows
as the time when the user’s hand is removed from the screen. platform.
However, if the user changes his/her mind, he/she can use the
undo/erase gesture to cancel the area selection operation before Multi-Touch Frame Work An OSG Application
releasing the ‘holding gesture’ from the screen.
Application Gesture Layer OSG Event Queue
4) Object selection / query

(One-finger) single-tap and (one-finger) double-tap gestures Basic Gesture Layer
are used to carry out object selection and query tasks. The
experience of tap and double tap come from using a mouse’s
click and double-click action in a GUI (Graphical User Windows Event Queue
Interface). A single click usually selects (or highlights) an Multi-Touch RAW Layer
object while double-click usually executes or opens the object
in many commonly used graphical user interfaces, such as the
GUI in Windows and Mac OS. Multi-touch Events Other events
Similar to the mouse action of ‘click’ and ‘double click’, Events

single-tap and double-tap gestures were designed with the Figure 7. The structure of the multi-touch framework
following in mind:
• The 1st layer (the lowest layer) is the RAW layer,
• Single-tap is the action of placing a finger on the which represents the RAW information from the
screen and then its removal within a certain time multi-touch device. It is usually the interface that the
interval, similar to the action of a mouse ‘click’. device driver provides, such as PQLabs and TUIO
• Double-tap is the action of placing a finger on the (3rd party library).
screen and removing twice within certain spatial and • The 2nd layer is the basic gesture recognition layer,
temporal intervals, which is similar to the action of which provides basic gesture atoms/elements together
mouse ‘double-click’. with touch information for the upper layer.
502
• The 3rd layer is the application layer. This layer V. INITIAL NUI PROTOTYPE IMPLEMENTATION
provides the final user interface based on the gestures To test the feasibility of the multi-touch NUI framework,
captured by the 2nd layer. It uses the gestures to the authors have implemented and integrated the framework
interact with the context of the application and with the COPE environment.
provides the special user interface functions required
by the application. Therefore, the 3rd layer is an The COPE implementation is based on Open Scene Graph
application related layer, and cannot be typically used (2.8) on a Microsoft Windows platform. Since the OSG does
for general purposes. not process multi-touch events, one of the key tasks of the
prototype is to act as a plug-in for the OSG-based COPE
B. Gestures’ definition within each layer architecture so that the multi-touch events are accepted without
modifying any code of the OSG itself.
1) RAW layer In order to implement the NUI plug-in for the COPE, the
The RAW layer gestures are usually the initial touch multi-touch events need to be captured. At same time, the NUI
information, such as the touch id, orientation, touch size and prototype also needs to distinguish the multi-touch events from
shape, touch image for each touch on the screen surface. The any other types of events, such as mouse events, so that the
details of the data are usually decided by the touch screen’s original interface still works.
capability and provided to the application over the device The implementation approach is based on two steps. The
driver or 3rd party libraries. first step is to capture multi-touch events and place them into
Because of the data structures of different libraries, the data the OSG’s event queue. The events cannot be placed into the
may need to be restructured for them to fit into the higher Windows’ event queue because the OSG can only put very
layers. limited types of Windows’ events into the OSG event queue,
and all other types of events are thrown away. The second step
is to process those multi-touch events in an extended
2) Basic gesture layer manipulator which is based on the OSG’s event manipulator. In
The basic gesture layer tries to capture gestures from raw the implementation of the COPE prototype, a new manipulator,
touch information and generate basic gesture information for derived from the OSG’s matrix-manipulator, has been
the application layer. implemented so that a specially designed user interface could
This layer first generates statistical information from the be implemented. It supports a plug-in structure by
raw touch information, such as the number of moving touches, implementing an event manipulating chain which dispatches
the number of non-moving touches and each touch’s history. events first to the plug-ins’ manipulator. If not processed, it
Then it captures the basic gesture based only on moving then dispatches them to the original event manipulator.
touches.
The gestures that the 2nd layer provides include:
• Parallel move
• Split
• Rotate
• Finger-down, finger-up, single tap and double tap for
each touch
• Begin, end and inertia begin
Because most of the current multi-touch devices cannot
distinguish touches from different people, it assumes that all
touches come from the same person. In general, the framework
should also provide touch-grouping functions to support
collaboration. Figure 8. The COPE multi-touch user interface
The results show that the plug-in type implementation can

3) Application layer add a new interface into a software system without affecting
The 3rd layer provides functions to drive the multi-touch any existing user interface. The physical look and feel of the
user interface as mentioned in section 3. It receives basic interface is shown in Fig. 8.
gestures from the 2nd layer, and combines them with the
context underlying the touches to generate proper actions for
VI. DISCUSSION AND FUTURE WORK
the application. The gesture set defined in section 3 is
recognised in this layer and the intended user action is passed This paper presented the design of a gesture set for urban
to the application to be executed, and then executed in response planning applications and the development of a multi-touch
to the gesture. natural user interface framework that could support high level
application gestures. A prototype has been implemented to
503
transform the COPE urban planning environment into a multi- [2] S. Bachl, M. Tomitsch, C. Wimmer, and T. Grechenig, “Challenges for
touch supported application. designing the user experience of multi-touch interfaces,” presented at the
Engineering Patterns for Multi-Touch Interfaces Workshop (MUTI'10)
The three-layer framework provides a generic structure for of the ACM SIGCHI Symposium on Engineering Interactive Computing
Systems, Berlin, Germany, 2010.
the development of a multi-touch user interface. The
implementation of the prototype based on the COPE has [3] J. Yao, T. Fernando, H. Tawfik, R. Armitage, and I. Billing, “A VR-
centred workspace for supporting collaborative urban planning,” in 9th
demonstrated the capability of the framework. International Conference on Computer Supported Cooperative Work in
Design, Coventry, UK, 2005.
The prototype has been informally tested by a small group
of end-users. During the tests, the authors have intentionally [4] M. Ringel, K. Ryall, C. Shen, C. Forlines, and F. Vernier, “Release,
relocate, reorient, resize: fluid techniques for document sharing on multi-
asked users to use this system without offering prior knowledge user interactive tables,” presented at the CHI '04 extended abstracts on
of the multi-touch gestures. The initial results show that there is Human Factors in Computing Systems, Vienna, Austria, 2004.
little difficulty for these users to navigate around (in the exo- [5] M. Wu, C. Shen, K. Ryall, C. Forlines, and R. Balakrishnan, “Gesture
centric mode) with multiple fingers which includes move, registration, relaxation, and reuse for multi-point direct-touch surfaces,”
zoom and rotate. Object selection with the tap gesture was also presented at the Proceedings of the First IEEE International Workshop
carried out without difficulty. However, the less commonly on Horizontal Interactive Human-Computer Systems, 2006.
used gestures, such as the hold gesture, the tilt operation, the [6] U. Hinrichs and S. Carpendale, “Gestures in the wild: studying multi-
touch gesture sequences on interactive tabletop exhibits,” presented at
erase and area selection required prompting and training. Many the Proceedings of the 2011 Annual Conference on Human factors in
users were simply unaware of the presence of such functions. Computing Systems, Vancouver, BC, Canada, 2011.
After introducing these gestures the users could use them with [7] GestureWorks - Multitouch Framework - Build Gesture-Driven Apps.
ease. http://gestureworks.com.
[8] Y. Kiriaty, MultiTouch Capabilities in Windows 7, MSDN Magazine
The next step of the authors’ research is to carry out a http://msdn.microsoft.com/en-us/magazine/ee336016. aspx , 2009.
formal usability evaluation and to integrate the framework with [9] A. Bragdon, R. Zeleznik, B. Williamson, T. Miller, and J. J. LaViola Jr.,
other applications to validate the framework as a general “GestureBar: Improving the Approachability of Gesture-based
purpose system and enhance its usability. Interfaces,” presented at the Proceedings of the 27th International
Conference on Human Factors in Computing Systems, Boston, MA,
USA, 2009.
REFERENCES
[1] D. A. Norman, “Natural user interfaces are not natural,” Interactions,
vol. 17, pp. 6-10, 2010.
504

Yao 2012

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Yao 2012

Uploaded by

Copyright:

Available Formats

2012 International Conference on Systems and Informatics (ICSAI 2012)

A Multi-Touch Natural User Interface Framework

Jialiang Yao Terrence Fernando Hongxia Wang

499 978-1-4673-0199-2/12/$31.00 ©2012 IEEE

TABLE II. GESTURE RECOMMENDATION

4) Object selection / query

Similar to the mouse action of ‘click’ and ‘double click’, Events

The results show that the plug-in type implementation can

You might also like