You are on page 1of 4

The Gesture Watch: An alternate gesture based interface for remote control of devices

Gaurav Gupta Georgia Inst of Tech Atlanta GA 30332


Jiasheng He Georgia Inst of Tech Atlanta GA 30332

Jung Soo Kim Georgia Inst of Tech Atlanta GA 30332

This paper describes the Gesture Watch, a gesture based interface for the remote control of devices. By subscribing to this interface, any device can become gesture-smart and thus will not need a separate physical interface with the user. The Gesture Watch is designed to be embedded inside the wristwatch of user, allowing it to be unobtrusive. It takes advantage of the developments in infra-red technology and machine learning algorithms, allow for the robust sensing and recognition of gestures.
Author Keywords

world may nd himself overloaded by remotes. At a given instant of time, a person would have two or more remotes in his proximity while he may need only one of them. Thus, the problem now becomes, can we design a universal remote interface that could serve as a common point of control for a variety of appliances?

The Gesture Watch, gesture, interface, control, remote


With advances in micro-technology, mobile devices such as cell phones and MP3 players are growing smaller and more lightweight. An MP3 player could conceivably become the size of a hearing aid that could t snugly inside your ear. However, the problem then becomes input. How do you advance to the next song when the MP3 player is so small that it is inside your ear? In such a case, attaching buttons or a scroll wheel to the MP3 player would impractical because it makes the device too big.

In the past, many researchers have developed clever interfaces that could allow a person to control all devices in his periphery using a single interface. These include XWeb[1] , Light Widgets[2] , Magic Wand[3] , the FreeDigiter[4] , the Gesture Pendant[5] , Smart Snakes[6] and the Gesture Pendant II. Of these, the FreeDigiter, the Gesture Pendant, and the Gesture Pendant II were developed at the Georgia Institute of Technology and relied on using infrared. The others varied in implementation. Light Widgets used a system comprised of two cameras that detected the position and movement of hand on any surface. These hand gestures served as a control input to the XWeb platform. Another system that used the XWeb cross-modal platform was the Magic Wand, which had lasers and an inexpensive camera to generate control input. Smart Snakes meanwhile recognized hand gestures from a 15-bit color video stream using genetic algorithms. Of the Georgia Tech creations, the FreeDigiter was worn on the ear and could count the number of ngers moving past the sensor whereas the Gesture Pendant was worn on the neck and had a camera ringed by infra-red LEDs that illuminated the hand of the user. This camera would successfully recognize the gesture being made. Our previous work, the Gesture Pendant II, uses four proximity sensors arranged in a cross formation instead of a camera to detect the hand gestures.

Figure 1. The size of the human hand is what constrains the new iPod Shufe to go smaller.

One solution for this problem is using remote controls. With the rst television remotes appearing around 60 years ago, the concept was to not have to physically reach out and touch the television to control it. Many novel ideas and concepts were devised based on remote control of devices that made some part of everyday life easier for humans. But when remotes were designed, they had one inherent aw: to remotely control an appliance, a person would have to physically hold another (i.e. the remote). When this is coupled with the fact that the remote control interfaces were not (and still are not) standardized, the average person in todays 1

The Gesture Watch is designed to allow the control of different devices through the use of hand gestures. A user wears this device on his left arm just like a wristwatch and uses his right hand to perform gestures in the air at a moderate height range of approximately 5 to 20 centimeters above the watch face. The Gesture Watch then detects these gestures and appropriate control signal is sent over to the appliance that needs to be controlled. In contrast to the FreeDigiter that could just count the number of ngers that went past it, the Gesture Watch is specically designed so that it can recognize complex gestures made by the user. It enables the

user to build up a rich library of different gestures that could then be used for a variety of actions.


Figure 3. Block Diagram of the Gesture Watch

Figure 2. 3D model of the Gesture Watch

The Gesture Watch consists of ve infrared sensors which are used to detect the proximity of users hand to each one of them (Figure 2). Each sensor emits a positive signal when it senses a hand in front of it, or emits no signal otherwise. When viewed through a given period of time, the sequence of the combined outputs of these sensors species a gesture, with different sequences mapping to different gestures. Thus, in principle, each exclusive sequence can indicate a distinct gesture. However, due to the imperfections in placement and sensing of the sensors coupled with the conical viewing volume of infrared, some gestures might be ambiguous. The user might not be able to use the complete space of all possible combinations. Also, since the infrared sensors gather data continuously, one of the major considerations is segmenting the true data (when a user is gesturing) from the invalid data (infrared signals bouncing off from walls, furniture, book and other obstacles) to avoid false positives. Our system achieves this by using one of the sensors as a trigger switch. This sensor is mounted vertically on the front edge of the watch so that it sweeps in a horizontal direction. The user icks his wrist up to cover this sensor whenever he needs to make a gesture. He then gestures with his other hand over the four remaining sensors, which are mounted in the horizontal plane of the users left arm and are facing up. This is done all while keeping his triggering wrist up. To signal the end of the gestures, the user icks his wrist back down and the sensors data of this interval is sent over for recognition. In order to prevent false triggering of the sensors by some object, a time windowing system is employed. If the trigger interval falls between 1 to 5 seconds, the sensor output is taken to be valid and is sent over for gesture recognition. Otherwise it is ignored. Example: A man walks into his living room, triggers his watch, and holds his hand for two seconds over the left sensor. This signals that he wants to control his room lighting system. Then he makes an up gesture moving his hand over the sensors perpendicular to his arm and going away from the body. This makes the room light up. Now wanting to listen to music, this user holds his hand over the right sensor, activating his home entertainment system. With a circular clockwise gesture, he is able to increase the volume of his MP3 player.

Figure 3 shows a high-level block diagram of the whole system. The output of infrared sensors goes to a PIC microcontroller which then sends it over to the Bluetooth chip. A remote application listening for new gesture data receives it wirelessly through Bluetooth. This data is then processed by a GT2 K[7] enabled application that emits the gesture corresponding to the sensor outputs. This gesture is nally passed over to the control circuitry that does the related action. The whole interface is designed to promote scalability in terms of number of devices that can be controlled and the vocabulary of gestures that can be tagged. Using wireless communication, multiple receivers can listen to the same data stream at any given instant of time and can independently take decisions. Also, instead of sending high level gesture tags, the Gesture Watch sends raw data over the channel so that each device can map its gestures according to its own needs. Transmitting high-level gesture information would limit the number of control signals that devices can use by the vocabulary of the Gesture Watch and thus would not scale well for multiple heterogeneous devices.

Figure 4. Prototype of the Gesture Watch

The Gesture Watch uses SHARP GP2Y0D340K proximity sensors that can detect objects in the range of 10-60 cm. In practice, this range is best from 5-20 cm. This is the digital version of SHARP proximity sensors that simply outputs a low when it detects an object and stays high other wise. It requires around 4.5 to 5V Vcc to work and measures 15x9.6x8.85 mm. The data from these sensors is sent over to PIC16LF873 microcontroller that packetizes this data and hands these packets over to the Taiyo Yuden EYMF2CAMM Bluetooth module. These packets are then transferred via Bluetooth to a remote receiver. Both the PIC and the Taiyo Bluetooth modules run on 3.6V, and due to the difference in the voltage requirements of the infrared sen2

sors and PIC module, some voltage regulation circuitry is also needed.


The software running on the receiver side does the actual work of tagging the sensor data with its corresponding gesture. In our reference implementation, this software runs on an Apple MacBook with Mac OS X. It employs Georgia Tech Gesture Toolkit (GT2 K) to do the actual work of recognizing the gestures. As stated on the GT2 K website: GT2 K leverages Cambridge Universitys speech recognition toolkit, HTK, to provide tools that support gesture recognition research. GT2 K provides capabilities for training models and allows for both real-time and off-line recognition. . HTK employs Hidden Markov Models (HMM) for recognition. Our software dumps raw segmented data to the GT2 K engine, which emits the corresponding gesture name that is then used to perform the related action. In our implementation, a user can successfully control iTunes software by his gestures. Table 1 shows a subset of these gestures and the corresponding actions. Gesture Up Down Clockwise Counter-Clockwise Command Go to previous track Go to next track Increase volume Decrease volume

Figure 6. The Gesture Pendant II has four proximity sensors placed diagonally and a Bluetooth module

Table 1. Subset of Gestures for iTunes control

The Gesture Pendant II is a prior version of the Gesture Watch. It uses four proximity sensors that are similar to those of the Gesture Watch in analogous conguration, but its form is different. The Gesture Pendant II is shaped like a pendant hung from the neck rather than a wristwatch-like interface. Figure 6 shows the Gesture Pendant II prototype. It uses a combination of four infrared sensors and a Bluetooth platform to detect a set number of gestures. In addition, the Gesture Pendant II uses an on/off switch to segment each gesture instead of using a separate proximity sensor that like the Gesture Watch does. Users can switch between the Gesture Watch and Gesture Pendant II as they share the same platform and data interface. For example, one can wear the Gesture Watch while he/she is jogging to control his/her MP3 player or wear the Gesture Pendant II to control various appliances at home.
An Infrared Remote Control System

Figure 5. A user gesturing on the Gesture Watch


A gesture-based remote control device could possibly have numerous applications. In addition to the iTunes control software, the authors were also able to successfully control a television set with a prior iteration of the Gesture Watch, namely the Gesture Pendant II.. This uses the similar sensors to the Gesture Watch in an analogous conguration, with the exception being that its form factor is in the shape of a pendant hung from the neck rather than a wristwatch-like interface. The following part of the paper discusses this implementation.

The gesture-based remote control device has many possible applications. In addition to the iTunes control software, we were also able to successfully control a television set with the Gesture Pendant II. With the Bluetooth module, one can only control computers or mobile devices with Bluetooth capability. However, Bluetooth can not be used to control other home appliances such as television, DVD player, radio, etc. This goal can be achieved with the infrared control system by sending IR commands to the home appliances that use infrared remote control. Figure 7 shows infrared codes that are recorded from Samsung SRC1000 remote control. The remote control system can record IR commands from various remote controls and send the commands back to the system. We used the USB-UIRT (Universal Infrared Receiver and Transmitter) infrared module which can detect 36-40 KHz frequency range because most remote controls use the frequency of 38 KHz. Moreover, by using EventGhost, an open source automation tool, we could also control Windows applications with gestures.Table 2 shows a set of gestures that we use to control Windows movie player. Since the Gesture Watch and the Gesture Pendant II share similar platform, the gestures are very similar and we can reuse the gestures to control any applications. 3


Figure 7. Recorded infrared remote control codes

Figure 8. Simple Gesture Pendant II GT2 K application

Motion A hand rotating over the sensors in a clockwise pattern A hand rotating over the sensors in a counterclockwise pattern A hand blocks two bottom sensors A hand blocks a right-bottom sensor A hand moves from bottom to top A hand moves from top to bottom

Gesture Name FF

Command Fast Forward




Turn on movie player

Figure 9. A confusion matrix from training result with 10 samples each gesture

Play/Pause movie

Increase volume Decrease volume

1. Olsen Jr. , Jefferies, Nielsen, Moyes, Fredrickson, Cross-modal Interaction using XWeb. 2. Fails, Olsen Jr. , Light Widgets: Interacting in Every-day Spaces. 3. Fails, Olsen Jr. , MagicWand: The True Universal Remote Control. 4. Metzger, Anderson and Starner (2004), FreeDigiter: A Contactfree Device for Gesture Control.

Table 2. A subset of gestures for Gesture Pendant II

Gesture Recognition

We used GT2 K to train and recognize hand gestures. Figure 8 shows a simple GT2 K program that shows each of four sensors in the column. Figure 9 shows a confusion matrix from the training. We trained ten samples in each gesture. Among the ten gestures, seven were used for training and three were used for testing. The matrix tells us that all three test samples in each gesture are classied correctly. After training using the GT2 K, a user can change to recognition mode. In the recognition mode, whenever a user sends a sample it returns the classied result and sends an IR command according to the commands shown on Table 2. By having IR remote system, we could successfully control both computer applications and home appliances. With the infrared remote control system, we could expand the Gesture Watch and Gesture Pendant IIs ability to control home 4

5. Starner, Auxier, Ashbrook, and Gandy (2000), The gesture pendant: A self-illuminating, wearable, infrared computer vision system for home automation control and medical monitoring. 6. Heap and Samaria (1995), Real-Time Hand Tracking and Gesture Recognition Using Smart Snakes. 7. Westeyn, Brashear, Atrash and Starner (2003), Georgia Tech Gesture Toolkit: Supporting Experiments in Gesture Recognition