You are on page 1of 175

Proceedings of the 1st Augmented Human International Conference

2010, Megève, France


AH ‘10

General Co-Chairs: Hideo Saito, Keio University, Japan & Jean-Marc Seigneur, University of
Geneva, Switzerland
Program Co-Chairs: Guillaume Moreau, Ecole Centrale de Nantes, France & Pranav Mistry, MIT
Media Lab, USA
Organisation Chair: Jean-Marc Seigneur, University of Geneva, Switzerland
Augmented/Mixed Reality Co-Chairs: Guillaume Moreau, Ecole Centrale de Nantes, France &
Masahiko Inami, Keio University, Japan
Brain Computer Interface Co-Chairs: Karla Felix Navarro, University of Technology Sydney,
Australia and Ed Boyden, MIT Media Lab, USA
Biomechanics and Human Performance Chair: Guillaume Millet, Laboratoire de Physiologie de
l'Exercice de Saint-Etienne, France
Wearable Computing Chair: Bruce Thomas, University of South Australia
Security and Privacy Chair: Jean-Marc Seigneur, University of Geneva, Switzerland
Program Committee:
Peter Froehlich, Forschungszentrum Telekommunikation Wien, Austria
Pranav Mistry, MIT Media Lab, USA
Jean-Marc Seigneur, University of Geneva, Switzerland
Guillaume Moreau, Ecole Centrale de Nantes, France
Guillaume Millet, Laboratoire de Physiologie de l'Exercice de Saint-Etienne, France
Jacques Lefaucheux, JLX3D, France
Christian Jensen, Technical University of Denmark
Jean-Louis Vercher, CNRS et Université de la Méditerranée, France
Steve Marsh, National Research Council Canada
Didier Seyfried, INSEP, France
Hideo Saito, Keio University, Japan
Narayanan Srinivasan, University of Allahabad, India
Qunsheng Peng, Zhejiang University, China
Karla Felix Navarro, University of Technology Sydney, Australia
Brian Caulfield, University College Dublin, Ireland
Masahiko Inami, Keio University, Japan
Ed Boyden, MIT Media Lab, USA
Bruce Thomas, University of South Australia
Franck Multon, Université de Rennes 2, France
Yanjun Zuo, University of North Dakota, USA

Sponsors: Sporaltec, Megève

ACM International Conference Proceedings Series

ACM Press
The Association for Computing Machinery
2 Penn Plaza, Suite 701
New York New York 10121-0701

ACM COPYRIGHT NOTICE. Copyright © 2010 by the Association for Computing Machinery,
Inc. Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed for profit
or commercial advantage and that copies bear this notice and the full citation on the first
page. Copyrights for components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to
redistribute to lists, requires prior specific permission and/or a fee. Request permissions from
Publications Dept., ACM, Inc., fax +1 (212) 869-0481, or permissions@acm.org.

For other copying of articles that carry a code at the bottom of the first or last page,
copying is permitted provided that the per-copy fee indicated in the code is paid
through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923,
+1-978-750-8400, +1-978-750-4470 (fax).

Notice to Past Authors of ACM-Published Articles


ACM intends to create a complete electronic archive of all articles and/or
other material previously published by ACM. If you have written a work
that was previously published by ACM in any journal or conference proceedings
prior to 1978, or any SIG Newsletter at any time, and you do NOT want this work
to appear in the ACM Digital Library, please inform permissions@acm.org,
stating the title of the work, the author(s), and where and when published.

ACM ISBN: 978-1-60558-825-4


Introduction
The first Augmented Human International Conference (AH’10) has gathered scientific
papers from many different disciplines: information technology, human computer
interface, brain computing interface, sport and human performance, augmented reality…
This first edition is quite multidisciplinary for a research domain that requires even more
interdisciplinarity as it touches the human person. Many papers concentrated on building
the human augmentation technologies, which is necessary for them to emerge in the real
world. However, few papers were investigating the ethical or safety issues of augmented
human technologies. The next edition may bring more papers on this essential aspect that
must be taken into account for a long term success of these technologies.

Acknowledgments
Many thanks to: the eHealth division of the European Commission who has circulated the
call for papers in its official lists of events; the municipality of Megève and Megève
Tourisme who helped organising the conference; the EU-funded FP7-ICT-2007-2-
224024 PERIMETER project who partially funds the organisation chair as well as the
University of Geneva where he is affiliated; the French association for virtual reality
(AFRV) who organised the industrial and scientific session; the ACM who published the
proceedings of the conference in its online library; the French “pôle de compétitivité”
Sporaltec who sponsored the best paper award; and all the program committee members
who reviewed the submitted papers and circulated the CFP to their contacts.
Table of Contents
Article 1: “ExoInterfaces: Novel Exosceleton Haptic Interfaces for Virtual Reality,
Augmented Sport and Rehabilitation”, Dzmitry Tsetserukou, Katsunari Sato and Susumu
Tachi.

Article 2: “PossessedHand: A Hand Gesture Manipulation System using Electrical


Stimuli”, Emi Tamaki, Miyaki Takashi and Jun Rekimoto.

Article 3: “A GMM based 2-stage Architecture for Multi-Subject Emotion Recognition


using Physiological Responses”, Yuan Gu, Su Lim Tan, Kai Juan Wong, Moon-Ho
Ringo Ho and Li Qu.

Article 4: “Gaze-Directed Ubiquitous Interaction Using a Brain-Computer Interface”,


Dieter Schmalstieg, Alexander Bornik, Gernot Mueller-Putz and Gert Pfurtscheller.

Article 5: “Relevance of EEG Input Signals in the Augmented Human Reader”, Inês
Oliveira, Ovidiu Grigore, Nuno Guimarães and Luís Duarte.

Article 6: “Brain Computer Interfaces for Inclusion”, Paul McCullagh, Melanie Ware,
Gaye Lightbody, Maurice Mulvenna, Gerry McAllister and Chris Nugent.

Article 7: “Emotion Detection using Noisy EEG Data”, Mina Mikhail, Khaled El-Ayat,
Rana El Kaliouby, James Coan and John J.B. Allen.

Article 8: “World’s First Wearable Humanoid Robot that Augments Our Emotions”,
Dzmitry Tsetserukou and Alena Neviarouskaya.

Article 9: “KIBITZER: A Wearable System for Eye-Gaze-based Mobile Urban


Exploration”, Matthias Baldauf, Peter Fröhlich and Siegfried Hutter.

Article 10: “Airwriting Recognition using Wearable Motion Sensors”, Christoph Amma,
Dirk Gehrig and Tanja Schultz.

Article 11: “Augmenting the Driver’s View with Real-Time Safety-Related Information“,
Peter Fröhlich, Raimund Schatz, Peter Leitner, Stephan Mantler and Matthias Baldauf.

Article 12: “An Experimental Augmented Reality Platform for Assisted Maritime
Navigation”, Olivier Hugues, Jean-Marc Cieutat and Pascal Guitton.

Article 13: “Skier-ski System Model and Development of a Computer Simulation Aiming
to Improve Skier’s Performance and Ski”, François Roux, Gilles Dietrich and Aude-
Clémence Doix.
Article 14: “T.A.C: Augmented Reality System for Collaborative Tele-Assistance in the
Field of Maintenance through Internet.” Sébastien Bottecchia, Jean Marc Cieutat and
Jean Pierre Jessel.

Article 15: “Learn complex phenomenon and enjoy interactive experiences in a


Museum!”, Benedicte Schmitt, Cédric Bach and Emmanuel Dubois.

Article 16: “Partial Matching of Garment Panel Shapes with Dynamic Sketching
Design”, Shuang Liang, Rong-Hua Li, George Baciu, Eddie C.L. Chan and Dejun Zheng.

Article 17: “Fur Interface with Bristling Effect Induced by Vibration”, Masahiro
Furukawa, Yuji Uema, Maki Sugimoto and Masahiko Inami.

Article 18: “Evaluating Cross-Sensory Perception of Superimposing Virtual Color onto


Real Drink: Toward Realization of Pseudo-Gustatory Displays”, Takuji Narumi,
Munehiko Sato, Tomohiro Tanikawa and Michitaka Hirose.

Article 19: “The Reading Glove: Designing Interactions for Object-Based Tangible
Storytelling”, Joshua Tanenbaum, Karen Tanenbaum and Alissa Antle.

Article 20: “Control of Augmented Reality Information Volume by Glabellar Fader”,


Hiromi Nakamura and Homei Miyashita.

Article 21: “Towards Mobile/Wearable Device Electrosmog Reduction through Careful


Network Selection”, Jean-Marc Seigneur, Xavier Titi and Tewfiq El Maliki.

Article 22: “Bouncing Star Project: Design and Development of Augmented Sports
Application Using a Ball Including Electronic and Wireless Modules”, Osamu Izuta,
Toshiki Sato, Sachiko Kodama and Hideki Koike.

Article 23: “On-line Document Registering and Retrieving System for AR Annotation
Overlay”, Hideaki Uchiyama, Julien Pilet and Hideo Saito.

Article 24: “Augmenting Human Memory using Personal Lifelogs”, Yi Chen and Gareth
Jones.

Article 25: “Aided Eyes: Eye Activity Sensing for Daily Life”, Yoshio Ishiguro, Adiyan
Mujibiya, Takashi Miyaki and Jun Rekimoto.
ExoInterfaces: Novel Exosceleton Haptic Interfaces for
Virtual Reality, Augmented Sport and Rehabilitation
Dzmitry Tsetserukou Katsunari Sato Susumu Tachi
Toyohashi University of Technology University of Tokyo Keio University
1-1 Hibarigaoka, Tempaku-cho, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 4-1-1 Hiyoshi, Kohoku-ku,
Toyohashi, Aichi, 113-8656 Japan Yokohama, 223-8526 Japan
441-8580 Japan
dzmitry.tsetserukou@erc. Katsunari_Sato@ ipc.i. tachi@tachilab.org
tut.ac.jp u-tokyo.ac.jp

ABSTRACT of our research is to implement a wearable haptic display for


We developed novel haptic interfaces, FlexTorque and presentation of realistic feedback (kinesthetic stimulus) to the
FlexTensor that enable realistic physical interaction with real and human arm. We developed a wearable device FlexTorque that
Virtual Environments. The idea behind FlexTorque is to induces forces to the human arm and does not require holding any
reproduce human muscle structure, which allows us to perform additional haptic interfaces in the human hand. It is completely
dexterous manipulation and safe interaction with environment in new technology for Virtual and Augmented Environments that
daily life. FlexTorque suggests new possibilities for highly allows user to explore surroundings freely. The concept of Karate
realistic, very natural physical interaction in virtual environments. (empty hand) Haptics proposed by us is opposite to the
There are no restrictions on the arm movement, and it is not conventional interfaces (e.g., Wii Remote [11], SensAble’s
necessary to hold a physical object during interaction with objects PHANTOM [7]) that require holding haptic interface in the hand,
in virtual reality. Because the system can generate strong forces, restricting thus the motion of the fingers in midair.
even though it is light-weight, easily wearable, and intuitive, users The powered exoskeleton robots, such as HAL [3] (weight of 23
experience a new level of realism as they interact with virtual kg) and Raytheon Sarcos [8] (weight of about 60 kg) intended for
environments. the power amplification of the wearer can be used for the force
presentation as well. However, they are heavy, require high power
ACM Classification Keywords consumption, and pose danger for user due to the powerful
H5.2. Information interfaces and presentation: User Interfaces – actuators.
haptic I/O, interaction styles, prototyping. Another class of exoskeletons is aimed at teleoperator systems.
Most of the force feedback master devices are similar in sizes to
slave robot and are equipped with powerful actuators. Such
General Terms systems pose dangerousness for human operator and in case of
Design, Experimentation, Performance.
failure during bilateral control can harm human. In the last years
there have been several attempts to make the force feedback
Keywords devices more compact, safe, and wearable.
Exoskeleton, haptic display, haptic interface, force feedback, In [5], an exoskeleton-type master device was designed based on
Virtual Reality, augmented sport, augmented games, the kinematic analysis of human arm. Pneumatic actuators
rehabilitation, game controller. generate torque feedback. The authors succeeded in making the
lightweight and compact force reflecting master arm. However,
1. INTRODUCTION the force-reflection capability of this device is not enough to
In order to realize haptic interaction (e.g., holding, pushing, and present contact forces effectively. An artificial pneumatic muscle-
contacting the object) in virtual environment and mediated haptic type actuator was proposed [4]. Wearable robotic arm with 7 DOF
communication with human beings (e.g., handshaking), the force and high joint torques was developed. Robotic arm uses parallel
feedback is required. Recently there has been a substantial need mechanisms at the shoulder part and at wrist part similarly to the
and interest in haptic displays, which can provide realistic and muscular structure of human upper limb. It should be noted,
high fidelity physical interaction in virtual environment. The aim however, that dynamic characteristics of such pneumatic actuator
possess strong nonlinearity and load-dependency, and, thus, a
Permission to make digital or hard copies of all or part of this work for number of problems need to be resolved for its successful
personal or classroom use is granted without fee provided that copies are application.
not made or distributed for profit or commercial advantage and that The compact string-based haptic device for bimanual interaction
copies bear this notice and the full citation on the first page. To copy in virtual environment was described in [6]. The users of SPIDAR
otherwise, to republish, to post on servers or to redistribute to lists,
can intuitively manipulate the object and experience 6-DOF force
requires prior specific permission and/or a fee.
Augmented Human Conference, April 2–3, 2010, Megève, France. feedback. The human-scale SPIDAR allowing enlargement of
Copyright © 2010 ACM 978-1-60558-825-4/10/04…$10.00. working space was designed [9]. However, the wires moving in
front of the user present the obstacle for the human vision. They like a rope pulling on a lever when pulling tendons to move the
also restrict the human arm motion in several directions and user skeleton (Figure 1).
has to pay attention to not injure himself. Moreover, user grasps When we hold a heavy object in a palm, its weight produces
the ball-shaped grip in such a way that fingers cannot move. torques in the wrist, elbow, and shoulder joint. Each muscle
In order to achieve human-friendly and wearable design of haptic generates a torque at a joint that is the product of its contractile
display, we analyzed the amount of torque to be presented to the force and its moment arm at that joint to balance gravity force, as
operator arm. Generally, there are three cases when torque well as inertial forces, and contact forces. Thus, we can feel
feedback is needed. The first case takes place when haptic object weight.
communication with remote human needs to be realized. For Because muscles pull but cannot push, hinge joints (e.g., elbow)
example, the person handshakes the slave robot and joint torques require at least two muscles pulling in opposite direction
are presented to the operator. Such interaction results in very (antagonistic muscles). The torque produced by each muscle at a
small torque magnitude (in the range of 0-1.5 Nm). The second joint is the product of contractile force (F) and moment arm at
situation takes place when a slave robot transports heavy object. that joint (d). The net torque Tnet is the sum of the torques
Here, the torque values are much higher than in previous case and produces by each antagonistic muscle. Movement of human limbs
torque magnitude depends on the load weight. However, is produced by coordinated work of muscles acting on skeletal
continuous presentation of high torques to the operator will result joints. The structure of the developed torque display FlexTorque
in human muscle fatigue. We argue that downscaled torque is presented in Figure 2.
indicating direction of the force would be informative enough.
The third and the worst case of contact state in term of interactive
force magnitude is collision. The result of collision with fixed
object (as it is often the case) is immediate discontinuation of the
operator’s arm motion. Therefore, the power of torque display
must be enough to only fixate the operator arm. For the case of
collision with movable obstacle, the haptic display should induce
human’s arm motion in the direction of the impact force,
decreasing thus the possible damages.

2. DEVELOPMENT OF THE HAPTIC


DISPLAY FlexTorque
The idea behind the novel torque display FlexTorque (haptic
display that generates Flexor and extensor Torque) is to
reproduce human muscle structure, that allows us to perform
dexterous manipulation and safe interaction with environment in
daily life.

Text=Fext × dext
Origin
Tflex=Fflex × dflex
Figure 2. FlexTorque on the human’s arm surface.
Muscle
Fext FlexTorque is made up of two DC motors (muscles) fixedly
Fflex mounted into plastic Motor holder unit, Belts (tendons), and two
Belt fixators (Insertions). The operation principle of the haptic
Tload=Fload × dload
display is as follows. When DC motor is activated, it pulls the belt
Tendon
dflex Fload and produces force Fflex generating the flexor torque Tflex. The
dext Insertion
oppositely placed DC motor generates the extensor torque Text.
dload Therefore, the couple of antagonistic actuators produce a net
torque at operator elbow joint Tnet. We defined the position of the
Insertion point to be near to the wrist joint in order to develop
Tnet= Tflex - Text large torque at the elbow joint.
The position of the operator’s arm, when flexor torque is
Figure 1. Structure and action of a skeletal muscle. generated, is shown in Figure 3 (where θ stands for angle of
forearm rotation in relation to upper arm).
Main functions of the muscles are contraction for locomotion and
skeletal movement. A muscle generally attaches to the skeleton at
both ends. Origin is the muscle attachment point to the more
stationary bone. The other muscle attachment point to the bone
that moves as the muscle contracts is Insertion. Muscle is
connected to the periosteum through tendon (connective tissue in
the shape of strap or band). The muscle with tendon in series acts
Direction of The angle α varies according to the relative position of the
Direction motor shaft forearm and upper arm. It can be found using the following
of belt rotation equation:
tension

⎛ l 2 + d 2f − d t2 ⎞ , (3)
α = cos−1 ⎜ ⎟⎟
⎜ 2ld f
⎝ ⎠

where dt is the distance from the pivot to the Origin; l is the length
of belt, it can be calculated from the rotation angle of the motor
shaft.
The detailed view of the FlexTorque is presented in Figure 5.

θ = 10° θ = 55° θ = 110°


DC Motor
Figure 3. Positions of the human’s arm under flexor torque.
Motor holder
Let us consider the calculation procedure of the net torque value.
The layout of the forces and torques applied to the forearm during
flexion is given in Figure 4. Stopper

Pulleys
Timing belt
Tm Supporter
Shaft Stopper
Ft
dt lα
Figure 5. 3D exploded view of the driving unit of FlexTorque.
Fty
Ftx Each unit is compact and light in weight (60 grams). This was
df achieved due to the use of plastic and duralumin materials in
manufacturing the main components. The Supporter surface has
concave profile to match the curvature of human arm surface
Tn (Figure 6).

Figure 4. Diagram of applied forces and torques.

The tension force Ft of the belt can be derived from:

Tmi , (1)
Ft =
r
Figure 6. Driving unit of FlexTorque.
where Tm is the motor torque, i is the gear ratio, and r is the shaft The essential advantage of the structure of FlexTorque device is
radius. that heaviest elements (DC motors, shafts, and pulleys) are
located on the part of upper arm, which is nearest to the shoulder.
The net torque Tn acting at the elbow joint is:
Therefore, operator’s arm undergoes very small additional
loading. The rest of components (belts, belt fixators) are light in
weight and do not load the operator’s muscles considerably. We
Tn = Fty d f = Ft d f cos (α ) , (2)
propose to use term “Karate (empty hand) Haptics” to such kind
of novel devices because they allow presenting the forces to the
human arm without using additional interfaces in the human
where df is the moment arm. hands. The developed apparatus features extremely safe force
presentation to the human’s arm. While overloading, the belt is
physically disconnected from the motor and the safety of the
human is guaranteed.
The vibration of the human arm (e.g., simulation of driving the
heavy truck) can be realized through alternate repeatable jerks of
torque of antagonistic motors. Thus, the operator can perceive the
roughness of road surface.
The FlexTorque enables the creation of muscle stiffness. By
contracting belts before the perturbation occur we can increase the
joint stiffness. For example, during collision of human hand with
the moving object in Virtual Environment the tension of the belt
of one driving units drops abruptly and the tension of the belt Figure 7. Augmented Arm Wrestling and Augmented
pulling the forearm in the direction of the impact force increases Collision.
quickly.
The contact and collision with virtual object can be presented 4. USER STUDY AND FUTURE
through FlexTorque as well. In the case of collision, the limb RESEARCH
must be at rest. In such a case, the net torque produced by the FlexTorque haptic interface was demonstrated at SIGGRAPH
muscles is opposed by another equal but opposite torque Tload. ASIA 2009 [1,2,10]. To maintain the alignment of the extensor
Similarly to the human muscles, the net torque produced by the belt on the elbow avoiding thus slippage, user wears specially
haptic display restrains the further movement of the user’s arm. designed pad equipped with guides.
3. APPLICATIONS We designed three games with haptic feedback. We developed the
The main features of FlexTorque are: (1) it presents high fidelity Gun Simulator game with the recoil imitation (Figure 8). Quick
kinesthetic sensation to the user according to the interactive single jerk of the forearm simulates the recoil force of a gun.
forces; (2) it does not restrict the motion of the human arm; (3) it High-frequency series of impulsive forces exerted on the forearm
has wearable design; (4) it is extremely safe in operation; (5) it imitate the shooting by machine gun. In this case upper motor is
does not require a lot of storage space. These advantages allow a supplied with short ramp impulses of current.
wide range of applications in virtual and augmented reality
systems and introduce a new way of game playing.
Here we summarize the possible application of haptic display
FlexTorque:
1) Virtual and Augmented Environments (presentation of
physical contact to human’s arm, muscle stiffness, object
weight, collision, physical contact, etc.).
2) Augmented Sport and Games (enhancing the immersive
experience of the sport and games through the force
feedback).
3) Rehabilitation (user with physical impairments can easily
control the applied torque to the arm/leg/palm during
performing the therapeutic exercises). Figure 8. The Gun Simulator game.

4) Haptic navigation for blind persons (the obstacle detected In Teapot Fishing game player casts a line by quick flicking the
by camera is transferred to force restricting the arm rod towards the water (Figure 9).
motion in the direction of the object).
A number of games for augmented sport experiences, which
provide a natural, realistic, and intuitive feeling of immersion into
virtual environment, can be implemented. The Arm Wrestling
game that mimics the real physical experience is currently under
development (Figure 7). The user wearing FlexTorque and Head
mounted display (HMD) can play either with a virtual character
or a remote friend for more personal experience. The virtual
representation of players’ arms are shown on the HMD. While
playing against a friend, user sees the motion of arms and
experiences the reaction force from rival.

Figure 9. The Teapot Fishing game.


Once the user feels the tug at the forearm (and see float going
down), he gives fishing rod a quick jerk backward and up. When Tm
jerk is late, the fish (teapot) gets off the hook. The ramp impulse Origin
of the motor torque generates the jerk of the forearm downward Ft
indicating that fish picks up the hook. Such practice can help the
user to get a feel of the real fishing.
Insertion
With Virtual Gym game we can do the strength training exercise Upper arm Forearm
at home in a playful manner (Figure 10). The virtual biceps curl
exercise machine was designed. The belt tension creates the
resistance force in the direction of the forearm motion. The user
can adjust the weight easily. Figure 11. Kinematic diagram of FlexTorque and human arm.

Hands

Origin/ Origin/
Insertion Ft1 Ft2 Insertion

T m1 T m2
Left Arm Right Arm

Shoulder Shoulder
joint joint
Figure 10. The Virtual Gym game.
In total more than 100 persons had experienced novel haptic
interface FlexTorque. We have a got very positive feedback from
the users and companies. While discussing the possible useful Figure 12. Kinematic diagram of FlexTensor and human arm.
applications with visitors, the games for physical sport exercises
and rehabilitation were frequently mentioned. The majority of In the configuration when the middle of the belt is not fixed,
users reported that this device presented force feedback in a very FlexTensor presents the external force resisting the expanding of
realistic manner. the human arms (basic configuration). This action can be used for
simulation of the breaststroke swimming technique, when human
sweeps the hands out in water to their widest point (Figure 13).
5. DESIGN OF MULTIPURPOSE HAPTIC The configuration, in which the middle of the belt is fixed by user
standing on the band with both (or one) feet, enables presentation
DISPLAY FlexTensor of the object weight (Figure 14). The tension of the belt represents
The motivation behind the development of the FlexTensor (haptic the magnitude of gravity force acting on the human arms. The
display that uses Flexible belt to produce Tension force) was to fixation of the middle of the belt can be positioned on the human
achieve realistic feedback by using simple and easy to wear haptic neck (for simulation of human arm lifting) and on the waist (for
display. simulation of resistance of environment in the direction of arm
The multipurpose application is realized by means of fixation of stretching, e.g., in the case of contact with the virtual wall).
different elements of FlexTensor (i.e. middle of the belt,
Origin/Insertion points) in the particular application. The structure FEXT
of the FlexTensor is similar to the flexor part of FlexTorque haptic
display. The main differences are: (1) belt connects the movable Breaststroke technique
points on the human arm; (2) both attachment points of the belt FEXT
have embedded DC motors. Belt
In the haptic display FlexTorque the function of each attachment
point is predetermined (Figure 11). The configuration of
FlexTensor allows each point to perform the function of Insertion
and Origin depending on the purpose of application (Figure 12).
This fact enables to enlarge the area of FlexTensor applications in
Virtual Reality extraordinary.

Figure 13. Application of FlexTensor for swimming training.


7. ACKNOWLEDGMENTS
The research is supported in part by the Japan Science and
Object Technology Agency (JST) and Japan Society for the Promotion of
Science (JSPS). We would like also to acknowledge and thank
Alena Neviarouskaya for valuable contributions and advices.

mg 8. REFERENCES
Motor [1] FlexTorque. Games presented at SIGGRAPH Asia 2009.
Belt http://www.youtube.com/watch?v=E6a5eCKqQzc
[2] FlexTorque. Innovative Haptic Interface. 2009.
http://www.youtube.com/watch?v=wTZs_iuKG1A&feature=
related
Biceps curl exercise [3] Hayashi, T., Kawamoto, H., and Sankai, Y. 2005. Control
Middle of a belt is fixed method of robot suit HAL working as operator’s muscle
Figure 14. Application of FlexTensor for the weight using biological and dynamical information. In Proceedings
presentation and strength training exercise. of the International Conference on Intelligent Robots and
Systems (Edmonton, Canada, August 02 - 06, 2005). IROS
In the case when the palm of one arm is placed on some part of '05. IEEE Press, New York, 3063-3068.
the body (e.g., waist, neck), this attachment point becomes Origin. [4] Jeong, Y., Lee, Y., Kim, K., Hong, Y-S., and Park, J-O.
Such action as unsheathing the sword can be simulated by 2001. A 7 DOF wearable robotic arm using pneumatic
stretching out the unfixed arm. FlexTensor can interestingly actuators. In Proceedings of the International Symposium on
augment the 3D archery game presenting the tension force Robotics (Seoul, Korea, April 19-21, 2001). ISR '01.
between arms. 388-393.
[5] Lee, S., Park, S., Kim, W., and Lee, C-W. 1998. Design of a
The illusion of simultaneous pulling of both hands can be
force reflecting master arm and master hand using pneumatic
implemented by exertion of different values of forces Ft1 and Ft2
actuators. In Proceedings of the IEEE International
in the basic configuration (see Figure 12). The illusion of being
Conference on Robotics and Automation (Leuven, Belgium,
pulled to the left side and to the right side can be achieved when
May 16-20, 1998). ICRA '98. IEEE Press, New York, 2574-
Ft1>Ft2 and Ft1<Ft2, respectively.
2579.
The developed apparatus features extremely safe force [6] Murayama, J., Bougrila, L., Luo, Y., Akahane, K.,
presentation to the human’s arm. While overloading, physical Hasegawa, S., Hirsbrunner, B., and Sato, M. 2004. SPIDAR
disconnection of the belt from the motor protects the user from the G&G: a two-handed haptic interface for bimanual VR
injury. interaction. In Proceedings of the EuroHaptics (Munich,
Germany, June 5-7, 2004). Springer Press, Heidelberg,
6. CONCLUSIONS 138-146.
Novel haptic interfaces FlexTorque and FlexTensor suggest new [7] PHANTOM OMNI haptic device. SensAble Technologies.
possibilities for highly realistic, very natural physical interaction http://www.sensable.com/haptic-phantom-omni.htm
in virtual environments, augmented sport, and augmented game
applications. [8] Raytheon Sarcos Exoskeleton. Raytheon Company.
http://www.raytheon.com/newsroom/technology/rtn08_exos
A number of new games for sport experiences, which provide a keleton/
natural, realistic, and intuitive feeling of physical immersion into
virtual environment, can be implemented (such as skiing, biathlon [9] Richard, P., Chamaret, D., Inglese, F-X., Lucidarme, P., and
(skiing with rifle shooting), archery, tennis, sword dueling, Ferrier, J-L. 2006. Human scale virtual environment for
driving simulator, etc.). product design: effect of sensory substitution. The
International Journal of Virtual Reality, 5(2), 37-34.
The future goal is the integration of the accelerometer and MEMS
gyroscopes into the holder and fixator of the FlexTorque and into [10] Tsetserukou, D., Sato, K., Neviarouskaya, A., Kawakami,
FlexTensor for capturing the complex movement and recognizing N., and Tachi, S. 2009. FlexTorque: innovative haptic
the gesture of the user. The new version of the FlexTorque and interface for realistic physical interaction in Virtual Reality.
FlexTensor (ExoInterface) will take advantages of the In Proceedings of the 2nd ACM SIGGRAPH Conference and
Exoskeletons (strong force feedback) and Wii Remote Interface Exhibition on Computer Graphics and Interactive
(motion-sensing capabilities). Technologies in Asia (Yokohama, Japan, December 16-19,
2009), Emerging Technologies. ACM Press, New York, 69.
We expect that FlexTorque and FlexTensor will support future [11] Wii Remote. Nintendo Co. Ltd.
interactive techniques in the field of robotics, virtual reality, sport http://www.nintendo.com/wii/what/accessories
simulators, and rehabilitation.
PossessedHand: A Hand Gesture Manipulation System
using Electrical Stimuli

Emi Tamaki Takashi Miyaki Jun Rekimoto


Interdisciplinary Information Interfaculty Initiative in Interfaculty Initiative in
Studies, Information Studies, Information Studies,
The University of Tokyo, Japan The University of Tokyo, Japan The University of Tokyo, Japan
hoimei@acm.org miyaki@acm.org rekimoto@acm.org

ABSTRACT
Acquiring knowledge about the timing and speed of hand
gestures is important to learn physical skills, such as play-
ing musical instruments, performing arts, and making hand-
icrafts. However, it is difficult to use devices that dynam-
ically and mechanically control a user’s hand for learning
because such devices are very large, and hence, are unsuit-
able for daily use. In addition, since groove-type devices
interfere with actions such as playing musical instruments,
performing arts, and making handicrafts, users tend to avoid
wearing these devices. To solve these problems, we propose
PossessedHand, a device with a forearm belt, for controlling
a user’s hand by applying electrical stimulus to the muscles
around the forearm of the user. The dimensions of Pos-
sessedHand are 10 × 7.0 × 8.0 cm, and the device is portable
and suited for daily use. The electrical stimuli are gener-
ated by an electronic pulse generator and transmitted from
14 electrode pads. Our experiments confirmed that Pos-
sessedHand can control the motion of 16 joints in the hand.
We propose an application of this device to help a beginner
learn how to play musical instruments such as the piano and
koto.

Categories and Subject Descriptors


B4.2 [Input/output and data communications]: In-
put/Output Devices

General Terms
Design
Figure 1: Interaction examples of PossessedHand.
Keywords (a)A feedback system. (b)A navigation system.
interaction device, output device, wearable, hand gesture,
electrical stimuli
1. INTRODUCTION
Although a number of input systems for hand gestures
have been proposed, very few output systems have been pro-
posed for hand gestures. If a computer system controls a
user’s hand, the system can also be used to provide feed-
backs to various interaction systems such as systems for rec-
Permission to make digital or hard copies of all or part of this work for ognizing virtual objects (Fig. 1-a) and navigation (Fig. 1-
personal or classroom use is granted without fee provided that copies are b), assistant systems for playing musical instruments, and
not made or distributed for profit or commercial advantage and that copies a substitute sensation system for the visually impaired and
bear this notice and the full citation on the first page. To copy otherwise, to hearing impaired. In this paper, we propose PossessedHand,
republish, to post on servers or to redistribute to lists, requires prior specific a device with a forearm belt, for controlling a user’s hand
permission and/or a fee.
Augmented Human Conference April 2–3, 2010, Megève, France by applying electrical stimulus to the muscles around the
Copyright 2010 ACM 978-1-60558-825-4/10/04 ...$10.00. forearm.
2. PHASE OF DEVELOPMENT
There are four phases for controlling the hand posture. In
this research, we confirm the phase for which PossessedHand
can be used. Thereafter, we propose interaction systems
based on PossessedHand.

• Phase. 1 : Although the user cannot visually con-


firm the hand motion, he/she feels the motion owing
to his/her somatic sense.
(e.g., Providing feedback for recognizing virtual ob-
jects)

• Phase. 2 : User can visually confirm the motion.


(e.g., Learning systems for performing arts)

• Phase. 3 : User’s fingers can be independently con-


trolled to achieve grasping and opening motions.
(e.g.,Assistant systems for musical performances and
sport activities, navigation systems, and sensory sub-
stitution systems for the visually impaired and the
hearing impaired)

• Phase. 4 : User’s hand can be controlled to achieve fine


motions such as pinching using the thumb and index
Figure 2: Area A:This area is involved in pinch-
finger.
ing, gripping, and holding motions. Area B: Electric
(e.g., Learning systems for finger languages and for
stimuli are given in this area.
making handicrafts)

Many devices that directly stimulate a user’s fingers[6]


are proposed. However, users tend to avoid wearing devices
placed on area A, which is shown in Figure 2; this is because
area A is used to touch, hold, and pinch real objects. Area A
interferes with playing musical instruments, performing arts,
and making handicrafts. Groove-type devices that dynam-
ically and mechanically control a user’s hand are available.
Such devices can control a user’s hand for the phases 1-4.
However, such devices cover most of area A. Although a de-
vice that can be worn on the forearm is proposed[16], it is
too large for daily use. We propose a small device that can
control a user’s hand and avoid covering area A.

3. RELATED WORK Figure 3: A prototype of PossessedHand (electronic


Electrical muscle stimulation (EMS) has several appli- pulse generator and electric pads).
cations. EMS is widely used in low-frequency therapeutic
equipments and in devices for ergotherapy[9]. Akamatsu et
al. applied EMS for performing arts[18]. need to avoid placing electrodes on hands or fingers because
Our goal is to control a user’s hand by EMS, which is they are used to hold or touch objects.
similar to functional electrical stimulation (FES)[8], [7],[17], In this paper, we propose PossessedHand, a device used
[14]. In FES, electric currents are used to activate nerves for controlling a user’s hand by applying an electrical stim-
innervating extremities that are affected by paralysis result- ulus to the muscles around the forearm with noninvasive
ing from stroke or other neurological disorders and injuries electrode pads. Muscles, which are involved in finger mo-
of spinal cord or head; FES can be used to restore functions tions, are clustered in the forearm[10]. PossessedHand has
in people with disabilities[19]. 14 electrode pads placed on the forearm to stimulate these
Watanabe et al. and Kruijff et al. proposed a technique muscles. The tendons that are connected to the muscles
in which a user’s wrist can be controlled with two degrees move the finger joints. There are no precedent researches on
of freedom by stimulating four muscles[14], [5]. They con- the manner in which hand posture can be controlled by pro-
firmed that they could control wrist motion by electrically viding only electrical stimulation to the forearm. First, we
stimulating a muscle because such a stimulation results in conducted an experiment to identify which and how many
the motion of the tendon connected to the wrist. However, finger joints can be controlled by PossessedHand. In this
they did not consider the motion of finger joints; this motion paper, we discuss the results on the basis of the phases 1-4
is important for controlling the hand posture. Moreover, discussed above. Thereafter, we propose interaction systems
they use invasive electrodes embedded under the skin; such that can be realized by using PossessedHand.
electrodes are not suitable for daily use. For enabling daily
use, we need to use noninvasive electrodes. In addition, we
Figure 4: Configuration.

Figure 5: Operable joints. Arrows and squares indi-


cate independently operable joints. Circles indicate
4. SYSTEM CONFIGURATION ganed operable joints.
4.1 Muscles and Stimulations for Making Hand
Postures
We use EMS[2], in which muscle contraction is achieved ger joints can be appropriately moved to achieve desired
by using electric impulses, to control a user’s hand. The im- hand postures. We selected an anode from the seven elec-
pulses are generated by PossessedHand and are transmitted trodes placed on the upper arm, and a ground electrode from
through electrode pads placed on the skin to the muscles the seven electrodes placed on the hand side. We tested 7-
that are to be stimulated. PossessedHand with the desired by-7 patterns of the electronic paths corresponding to each
output energy and compact size can be realized by using of three peak values of the pulse (17 V, 23 V, and 29 V); in
EMS[12]. other words, we performed 147 stimulations. We asked the
An electrical stimulus of PossessedHand is applied to the subjects to eliminate strain in the hand.
muscles in the forearm of a user because many muscles that In the next section, we introduce 3 interaction systems
control the fingers and the wrist are located here. We adopt of PossessedHand; navigation system, providing feedback
a forearm belt for PossessedHand. The electrical stimuli are for recognizing virtual objects, assistant system for musi-
generated by an electronic pulse generator and transmitted cal performance. They correspond to phase 1 to 3 of the
from 14 electrode pads. The pads are arranged on the upper hand posture, respectively.
and lower parts of the forearm of a user (Fig. 3); eight pads We have confirmed that PossessedHand can control seven
are needed to stimulate the muscles that are used to bend independent and nine linked joints, i.e., a total of 16 joints.
the joint in a finger, and six other pads are needed to stimu- We have also confirmed that a clasped hand can be opened
late finger extension and wrist flexion. PossessedHand stim- by stimulating a common digital extensor muscle. Further,
ulates seven muscles (superficial flexor muscle, deep flexor we have confirmed that the users can recognize the motion
muscle, long flexor muscle of thumb, common digital exten- of their hands motion even with closed eyes. Figure 5 shows
sor muscle, flexor carpi radialis muscle, long palmar muscle, the results of our experiment. These results suggest that
and flexor carpi ulnaris muscle). These muscles are shown PossessedHand can control hand postures in phases 1-3 as
in area B in Figure 3. We can select a channel between a discussed above. In the next section, we introduce the three
pad on the upper portion and one on the lower portion of interaction systems of PossessedHand, namely, the naviga-
the forearm. Thus, 7 × 7 channels are available. tion system, feedback system for recognizing virtual objects,
and assistant system for musical performance. These sys-
4.2 A Prototype of PossessedHand tems correspond to phases 1-3 of the hand posture, respec-
We built a prototype of PossessedHand using a pulse gen- tively.
erator, a channel selector (Photo-MOS Relays Series AQV253),
and 14 electrode pads (Fig. 3). The dimensions of Pos- 6. INTERACTION SYSTEMS OF POSSESSED-
sessedHand are 10.0 × 7.0 × 8.0 cm, and it is portable and HAND
suited for daily use. Its configuration is shown in Figure 4.
Pulse width is 0.2 ms, and voltage is in the range 17-29 V. 6.1 Navigation System(Using Phases 1, 2, and
3)
5. EXPERIMENTS We propose a navigation system for PossessedHand (Fig.
We confirmed that PossessedHand can control the motion 1-b). PossessedHand can be used to make hand gestures to
of 16 joints in the hand. point to user’s destination. This is advantageous because
We conducted an experiment to confirm whether the fin- maps or announcements are not required when using Pos-
Figure 6: manipulator

sessedHand. Watanabe et al. proposed a navigation system


in which galvanic vestibular stimulation (GVS) [15], [20],
[3] is used. Since GVS affects user’s sense of acceleration,
user’s walking direction can be controlled by the proposed Figure 7: Hand postures for musical performances.
system. However, this system cannot provide detailed in- (a)An incorrect posture for playing the piano. (b)A
formation such as direction and distance. We propose a correct posture for playing the piano. (c)An incor-
navigation system that controls wrist flexion and hand pos- rect posture for playing the koto. (d)A correct pos-
ture and provides detailed information about direction and ture for playing the koto.
distance.

6.2 Feedback System for Recognizing Virtual


Objects(Using Phase 1) to develop an automatic setup system that is based on neural
PossessedHand can be used as a feedback system that con- network systems, which provide rapid feedback on the posi-
veys the existence of a 3D virtual object in the real world tion of the pads, voltage value, and joint angles. Thereafter,
(Fig. 1-a). Haptic feedback is necessary for receiving infor- the use of PossessedHand can be extended for performing
mation on virtual objects in augmented reality and mixed sports, learning finger languages, performing arts, and mak-
reality spaces. PossessedHand provides haptic feedback by ing handicrafts.
controlling hand posture as well as visual feedback[4],[11]
obtained using head-mounted displays or 3D displays. 8. CONCLUSION
6.3 Assistant System for Musical Performance In this paper, we proposed the use of PossessedHand, a
device used to control hand postures by an electrical stim-
(Using Phase 1, 2, and 3) ulation technique. The electrical stimuli are transmitted
We propose an application of PossessedHand that helps from the 14 noninvasive electrode pads placed on the fore-
a beginner learn how to play the musical instruments such arm muscles of the user; these stimuli control the motions of
as the piano and koto. In such musical instruments, subtle a user’s hand. Our experiments confirmed that Possessed-
differences in tones are achieved by fine finger movements. Hand can to control the motion of 16 joints in the hand. The
The koto is a traditional Japanese stringed musical instru- device can control the motion of seven independent joints
ment. A koto player uses three finger picks (on the thumb, and nine joints whose motions are linked with those of other
index finger, and middle finger) to pluck the strings. An joints. We confirmed that a clasped hand can be opened by
appropriate hand posture is important for playing such in- stimulating the common digital extensor muscle. We also
struments well (Fig. 6). PossessedHand can assist the be- confirmed that the users can recognize the motion of their
ginner to acquire proper hand positions and postures. A hand even with their eyes closed. On the basis of the results
hand-gesture recognition system with a camera[1] can also of the experiments, we proposed three interaction systems,
be used to identify whether the hand positions and postures namely, a navigation system, a feedback system for recog-
are appropriate for the instrument. PossessedHand can help nizing virtual objects, and an assistant system for aiding
the beginner to learn professional techniques, which cannot musical performance.
be written in scores (Fig. 7). Furthermore, PossessedHand
can help a distant learner learn to move fingers appropriately
when playing musical instruments. 9. ACKNOWLEDGMENTS
We thank Ken Iwasaki who have contributed time to this
7. DISCUSSION research.
To extend the use of PossessedHand, we have to consider
reaction rates, accuracy, and muscle fatigue[13] and realize 10. REFERENCES
automatic setup systems to control the voltage and the po- [1] T. Emi, M. Takashi, and R. Jun. A robust and
sitions of the electrode pads. It takes 5 min for manually accurate 3d hand posture estimation method for
setting the position of the pads and voltage value. We have interactive systems. IPSJ, 51(2):1234–1244, 2010.
[2] H. Hummelsheim, M. Maier-Loth, and C. Eickhof. ME and bio cybernetics, 104(757):25–28, 2005.
The functional value of electrical muscle stimulation [14] W. Takashi, I. Kan, K. Kenji, and H. Nozomu. A
for the rehabilitation of the hand in stroke patients. method of multichannel pid control of 2-degree of
Scandinavian journal of rehabilitation medicine, freedom of wrist joint movements by functional
29(1):3, 1997. electrical stimulation. The transactions of the Institute
[3] J. Inglis, C. Shupert, F. Hlavacka, and F. Horak. of Electronics, Information and Communication
Effect of galvanic vestibular stimulation on human Engineers., 85(2):319–328, 2002.
postural responses during support surface translations. [15] Y. Tomofumi, A. Hideyuki, M. Taro, and W. Junji.
Journal of neurophysiology, 73(2):896, 1995. Externalized sense of balance using galvanic vestibular
[4] D. Jack, R. Boian, A. Merians, S. V. Adamovich, stimulation. Association for the Scientific Study of
M. Tremaine, M. Recce, G. C. Burdea, and Consciousness 12th Annual Meeting.
H. Poizner. A virtual reality-based exercise program [16] D. Tsetserukou, K. Sato, A. Neviarouskaya,
for stroke rehabilitation. In Assets ’00: Proceedings of N. Kawakami, and S. Tachi. Flextorque: innovative
the fourth international ACM conference on Assistive haptic interface for realistic physical interaction in
technologies, pages 56–63, New York, NY, USA, 2000. virtual reality. In SIGGRAPH ASIA ’09: ACM
ACM. SIGGRAPH ASIA 2009 Art Gallery & Emerging
[5] E. Kruijff, D. Schmalstieg, and S. Beckhaus. Using Technologies: Adaptation, pages 69–69, New York,
neuromuscular electrical stimulation for pseudo-haptic NY, USA, 2009. ACM.
feedback. In VRST ’06: Proceedings of the ACM [17] S. H. Woo, J. Y. Jang, E. S. Jung, J. H. Lee, Y. K.
symposium on Virtual reality software and technology, Moon, T. W. Kim, C. H. Won, H. C. Choi, and J. H.
pages 316–319, New York, NY, USA, 2006. ACM. Cho. Electrical stimuli capsule for control moving
[6] S. Kuroki, H. Kajimoto, H. Nii, N. Kawakami, and direction at the small intestine. In BioMed’06:
S. Tachi. Proposal for tactile sense presentation that Proceedings of the 24th IASTED international
combines electrical and mechanical stimulus. In WHC conference on Biomedical engineering, pages 311–316,
’07: Proceedings of the Second Joint EuroHaptics Anaheim, CA, USA, 2006. ACTA Press.
Conference and Symposium on Haptic Interfaces for [18] N. Yoichi, A. Masayuki, and T. Masaki. Development
Virtual Environment and Teleoperator Systems, pages of bio-feedback system and applications for musical
121–126, Washington, DC, USA, 2007. IEEE performances. IPSJ SIG Notes, 2002(40):27–32, 2002.
Computer Society. [19] D. Zhang, T. H. Guan, F. Widjaja, and W. T. Ang.
[7] M. Poboroniuc and C. Stefan. A method to test Functional electrical stimulation in rehabilitation
fes-based control strategies for neuroprostheses. In engineering: a survey. In i-CREATe ’07: Proceedings
ICAI’08: Proceedings of the 9th WSEAS International of the 1st international convention on Rehabilitation
Conference on International Conference on engineering &#38; assistive technology, pages
Automation and Information, pages 344–349, Stevens 221–226, New York, NY, USA, 2007. ACM.
Point, Wisconsin, USA, 2008. World Scientific and [20] R. Zink, S. Steddin, A. Weiss, T. Brandt, and
Engineering Academy and Society (WSEAS). M. Dieterich. Galvanic vestibular stimulation in
[8] Y. Ryo, S. Yoshihiro, N. Yukio, H. Yasunobu, humans: effects on otolith function in roll.
Y. Shimada, K. Shigeru, N. Akira, I. Masayoshi, and Neuroscience letters, 232(3):171–174, 1997.
H. Nozomu. Analysis of hand movement indubed by
functional electrical stimulation in tetraplegic and
hemiplegic patients. The Japanese Journal of
Rehabilitation Medicine, 21(4):235–242, 1984.
[9] S. S and V. Gerta. Science and practice of strength
training - ems. The Journal of Physiology.
[10] M. Schuenke, U. Schumacher, E. Schulte, and et al.
Atlas of Anatomy: General Anatomy and
Musculoskeletal System.(Prometheus). Georg Thieme
Verlag, 2005.
[11] Y. Shen, S. K. Ong, and A. Y. C. Nee. Hand
rehabilitation based on augmented reality. In
i-CREATe ’09: Proceedings of the 3rd International
Convention on Rehabilitation Engineering & Assistive
Technology, pages 1–4, New York, NY, USA, 2009.
ACM.
[12] S. Tachi, K. Tanie, and M. Abe. Effects of pulse height
and pulse width on the magnitude sensation of
electrocutaneous stimulus. Japanese journal of medical
electronics and biological engineering, 15(5):315–320,
1977.
[13] S. Takahiro, K. Toshiyuki, and I. Koji. Lower-limb
joint torque and position controls by functional
electrical stimulation (fes). IEICE technical report.
A GMM based 2-stage architecture for multi-subject
emotion recognition using physiological responses

Gu Yuan Tan Su Lim Wong Kai Juan


School of Computer School of Computer School of Computer
Engineering Engineering Engineering
Nanyang Technological Nanyang Technological Nanyang Technological
University University University
Singapore 639798 Singapore 639798 Singapore 639798
guyu0006@ntu.edu.sg ASSLTan@ntu.edu.sg ASKLWong@ntu.edu.sg
Ho Moon-Ho Ringo Qu Li
School of Humanities and School of Humanities and
Social Science Social Science
Nanyang Technological Nanyang Technological
University University
Singapore 639798 Singapore 639798
HOmh@ntu.edu.sg QuLi@ntu.edu.sg

ABSTRACT 1. INTRODUCTION
There is a trend these days to add emotional character- Emotion awareness has become one of the most innovative
istics as new features into human-computer interaction to features in human computer interaction in order to achieve
equip machines with more intelligence when communicating more natural and intelligent communications. Towards var-
with humans. Besides traditional audio-visual techniques, ious measures of automatic emotion recognition in the engi-
physiological signals provide a promising alternative for au- neering way, numerous efforts have been deployed to the au-
tomatic emotion recognition. Ever since Dr. Picard and diovisual channels such as facial expressions [4, 6] or speeches
colleagues brought forward the initial concept of physiolog- [2, 12, 16]. Recently physiological signals, as an alterna-
ical signals based emotion recognition, various studies have tive channel for emotional communication, have gradually
been reported following the same system structure. In this earned attentions in the field of emotion recognition. Start-
paper, we implemented a novel 2-stage architecture of the ing from the series of publications authored by Dr. Picard
emotion recognition system in order to improve the perfor- and her colleagues in the Massachusetts Institute of Tech-
mance when dealing with multi-subject context. This type nology (MIT) Laboratory [17, 18, 19], several interesting
of system is more realistic practical implementation. Instead findings have been reported indicating that certain affec-
of directly classifying data from all the mixed subjects, one tive states can be recognized by means of heart rate (HR),
step was added ahead to transform a traditional subject- skin conductivity (SC), temperature (Tmp), muscle activity
independent case into several subject-dependent cases by (EMG), and respiration velocity (Rsp). They also elabo-
classifying new coming sample into each existing subject rated a complete physiological signal based emotion recogni-
model using Gaussian Mixture Model (GMM). For simul- tion procedure which gave great inspirations to the followers
taneous classification on four affective states, the correct [15, 9, 24, 7, 10].
classification ration (CCR) shows significant improvement
from 80.7% to over 90% which supports the feasibility of There is one particular issue that first appeared during the
the system. description of the affective data collection in [19], turns out
to be a major obstacle to the development of a general
methodology for multi-subject emotion recognition using phys-
Categories and Subject Descriptors iological signals. This issue, we put as the “individual differ-
H.1.2 [Models and Principles]: User/Machine Systems— ences”, can be briefly explained as the intricate variety of in-
Human information processing; G.3 [Probability and Statis- dividual behaviors among subjects. On one hand, the prob-
tics]: Multivariate statistics lem shows the concern of different interpretation of emotions
across individuals within the same culture [19]. Therefore it
may complicate the signal processing and classification pro-
Permission to make digital or hard copies of all or part of this work for cedures when the goal is to examine whether subjects elicit
personal or classroom use is granted without fee provided that copies are similar physiological patterns for the same emotion. For-
not made or distributed for profit or commercial advantage and that copies tunately, thanks to the vast studies in psychology such as
bear this notice and the full citation on the first page. To copy otherwise, to the proposal of six basic emotions by Ekman [4, 5] or the
republish, to post on servers or to redistribute to lists, requires prior specific development of the International Affective Picture System
permission and/or a fee.
(IAPS), this aspect of “individual differences” has been some
Augmented Human Conference, April 2-3, 2010, Megève, France.
Copyright ⃝ c 2010 ACM 978-1-60558-825-4/10/04... $10.00. extent alleviated by the employment of scientific categoriza-
tion of emotions and the usage of standardized emotion elic-
itation facilities.

On the other hand, the problem however, explains as the


possibility of significant physiological signal variation across
individuals, can be quite difficult to solve. The basic in-
tent of using physiological signals for emotion recognition
is to discover the inner trend of signal variation during hu-
man’s emotional variation. So in fact, it yields a fundamen-
tal knowledge that the physical properties of all the signals
should at least follow a same pattern. In other words, if
Figure 1: Experimental procedure of the actual test
signals from different subjects differ too much, the exact
session.
pattern of signal changes during emotional variation that
we are looking for could be buried by the other distinct pat-
terns brought by the “individual differences”. That is why
some studies limited the experimental subject to a single 2. DATA COLLECTION
person [19, 9, 7], to potentially remove the variability. This One set of ProComp Infiniti unit from Thought Technol-
compromise of single-subject approach might be considered ogy was employed as the data acquisition system. Two high
in the early stages of affective recognition since it is valuable speed channels at 2048 Hz were used for electrocardiogram
to develop subject-dependent methods. However, the spe- (ECG) and blood volume pulse (BVP) sensor. Four low
cific features and recognition results obtained from one single speed channels at 256 Hz were occupied by skin conductivity
person may not be the same for other subjects [19]. Hence, (SC) sensor, respiration (Resp) sensor and two electromyo-
single-subject approach is always argued as showing the fa- graphy (EMG) - one for corrugator (EMGc) and one for zy-
tal weakness that any developed recognition methods are not gomaticus (EMGz). All the data collection were carried out
generally applicable. Therefore, some other researchers im- in the same environment using the same sets of equipment.
plemented various ways to realize the subject-independent
(meaning all signals mixed together for different subjects) 28 pictures from the International Affective Picture System
approach [15, 24, 10, 8]. After a few not-so-successful at- (IAPS) were chosen for emotion elicitation. The IAPS is a
tempts (a comparison of similar studies is shown in Table 1), standard emotion induction procedure developed by Lang
it is commonly accepted that subject-independent approach et al. [11], which has been classified by a large number
tends to perform worse than subject-dependent (single sub- of participants in terms of valence and arousal. The pic-
ject) approach due to the influence of “individual differences” tures were selected based on a criteria that the distribution
[10]. Hence Kim and André briefly suggested that it can be of ratings along pleasant/unpleasant (valence) and excite-
possible to improve the recognition rate by identifying each ment/calm (arousal) should be relatively balanced. Since it
individual prior to the recognition phase, and then conduct- is still unclear whether people from different cultural back-
ing the emotion classification in a subject-dependent way. ground would respond to the same emotional stimuli sim-
However they did not experimentally elaborate this issue. ilarly, instead of following the original IAPS ratings from
Besides, they also concerned that this kind of recognition Lang’s experiments, participants were required to rate pleas-
system may only be feasible for a limited number of sub- antness and emotional intensity with a nine-point scale.
jects, who are supposed to be “known” to the system (corre-
sponding data of each subject are cumulatively collected in Figure 1 shows the detail process of the experiment. Be-
a learning phase) [10]. fore and after the 28 trials of actual emotional induction
phase, there were one PANAS questionnaire (to choose the
In this paper, we introduce a complete physiological signals words that best described the present mood state) and a
based 2-stage emotion recognition system using Gaussian 3-minute recording session of the physiological levels respec-
Mixture Model (GMM) for the 1st stage, Sequential Float- tively. Each trial consisted of displaying a fixation point as
ing Forward Search (SFFS) with kNN classifier (SFFS-kNN) “+” for 6 secs, a picture (randomly chosen from the 28 pre-
for the 2nd stage. In the 1st stage, data from each subject selected IPAS pictures) for 6 secs, and a black screen for 6
are trained into separate GMM models. A new incoming secs. Participant was asked to rate the viewed picture in
sample is classified to one subject model using the Maximum terms of its arousal level and valence on a scale from 1-9
Posterior Probability (MAP) rule, then follows a traditional and verbally speak out a single word of emotion that best
subject-dependent procedure which is the 2nd stage. Noted described their feelings after viewing the picture. Then,
that a major difference in our understanding is that the 1st the participant chose from a list of emotional descriptors
stage is not treated as a subject identification process, but extracted from Tomkins’s concept of eight basic emotions:
rather a similarity classification based on “known” subject “anger”, “interest”, “contempt”, “disgust”, “distress”, “fear”,
models prepared by the system. Suppose there are c subject “joy”, “shame”, “surprise” [22, 23], and “nothing” that best
models M1 ...Mj ...Mc stored in the system, a new incoming described their feelings after viewing the picture. Each trial
sample xi will be classified to model Mj . This situation was concluded with solving 5 simple mathematic problems
would be explained as data xi shows the most similar char- so to “wash out” the effect of the viewed picture on the sub-
acteristics to model Mj but it may not certainly come from ject before next trial was administrated.
subject j. In other words, the 2-stage system does not neces-
sarily require the test subjects to be “known” to the system. 3. METHODOLOGY
Further elaboration will be presented in later section.
3.1 Signal Processing and Feature Extraction
Table 1: Comparison with Similar Studies (Exp: Experiment settings, Classi: Classifiers, Sel/Red: Feature
selection/reduction algorithms)

Author Exp Classi. Sel/Red Results

Picard et al. [19] - single-sub DFA and QDF SFFS and Fisher all classes: 81.25%
- 8 emotions using guided imagery
Haag et al.[9] - single-sub MLP none aro: 96.58%
- aro/val using IAPS val: 89.93%
Gu et al. [7] - single-sub SVM none aro: 85.71%
- aro/val using IAPS val: 78.57%
Wagnar et al. [24] - single-sub kNN, LDF SFFS, Fisher - no feat.red.: 80%
- 6 emotions using music and MLP and ANOVA - with feat.red.: 92%
Nasoz et al. [15] - multi-sub kNN, DFA none kNN: 71.6%
- 6 emotions using movie clips and MBG DFA: 74.3%
Gu et al. [8] - multi-sub kNN, fkNN, GA - no feat.red.:
- aro/val using IAPS LDF and QDF val: 64.2%, aro: 62.8%
- with feat.red.:
val: 76.1%, aro: 78%
Kim and André [10] - multi-sub pLDA SBS for 4 classes:
- 4 EQs on aro/val plane sub-indep: 70%
using music sub-dep: 95%

Collected data samples were first segmented into 28 data


entries per subject. Each data entry covered 12 seconds of
signals trimmed from the beginning of displaying the picture
stimulus and ended after showing the black screen. Raw
ECG signals were preprocessed with a series of high and low
pass filters to remove noises and then down-sampled with
the remaining signals including BVP, SC, EMGz, EMGc
and Rsp by a factor of 8 for further feature evaluation. In- Table 2: Formulas for feature extraction where
stead of directly using the ECG signals, the HR (heart rate) x̃(n) = x(n)−µ
σ
refers to the normalized signal of x(n)
information were deduced from the intervals between succes- (std: standard derivation, abs: absolute values)
sive QRS complexes (the most striking waveform within an ∑
ECG signal). In this study, QRS complexes were detected The mean of x(n): µ = n1 Nn=1 x(n)
using a derivative-based algorithm and a moving-average fil- √ ∑N
ter for smoothing the output [21]. Subsequently, a simple The std of x(n): σ= 1
N −1 n=1 (x(n) − µ)2
peak-searching method was applied to locate the peak points
which indicate the heart beats. The detailed procedure of The mean of the abs of the
the QRS detection can be found in our previous report [7]. ∑N −1
1st differences of x(n): δ= 1
N −1 1 |x(n + 1) − x(n)|

For a preliminary study on the proposed 2-stage emotion The mean of the abs of the
recognition system, we adopted the time-domain statistical ∑N −1
1st differences of x̃(n): δ̃ = 1
N −1 1 |x̃(n + 1) − x̃(n)|
feature sets proposed by Picard et al. [19], because these fea-
tures have appeared in several previous studies and shown
The mean of the abs of the
the ability in classification affective states [19, 9, 7, 8, 14]. 6 ∑N −2
2nd differences of x(n): γ= 1
N −2
|x(n + 2) − x(n)|
features were extracted from each physiological signal (HR, 1

BVP, SC, EMGz, EMGc and Rsp) using the formulas de-
The mean of the abs of the
picted in Table 2. In all, there were 36 features prepared for ∑N −2
each data entry. 2nd differences of x̃(n): γ̃ = 1
N −2 1 |x̃(n + 2) − x̃(n)|

3.2 Proposed 2-stage Emotion Recognition


System
As discussed above, the basic idea of our recognition system
is to transform a general subject-independent case into sepa-
rated subject-dependent cases by classifying the new coming
Figure 2: Diagram of the 2-stage emotion recogni- Figure 3: Categories of affective states on 2D emo-
tion system. tion model: EQ1 = val rating > 5 and aro rating > 5,
EQ2 = val rating <= 5 and aro rating > 5, EQ3 =
val rating <= 5 and aro rating <= 5, and EQ4 =
data into existing subject models prior to the actual emo- val rating > 5 and aro rating <= 5.
tion recognition procedure. Figure 2 illustrates the overall
structure of the system.
sample is assigned to the subject class with the highest pos-
terior probability, Sj = argmax(P ). Onwards, the origi-
3.2.1 The 1st stage
nal subject-independent problem is transformed to a normal
Process starts by learning a GMM probability distribution
within-subject case for classification of the affective states.
for each subject using only the 6 features from HR. By def-
inition, a GMM is expressed as follows:
3.2.2 The 2nd stage
This stage follows a general way of hybrid feature selection
and classification method. In this study, Sequential Floating

G ∑
G
Forward Search (SFFS) and k-Nearest Neighbor (kNN) rule
p(x) = πg pg(x) = πg N (x|µg , Σg ), (1)
are employed.
g=1 g=1

SFFS [20] is one of the frequently used feature set search


where G is the number of Gaussian components, N (x|µg , Σg ) methods in the area of physiological signal based emotion
is a normal distribution with mean µg and covariance matrix recognition. It is an improved version of traditional Sequen-
Σ
∑g , and πg is the weight of component with the constraint of tial Forward Search (SFS) [25] by following a “bottom up”
πg = 1. The parameters of GMM are estimated using the procedure but introducing a “floating” characteristic, and
Expectation-maximization (EM) algorithm [1], which yields first appeared in affective computing by Picard et al. in
a Maximum Likelihood (ML) estimation. Each iteration of [19]. Served as a wrapper mode of feature selection pro-
the EM algorithm consists the E-step (Expectation) and the cedure [13], SFFS is commonly applied with a pre-defined
M-step (Maximization). During the E-step, the missing data learning algorithm (kNN) and uses its performance as the
are estimated given the observed data and current estimate evaluation criteria.
of the model parameters. And in the M-step, the likelihood
function is maximized using the estimated missing data from The kNN rule [3] classifies a data sample by assigning it the
the E-step in lieu of the actual missing data. Eq. (2) depicts label of the most frequently represented among the k nearest
the estimation formulas of the model parameters. samples. In other words, a decision is made by examining
the labels on the k nearest neighbors (the Euclidean dis-
tance) and taking a vote. The main reason to choose kNN
∑T as the classification method in this study lies in the fact
′ pm (xt )xt
µ = ∑t=1
T
, that kNN is able to achieve simultaneous multi-class classi-
t=1 pm (xt ) fication.
∑T
t=1 pm (xt )(xt − µm ) (xt − µm )
T

Σm = ∑T , 4. CLASSIFICATION RESULTS
t=1 pm (xt )
∑T As a preliminary study on the proposed system, we design
′ pm (xt ) and implement two classification tasks using data from 5
ωm = ∑T t=1
∑G .
t=1 g=1 pm (xt ) subjects. The first task, called a close-set experiment, learns
(2) GMM models based on the training data set coming from
all the 5 subjects. So the 1st stage of the proposed system
actually becomes a subject identification process. Whereas
Let c represents the number of subject classes, the posterior the second task, an open-set experiment, trains only 3 GMM
probabilities of the input data sample, P = {pj }j=1,2,...,c , subject models, but tests using data from all the 5 subjects.
are calculated based on each GMM generated for corre- This experiment intends to prove the idea that it is not re-
sponding subject. According to the MAP rule, the data quirable for the new coming data to be known to the system.
Table 3: Recognition Results of close-set experi- Table 4: Confusion matrix for results of subject
ment using traditional subject-independent proce- recognition (CCR % = 98.21%)
dure and the proposed 2-stage system (k = 7 for
kNN) sub A B C D E total error

sub-indep 0.807 A 13 5 0 9 1 28 0.536

sub A 0.889 B 6 20 0 2 0 28 0.286

sub B 1 C 0 0 19 0 9 28 0.321

2-stage system sub C 0.9 0.931 D 13 3 0 12 0 28 0.571

sub D 0.891 E 0 0 20 1 7 28 0.75

sub E 0.975
Table 5: Recognition Results of open-set experiment
using the proposed 2-stage system (k = 7 for kNN)
The classification goal is to simultaneously differentiate four
Experiment I Experiment II
types of affective states (EQ1 to EQ4 in Figure 3). All of
the classification procedures are conducted under 10-cross
validation. sub A 0.832 sub B 0.963

sub B 1 0.896 sub C 0.859 0.903


4.1 Task One: Close-set Experiment
Table 3 presents the correct classification ratio (CCR) using sub C 0.855 sub D 0.886
traditional subject-independent procedure (mixing all the
data together) with SFFS-kNN directly and the proposed 2-
stage system. Since the proposed system turns the subject- The performance of open-set task actually suggests that it is
independent task into separate subject-dependent cases, the not requirable for the 2-stage system to learn all the subjects
final result 0.931 is the average value taken from the individ- that the testing data are coming from, given the fact that
ual CCR of subject A to E. It clearly shows that by using the the two open-set experiments can also achieve quite compa-
2-stage system, the CCR raised about 13% over the result rable results with the close-set task. However, the system
of traditional procedure which obtains 80.7% correctness. should at least knows certain subject models that are rep-
resentative enough for the “unknowns”. In our case, when
Besides significant improvement of CCR, an interesting phe- switching the subject models from “sub A” to “sub D”, the
nomenon also appears in Table 4, where the confusion ma- performance raises about 0.07%, which indicates that it is
trix of results from the 1st stage process is presented. Notice rather important to select the “right” subjects for the system
that roughly 70% of data samples from subject B and C re- to learn.
spectively are correctly classified into “sub B” and “sub C”,
while the other three especially subject E, tend to appear
much higher classification error rate. For example, among 5. CONCLUSION
all the 28 samples from subject E, a major part of them This paper introduced a novel 2-stage system for physiolog-
(75%) is wrongly recognized as “sub C”. One explanation ical signals based emotion recognition. The 1st stage cre-
of this situation is that the inner properties of data from ated GMM model for each “known” subject and classified
subject C and E are so similar that using only the “sub C” the coming data sample to the subject models. So a subject-
model is representative enough for both subject C and E. independent case was transformed into several subject-dependent
cases. Then the 2nd stage follows a general way of hybrid
Hence the following open-set tasks are designed in this way. feature selection and classification method to simultaneously
The first open-set experiment uses data from subject A, B classify four affective states.
and C for training the GMM subject model, and the second
uses data from subject B, C and D. Both experiments are As a preliminary study, we designed both the close-set and
tested using data from all the 5 subjects. open-set tasks using data from 5 subjects to investigate the
effectiveness of the proposed system. The overall results
show significant improvements over the traditional subject-
4.2 Task Two: Open-set Experiment independent procedure (CCR improved from 80.7% to over
Table 5 compares results from the two experiments con- 90%). Besides, the comparison between the close-set and
ducted for open-set task. Testing data samples are classified open-set experiments actually suggests that it is not re-
into three subject models that leant by the system, then fed quirable for the 2-stage system to learn all the subjects be-
into the SFFS-kNN procedure. Both the CCRs show about forehand, as long as there are enough representative models
10% improvement over traditional subject-independent method known by the system. Hence, it is rather critical to define a
(80.7%), though a bit less than 93.1% which is obtained from criteria that can properly choose the “representative” learn-
the close-set case. ing data for the system. Also, since we only used 5 subjects
to test the system, it is also necessary to expand the data Cognition, Technology and Work, Special Issue on
to a larger data pool to further enhance the performance of Presence, 6(1), 2003.
the system. We believe more interesting findings would be [16] J. Nicholson, K. Takahashi, and R. Nakatsu. Emotion
discovered by focusing on those issues in the future. recognition in speech using neural networks. In
Proceedings of the 6th International Conference on
6. REFERENCES Neural Information Processing, pages 495–501, 1999.
[1] J. A. Bilmes. A gentle tutorial of the em algorithm [17] R. W. Picard. Affective computing. Technical Report
and its application to parameter estimation for No. 321, MIT Media Laboratory Perceptual
gaussian mixture and hidden markov models. Computing Section, 1995.
Technical report, International Computer Science [18] R. W. Picard. Affective computing. Cambridge,
Institute, U.C. Berkeley, 1998. Mass:The MIT Press, 1997.
[2] Z. J. Chuang and C. H. Wu. Emotion recognition [19] R. W. Picard, E. Vyzas, and J. Healey. Toward
using acoustic features and textual content. In 2004 machine emotional intelligence: Analysis of affective
IEEE International Conference on Multimedia and physiological state. IEEE Transactions on Pattern
Expo, pages 53–56, 2004. Analysis and Machine Intelligence, 23(10):1175–1191,
[3] R. O. Duda and P. E. Hart. Pattern Classification and 2001.
Scene Analysis. New York: Wiley, 1973. [20] P. Pudil, J. novovičová, and J. Kittler. Floating search
[4] P. Ekman. Are there basic emotions? Psychological methods in feature selection. Pattern Recognition
Review, 99(3):550–553, 1992. Letters, 15:1119–1125, 1994.
[5] P. Ekman. An argument for basic emotions. Cognition [21] R. M. Rangayyan. Biomedical signal analysis: A
and Emotion, 6(3/4):169–200, 1992. case-study approach (IEEE Press Series on Biomedical
[6] P. Ekman. Emotions revealed: recognizing faces and Engineering). Wiley-IEEE Press, 2001.
feelings to improve communication and emotional life. [22] S. S. Tomkins. Affect, imagery, consciousness, volume
New York: Henry Holt and Company, 2003. I, The positive affects. New York: Springer Publishing
[7] Y. Gu, S. L. Tan, K. J. Wong, M. H. R. Ho, and Company, Inc, 1962.
L. Qu. Emotion-aware technologies for consumer [23] S. S. Tomkins. Affect, imagery, consciousness, volume
electronics. In IEEE International Symposium on II, The negative affects. New York: Springer
Consumer Electronics, pages 1–4, Portugal, 2008. Publishing Company, Inc, 1963.
[8] Y. Gu, S. L. Tan, K. J. Wong, M. H. R. Ho, and [24] J. Wagner, J. Kim, and E. André. From physiological
L. Qu. Using GA-based feature selection for emotion signals to emotions: implementing and comparing
recognition from physiological signals. In International selected methods for feature extraction and
Symposium on Intelligent Signal Processing and classification. In Proceedings Of IEEE ICME
Communication Systems, pages 1–4, Thailand, 2008. International Conference on Multimedia and Expo,
[9] A. Haag, S. Goronzy, P. Schaich, and J. Williams. pages 940–943, 2005.
Emotion recognition using biosensors: first step [25] A. W. Whitney. A direct method of nonparametric
towards an automatic system. In Affective Dialogue measurement selection. IEEE Transactions on
Systems, Tutorial and Research Workshop, pages Computers, 20:1100–1103, 1971.
36–48, Kloster Irsee, Germany, June 2004.
[10] J. Kim and E. André. Emotion recognition based on
physiological changes in music listening. IEEE
Transactions on Pattern Analysis and Machine
Intelligence, 30(12):2067–2083, 2008.
[11] P. J. Lang, M. M. Bradley, and B. N. Cuthbert.
International affective picture system (IAPS):
Affective ratings of pictures and instruction manual.
Technical Report A-6, University of Florida,
Gainesville, FL, 2005.
[12] Y. L. Lin and G. Wei. Speech emotion recognition
based on HMM and SVM. In Proceedings of the 4th
International conference on machine Learning and
Cybernetics, pages 4898–4901, 2005.
[13] H. Liu and L. Yu. Toward integrating feature selection
algorithms for classificaion and clustering. IEEE
Trans. on Knowledge and Data Engineering, 17(4),
2005.
[14] K. Mera and T. Ichimura. Emotion analyzing method
using physiological state. In Knowledge-Based
Intelligent Information and Engineering Systems,
pages 195–201. Springer Berlin / Heidelberg, 2004.
[15] F. Nasoz, K. Alvarez, C. L. Lisetti, and N. Finkelstein.
Emotion recognition from physiological signals for
presence technologies. International Journal of
Gaze-Directed Ubiquitous Interaction Using a
Brain-Computer Interface

Dieter Schmalstieg Alexander Bornik


Graz University of Technology
Ludwig Boltzmann Institute for
Inffeldgasse 16 Clinical-Forensic Imaging
A-8010 Graz, Austria Universitätsplatz 4, 2. Stock
schmalstieg@icg.tugraz.at A-8010 Graz, Austria
bornik@icg.tugraz.at
Gernot Müller-Putz Gert Pfurtscheller
Graz University of Technology Graz University of Technology
Krenngasse 37/IV Krenngasse 37/IV
A-8010 Graz, Austria A-8010 Graz, Austria
gernot.mueller@tugraz.at gert.pfurtscheller@tugraz.at

ABSTRACT teraction. BCI is able to capture ambient properties of hu-


n this paper, we present a first proof-of-concept for using a man activity rather than requiring active operation. This is
mobile Brain-Computer Interface (BCI) coupled to a wear- not only useful for assistive technology, but also allows input
able computer as an ambient input device for a ubiquitous for a computer system to be gathered without inducing cog-
computing service. BCI devices, such as electroencephalo- nitive load on the user. It is therefore suitable for contextual
gram (EEG) based BCI, can be used as a novel form of computing, such as activity recognition. We suggest the in-
human-computer interaction device. A user can log into a tegration of BCI into the toolset of user interface designers,
nearby computer terminal by looking at its screen. This assuming that BCI will soon become sufficiently accessible
feature is enabled by detecting a user’s gaze through the and inexpensive.
analysis of the brain’s response to visually evoked patterns. In this paper, an electroencephalogram (EEG) based BCI
We present the experimental setup and discuss opportunities is used to capture brain activity with a wearable computer.
and limitations of the technique. Unlike typical laboratory experiments, this wearable hard-
ware setup allows a user’s brain activities to be monitored
whilst freely roaming an environment. The wearable device
Keywords therefore enables the prototyping of ubiquitous computing
Brain computer interface, gaze tracking, electroencephalo- services based on BCI.
gram, biometrics, object selection, authentication. To illustrate the potential of mobile BCI as an input de-
vice, we have created a system for secure login to a com-
Categories and Subject Descriptors puter terminal within visual proximity of the user by de-
tection of characteristic brain patterns evoked when looking
H.5.2 [Information Interfaces and Presentation]: User at a blinking screen of the computer terminal. A proof-
Interfaces; K.6.5 [Management of Computing and In- of-concept implementation of this type of interaction us-
formation Systems ]: Security and Protection—Authen- ing a real world, secure remote desktop software was im-
tication plemented. We present first experiences how to build and
operate a ubiquitous computing system involving BCI as a
Keywords personal input modality. While our approach does not yet
Brain computer interface, gaze tracking, electroencephalo- fully qualify as biometric identification in the sense that a
gram, biometrics, object selection, authentication. user’s identity is uniquely verified through physical means,
it does provide the verification of the presence of a digitally
authorized user at a particular task location, and has poten-
1. INTRODUCTION tial to be upgraded to a full biometric identification system
As suggested in [4], brain-computer interface (BCI) tech- with enhanced BCI technology.
nology can become a useful device for human-computer in-
2. RELATED WORK
There are a large number of localization and object iden-
Permission to make digital or hard copies of all or part of this work for tification systems, using GPS for outdoor applications, or
personal or classroom use is granted without fee provided that copies are indoor beacon systems such as RFID, Bluetooth, infrared
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
or ultrawideband radio1 . Most of these wide-area systems
republish, to post on servers or to redistribute to lists, requires prior specific can only determine position, but not viewing direction. In
permission and/or a fee. contrast, the ID CAM by Matsushita et al. [7] determines
Augmented Human Conference April 2–3, 2010, Megève, France 1
Copyright 2010 ACM 978-1-60558-825-4/10/04 ...$10.00. http://www.ubisense.net
frequency-encoded patterns from blinking beacons in the en- proach is essentially a form of gaze tracking. Gaze tracking
vironment observed by a camera. In this work, we present is normally accomplished by observing a user s pupils with a
a related approach using the human visual system. Un- computer-vision system. In our case, a user’s gaze is tracked
like location systems, the origins of BCI are not in human- by detecting the activation patterns triggered in the brain
computer interaction, but in assistive technology. However, when gazing at a specific object. Compared to beacon based
there has been a lot of work in using BCI as assistive tech- location determination, such as RFID, gaze based selection
nology. has a wider range of operation and allows to distinguish close
Recently there has been some interest on using variants objects based on the bearing.
of BCI technology for non-handicapped people, in order to One mental strategy for operating an EEG-based BCI is
control aspects of a user interface with little or no attention motor imagery, another is to focus gaze and/or visual atten-
required from the user. For example, Mann [5] uses biosignal tion to a flickering light source. In the latter case, either a
feedback processing for controlling the brightness of a head- late cognitive component with a latency of 300 ms (P300)
mounted display, while Lee and Tan [4] uses EEG for task after rare or significant visual stimulus has to be detected,
classification. Chen and Vertegaal [2] use EEG for deter- or the amplitude of the steady state visual evoked poten-
mining mental load with the aim of managing interruptions tial (SSVEP) has to be measured. The SSVEP is a natural
. response of the brain evoked by flashing visual stimuli at
A similar goal is pursued by Vertegaal et al. [17]. They specific frequencies between 6-30 Hz. SSVEP signals are
use a different sensor type, namely eye contact sensors, to enhanced when the user’s focuses selective attention (focus
control interruptions from cell phones. This work exploits gaze) on a specific flashing light source [10].
gaze direction to derive information, an aspect that is shared While the P300-based BCI needs complex patterns recog-
with the work presented in this paper. Velichkovsky and nition algorithms to check the absence or presence of the
Hansen [16] suggest a combination of eye sensing and BCI P300 component, the SSVEP-based BCI is simpler and can
control electronic devices . They state their paradigm as use a linear threshold algorithm for detection of an ampli-
“Point with your eye and click with your mind”. This sug- tude increase of the SSVEP signal. A further advantage of
gestion is actually surprisingly close to our intention, and we the SSVEP-based BCI is its ease of use and the relatively
believe that in this paper we present one of the first practical short training time.
implementations of such control. Today, SSVEP-based BCI is used to control a robotic
Using the EEG as a biometric is relatively new compared hand [9], secondary cockpit functions [8], the display of geo-
to other methods. Various types of signals can be mea- graphic maps, or communication (spelling) systems [3]. The
sured from the EEG, and consequently several aspects have highest information transfer rate reported is between 60-70
been investigated in terms of user recognition or authenti- bits/minute.
cation. Poulus et al. [15] used autoregressive parameters
which were estimated from EEG signals containing only the 3.2 Gaze Tracking Procedure
alpha rhythm (eyes closed). Learning Vector Quantization
neural networks were used for classification with a 72 - 80%
of success. A similar approach was performed by Paranjape
et al. [14] which also used autoregressive modeling. They
applied discriminant analysis with a classification accuracy
of 49% to 85%. Here subjects were tested with both eyes
open and closed. Visual evoked potentials (VEP) were used
for biometrics by Palaniappan et al. [13] [11] [12]. In these
studies, the authors investigated the gamma band range of
30-50 Hz from VEPs after visual stimuli. In the work by
Marcel and Millan [6], the power spectral density (PSD) was
used from 8-30 Hz for analyzing the repetitive imagination
of either left hand movement, right hand movement or the
generation of words beginning with the same random letter.
A statistical framework based on Gaussian mixture models
and maximum a posteriori model adaptation was used in
these experiments. In this work the authors conclude that
some mental tasks are more appropriate than others, the
performance degrades over days, and using training data
over two days increases the performance.
Figure 1: Wearable BCI setup consisting of an EEG
helmet and a mobile EEG amplifier both connected
3. BCI FOR GAZE DIRECTED OBJECT to a UMPC. In the experiment test screens show a
SELECTION blinking window as a screensaver and the desktop of
the UMPC after a successful BCI triggered login.
3.1 Background
Biosignals, such as EEG, can be used to detect human In our setup, a mobile user is equipped with a wearable
gaze for object selection in the physical environment. This computer (Sony Vaio UX280p) and a portable EEG ampli-
approach is similar to RFID tags or ID CAM, but a mobile fier (g.tec mobilab2 ) as shown in Figure 1. Wearable com-
2
scanner is replaced by human perception. Therefore the ap- http://www.gtec.at
puter and EEG communicate via a Bluetooth personal area
network. The user wears a cap fitted with electrodes.
A characteristic blinking frequency of an observed object
can be determined with the EEG. This allows multiple fre-
quencies to be distinguished within a few seconds. By set-
ting up physical objects to emit blinking patterns, for ex-
ample using LEDs or computer screens in the environment,
it is possible to identify these objects.
The most obvious way is to directly encode the id of the
perceived object using any combination of frequency mul-
tiplexing and time multiplexing. For example, an IPv4 ad-
dress has 32 bits, which once transmitted and decoded could
be used to access a web service. However, using current BCI
technology, the achievable bit rate is very low, requiring the
user to wait too long for the data to be transmitted.
Therefore, the observed characteristic frequency is used as
an index into a central directory service accessed wirelessly
from the wearable computer. The directory server returns
the actual object id or network id (IP address in case of a
computer terminal).
To increase the number of addressable objects, the search
space is organized hierarchically using a second, complemen-
tary sensor system besides EEG. A sensor system (in our
case Ubisense) provides coarse wide-area location. The lo-
cation system is used to limit the search space to one room,
and the BCI gaze detection selects one computer terminal
within this room.

4. REMOTE, SECURE DESKTOP ACCESS


In [1], a location system based on ultrasonic sensors worn
by the users of a large office environment is used to imple-
ment various ubiquitous computing services based on the
observation of user location.
For example, every computer terminal can be remotely ac-
cessed using the Virtual Network Computer (VNC) service.
Likewise, incoming calls can be automatically routed by the
telecom system to the office phone nearest to a roaming user.
In our setup, the wearable computer can be used to di- Figure 2: Workflow for establishing a secure VNC
rectly connect to a local computer terminal to use the input connection to a computer terminal after the termi-
and output peripherals, while the displayed applications ac- nal has been identified using BCI: (1) Localization
tually run on the wearable computer. In a conference or service determines position, (2) user observes char-
seminar room, the wearable computer could also be con- acteristic screen blinking, (3) from EEG signal the
nected to a video projector to give a presentation. The code is detected, (4) code and position are trans-
overall procedure is shown on the right in Figure 2. For mitted to translation server, (5) translation server
secure remote desktop access, the user is interested in deter- returns terminal id, (6) terminal id sent to CSpace
mining and verifying the identity of the selected object at directory, (7) CSpace directory returns public key
a particular location, to establish a secure communication of terminal, (8) secure VNC session established with
channel. terminal using public key.
The secure channel is based on CSpace3 , an open source
secure communication framework. It uses public key cryp-
tography (PKC) to allow distributed applications to com-
municate securely without burdening application developers
with the details of establishing secure connections. CSpace
registers a unique id, public key and current IP address in a
global directory.
An application uses a local CSpace proxy object to ob-
tain the user’s public key and IP address from the global
directory. Then a secure connection tunnel is established to
the destination, which looks to the client application like an
ordinary TCP connection.
In our implementation, a user can connect the VNC ses-
3
http://www.cspace.in
sion originating at the wearable computer to the computer the BCI software, a remote login to the corresponding test
terminal selected by gazing. The current position is deter- PC found using the translation service was initiated.
mined from a Ubisense indoor location system which covers The setup was tested with 2 participants. Participants
a large portion of our office space, and the screens of the had to perform the following sequence of remote login tasks:
computer terminals have been set up to run a screen saver pause (no login, 30s) – login P0 – login P2 – login P1 – pause
emitting characteristic blink patters picked up through the (30s) - login P2 – login P0 – login P1 – pause (30s). Table
EEG. The combined code position/frequency is transmitted 2 shows the results. Both users could successfully complete
to a global translation service, which translates the code to the tasks. However, we noticed errors. Login times were
a CSpace id. This step is necessary because CSpace ids are ranging from 15s up to three minutes with a median of 27.5s
globally unique and cannot be chosen arbitrarily. and a standard deviation of 51.2s.
Since the CSpace directory itself is based on an exist-
ing peer-to-peer infrastructure (Kademlia), it cannot be ex-
tended with the map service directly. Therefore the trans- 6. CONCLUSIONS
lation service was implemented as a new service using the We have shown that it is possible in principle to use BCI
CSpace communication infrastructure. Read access works for biometric communication useful in deploying ubiquitous
as described above, while write access for updating position computing services. However, significant improvements are
or frequency of a particular computer terminal is secured required to make such services practical in terms of robust-
using the private key associated with the terminal’s CSpace ness and information transfer rate, which are currently very
id. A client tool for the translation service can be used to low. Higher rates can be achieved by more efficient BCI
connect securely to the map service for updating the map, (such as the laser based BCI currently being developed), re-
and also launches the appropriate blinking widget used to ducing the intervals of blinking stimuli/pauses, exploiting
trigger the EEG. phase as well as amplitude information and than two blink-
ing frequencies. Even more exciting is the possibility to use
subject-specific frequencies and sample positions, which may
5. EXPERIMENTS yield better efficiency but also allow to create unique signa-
SSVEP was first tested using a setup consisting of two tures per human which cannot easily be forged and allow
similar screens placed in front of the subjects, each present- two-way authorization between human and environment.
ing an individual stimulation pattern (flickering). Screen 1
showed repetitively code 1 pause (6s) - f1 (4s) – pause (1s)
– f2 (4s), whereas screen 2 presented code 2 pause (6s) – f2 7. ACKNOWLEDGMENTS
(4s) – pause (1s) – f1 (4s) presented. This work was sponsored by the European Union contract
EEG was bipolarly recorded from one occipital position FP6-2004-IST-4-27731 (PRESENCCIA) and the Austrian
(O1 or O2, subject-specific) and digitized with a sampling Science Fund FWF contract Y193. Special thanks to g.tec
frequency of 256 Hz. A lock-in amplifier system (LAS) was for the equipment loan and assistance.
used to extract the SSVEP amplitudes of 2 specific frequen-
cies (f1=6.25 Hz and f2=8.0 Hz) and their harmonics (up
to 3). A simple one versus rest classifier was used to dis- 8. REFERENCES
tinguish between those frequencies [7]. A correct login was
[1] M. Addlesee, R. Curwen, S. Hodges, J. Newman,
performed when the pattern of the detected frequencies rep-
P. Steggles, A. Ward, and A. Hopper. Implementing a
resented either code 1 or 2 (C1 or C2).
sentient computing system. Computer, 34(8):50–56,
Four different runs (lasting max. 5min, pause was defined
2001.
with 30s) were performed to validate the functionality of the
[2] D. Chen and R. Vertegaal. Using mental load for
login classification:
managing interruptions in physiologically attentive
• run1: pause-C1-C2-pause-C2-C1-pause-C2-pause-C1 user interfaces. In CHI ’04: CHI ’04 extended
abstracts on Human factors in computing systems,
• run2: pause-C2-C1-pause-C1-C2-pause-C1-pause-C2 pages 1513–1516, New York, NY, USA, 2004. ACM.
[3] X. Gao, D. Xu, M. Cheng, and S. Gao. A bci-based
• run3: lasted 2min, reading newspaper, no login environmental controller for the motion-disabled.
• run4: pause-C1-C2-C2-C1-pause-C2-C1-C1-C2-pause Neural Systems and Rehabilitation Engineering, IEEE
Transactions on, 11(2):137–140, June 2003.
Results of SSVEP-based login shows following results (Ta- [4] J. C. Lee and D. S. Tan. Using a low-cost
ble 5): TP (true positives, correctly logged in, max. 20 electroencephalograph for task classification in hci
TPs in whole experiment, FN (false negatives, incorrect lo- research. In UIST ’06: Proceedings of the 19th annual
gins), FP (false positives, logged in, although no login was ACM symposium on User interface software and
required). technology, pages 81–90, New York, NY, USA, 2006.
A more practical experiment was carried out using the ACM.
mobile setup from Figure 1. In this setup the UMPC was [5] S. Mann, D. Chen, and S. Sadeghi. Hi-cam: Intelligent
running a modified version of CSpace to connect to three dif- biofeedback processing. Wearable Computers, IEEE
ferent PCs. The screens of the test PCs showed a blinking International Symposium, 0:178, 2001.
pattern transmitting a 2-bit code registered with a trans- [6] S. Marcel and J. d. R. Millan. Person authentication
lation service as shown in Figure 2, while the UMPC was using brainwaves (eeg) and maximum a posteriori
running the signal processing routines for frequencies of 6 model adaptation. IEEE Trans. Pattern Anal. Mach.
and 9 Hz. Whenever a valid machine code was detected by Intell., 29(4):743–752, 2007.
Subject T1 T2 T3 T4 T5 T6 T7 T8 T9
TP 20 9 20 16 19 17 14 20 12
FN 6 11 1 11 11 12 9 1 9
FP 9 5 0 2 6 5 7 0 3

Table 1: Results of SSVEP-based login.

Subject TP (max. 6) FN FP Total time [m:s]


T1 6 9 1 7:46
T2 6 3 1 5:25

Table 2: True positives (TP), false negatives (FN) and false positives (FP) measured for subjects T1 and T2.

[7] N. Matsushita, D. Hihara, T. Ushiro, S. Yoshimura, York, NY, USA, 1996. ACM.
J. Rekimoto, and Y. Yamamoto. Id cam: A smart [17] R. Vertegaal, C. Dickie, C. Sohn, and M. Flickner.
camera for scene capturing and id recognition. In Designing attentive cell phone using wearable
ISMAR ’03: Proceedings of the 2nd IEEE/ACM eyecontact sensors. In CHI ’02: CHI ’02 extended
International Symposium on Mixed and Augmented abstracts on Human factors in computing systems,
Reality, page 227, Washington, DC, USA, 2003. IEEE pages 646–647, New York, NY, USA, 2002. ACM.
Computer Society.
[8] M. Middendorf, G. McMillan, C. G., and J. K.
Brain-computer interfaces based on the steady-state
visual-evoked response. IEEE Trans. Rehabil. Eng.,
8:211–214, 2000.
[9] G. Mueller-Putz and G. Pfurtscheller. Control of an
electrical prosthesis with an ssvep-based bci. IEEE
Transactions on Biomedical Engineering, 55:361–364,
2008.
[10] G. Mueller-Putz, R. Scherer, C. Brauneis, and
G. Pfurtscheller. Steady-state visual evoked potential
(ssvep)-based communication: impact of harmonic
frequency components. Journal of Neural Engineering,
2:123–130, 2005.
[11] R. Palaniappan. Utilizing gamma band to improve
mental task based brain-computer interface design.
Neural Systems and Rehabilitation Engineering, IEEE
Transactions on, 14(3):299–303, Sept. 2006.
[12] R. Palaniappan and D. Mandic. Biometrics from brain
electrical activity: A machine learning approach.
Pattern Analysis and Machine Intelligence, IEEE
Transactions on, 29(4):738–742, April 2007.
[13] R. Palaniappan, R. Paramesran, S. Nishida, and
N. Saiwaki. A new brain-computer interface design
using fuzzy artmap. Neural Systems and
Rehabilitation Engineering, IEEE Transactions on,
10(3):140–148, Sept. 2002.
[14] R. Paranjape, J. Mahovsky, L. Benedicenti, and
Z. Koles’. The electroencephalogram as a biometric. In
Electrical and Computer Engineering, 2001. Canadian
Conference on, volume 2, pages 1363–1366 vol.2, 2001.
[15] M. Poulos, M. Rangoussi, V. Chrissikopoulos, and
A. Evangelou. Parametric person identification from
the eeg using computational geometry. In Electronics,
Circuits and Systems, 1999. Proceedings of ICECS
’99. The 6th IEEE International Conference on,
volume 2, pages 1005–1008 vol.2, Sep 1999.
[16] B. M. Velichkovsky and J. P. Hansen. New
technological windows into mind: there is more in eyes
and brains for human-computer interaction. In CHI
’96: Proceedings of the SIGCHI conference on Human
factors in computing systems, pages 496–503, New
Relevance of EEG Input Signals
in the Augmented Human Reader
Inês Oliveira Ovidiu Grigore, Nuno Guimarães, Luís Duarte
CICANT, University Lusófona LASIGE/FCUL
Campo Grande, 376, 1749-024, Lisbon University of Lisbon, Campo Grande, 1749-016, Lisbon
PORTUGAL PORTUGAL
ines.oliveira@ulusofona.pt {ogrigore | nmg}@di.fc.ul.pt

ABSTRACT 1. INTRODUCTION
This paper studies the discrimination of electroencephalographic The understanding and use of human physical and physiological
(EEG) signals based in their capacity to identify silent attentive states in computational systems increases the coupling between
visual reading activities versus non reading states. the user and the application behavior. The integration of
The use of physiological signals is growing in the design of physiological signals in applications is relevant in the design of
interactive systems due to their relevance in the improvement of universally-accessible interactive systems and will become more
the coupling between user states and application behavior. relevant as new computing paradigms such as ubiquitous
computing [7] and ambient intelligence [1],[14] develop.
Reading is pervasive in visual user interfaces. In previous work,
we integrated EEG signals in prototypical applications, designed The use of neurophysiological signals, and in particular
to analyze reading tasks. This work searches for signals that are electroencephalograms (EEG), has been widely reported in the
most relevant for reading detection procedures. More specifically, context of an important example of coupled interaction systems:
this study determines which features, input signals, and frequency BCI’s [4],[5],[16]. These interfaces exploring the information at
bands are more significant for discrimination between reading and its source, the brain. EEG signals are frequently chosen because of
non-reading classes. This optimization is critical for an efficient their small temporal resolution and non-invasiveness [9] and also
and real time implementation of EEG processing software due to its relative low cost capture device settings.
components, a basic requirement for the future applications. Visual user interfaces often require reading skills. The users’
reading flow is highly influenced by their concentration and
We use probabilistic similarity metrics, independent of the attention while interacting with applications. The application
classification algorithm. All analyses are performed after visual characteristics and users’ cognitive state can decrease
determining the power spectrum density of delta, theta, alpha, beta readability and degrade the interaction.
and gamma rhythms. The results about the relevance of the input
signals are validated with functional neurosciences knowledge. Augmented reading applications should adapt to the user’s
reading flow through the detection of reading and non-reading
The experiences have been performed in a conventional HCI lab, states. Reading flow analysis also improves the understanding of
with non clinical EEG equipment and setup. This is an explicit the users’ cognitive state while interacting with the applications
and voluntary condition. We anticipate that future mobile and and improves the current empirical style of usability testing [9]. In
wireless EEG capture devices will allow this work to be previous work, we integrated EEG signals in two prototypical
generalized to common applications. applications, designed to analyze and assist reading tasks. These
applications are briefly described further down in this paper.
Categories and Subject Descriptors
H.5.2 [Information Interfaces and Presentation]: User This paper focuses on the discrimination of EEG signals based in
Interfaces – user centered design, evaluation, interaction styles. their relevance with respect to the identification of silent attentive
reading versus non reading tasks, therefore finding the
General Terms importance of each EEG signal for the reading detection
Design, Experimentation, Human Factors, Measurement procedure. The ultimate goal of this study is to allow a robust
selection and weighting of input signals, which we deem critical
Keywords for a feasible, efficient, and real time implementation of EEG
Reading Detection, HCI, EEG Processing and Classification, processing software components, our augmentation approach.
Similarity Metrics, Feature Relevance Measurement. EEG processing literature generally refers feature vectors of some
extent. We have dealt with data dimensionality reduction in the
processing pipeline by using Principal Component Analysis [9].
Permission to make digital or hard copies of all or part of this work for PCA does not consider the spatial distribution of the input signals
personal or classroom use is granted without fee provided that copies are
nor the functional neurosciences knowledge. Neurosciences map
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy cognitive processes into skull areas.. Quantifying the importance
otherwise, or republish, to post on servers or to redistribute to lists, of each input signal in relation to reading detection will help
requires prior specific permission and/or a fee. verifying what electrodes and frequency bands are more involved
AH’10, April 2-4, 2010, Megève, France. in the reading cognitive process, and builds on the functional
Copyright 2010 ACM 978-1-60558-825-4/10/04…$10.00. neurosciences knowledge.
The analysis of EEG signals relevance is performed after The first 5000ms and the last 3000ms of each trial are discarded
determining the power spectrum density (PSD) of delta, theta, for avoiding possible artifacts caused by start-end of the recording
alpha, beta, and gamma rhythms (the known EEG frequencies process. To assure the reliability of capture procedure the
bands) in each of the captured EEG streams. We then apply experiment was also tested using a professional medical capture
probabilistic similarity measures [10], which are independent of device, in use in a Hospital, which setup was entirely prepared
the classification algorithm, in each of these streams to detect the and tuned by expert technicians [9]. The results obtained with
main differences, and to discriminate between visual reading and both capture devices were validated by an EEG specialist and a
non reading activities. All results obtained about the importance consistent set of sample results was produced.
of the input signals are provided and crossed against functional
neurosciences knowledge. 2.1 Read and Not Read Experience
The capture experiments, object of the relevance analysis
Our experiments were performed in a conventional HCI lab, with described in this paper, were based in the presentation of alternate
non clinical EEG capture equipment. This is not a limitation to blank and text screens containing about 40 lines of daily news
overcome but rather a feature and an apriori requirement of our text. The duration of such screens differed according with the
design. Even if the results can be further validated in clinical ability to keep subjects concentrated in the task [9]. Text screens
settings (in vitro), our goal is to address real life situations (in were presented in longer periods (30s) than black screens (20s).
vivo) which have harsher stability, noise and artifact conditions. These types of periods were interlaced: one reading text sample,
We predict future mobile and wireless EEG capture devices will followed by 2 watch-only blank screens, and again back to read.
allow the generalization and extension of this work to common All these periods were captured separately, allowing a small
tools and applications. The broader goal of this work is to design resting period, where the signal was not recorded.
and develop usable and robust software components for Each capture trial included approximately 120s of both sample
integration in interactive systems that reach higher adaptation classes. All data was recorded without any previous or special
levels through this augmentation approach. training in a right handed female subject, mid thirties and without
known vision disabilities (see discussion on this choice in the final
section).
2. EXPERIMENTAL SETTINGS
EEG signals were captured using MindSet-1000, a simple digital 2.2 Assisted Reading Prototypes
system for EEG mapping with 16 channels, connected to a PC In the context of these experiences, we designed simple prototype
using a SCSI-interface. These channels are connected through tools. ReadingTester tests in real time “reading event scripts”,
pure tin electrodes (sensors) to a cap made of elastic fabric, sequences of events with certain duration that are generated by the
produced by Electro-Cap International. application. The subject is exposed to these events, and
simultaneously the EEG is captured and analyzed. A detection
performance report is built when the detection process stops.

Figure 1. MindSet-1000 and Electro-Cap Intl Cap.

Figure 2 shows the electrodes mapping that are used in our study.
The EEG signals are amplified in differential manner relative to
the ear electrodes and are sampled with 256Hz frequency. All
requirements indicated by suppliers and technicians were fulfilled
[9]. These included “grounding” the subjects and keeping the
impedance in each electrode bellow 6000Ω, through the thorough
application of conductive gel.

Figure 3. Assisted Reading Prototypes.

ReadingScroller aims at controlling the text scrolling through


EEG signals: while the user is reading the scrolling should occur;
if the user stops reading the scrolling should also stop. This is a
trivial (from the functionality’s viewpoint) Brain Computer
Interface exploiting the reading detection capability, but with non
Figure 2. Mapping of used EEG electrodes (Int 10-20 method).
trivial design challenges not reported here.
3. RELEVANCE ANALYSIS Kullback-
Leibler
Relevance analysis is performed after determining the PSD of
delta (δ), theta (θ), alpha (α), beta1 (β1), and gamma (γ) rhythms
in each of the 16 electrodes’ input streams. This results in 16x5
PSD features streams, each with reading and non-reading J-Coefficient
samples. We then determined probabilistic dissimilarity measures
separately in each of these streams, in order to quantify the
dissimilarity between these two sample classes. The most relevant
streams are those revealing larger significant differences between Information
the reading and non-reading classes. Radius
3.1 Probabilistic Dissimilarity Measures
Relative similarity is the relationship between two entities that χ2 Divergence
share common characteristics with different degrees [10]. The
larger it is, the greater the resemblance between the compared
objects. Relative dissimilarity, on the other hand, focuses on the
differences: the smaller it is, the greater the resemblance between
the compared objects [10]. In our work, we compare the
Hellinger
dissimilarity between reading and non reading samples sets. Both Coefficient
sets were approximated through Normal probability functions,
since their samples result from discrete observations belonging to
a large vector space.
Table 1 summarizes the probabilistic dissimilarity measures that
were tested [2]. µi and Σi are respectively the mean vector and
Chernoff
covariance matrix of the Normal distribution, noted as Ni, which
Coefficient
approximates class i samples set. DM is the squared Mahalanobis1
distance between their means. In all the presented formulas, we Bhatthacharyy
assume Σ1 ≠ Σ2. a Coefficient

For the sake of reproducibility of this work, the remainder of this


section briefly describes each one of these measures.
3.1.1 Kullback-Leibler (KL) Divergence Based Distance
Measures
Kullback-Leibler Divergence is an asymmetric measure, a.k.a
Relative Entropy or Information Gain. It quantifies, in bits, how
close a distribution F1 is from a (model) distribution F2 [12], or,
more precisely, the loss of information we incur in if we take F1
instead of F2 [8]. By definition, this measure between probability
distributions p1(x) and p2(x) is determined by [10], [15],[8]:

Table 1. Probabilistic dissimilarity measures.

For two Normal distributions N1 (x) and N2(x) it becomes in the


formula displayed in Table 1. 3.1.2 χ2 Divergence
χ2 divergence is an asymmetric measure between probability
KL divergence cannot be considered a metric because it is
distributions p1(x) and p2(x), and is determined by [8]:
asymmetric, that is: [12][8]. There are
however measures such as J-Coefficient and Information Radius,
which are symmetric versions of KL divergence.
J-Coefficient (JC) [2] is calculated by applying the KL formula The convergence in χ2 divergence implies convergence in KL-
symmetrically: divergence, but the converse is not true [8]. This is because χ2
divergence is strictly topologically stronger then KL-divergence,
since KL(P,Q)≤ χ2(P,Q).
Information Radius (IR), also known as Jensen–Shannon
divergence, is a smoothed symmetric version that is the average of
the KL-distances to the average distribution [15]: 3.1.3 Hellinger Coefficient (HC) Based Measures
Hellinger Coefficient (HC) of order t is a similarity measure
between probability distributions p1(x) and p2(x), defined in [8]:

1
From this similarity-like measure, several dissimilarity 3.3 ANOVA Analysis
coefficients have been derived. Chernoff coeficient (CC) of the To statistically validate our conclusions we performed Variance
order t is defined as [5]: Analysis, also known as ANOVA. It analyses the variation
present in our experiments by statistically testing whether the
statistical parameters of our groups of measures (bands,
This measure is related to KL divergence through its slope at t=0,
electrodes, etc.) are consistent, assuming that the sampled
it is smaller than KL divergence and it is less sensitive than the
populations are normally distributed. If, for instance, this
KL-divergence to outlier values [8].
consistency holds for two electrodes or bands, then we can safely
There is also a special case symmetric metric for t=1/2, named consider them correctly ranked.
Bhattacharyya Coefficient (BC), defined by [10]:
ANOVA results are put into a graphic or table (Figure 4). The
center line in the graphic represents the mean of each group, the
above and below polygon lines, show it’s the mean +/- variance
BC measures the amount of overlap between two probability values and the line segments delimit the confidence interval.
distributions.
3.1.4 Minkowski’s Based Measures
The Minkowski’s Lp distance with p ={1,2,3, …} defined in
[2][5]:

All Minkowski measures are symmetric and differ only in the way
they amplify the effect of outlier values. Minkowski’s distances of
first and second order, L1 and L2 distances, are also known as
Manhattan and Euclidean distance respectively.

L2 measure is defined by [10]:

It defines the distance between two points in a Euclidean n-space–


a real coordinate space with n real coordinates, in this case our
samples.
SS DF MS F P Crit.F
Between Groups 167,1 1 167,1 107,3 6,02E-08 4,6
3.2 Relevance Measurement Method
We assume that relevance is directly proportional to the Within Groups 21,8 14 1,6
differences determined by dissimilarity measures. So our Total 188,8 15
procedure is based in the ordering of the 16x5 feature streams Figure 4. ANOVA for left (1) vs. right (2) Hemispheres.
accordingly with the calculated dissimilarities.
The first step is applying all the dissimilarity measures to the
The main ANOVA formula is given by:
feature streams. This results in 16x5 (80) real values, one for each
stream, corresponding to the measured difference. In order to
compare all these values, all streams are normalized and turned
into percentages by applying the following formulas:
where the numerator is the variance between groups, and the
denominator is the variance within groups:

The first equation normalizes the range of each difference to the


interval [0,1]. The second weights it in relation to the overall
results obtained with the measure.
The numerator of these formulas is represented in Squared Sum
At this stage, after observing all the produced graphics, the
(SS) column (in the Table of the above Figure 4), while the
Minkowski’s Based Measures were discarded. These measures
Degrees of Freedom (DF) column contains the denominator. Total
showed results that were too divergent from the ones provided by
row is the sum of the columns. Mean Squares (MS) column is
the rest of the metrics.
SS/DF. Critical F is got from the F distribution table and P is the
The final 16x5 weights, which we are using to quantify the probability of Critical F being as great as F. In the case of the
importance of each 16X5 stream, are the average of all the above values, F is much greater that critical F and P is a
measures. These weights are then ranked from the minimum (1) to significantly low value, so we can state that the statistical
maximum (80) importance and these are the results to analyze in parameters of our groups of measures are consistent
order to determine signal relevance.
4. PROCESSING AND ANALYSIS
FRAMEWORK
All the processing functionalities are encapsulated in EEGLib
framework, an object oriented toolkit implemented in C++ and
MatLab [9],[3]. This framework provides tools for feature >55
extraction and classification and also components for data >47
modeling, such as EEG streams, frames, and iterators. >42
>38
EEGLib includes several common EEG feature extraction
procedures, including wavelets, power spectrum density (PSD),
Event Related Synchronization (ERS) and other statistical
measures. In the work described, we are using the mean PSD in
Delta (δ) – 1 to 4 Hz, Theta (θ) – 4 to 8 Hz, Alpha (α) – 8 to
13Hz, Beta1 (β1) – 13 to 17Hz, and Gamma (γ) – 51 to 99 Hz –
rhythms in all 16 electrodes. The analysis thus considers feature Figure 6. Average input signal relevance (locations).
vectors composed by 16x5 real values. Mean PSD is determined
in 1000ms frames with an overlapping of 500ms.
It is clear that the main differences are dominant in the left
Our framework also has tools that support various standard hemisphere. This is in agreement with the study about reading
learning methods, including neural networks, K-Nearest tasks conducted by Bizas et al [2]. Their findings suggested that
Neighbors (KNN), Ada Boost and Support Vector Machines. We changes in PSD between reading tasks are restricted to left
have tested all of these tools, but for simplicity, current reading hemisphere. This hemisphere specialization is also confirmed by
processing procedures are using the KNN provided in SPRTOOL functional neurosciences experiments. It seems that about 90% of
MATLAB Toolbox. adult population has left-hemisphere dominance for language
[13]. Broca’s and Wernicke’s regions, which are respectively
responsible for speech and language understanding, are located in
5. RESULTS AND DISCUSSION left hemisphere [13]. Wernicke’s area influence is clear in our
This section presents and discusses the results of the relevance study: there is a visible importance elevation near electrodes T5,
ordering of input signals and bands. P3 and O1. We also expected a more visible influence of Broca’s
area in our results, near F7, C3 and T3, but this was inconclusive.
5.1 EEG Signals Relevance Ordering
The relevance measurements ranks of all bands were averaged for The highest ranked electrodes are in the frontal polar and occipital
each electrode. Figure 5 below presents the average values electrodes. Occipital lobe is where visual processing occurs [13],
determined in all samples sets. The y-axis represents the which supports our results regarding reading versus non-reading
importance rank average of all features of each electrode. For cognitive tasks. Frontal lobe is responsible for higher level
instance, the average of all features relative to O1 electrode has an processes, but we believe that it is more likely that the differences
average rank of 60 in 80. are due to eye artifacts.
5.2 Bands Relevance Ordering
The relevance measurement ranks of all electrodes were averaged
70
for each band. Figure 5 shows the average results determined in
Electrode Average importance Rank

60 all samples. The y-axis indicates the importance rank average of


all features of each band. For instance, the average of all features
50 relative to α band has an average rank of 50 in 80.
40

30 60
importance Rank

50
Band Average

40
20 30
20
10 10
0

0
FP1 FP2 F7 F3 F4 F8 T3 C3 C4 T4 T5 P3 P4 T6 O1 O2

Figure 7. Average band relevance.


Figure 5. Average input signal relevance (ranks).
γ and δ bands are visibly less ranked that θ, α , and β1. We were
Figure 6 shows the locations of the most ranked electrodes. The expecting, due to previous related work, more relevant differences
electrodes that are not signaled have a rank inferior to 27. between these two groups of bands. γ rhythm is considered an
important marker for attention [6]. It appears that visual
presentation of attended words induces a γ rhythm in the major
brain regions associated with reading and that this effect is
significantly attenuated for unattended words. A possible The next figure shows the ANOVA result taking the the five
justification for poor γ band performance in our work can be that tested bands as groups: δ, θ, α, β1, and γ (this order). These
our experiments are focusing on reading instead of stressing calculations were also performed after averaging the ranks in all
attention. These results show the difference between visual features related with each band.
reading and attention in the cortex activity.
Left hemisphere’s δ, θ, and β1 rhythms were already used for
differentiating reading tasks [13]. Significant differences in these
rhythms between semantic tasks, the ultimate attentive reading
activity, and visual, orthographic, and phonological tasks have
been reported. The differentiation of θ rhythm was confirmed in
our study, but this did not hold for δ and β1 bands. This cannot be
due to the averaging effect, since these results are consistent with
the non-averaged values (see next section).
The α rhythm, related with resting condition, demonstrated a good
performance in our study, in spite of not being referenced in
related work as the other bands are. Probably our non-reading task
is behaving like a mental resting activity, when compared to
reading task, thus causing the differentiation between the α bands
of the sample sets of both classes.
5.3 Total Features Ordering SV SS DF MS F P Crit.F
The importance measurement values were averaged for each of Between Groups 4158,5 4 1039,6 9,2 3,372E-05 2,6
16x5 features. Table 2 below displays the 10 highest average Within Groups 3944,6 35 112,7
results, determined in all samples sets for each feature. Total 8103,1 15

Average Figure 8. ANOVA for δ(1), θ(2), α(3), β1(4) and γ(5) bands.
Rank Electrode Band
Relevance Rank The average ranks of θ and α were relatively higher and
1 79,4 O1 Alpha
differentiated from the rest of the bands. Γ band performed
2 77,8 P3 Alpha
poorly, showing the lowest rank and widest variation. According
3 74,9 O1 Beta1
4 74,8 P3 Theta with the previous reasoning about ANOVA table results, we can
5 73,9 O2 Alpha also state that the statistical parameters of these groups are
6 73,5 O1 Theta consistent, in spite of the F being close of its critical value.
7 72,4 T5 Alpha To further detail this analysis we performed Multiple
8 69,0 O2 Beta1 Comparisons: a technique that complements ANOVA and looks
9 68,1 O2 Theta for specific significant differences between pairs of groups by
10 66,3 T5 Theta checking the means among them. Figure 9 contains multiple
comparison results for delta, theta, alpha, and beta1. Each line
Table 2. 10 highest average feature relevance. segment represents the comparison intervals of each group.
This ranking reinforces all the previous discussion, because all
these values are located in the left hemisphere, and α and θ are the
most frequent bands. It also shows that the averaging introduced
in the previous analyses may minimize the importance of certain
electrodes, namely P3 that appears twice in the top 10.
5.4 ANOVA Analysis Results
We performed several ANOVA test runs with different groups of
measures, namely: left versus right hemisphere, skull areas, bands,
electrodes, and features.
Erro! A origem da referência não foi encontrada. above
(section 3) presented the ANOVA graphic and table for left and
right hemispheres. These calculations were performed after
averaging the ranks of all features related with each hemisphere.
As we stated before, the results in the table indicate that the Figure 9. Multiple Comparison for δ(1), θ(2), α(3) and β1 (4).
statistical parameters of the analyzed groups are consistent. This
conclusion is reinforced by the graphic, which shows that the
average ranks of both groups are statistically distinct with no δ and β1 bands comparison intervals were significantly different
possible overlap. We can also see that the left hemisphere from the ones determined for θ and α rhythms. This also means
importance is significantly higher than that of the right that θ and α bands were significantly higher and distinct from the
hemisphere. rest of the rhythms and, for this reason, they appear to be more
relevant for classifying reading versus non reading tasks.
Figure 10 below displays the ANOVA result for specific skull
areas: front polar, frontal, central, temporal, occipital, and parietal
regions. These calculations were performed after averaging the
ranks of all features related with each area.

SV SS DF MS F P Crit.F
Between Groups 15849,8 15 1056,7 31,1 5,9E-33 1,8
Within Groups 3810,4 112 34,0
Total 19660,2 127
SV SS DF MS F P Crit.F Figure 11. ANOVA for FP1(1), FP2(2), F7(3), F3(4), F4(5),
Between Groups 4893,1 5 978,6 50,9 9,5E-17 2,4
F8(6), T3(7), C3(8), C4(9), T4(10), T5(11), P3(12), P4(13),
Within Groups 807,4 42 19,2
T6(14), O1(15) and O2(16).
Total 5700,5 47
Figure 10. ANOVA for Frontal Polar (1), frontal(2),
central(3), temporal(4), occipital(5) and parietal(6) areas.
These groups’ statistical parameters are also consistent, as the
previous tables, since F is significantly higher than its critical
value, and P is extremely small. Accordingly, with our previous
results, we obtained average ranks relatively higher and distant
from the remaining regions for the front polar and occipital areas.
We then repeated the ANOVA process for all input signals using
the average ranks in all features related with each electrode (see
Figure 11). We did not discard any input signal at this stage in
order to verify the averaging effect that we could get in the
previous calculations. These results confirmed the previous
discussion about areas. Front polar and occipital electrodes
revealed higher ranks than the remaining electrodes, in spite of
not being distant enough, especially front polar.
Figure 12. Multiple Comparison for FP1(1), FP2(2), F7(3),
The values in the table also confirm these rankings as statistically F3(4), F4(5), F8(6), T3(7), C3(8), C4(9), T4(10),
consistent. F is once more greater that its critical value and P is T5(11), P3(12), P4(13), T6(14), O1(15) and O2(16).
very small. We then applied multiple comparisons to better
analyze differences among electrodes (see Figure 12), and
approximately got three groups, occipital, front polar and the
remaining electrodes. Only for occipital electrodes, the
comparison interval was significantly different from remaining
electrodes group.
Finally, we applied ANOVA to individual features, but reducing
its number to 16 by applying the previous conclusions (see Figure
13). Features were restricted to front-polar and occipital areas, and
we also discarded the γ band.
The table supports that these rankings are statistically consistent,
but we got here the lowest F value. However, F still is greater than
its critical value and the probability of F being smaller than its
critical value is very small (P).
δ band features from both occipital electrodes (9 and 13) worked SV SS DF MS F P Crit.F
poorly and showed a great variability. But the remaining features Between Groups 17535,6 15 1169,0 6,7 4,59E-10 1,8
of these input signals were very concentrated and showed a Within Groups 19584,3 112 174,9
relative distance regarding the rest of the groups. The variation of Total 37119,9 127
front polar related features (from 1 to 8) was more significant, Figure 13. ANOVA for FP1(1 to 4), FP2 (5 to 8), O1(9 to 12)
especially for δ and β1 bands. and O2(13 to 16) with bands δ, θ, α and β1 respectively.
We presented results that reinforce that left hemisphere is
dominant regarding reading tasks. We showed that its input
signals consistently revealed higher dissimilarities between
reading and non-reading samples than its homologues in right
hemisphere. The results also indicated front polar and occipital
areas, especially the latter, as also α and θ bands, related features
as being more relevant that the remaining values. In opposition
the some related work [12],[13], γ and δ bands results consistently
performed poorly. In summary, we can state that:
For EEG-based silent reading detection, use mainly O1(θ,α,β1)
and O2(α)

With this method, we can now proceed to the design of focused


applications that exploit this significantly reduced set of human
Figure 14. Multiple Comparison FP1(1 to 4), FP2 (5 to 8), physiological features. The above specific conclusions are a first
O1(9 to 12) and O2(13 to 16) with bands δ, θ, α and β1 step towards the exploitation of this reduced set of signals in
respectively. interactive applications targeted at assisted reading (such as
Multiple comparisons (see Figure 14) revealed approximately ReadingScroller, briefly mentioned above). Having a reduced and
three groups of comparison intervals: (I) 10 to 12 and 15, with optimized (for the cognitive task at hand) set of signals is a critical
higher ranks and significantly different from the next group; (II) requirement for the optimization of the real time processing and
1, 4, 5,8,9 and 13, with lower ranks and significantly different for the use of the future light and portable EEG devices, where
from the previous group, and (III) Remaining features. Table 3 results are being reported that justify our expectations [11].
shows more detailed data about the first two groups. Our work elicits the following additional requirements and ideas
that should be explored in sequence.

Average Calibration Procedures Design


Group Relevance Electrode Band Although our results were consistent with neurosciences
Rank knowledge and some of the existing related work, the presented
10 O1 Theta θ analysis was performed with a single subject and a limited set of
1 11 O1 Alpha α samples. This was a conscious choice in this stage to minimize the
12 O1 Beta1 β1
set of variables and tune the method. The repetition of the
15 O2 Alpha α
procedure with a larger number of subjects will now evaluate the
1 FP1 Delta δ
4 FP1 Beta1 β1
degree of generalization of these results2.
2 5 FP2 Delta δ Our experience indicates that user differences will introduce some
8 FP2 Beta1 β1 degree of diversity, such as skin conductance or hair type. In any
9 O1 Delta δ case, differences are to be expected even when the subject is the
13 O2 Delta δ same, due to biorhythmic cycle, sleepiness, or environmental
conditions.
Table 3. Details of the two significantly different intervals. We aim to compensate these differences by designing adequate
calibration procedures that adapt to the individual user profiles
and conditions.
Almost all O1 electrode comparison intervals were situated in the
higher ranked group, revealing that this electrode appears to be Dimensionality Reduction
consistently different between both reading and non-reading As we said before, the ordering of EEG signals relevance, with
classes, since most of its bands were affected. δ band intervals respect to their ability to distinguish reading and non reading
were also consistently located in the lower ranked group, showing mental activities, is indispensable for the use of the future light
that it seems to be less relevant than the remaining rhythms. and portable EEG devices. Signal ranking will allow the reduction
of the number of sensors and turn the way users interact with
augment reading applications more simple and natural.
6. CONCLUSIONS AND FUTURE WORK In this context, we aim to include this knowledge in the current
This paper presented a study about the discrimination between the signal processing chain. A serious analysis about the impact of
relevance of different types of EEG input signals with respect to removing some of the less relevant features must be done.
their ability to identify silent attentive visual reading versus non Reducing feature vector dimensionality will ultimately reduce
reading cognitive tasks. processing time and allow the development of more effective real
We have demonstrated that EEG input signals are not equally time applications.
significant, and that we can quantify their contributions for the
distinction between reading and non reading cognitive tasks. More
than that, we outlined a systematic and quantitative method for
relevance determination that can be applied to other cognitive 2
We referred above that around 90% of the population shows left
tasks. hemisphere dominance for language [13].
Opportunities for Gamma Band Analysis Reading”, Human Brain Mapping, Vol. 29, Issue 10, pp.
As we told before, γ rhythm is considered an important marker for 1193 – 1206, 2008.
attention [13]. However, it performed poorly in our study. [7] Krumm, J. (ed) (2010) Ubiquitous Computing Fundamentals,
Possible reasons for these results, in relation to the ones suggested CRC Press, 2010
in relative work, are the use of different type of features (PSD
instead of wavelets) or distinct cognitive goals (reading versus [8] Malerba, D. Esposito, F., Monopoli, M. , Comparing
non-reading instead of attentive versus non-attentive reading). A dissimilarity measures for probabilistic symbolic objects,
better understanding of this effect may be achieved through the Data Mining III, Series Management Information Systems,
use of wavelet coefficients for analyzing the γ band patterns in our WIT Press, Vol. 6, pp. 31-40, 2002.
experiments. [9] Oliveira, I., Grigore, O. and Guimarães, N., Reading
detection based on electroencephalogram processing,
Proceedings of the WSEAS 13th international conference on
7. ACKNOWLEDGMENTS Computers, Rhodes, Greece, 2009.
This work was partially supported by Fundação para a Ciência e
Tecnologia (FCT), Portugal, Grant SFRH/BD/30681/2006 and [10] Pekalska, E., Duin, R., “The Dissimilarity Representation
Ciência 2007 Program. for Pattern Recognition: Foundations And Applications”,
Machine Perception and Artificial Intelligence, World
Scientific Publishing Company, Ch. 5, pp 215-254, 2005.

8. REFERENCES [11] Popescu, F. Siamac, F., Badower, Y., Blankertz, B., Mu ller,
K., “Single Trial Classification of Motor Imagination Using
[1] Aarts, E., Encarnação, J., True Visions, The Emergence of 6 Dry EEG Electrodes”. PLoS , ONE 2(7): e637, 2007.
Ambient Intelligence, Springer, 2006.
[12] Shlens, J., “Notes on Kullback-Leibler Divergence and
[2] Bizas, E., Simos, G., Stam, C.J., Arvanitis, S., Terzakis, D., Likelihood Theory”, Systems Neurobiology Laboratory, Salk
Micheloyannis, S. EEG Correlates of Cerebral Engagement Institute for Biological Studies, La Jolla, CA 92037, 2007.
in Reading Tasks, Brain Topography, Vol. 12, 1999.
[13] Steinberg, R. J., Cognitive Psychology, Thomson
[3] Oliveira, I., Lopes, R., Guimarães, N. M., Development of a Wandsworth, 2003.
Biosignals Framework for Usability Analysis (Short Paper),
ACM SAC´09 HCI Track, 2009. [14] Streitz, N., Kameas, A., Mavromatti, I., The Disappearing
Computer: Interaction Design, System Infrastructures and
[4] Wolpaw, J. R. et al., “Brain–Computer Interface Technology: Applications for Smart Environments, Springer, 2007
A Review of the First International Meeting”, IEEE
Transactions on Rehabilitation Engineering, Vol. 8, 2000. [15] Topsøe , F., Jensen-Shannon Divergence and norm-based
measures of Discrimination and Variation, Technical report,
[5] Millán, J.R., “Adaptative Brain Interfaces”, Communications Department of Mathematics, University of Copenhagen,
of the ACM, 2003. 2003.
[6] Jung, J., Mainy,N., Kahane,P., Minotti, L., Hoffmann, D., [16] Z.A. Keirn, J. I. Aunon, “A New Mode of Communication
Bertrand, O., Lachaux, J., ” The Neural Bases of Attentive between Man and His Surroundings”, IEEE Transactions on
Biomedical Engineering, Vol. 37, 1990.
Brain Computer Interfaces for Inclusion
P. J. McCullagh M.P. Ware G. Lightbody
Computing & Engineering, Computing & Engineering, Computing & Engineering,
University of Ulster University of Ulster University of Ulster
Shore Road, Jordanstown, Shore Road, Jordanstown, Shore Road, Jordanstown,
Co. Antrim BT37 0QB, UK Co. Antrim BT37 0QB, UK Co. Antrim BT37 0QB, UK
Tel: +44 (0)2890368873 Tel: +44 (0)28 90366045
g.lightbody@ulster.ac.uki
pj.mccullagh@ulster.ac.uk mp.ware@ulster.ac.uk

ABSTRACT A Brain-Computer Interface (BCI) may be defined as a system


In this paper, we describe an intelligent graphical user interface that should translate a subject’s intent (thoughts) into a technical
(IGUI) and a User Application Interface (UAI) tailored to Brain control signal without resorting to the classical neuromuscular
Computer Interface (BCI) interaction, designed for people with communication channels [3]. The key components are signal
severe communication needs. The IGUI has three components; acquisition to acquire the electroencephalogram (EEG), signal
a two way interface for communication with BCI2000 processing to extract relevant features and translation software
concerning user events and event handling; an interface to user to provide appropriate commands to an application.
applications concerning the passing of user commands and Applications include computer and environmental control, but
associated device identifiers, and the receiving of notification of entertainment applications are also under investigation. It is of
device status; and an interface to an extensible mark-up course possible that the application could provide some
language (xml) file containing menu content definitions. The opportunity for self expression and creativity. Figure 1
interface has achieved control of domotic applications. The illustrates some possibilities of BCI for augmenting the human:
architecture however permits control of more complex ‘smart’ listening to music, controlling photographs, watching films, or
environments and could be extended further for entertainment influencing music or visual arts.
by interacting with media devices. Using components of the The ability to apply a BCI to the control of multiple devices has
electroencephalogram (EEG) to mediate expression is also previously been explored [4]. It has also been demonstrated that
technically possible, but is much more speculative, and without BCI technology can be applied to many assistive technologies: -
proven efficacy. The IGUI-BCI approach described could wheelchair control [4],[5],[6] computer speller [7],[8],[4], web-
potentially find wider use in the augmentation of the general browser [9], environment control [10],[11],[12] and computer
population, to provide alternative computer interaction, an games [13],[14]. Smart homes technology is also an active area
additional control channel and experimental leisure activities. of research [15],[16], both for assistive applications and to
enhance ‘lifestyle’. This BCI paradigm offers the opportunity
General Terms for automated control of domotic devices and sensor interaction,
Experimentation for the purpose of providing an integrated ambient assistive
living space.
Keywords Inclusion

Brain Computer Interfaces, user interface, domotic control,


entertainment. BCI channel EEG channel
controls interface brain activity
modulates
1. INTRODUCTION Personalisation:
photo albums
Degenerative diseases or accidents can leave a person paralyzed BCI and
multimodal control
yet with full mentally function. There has been significant of music derived
research into creating brain mediated computer control [1] and from sonified EEG
assistive equipment that can be controlled by the brain, such as a Listening to
wheelchair mounted robotic arm system [2]. music

BCI and multimodal


Permission to make digital or hard copies of all or part of this work for augmentation of
Watching
personal or classroom use is granted without fee provided that copies are films visual textures
BCI
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, to republish, to post on servers or to redistribute to lists, Engaging environment
requires prior specific permission and/or a fee.
Augmented Human Conference, April 2–3, 2010, Megève, France. Figure 1: Uses of BCI for inclusion and augmentation of
Copyright © 2010 ACM 978-1-60558-825-4/10/04…$10.00. disadvantaged citizens
The merging of these complex and emerging technologies is not concern the issuing of temporally continuous and progressive
a trivial issue. The European Union has addressed this with commands for the purpose of adjusting environmental controls,
BRAIN (FP7-2007-ICT-2-7.2), and other BCI projects, which e.g. to increase volume on a media device; these interactions are
come under the “Future BNCI” umbrella [17]. BRAIN is vulnerable to latency. A hybrid command sequence achieves
dedicated to providing a solution for controlling multiple the intent of an analogue command sequence but is implemented
domotic devices. The project adopts a multi-disciplinary using a binary style of interaction. It offers pre-defined options:
approach to address issues ranging from improving BCI high, medium, low; and is enacted via a single command. It is
interaction [18],[19]; to integrating smart homes solutions. not suitable for fine-tuning.
BRAIN’s focus concerns the application architecture and modes
of user interaction. It offers an application interface with a
wrapper for integrating domotic standards and protocols, if
required, known as the universal application interface (UAI).
An intuitive graphical user interface (IGUI) provides an on
screen menu structure which interacts with the UAI to achieve
device control. The IGUI also interacts with BCI components,
currently implementing a high-frequency steady state visual
evoked potential (SSVEP) BCI paradigm.

2. ARCHITECTURE DESIGN

Blankertz [7] comments that a ‘major challenge in BCI research


is intelligent front-end design’. To be effective a BCI used
within a domestic setting has to interact with multiple devices Figure 2: Operation of Binary, Analogue and Hybrid
and software packages. Within BRAIN we have designed an Command Sequences
architecture that accommodates modification, facilitates
substitution of existing components and to which functionality
2.1 Intuitive graphical user interface (IGUI)
An intuitive graphical user interface (IGUI) provides on screen
can be added. This facilitates emerging domotic devices or
menu display of application content. This interface is suited to
upgrades in BCI technologies. The architecture is modular in
operate in conjunction with BCI peripherals, transducers and the
approach with each component being highly cohesive and
control interface. Furthermore, conforming to specified user
exhibiting loose coupling to other modules each with clearly
requirements the interface is defined in such a way as to be
defined interfaces. The human interface is intuitive, to unify
suited to operation under various BCI paradigms whilst
principles of operation across devices thereby reducing learning
maintaining common principles of operation. It is applicable to
and overhead of operation. This is crucial with BCI, which
all devices in the device controller module, i.e. universal
suffers very slow communication rates, with user invoked error
application interface (UAI), or which may be envisaged at a
choices posing a major problem. The architecture of the
future point. It is capable of updating content as the state or
application and the interface must also be capable of supporting
availability of the applications or devices changes over time.
context aware technologies such as predictive commands and
The user interface is also capable of handling modifications in
command sets based upon models of previous interactions.
display or operation according to user defined preferences. It is
The concept of a separate application working in conjunction anticipated that modules will be added to the interface
with BCI technologies was proposed by Mason et al [20]. This architecture which will provide predictive capabilities with
conforms to the end user elements of the controller interface and respect to user menu selections based upon context and passed
assistive devices as defined in the framework. It is also keeping choices. The UAI acts as wrapper for multiple device
with the design philosophy of the BCI2000 [21]. The BRAIN interaction standards and protocols; it provides a single control
architecture incorporates the flexible approach to the brain- interface to the IGUI, hiding the complexity of interaction. It is
computer interface of the BCI2000 general purpose platform. expected that the UAI will facilitate a variety of protocols and
The platform is designed to incorporate signals, signal standards with regard to domotic devices and that the number of
processing methods, output devices and operating protocols. available devices will expand.
The BCI2000 interface is minimal, consisting of Universal Data
The IGUI has three major interfaces. The first is a two way
Packets (UDP) the content of which can be specified in
interface for communication with BCI2000 concerning user
accordance with the signal processing being performed. This
events and event handling. The second, a two way interface to
approach employs the packets which originate from signal
the UAI concerning the passing of user commands and
processing activity [19]. Three main types of command
associated device identifiers, and the receiving of notification of
interaction sequence have been identified: binary, analogue and
device status. The third is an interface to an extensible mark-up
hybrid, see Figure 2. Binary command interaction concerns the
language (xml) file containing menu content definitions. This
issuing of a single command to the UAI in a single instance.
file although initially defined ‘off-line’ is thereafter written to by
This is used to navigate menu structures, and issue single
the UAI and read by the IGUI, dynamically. Should the need
autonomous instructions Analogue command interactions
arise substitution of packages is possible provided that the User commands passed through BCI2000 can either affect the
interface definitions are adhered to or an additional interface menu items displayed or can initiate the issuing of a device
wrapper is implemented for the purpose of ensuring interface command through a call to the UAI. In either case, a data
compatibility. In this manner, should the need arise; the IGUI packet is sent to BCI2000 indicating that LED operation should
can support an alternative signal processing mechanism to be suspended. The appropriate processing is performed and a
BCI2000, and has already been interfaced to an ‘OpenBCI’ further data packet is sent to BCI2000 re-initiating LED
platform [18]. The IGUI can potentially support other forms of operation. Where the user has indicated the issuing of a device
device interaction instead of the UAI, for instance: a dedicated command, the appropriate device identifier is read from the xml
navigational application for a wheelchair, with distributed menu definition file with the associated command. A command
control, executing macro commands and feeding back state notification is raised against the UAI using these two
information. The IGUI can support substitution of menu parameters. The UAI returns a device status indicator to the
content by changing the definitions in the xml file. IGUI.
The following sequence represents IGUI-UAI device interaction The purpose of the interface is to offer the ability to manipulate
menu context and issue simple instructions as part of an ongoing
1. BCI2000 raises user commands to the IGUI via an
sequence of communication. BCI is a low bandwidth form of
incoming data packet. The command is given significance
communication; the interface must offer the maximum amount
based upon the context of the user interface which is
of control for the minimum amount of interaction. The interface
obtained from the xml menu file.
must optimise engagement for the user, giving them a sense of
2. The specific command and associated device identification grounding with the application domain, offering a pathway
is passed to the UAI which handles commands according to towards task completion and giving a sense of accomplishment
the appropriate protocol or standard and instigates actual and progression as each step in a sequence of actions is
device interaction. achieved.

3. The status of devices as they join or leave the smart homes The IGUI and the UAI share access to a common menu
network is updated via the xml menu definition file; where structure. This menu is implemented in static xml with a
devices are either enabled or disabled as appropriate. The separate parsing module. The structure as implemented is
IGUI is informed as device status is modified so that the hierarchical, however, for future implementations, it is possible
menu can be re-parsed and the display updated accordingly. to declare traversal paths in-order to provide short-cuts, for
instance return to a high level menu on completion of a
4. For the purpose of receiving incoming messages the IGUI sequence of tasks. The current menu details several locations:
implements two listening threads. One dedicated to back garden, two bedrooms, bathroom, dining room, front
listening for incoming BCI2000 data packets – on thread garden, hall, kitchen, living room. Devices are grouped
UDPListener and one dedicated to listening for according to each room. Where devices are in a communal area,
incoming UAI redisplay events and unpackaged BCI2000 the user’s menu declaration lists the communal room and the
communications on thread EventQueueListener. device. A user’s menu declaration will not normally list menu
Clearly, it does not make sense to allow the user to issue a items which are only of significance to another user. When the
command as the menu display is being up-dated. The UAI detects that a device is available to the network, the device
device that the user may wish to interact with may no status on the menu declaration will be updated to ‘enabled’.
longer be available; neither does it make sense for the menu Provision within the xml declaration has also been made such
to be redisplayed at the same time a user command is being that, should a device be judged to be sufficiently significant it
processed as the outcome may affect the menu display. For can be made constantly available, through the use of a ‘sticky’
this reason mediation has to take place between events status to indicate permanent on screen display. It is also
raised on either thread and each event is processed possible to use a non location based groupings such as ‘general’
sequentially. to collect devices and applications together which do not have a
Interaction with BCI2000 is based upon the reception and single location, for example spellers and photo albums. Should
sending of data packets. The Internet Protocol (IP) address and the interface be used for some other purpose it is possible to
communication port of a computer supporting BCI2000 is implement a different classification mechanism, for instance
known to the IGUI. Using these, a thread is initiated for the grouping by functionality, or if necessary no classification
purpose of listening for incoming packets. On packet reception, mechanism. This is done by simply using replacing the xml
the data is unpacked and the nature of the incoming user declaration. Where devices or locations are to be added, the xml
command determined. The appropriate message is placed on the declaration can be expanded accordingly.
EventQueue. The UAI may also write an event to the The sample xml declaration (below) lists two forms of item.
EventQueue, essentially the UAI indicates when and how the ‘Node’ items have sub-item declarations (e.g., Bedroom1).
menu should be re-parsed and redisplayed. The IGUI ‘Leaf’ items are used to associate a specific physical device or
EventQueueListener monitors the EventQueue length. software package to a device/package interface command, e.g.
If an event is detected in the queue the IGUI reads the event, x10Light1. All menu items have an associated graphical
instigates the appropriate processing and removes the event from display. The location of the graphics file is declared in the icon
the queue. tag of the menu item in the xml declaration. Currently the menu
implementation uses static xml. Provision has been made in the
IGUI interface for the passing of an object containing a similar command arrows presented on the screen point to four
xml declaration for dynamic content. Dynamic content of the peripheral LEDs. For the purpose of interface testing and to
same format can be parsed and displayed using existing provide a potential support facility for the carers of users, the
mechanisms. Dynamic xml is relevant where content may be four command arrows can be activated using a standard mouse.
subject to frequent change, such as listing available files on a Under different BCI paradigms arrows would still be present on
media server (e.g. movie titles). the screen but they would function in a slightly different
manner. Under P300 it is anticipated that an additional module
<menu_list_item>
<label>Bedroom1</label> would govern the time sequenced animation of the arrows,
<enabled>True</enabled> thereby providing a synchronised stimulus for the ‘odd-ball’
<sticky>False</sticky> effect. Alternatively voluntary responses can be used to control
<icon>Bedroom1.jpg</icon> cursor movement towards arrows (ERD/ERS intended
<on_selection> movement paradigm).
<menu_list_item>
<label>Lighting</label>
<device_Id>x10Light1</device_Id>
<enabled>True</enabled>
<sticky>False</sticky>
<icon>Bedroom1/Lighting.jpg</icon>

<on_selection>
<command>BinaryLight.
toggle_power</command>
</on_selection>

</menu_list_item>
</on_selection>
</menu_list_item>
Figure 3: Intuitive Graphical Command Interface for
BRAIN Project Application
The xml menu declaration represents menu content. The user
needs a mechanism for manipulating content in as an effective The screen displays location level menu items. At this menu
manner as possible, in order to traverse the menu hierarchy and level the icons are photographs which in the real world context
to pass correctly formulated commands to the UAI. Currently could relate to the users own home. The use of photographs and
the supported BCI paradigm implements high-frequency SSVEP real world images are intended to make the interface use more
as a mechanism for user interaction, but it is anticipated that intuitive to the user and to reduce the cognitive load of
other BCI paradigms (‘oddball’ stimulus and intended interacting with the menu structure. At a lower menu level
movement) will be supported over time. Studies have reported general concept images have been used, albeit still in a
that up to 48 LEDs can be used. These operated between 6- photographic format. Specifically, individual lights are not
15Hz at increments of 0.195Hz [22]. However, making this represented; instead a universal image of a light bulb is used. It
many signals available to the user in a meaningful manner using is felt that at this menu level the user will already have grasped
a conventional screen interface requires a degree of mapping the intent of the interface. Furthermore, the concept of device
which may be beyond both the interface and beyond the user’s interaction at the command level is made universal by this
capabilities and inclinations. Similarly, many devices (cameras, approach, such as a tangibly visible interaction of turning a light
mp3 players, printers) with restricted input capabilities use a on and an invisible interaction such as turning the volume of a
four-way command mapping as an interface of choice. Using device up. Once again the flexibility of the application is
such a command interface it is possible to cycle through lists of demonstrated. The intuitive feel of the interface can be
menu items (left/right), to select commands or further menu modified by simply replacing the graphics files. For instance, a
option (down), and to reverse selections or exit the system (up). ‘younger’ look and feel can be obtained by replacing
Using less than four commands produces an exponential photographic representations with cartoon style drawings. On
command burden upon the user as cycle commands (left/right) screen display of icons are supported by associated labels, these
increase and selection commands (down) cannot be applied in a are represented by tags in the xml menu declaration. The labels
single action. It was decided that a four-way command interface are used to make the meaning explicit; however the interface has
would be optimal. The LEDs are placed at the periphery of the been devised in such a way as to ensure that literacy is not
screen with command icons central to the display [23]. required.

3. Application Interface
The command interface Figure 3, displays the icons relating to
three menu items central to the screen. The central icon
Smart Homes are environments facilitated with technology that
represents the current menu item; as such it is highlighted using
act in a protective and proactive function to assist an inhabitant
a lighter coloured frame. Icons to either side, provide list
in managing their daily lives specific to their individual needs.
orientation to the user, to suggest progression and to suggest
Smart homes technology has been predominately applied to
alternative options. Under the current SSEVP paradigm the four
assist with monitoring vulnerable groups such as people with specification, offering the possibility of ‘wrapping’ other
early stage dementia,[24] and older people in general [25] by technologies (e.g. where a device is not UPnP compliant). UPnP
optimising the environment for safety. The link between BCI enables data communication between any two devices under the
and Smart homes is obvious, as it provides a way to interact command of any control device on the network. UPnP
with the environment using direct brain control of actuators. Our technology can run on any medium (category 3 twisted pairs,
contribution uses a software engineering approach, building an power lines (PLC), Ethernet, Infra-red (IrDA), Wi-Fi,
architecture which connects the BCI channel to the standard Bluetooth). No device drivers are used; common protocols are
interfaces used in Smart Homes so that control, when used instead.
established, can be far reaching and tuned to the needs of the
The UPnP architecture supports zero-configuration, invisible
individual, be it for entertainment, assistive devices or
networking and automatic discovery, whereby a device can
environmental control. Thus a link to the standards and
dynamically join a network, obtain an IP address, announce its
protocols used in Smart Home implementations is important.
name, convey its capabilities upon request, and learn about the
A BCI-Smart Home channel could allow users to realize several presence and capabilities of other devices. Dynamic Host
common tasks. For instance, to switch lights or devices on/off, Configuration Program (DHCP) and Domain Name servers
adjust thermostats, raise/lower blinds, open/close doors and (DNS) are optional and are only used if they are available on the
windows. Video-cameras could be used to identify a caller at the network. A device can leave a network smoothly and
front door, and to grant access, if appropriate. The same automatically without leaving any unwanted state information
functions achieved with a remote control could be realized (i.e. behind. UPnP networking is based upon IP addressing. Each
for a television, control the volume, change the channel, ‘mute’ device has a DHCP client and searches for a server when the
function). In a media system, the user could play desired music device is first connected to the network. If no DHCP server is
tracks. available, that is, the network is unmanaged, the device assigns
itself an address. If during the DHCP transaction, the device
3.1 Standards and Protocols obtains a domain name, the device should use that name in
subsequent network operations; otherwise, the device should use
its IP address.
While the underlying transmission media and protocols are
largely unimportant from a BCI user perspective, the number of Open Source Gateway Interface (OSGi ) is middleware for the
standards provides an interoperability challenge for the software Java platform. OSGi technology provides a service-oriented,
engineer. Open standards are preferred. A number of standards component-based environment for developers and offers a
bodies are involved; the main authorities are Institute of standardized ways to manage the software lifecycle. The OSGi
Electrical and Electronics Engineers (IEEE), International platform allows building applications from components. Two
Telecommunication Union (ITU-home networking) and (or more components) can interact through interfaces explicitly
International Standards Organisation (ISO). Industry provides declared in configuration files (in xml). In this way, OSGi is an
additional de-facto standards. Given the slow ‘user channel’, enabler of expanded modular development at runtime. Where
BCI interaction with the control aspects of domotic networks modules exist in a distributed environment (over a network),
requires high reliability with available bit-rate transmission, web services may be used for implementation. The OSGi UPnP
being of much lesser importance. Service maps devices on a UPnP network to the Service
Registry.
Domotic standards for home automation are based on either
wired or wireless transmission. Wired is the preferred mode for 3.2 Interoperability with existing smart
‘new’ build Smart Homes, where an information network may
be installed as a ‘service’ similar to electricity or mains water home interface
supply. Wireless networks can be used to retrofit existing It is important that the architecture developed can interoperate
buildings, are more flexible, but are more prone to domestic with existing and future assistive technology. A BRAIN partner,
interference, overlap and ‘black spots’, where communication is The Cedar Foundation, has sheltered apartments (Belfast) which
not possible. Wireless networks normally use work using radio are enabled for non-BCI Smart Home control. Each apartment is
frequency (RF) transmission and can use Industrial Scientific fully networked with the European Installation Bus (EIB) for
and Medicine (ISM) frequencies (2.4GHz band) or proprietary home and building automation [26]. Into this, peripherals are
frequencies and protocols. Infra-red uses higher frequencies connected which can be operated via infra-red remote control
which are short range and travel in straight lines (e.g. remote [27]. These peripherals when activated carry tasks that tenants
control for television control). are not physically able to perform. Examples include door
access, window and blind control, heating and lighting control
The Universal Plug and Play (UPnP) architecture offers and access to entertainment. Whilst this was ‘state of the art’
pervasive peer-to-peer network connectivity of PCs, intelligent technology at the time of development, KNX has replaced EIB
appliances, and wireless devices. UPnP is a distributed, open as the choice for open standard connectivity. This reinforces the
networking architecture that uses TCP/IP and HTTP protocols need for interoperability within a modular architecture, if BCI is
to enable seamless proximity networking in addition to control to be introduced to the existing configuration.
and data transfer among networked devices in the home. UPnP
does not specify or constrain the design of an API for
applications running on control points. A web browser may be
used to control a device interface. UPnP provides interoperable
3.3 A Universal Application Interface replacing a communication wrapper the IGUI could interface
The Universal Application Interface (UAI) aims to interconnect with a different BCI package, or by replacing the UAI the IGUI
heterogeneous devices from different technologies integrated in could be harnessed for other control purposes, for example
the home network, and to provide common controlling interface driving a robot. It is also possible for the IGUI to be substituted
for the rest of the system layers. Figure 4 illustrates how the and for a different command interface to call the services of the
interfaces between BCI2000, IGUI, menu definition and UAI. UAI.

UAI control is based on UPnP specification, which provides The UAI is also flexible, additional standards can be added
protocols for addressing and notifying, and provides without modifying the core command processing and device
transparency to high level programming interfaces. The UAI handling modules. By incorporating new standards it is possible
maps requests to events, generates the response to the user’s to interact with an increasing number of devices without
interaction, and advertises applications according device. The radically modifying other aspects of the application device
UAI infers the device type and services during the discovery architecture. By presenting an architecture which facilitates the
process, including the non-UPnP devices, which can be wrapped up-grading of existing standards it is also possible to interact
as UPnP devices with the automatic deployment of device with existing devices in a more efficient manner.
proxies.

4. BCI FOR CREATIVE EXPRESSION

The link between EEG and performance has been established by


Miranda and Brouse [28], who used a BCI to play notes on an
electric piano. The piano didn’t play specific songs as such, but
more rhythms of similar notes put together. There was not much
choice in what the application could do but it created music
using purely the power of the mind. The system was difficult to
use but provided an outlet for disabled users. Even if not
particularly creative, music and art therapy can be used to
improve quality of life for severely disabled individuals.
Within the area of EEG there has been a level of interest in
sonification of the different frequency bands within the
waveforms. For some, the objective has been to create an
“auditory display” of the waveforms as a method to portray
information. For example a sonic representation of the heart rate
can be a powerful mechanism for conveying important
Figure 4: Interaction of the Intuitive Graphical Command
information, complementing the visualization of the data.
Interface and Universal Application Interface
Biosignals such as EEG and electromyogram (EMG) or muscle
The UAI is divided into three modules: 1) Devices Control activity can also be used to generate music. Using EEG as a
Module which interacts directly with the UPnP devices. It forum for musical expression was first demonstrated in 1965 by
consists of a Discovery Point and several Control Points for the the composer Alvin Lucier through his recital named “Music for
different types of UPnP services. It also includes the UPnP Solo Performer” [29]. Here manipulation of alpha waves was
wrappers needed to access the non UPnP devices. 2) Context utilized to resonate percussion instruments. Rosenboom [30]
Aware Module, which is capable of triggering automatic actions used biofeedback as a method of artistic exploration by
when certain conditions are met without the need of user composers and performers.
intervention. The module receives events from the applications
and devices and invokes the actions resulting from the There has been recent work within BCI applications for music
evaluation of a set of predefined rules. 3) Applications Layer, creation by researchers Miranda, Arslan, Brouse, Knapp and
which provides the interactive services the user will control Filatriau. Brouse and Miranda [31] developed the eNTERFACE
through the BCI. Applications include domotic control, which (http://www.enterface.net/) initiative resulting in two types of
allows the user to control simple devices; and entertainment, instrument. The first was referred to as the BCI-Piano and used
which allows the user to control a multimedia server, e.g. to a music generator driven by the parameters of the EEG. The
watch movies. second instrument was the InterHarmonium and enabled
geographically separated performers to combine the sonification
The purpose of the UAI is to provide a uniform platform for of their EEG in real time. In addition to generating music, EEG
device inter-action based upon masking the complexity of synthesis of visual textures have also been investigated [32].
numerous domotic device standards and communication
protocols. Flexibility and robustness in design is evidenced by The importance of enabling a level of creativity and self-
the fact that it is possible to substitute the two core components expression to be given to highly physically disabled people
or for the core components to interact with other hardware and through the use of EEG and bio-signal sonification is
software configurations, if necessary. For instance by simply highlighted by the Drake music project [33]. Assistive
technology is used, enabling compositions by musicians, with
only a limited level of motor movement. By incorporating BCI Control of a Wheelchair in Virtual Environments: A Case
within such a framework would expand the ability of self- Study with a Tetrplegic’, Journal of Computational
expression to people with much more severe physical Intelligence and Neuroscience, Vol 2007.
disabilities. Multi-modal interaction, allowing EMG for example [7] Blankertz B, Krauledat M, Dornhege G, Williamson J,
to augment the EEG signal, is the basis of the BioMuse system. Murray-Smith, Müller K.R, ‘A Note on Brain Actuated
In addition the combination of enabling the generation of both Spelling with the Berlin Brain-Computer Interface’,
music and visual display as demonstrated by Arslan et al [34] Lecture Notes on Computer Science, 2007, Vol 4555,
creates enhanced opportunity for self-expression. pp759-768, Heidelberg, Springer.
[8] Felton E, Lewis N.L, Willis S.A, Radwin R.G, ‘Neural
5. CONCLUSIONS Signal Nased Control of the Dasher Writing System’, IEEE
3rd. International EMBS Conference on Neural
We describe an intelligent graphical user interface (IGUI) Engineering, May 2007, pp366-370.
tailored to Brain Computer Interface (BCI) interaction, designed [9] Bensch M, Karim A.A, Mellinger J, Hinterberger T,
for people with severe communication needs. The interface has Tangermann M, Bogdan M, Rosenstiel W, Birbaumer N,
achieved control of simple domotic applications, via a user ‘Nessi: An EEG-Controlled Web Browser for Severely
applications interface (UAI). The architecture described Paralyzed Patients’, Journal of Computational Intelligence
however permits control of more complex ‘smart’ environments and Neuroscience, Vol 2007.
and will be extended further for entertainment by interacting
[10] Piccini L, Parini S, Maggi L, Andreoni G, ‘A Wearable
with media devices. While the use and efficacy of BCI for
Home BCI System: Preliminary Results with SSVEP
creative expression is still highly speculative, the technical
Protocol’, Proceedings of the IEEE 27th. Annual
approach adopted with the IGUI can be easily adapted to the
Conference Engineering in Medicine and Biology, 2005,
generation of relevant features from the EEG. All that is requires
pp5384-5387.
is agreement upon syntax and content of UDP packets and
additional signal processing. The IGUI approach described [11] Teo E, Huang A, Lian Y, Guan C, Li Y, Zhang H, ‘Media
could potentially find wider use in the general population, to Communication Centre Using Brain Computer Interface’,
provide alternative computer interaction, an additional control Proceedings of the IEEE 28th. Annual Conference
channel and experimental leisure activities. Engineering in Medicine and Biology, 2006, pp2954-2957.
[12] Bayliss J, ‘Use of the Evoked Potential P3 Component for
6. ACKNOWLEDGMENTS Control in a Virtual Appartment’, IEEE Transactions on
The BRAIN consortium gratefully acknowledge the support of Neural Systems and Rehabilitation Engineering Vol 11, No
the European Commission’s ICT for Inclusion Unit, under grant 2, 2003, pp113-116.
agreement No. 224156. [13] Martinez P, Bakardjian H, Cichocki A, ‘Fully Online
Multicommand Brain-Computer Interface with Visual
7. REFERENCES Neurofeedback Using SSVEP Paradigm’, Journal of
[1] Wolpaw, J. R, McFarland D. J. Control of a two- Computational Intelligence and Neuroscience, Vol 2007.
dimensional movement signal by a non-invasive brain- [14] Berlin Brain-Computer Interface – The HCI
computer interface in humans, Published by Proceedings of Communication Channel for Discovery’, International
the National Academy of Sciences of the United States of Journal Human-Computer Studies, Vol 65, 2007 pp460-
America, December 21, 2004 vol. 101 no. 51 477.
[2] Laurentis, K.J., Arbel Y., Dubey R., Donchin E. [15] Chan M, Estève D, Escriba C, Campo E, ‘A Review of
Implementation of a P-300 brain computer interface for the Smart Homes – Present State and Future Challenges’,
control of a wheelchair mounted robotic arm system. Computer Methods and Programs in Biomedicine, Vol 91,
Published by ASME in the Proceedings of the ASME 2008 2008, pp55-81.
Summer Bioengineering Conference (SBC2008), June 25- [16] Poland, M.P, Nugent, C.D, Wang, H, Chen L, ‘Smart
29, pp. 1-2 (2008). Home Research: Projects and Issues’, International Journal
[3] Blanchard, Gilles, Blankertz, Benjamin, (2003) BCI of Ambient Computing and Intelligence, Vol 1, No 4,
Competition 2003—Data Set IIa: Spatial Patterns of Self- 2009, pp32-45.
Controlled Brain Rhythm Modulations, Published by IEEE. [17] Progress in Brain/Neuronal Computer Interaction (BNCI),
[4] Millan J.R, ‘Adaptive Brain Interfaces’, Communications http://hmi.ewi.utwente.nl/future-bnci, accessed Jan 2010.
of the ACM, March 2003, Vol 46 No 3, pp75-80. [18] Durka P, Kus R, Zygierewicz J, Milanowski P and Garcia
[5] Valsan G, Grychtol B, Lakany H, Conway B.A, ‘The G, ‘High-frequency SSVEP responses parametrized by
Strathclyde Brain Computer Interface’, IEEE 31st. Annual multichannel matching pursuit’, Frontiers in
International Conference on Engineering in Medicine and Neuroinformatics. Conference Abstract: 2nd INCF
Biology Society, Sept 2009, pp606-609. Congress of Neuroinformatics. 2009.
[6] Leeb R, Friedman D, Müller-Putz G.R, Scherer R, Slater [19] Garcia G, Ibanez D, Mihajlovic V, Chestakov D,
M, Pfurtscheller G, ‘Self-Paced (Asynchronous) BCI ‘Detection of High Frequency Steady State Visual Evoked
Potentials for Brain-Computer Interfaces’, 17th European [27] SICARE SENIOR PILOT, 2008. Available at:
Signal Processing Conference, 2009. http://www.reflectivepractices.co.uk/cms/index.php?option
[20] Mason S.G, Moore Jackson M.M, Birch G.E, ‘A General =displaypage&Itemid=50&op=page&SubMenu=
Framework for Characterizing Studies of Brain Interface [Accessed on 21.07.09].
Technology’, Annuals of Biomedical Engineering, Vol 33, [28] Miranda, E. R. and Brouse, A. (2005a). Interfacing the
No 11, November February 2005, pp1653-1670. Brain Directly with Musical Systems: On developing
[21] Schalk G, McFarland D, Hinterberger T, Birbaumer N, systems for making music with brain signals, Leonardo,
Wolpaw J.R, ‘BCI2000: A General-Purpose Brain- 38(4):331-336
Computer Interface (BCI) System’, IEEE Transactions on [29] Lucier, A (1965). Music for the solo performer,
Biomedical Engineering Vol 51, No 6, June 2004, pp1034- http://www.emfinstitute.emf.org/exhibits/luciersolo.html
1043. last accessed July 2007.
[22] Gao X, Xu D, Cheng M, Gao S, ‘A BCI-Based [30] Rosenboom, D. (2003). Propositional Music from
Environmental Controller for the Motion-Disabled’, IEEE Extended Musical Interface with the Human Nervous
Transactions on Neural Systems and Rehabilitation System, Annals of the New York Academy of Sciences
Engineering , Vol 11, No 2 June 2003, pp137-140. 999, pp 263.
[23] Parini S, Maggi L, Turconi A Robust and Self-Paced BCI [31] Brouse, A., Filatriau, J.-J., Gaitanis, K., Lehembre, R.,
System Based on a Four Class SSVEP Paradigm: Macq, B., Miranda, E. and Zenon, A. (2006). "An
Algorithms and Protocols for a High-Transfer-Rate Direct instrument of sound and visual creation driven by
Brain Communication’, Journal of Computational biological signals". Proceedings of ENTERFACE06,
Intelligence and Neuroscience, 2009. Dubrovnik (Croatia). (Not peer-reviewed report.)
[24] ENABLE project (2008). Can Technology help people with [32] Filatriau, J.J., Lehembre, R., Macq, B., Brouse, A. and
Dementia? Retrieved July 7, from Miranda, E. R. (2007). From EEG signals to a world of
http://www.enableproject.org/ sound and visual textures. (Submitted to ICASSP07
[25] Kidd, C.D., Orr, R.J., Abowd, G.D., Atkeson, C.G., Essa, conference).
I.A., MacIntyre, B., Mynatt, E., Starner, T.E., Newstetter, [33] Drake music project (2007),
W. (1999). The Aware home: a living laboratory for http://www.drakemusicproject.org/, last accessed July 2007
ubiquitous computing research , Proceedings of 2nd [34] Arslan, B., Brouse, A., Castet, J., Lehembre, R., Simon, C.,
International workshop on cooperative buildings Filatriau, JJ and Noirhomme, Q. (2006). A Real Time
Integrating Information, Organization, and Architecture, Music Synthesis Environment Driven with Biological
191 - 198 . Signals, IEEE International Conference on Acoustics,
[26] Kolger, M. 2006. Free Development Environment for bus Speech and Signal Processing, 2006. ICASSP 2006
coupling units of the European Installation Bus. Available Proceedings, vol. 2.
at http://www.auto.tuwien.ac.at/~mkoegler/eib/sdkdoc-
0.0.2.pdf . [Accessed on 21.07.09]

i
Additional authors: M.D..Mulvenna, H.G..McAllister, C.D..Nugent, Faculty of Computing and Engineering, University of Ulster,
Shore Road, Jordanstown, Co. Antrim BT37 0QB, UK
Emotion Detection using Noisy EEG Data

Mina Mikhail Khaled El-Ayat Rana El Kaliouby


Computer Science and Computer Science and Media Laboratory
Engineering Department Engineering Department Massachusetts Institute of
American University in Cairo American University in Cairo Technology
113 Kasr Al Aini Street 113 Kasr Al Aini Street 20 Ames Street
Cairo, Egypt Cairo, Egypt Cambridge MA 02139 USA
minamikhail@acm.org kelayat@aucegypt.edu kaliouby@media.mit.edu
James Coan John J.B. Allen
Department of Psychology Department of Psychology
University of Virginia University of Arizona
102 Gilmer Hall 1503 E University Blvd.
Charlottesville, VA Tucson, AZ 85721-0068
22904-4400 John.JB.Allen@arizona.edu
jcoan@virginia.edu

ABSTRACT 1. INTRODUCTION
Emotion is an important aspect in the interaction between Over the past two decades, there has been an increasing
humans. It is fundamental to human experience and ratio- interest in developing systems that will detect and distin-
nal decision-making. There is a great interest for detecting guish people’s emotions automatically. Emotions are funda-
emotions automatically. A number of techniques have been mental to human experience, influencing cognition, percep-
employed for this purpose using channels such as voice and tion, and everyday tasks such as learning, communication,
facial expressions. However, these channels are not very and even rational decision-making. However, studying emo-
accurate because they can be affected by users’ intentions. tions is not an easy task, as emotions are both mental and
Other techniques use physiological signals along with elec- physiological states associated with a wide variety of feel-
troencephalography (EEG) for emotion detection. However, ings, thoughts, and behaviors [15].
these approaches are not very practical for real time appli- Many have attempted to capture emotions automatically.
cations because they either ask the participants to reduce Developing computerized systems and devices that can au-
any motion and facial muscle movement or reject EEG data tomatically capture human emotional behavior is the pur-
contaminated with artifacts. In this paper, we propose an pose of affective computing. Affective computing attempts
approach that analyzes highly contaminated EEG data pro- to identify physiological and behavioral indicators related
duced from a new emotion elicitation technique. We also to, arising from or influencing emotion or other affective
use a feature selection mechanism to extract features that phenomena [14]. It is an interdisciplinary field that requires
are relevant to the emotion detection task based on neuro- knowledge of psychology, computer science and cognitive sci-
science findings. We reached an average accuracy of 51% ences.
for joy emotion, 53% for anger, 58% for fear and 61% for Because of its many potential applications, affective com-
sadness. puting is a rapidly growing field. For example, emotion
assessment can be integrated in human-computer interac-
Categories and Subject Descriptors tion systems in order to make them more comparable to
human-human interaction. This could enhance the usabil-
I.5.2 [Pattern Recognition]: Design Methodology—Clas- ity of systems designed to improve the quality of life for
sifier design and evaluation, Feature evaluation and selec- disabled people who have difficulty communicating their af-
tion fective states. Another emerging application that makes use
of emotional responses is to quantify customer’s experiences.
Keywords Automated prediction of customer’s experience is important
Affective Computing, Brain Signals, Feature Extraction, Sup- because the current evaluation methods such as relying on
port Vector Machines customers’ self reports are very subjective. People are not al-
ways feeling comfortable revealing their true emotions. They
may inflate their degree of happiness or satisfaction in self
reports [21].
Permission to make digital or hard copies of all or part of this work for There are two main approaches for eliciting participants’
personal or classroom use is granted without fee provided that copies are emotions. The first method presents provoking auditory or
not made or distributed for profit or commercial advantage and that copies visual stimulus to elicit specific emotions. This method is
bear this notice and the full citation on the first page. To copy otherwise, to
used by almost all studies in literature [9, 17, 2, 1, 18,
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee. 13, 19]. The second approach builds on the facial feedback
AH Augmented Human Conference, April 2-3, 2010, Megève, France. paradigm which shows that facial expressions are robust elic-
Copyright ° c 2010 ACM 978-1-60558-825-4/10/04 ...$10.00.
itors of emotional experiences. In the famous Strack, Martin indicators of emotion because they can either be faked by
& Stepper’s study [20], Strack, Martin & Stepper attempted the user or may not be produced as a result of the detected
to provide a clear assessment of the theory that voluntary emotion.
facial expressions can result in an emotion. Strack, Martin, Based on the cognitive theory of emotion, the brain is the
& Stepper [20] devised a cover story that would ensure the center of every human action [17]. Consequently, emotions
participants adopt the desired facial posing without being and cognitive states can be detected through analyzing phys-
able to perceive either the corresponding emotion or the re- iological signals that are generated from the central nervous
searchers’ real motive. Each participant was asked to hold a system such as brain signals recorded using EEG. However,
pen in his mouth in different ways that result in different fa- there is not much work done in this area of research. Thanks
cial poses. Participants who held a pen resulting in a smile to the success of brain computer interface systems, a few
reported a more positive experience than those who held new studies have been done to find the correlation between
the pen in a position that resulted in a frown. This study different emotions and EEG signals. Most of these studies
was followed by different psychologists including Ekman et combine both EEG signals with other physiological signals
al. [6] who found that emotions generated with a directed generated from the peripheral nervous system [1, 2].
facial action task results in a finer distinction between emo- One of the earliest attempts to prove that EEG signals
tions. However, this approach contaminates brain signals can be used for emotion detection is proposed by Chanel et
with facial muscle artifacts and that’s why this approach is al [2]. Chanel et al [2] tried to distinguish among excitement,
not conceived by computer scientists. neutral and calm signals. They compared the results of three
We decided to explore this second approach because it emotion detection classifiers. The first one was trained on
helps making our system close to actual real time emotion EEG signals, the second classifier was trained on peripheral
detection systems since there will be lots of facial muscle signals such as body temperature, blood pressure and heart
movements and other artifacts that will contaminate EEG beats. The third classifier was trained on both EEG and
data. peripheral signals. In order to stimulate the emotion of in-
Our work extends existing research in three principal ways. terest, the user is seated in front of a computer and is viewed
First, we are the first in the computer science field to use an image to inform him/her which type of emotion s/he has
voluntary facial expression as a means for enticing emotions. to think of. They then captured the signals from 64 differ-
Although this contaminates EEG with noise, it helps to test ent channels that cover the whole scalp in order to capture
our approach on unconstrained environment where the users signals in all the rhythmic activity of the brain neurons. As
were not given any special instructions about reducing head for feature extraction, they transformed the signal into the
motions or facial expressions which makes our dataset close frequency domain and use the power spectral as the EEG
to a real time application. Second, we are using a new tech- features. Finally, they used a Naive Bayes Classifier which
nique for selecting features that are relevant to the emo- resulted in an average accuracy of 54% compared to only
tion detection task that is based on neuroscience findings. 50% for a classifier trained on physiological signals. The ac-
Finally, we tested our approach on a large dataset of 36 curacy of combining both types of signals resulted in a boost
subjects and we were able to differentiate between four dif- of accuracy that reached up to 72%. The problem with the
ferent emotions with an accuracy that ranges from 51% to research done by Chanel et al [2] is the idea of using 64 chan-
61% which is equal or higher than other related works. nels for recording EEG as well as other electrodes to capture
The paper is organized as follows: section 2 surveys re- physiological signals make this approach impractical to be
lated work on different channels used for emotion detection, used in real time situation.
especially those that use EEG. Section 3 discusses the cor- Ansari et al [1] improved the work done by Chanel et
pus of EEG we are using and how different emotions are al [2]. They proposed using Synchronization Likelihood (SL)
elicited. Section 4 shows the different noise sources that method as a multichannel measurement which allowed them
contaminates EEG signals. Section 5 gives an overview of along with anatomical knowledge to reduce the number of
our methodology for emotion detection using EEG. Exper- channels from 64 to only 5 with a slight decrease in accu-
imental evaluation and results are presented in section 6. racy and huge improvement in performance. The goal was
Section 7 concludes the paper and outlines future directions to distinguish between three emotions which are exciting-
in the area of emotion detection using EEG. positive, exciting-negative and calm. For signal acquisition,
they acquired the signal from (AFz, F4, F3, CP5, CP6). For
2. RELATED WORKS feature extraction, they used sophisticated techniques such
as Hjorth Parameters and Fractal Dimensions and they then
There is much work done in the field of emotion and cog-
applied Linear Discriminant Analysis (LDA) as their classi-
nitive state detection by analyzing facial expressions or/and
fication technique. The results showed an average accuracy
speech. Some of these systems showed a lot of success such
of 60% in case of using 5 channels compared to 65% in case
as those discussed in [7, 10]. The system proposed by El
of using 32 channels.
Kaliouby and Robinson [7] uses an automated inference of
A different technique was taken by Musha et al [13]. They
cognitive mental states from observed facial expressions and
used 10 electrodes (FP1,FP2, F3, F4, T3, T4, P3, P4, O1,
head gestures in video. Whereas, the system proposed by
and O2) in order to detect four emotions which are anger,
Kim et al [10] makes use of multimodal fusion of different
sadness, joy and relaxation. They rejected frequencies lower
timescale features of the speech. They also, make use of
than 5 Hz because they are affected by artifacts and fre-
the meaning of the words to infer both the angry and neu-
quencies above 20 Hz because they claim that the contribu-
tral emotions. Although facial expressions are considered to
tions of these frequencies to detect emotions are small. They
be a very powerful means for humans to communicate their
then collected their features from the theta, alpha and beta
emotions [21], the main drawback of using facial expressions
ranges. They performed cross correlation on each channel
or speech recognition is the fact that they are not reliable
collected in the university of Arizona by Coan et al. [4].Tin
(a) (c)
electrodes in a stretch-lycra cap (Electrocap, Eaton, Ohio)
were placed on each participant’s head. EEG was recorded
at 25 sites( FP1, FP2 F3, F4, F7, F8, Fz, FTC1, FTC2, C3,
C4, T3, T4, TCP1, TCP2, T5, T6, P3, P4, Pz, O1, O2, Oz,
A1, A2) and referenced online to Cz.

3.1 Participants
(d)
(b)
This database contains EEG data recorded from thirty-six
participants (10 men and 26 women). All participants were
right handed. The age of the participants ranged from 17 to
24 years, with a mean age of 19.1. The ethnic composition of
the sample was 2.7% African American, 2.7% Asian, 18.9%
Hispanic, and 75.7% Caucasian.

Figure 1: Muscle movements in the full face condi- 3.2 Procedure


tions: (a) joy (b) fear (c) anger (d) sadness [4]. According to Coan et al. [4], the experimenter informed
participants that they were taking part in a methodological
study designed to identify artifacts in the EEG signal intro-
pairs. The output of this cross correlation is a set of 65 vari- duced by muscles on the face and head. Participants were
ables that is linearly transformed to a vector of 1x4 using further told that accounting for these muscle movement ef-
a transition matrix. Each value indicates the magnitude of fects would require them to make a variety of specific move-
the presence of one of the four emotions. This means that ments designed to produce certain types of muscle artifact.
any testing sample is a linear combination of the four emo- The presence of such muscle artifacts make the problem of
tions. After that they applied certain threshold to infer the emotion detection using EEG very difficult because the EEG
emotion of interest. signals will be contaminated with muscle artifacts which gets
A good step towards real time applications that depend it close to real time applications where there will be no con-
on EEG for emotion recognition is proposed by Schaaff and trol over the facial muscles or other sources of noise.
Schultz [19]. Schaaff and Schultz [19] used only 4 electrodes Participants were led to believe that they were engaged
(FP1, FP2, F7, F8) for EEG recording. The main purpose in purposely generating error-muscle artifact. It was hoped
of this research is to classify between positive, negative and that although participants might detect the associations be-
neutral emotions. To reach their goal, they selected peak tween the directed facial action tasks and their respective
alpha frequency, alpha power, cross-correlation features and target emotions, they would not think of the target emo-
some statistical features such as the mean of the signal, the tions as being of interest to the investigators. After partici-
standard deviation. They used Support Vector Machines for pants were prepared for psychophysiological recording with
classification and they reached an average accuracy of 47%. EEG and facial EMG electrodes , participants sat quietly
Other researches imply a multimodal technique for emo- for 8 min during which resting EEG was recorded during
tion detection. One of these studies was done by Savran et a counterbalanced sequence of minute-long eyes-open and
al [18]. They propose using EEG, functional near-infrared eyes-closed segments.
imaging (fNIRS) and video processing. fNIRS detects the For the facial movement task, participants were seated in
light that travels through the cortex tissues and is used to a sound-attenuated room, separate from the experimenter.
monitor the hemodynamic changes during cognitive and/or The experimenter communicated with participants via mi-
emotional activity. Savran et al [18] combined EEG with crophone, and participants’ faces were closely monitored at
fNIRS along with some physiological signals in one system all times via video monitor. Participant facial expressions
and fNIRS with video processing in another system. They were recorded onto videotape, as were subsequent verbal
decided not to try video processing with EEG because facial self-reports of experience. The experimenter gave explicit
expressions inject noise into EEG signals. Also, when they instructions to participants concerning how to make each
recorded both EEG and fNIRS, they excluded the signals facial movement, observing participants on the video moni-
captured from the frontal lobe because of the noise produced tor to ensure that each movement was performed correctly.
by the fNIRS recordings. For experimentation, they showed Participants were asked to perform relatively simple move-
the participant images that will induce the emotion of inter- ments first, moving on to more difficult ones. For example,
est and then recorded fNIRS, EEG and video after showing the first movement participants were asked to perform is
these images. The fusion among the different modalities is one that is part of the expression of anger. This movement
done on the decision level and not on the feature level. The engages the corrugator muscle in the eyebrow and forehead
problem with this research is that they are excluding data drawing the eyebrows down and together. Subjects were
that are contaminated with facial expressions which is not asked to make the movement in the following manner: ”move
practical since most emotions are accompanied with some your eyebrows down and together.” This was followed by
sort of facial expressions. This makes their approach im- two other partial faces, making three partial faces in all. No
practical in lifetime situations. counterbalancing procedure was used for the control faces,
as they were all considered to be a single condition.
One of the approaches that describes facial movements
3. EEG STUDY and their relation with different emotions is the Facial Ac-
In this research, we are using the database of EEG signals tion Coding System (FACS) [5], a catalogue of 44 unique
action units (AUs) that correspond to each independent mo-
tion of the face. It also includes several categories of head
and eye movements. FACS enables the measurement and
scoring of facial activity in an objective, reliable and quan-
titative way. Expressions included joy (AUs 6 + 12 + 25), Voluntary facial expression

anger (AUs 4 + 5 + 7 + 23/24), fear (AUs 1 + 2 + 4 + 5


EEG Recording
+ 15 + 20), sadness (AUs 1 + 6 + 15 + 17) can be shown
in Fig 1. These kinds of action units are used to entice such Signal Preprocessing
emotions. Offline Average Reference
After completing the directed facial action task of a par-
Downsampling to 256 Hz
ticular face, each participant was asked each of the follow-
ing: (1) While making that face, did you experience any Bandpass filter [3-30] Hz

thoughts? (2) While making that face, did you experience


any emotions? (3) While making that face, did you experi- Feature Extraction
ence any sensations? (4) While making that face, did you Fast Fourier Transform

feel like taking any kind of action, like doing anything? If Extracting Alpha Band
anything was reported, participants were then asked to rate
its intensity on a scale of 1 to 7 (1 = no experience at all; 7 Classification
= an extremely intense experience).
For each participant, we have four files indicating one of
the four emotions. Each file has two minutes of recording. Figure 2: Multistage approach for emotion detection
These two minutes are not fully representing emotions. Hu- using EEG.
man coders were used to code the start and the end of each
emotion. We used a one minute of recording between the
start and the end. In order to have more than one file for near blood vessels, the data coming from them will be af-
each emotion for each participant, we worked on two 30- fected by the heartbeat.
second epoches. Sweat artifacts can affect the impedance of the electrodes
used in recording the brain activity. Subsequently, the data
recorded can be noisy or corrupted. These different types of
4. TYPES OF NOISE noise make the processing of EEG a difficult task especially
in real time environment where there is no control over the
4.1 Technical Artifacts environment or the subject.
The technical artifacts are usually related to the environ- Our dataset is largely contaminated with facial muscle
ment where the signals are captured. One source of technical artifacts. Despite this highly noisy dataset, we are trying to
noise is the electrodes itself [12]. If the electrodes are not achieve reasonable detection accuracy for the four emotions,
properly placed over the surface of the scalp or if the re- anger, fear, joy and sad , and low false positive so that we
sistance between the electrode and the surface of the scalp can integrate our emotion detection approach into real time
exceeds 5 kohm, this will result in huge contamination of affective computing systems.
the EEG. Another source of technical artifact is the line
noise. This noise occurs due to A/C power supplies which 5. APPROACH FOR EMOTION DETECTION
may contaminate the signal with 50/60 Hz if the acquisition
electrodes are not properly grounded. Our EEG database is USING EEG
contaminated with the 60 Hz frequency. As shown in Fig. 2, we use a multilevel approach for ana-
lyzing EEG to infer the emotions of interest. The recorded
4.2 Physiological Artifacts EEG recorded signals are first passed through the signal
Another sources of noise are the physiological artifacts. preprocessing stage in which the EEG signals are passed
Those physiological artifacts include eye blinking, eye move- through number of filters for noise removal. After that rel-
ments, Electromyography (EMG), motion, pulse and sweat evant features are extracted from the signals and finally we
artifacts [12]. use support vector machines for classification.
The problem in eye blinks is that it produces a signal
with a high amplitude that is usually much greater than the 5.1 Signal Preprocessing
amplitude of the EEG signals of interest. Eye movements Fig. 2 shows the three stages that EEG data is passed
are similar to or even stronger than eye blinks. through during the signal preprocessing stage. Our EEG
The EMG or muscle activation artifact can happen due to data are referenced online to Cz. According to the recom-
some muscle activity such as movement of the neck or some mendation of Reid et al [16] who pointed out that this ref-
facial muscles. This can affect the data coming from some erence scheme did not correlate particularly well, an offline
channels, depending on the location of the moving muscles. average reference is performed for the data by subtracting
As for the motion artifact, it takes place if the subject is from each site average activity of all scalp sites. To reduce
moving while EEG is being recorded. The data obtained can the amount of data to be analyzed, our data is downsampled
be corrupted due to the signals produced while the person from 1024 Hz to 256 Hz.
is moving, or due to the possible movement of electrodes. Our dataset is largely contaminated with facial muscle and
Another involuntary types of artifacts are pulse and sweat eye blink artifacts. Moreover, there are segments that are
artifacts. The heart is continuously beating causing the ves- highly contaminated with artifacts and are marked for re-
sels to expand and contract; so if the electrodes are placed moval. Instead of rejecting such segments, we included them
2 sec 100 Percentage
90
1 sec 80
70
60
50 Alpha linear kernel
30 seconds
40
30 Alpha + asymmetry linear kernel
20
10
0 Run Number
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Figure 3: Applying FFT to overlapping windows. Figure 4: A comparison of the classification accuracy
of joy emotion using a linear SVM kernel on two
different feature selection criteria.
in our analysis so that our approach can be generalized to
real time applications. Since most of the previously men- 100
Percentage
tioned artifacts appear in low frequencies, we used a band 90
80
pass finite impulse response filter that removed the frequen- 70
60
overall
cies below 3 Hz and above 30 Hz. 50
40 absence of joy
30
presence of joy
5.2 Feature Extraction 20
10
0 Run number
Our approach divides each 30 sec data epoch into 29 win- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

dows, 2 seconds wide with a sec overlapping window. Each


window is converted into the frequency domain using Fast
Fourier Transform (FFT) as shown in Fig. 3. The fre- Figure 5: A comparison of the classification accuracy
quency descriptors of the power bands, theta, alpha and of joy emotion using a linear SVM kernel on the
beta rhythms, are extracted. This resulted in a huge feature second set of features (alpha band + asymmetry).
set of 146025 features.

5.2.1 Feature Reduction Using Alpha Band of feature vectors, SVMs attempt to find a hyperplane such
We made use of the study made by Kostyunina et al. [11] that the two classes are separable and given a new feature
in order to reduce our feature set. Kostyunina et al. [11] vector, SVMs try to predict to which class this new feature
showed that emotions such as joy, aggression and intention vector belongs to.
result in an increase in the alpha power whereas, emotions SVMs view the input data, FFT features, as two sets of
such as sorrow and anxiety results in a decrease in the alpha vectors in a n-dimensional space. SVM will construct a sep-
power. As a result of this conclusion, we focused our feature arating hyperplane in that space that maximizes the mar-
extraction on the power and phase of the alpha band only gin between the two data sets. A good hyperplane will be
which ranges from 8 Hz to 13 Hz for the 25 channels. We the one that has the highest distance to different points in
used other features such as the mean phase, the mean power, different classes [8]. We built eight different binary classi-
the peak frequency, the peak magnitude and the number fiers. For each emotion, we used two different classifiers, the
of samples above zero. Making use of the study made by first classifier is trained on the alpha band extracted features
Kostyunina et al. [11] helped in decreasing the number of only. The second classifier is trained on scalp asymmetries
features from 146025 to 10150 features. extracted features. For each classifier, we used linear, poly-
nomial and radial kernels.
5.2.2 Feature Reduction Using EEG Scalp Asymme-
tries
Another important research that we made use of in order 6. EXPERIMENTAL RESULTS
to reduce our feature set is the research done by Coan et The experiment included 36 participants with 265 sam-
al. [4]. Coan et al. [4] showed that positive emotions are ples (66 samples representing joy, 64 samples representing
associated with relatively greater left frontal brain activ- sadness, 65 samples representing fear and 70 samples repre-
ity whereas negative emotions are associated with relatively senting anger). We started by building a joy emotion classi-
greater right frontal brain activity. They also showed that fier on which all the samples representing joy are considered
the decrease in the activation in other regions of the brain positive samples and all other samples represent negative
such as the central, temporal and mid-frontal was less than samples.
the case in the frontal region. This domain specific knowl- Six different classifiers were built, two classifiers with lin-
edge helped us in decreasing the number of features from ear kernel for each set of features, two classifiers with ra-
10150 to only 3654 features. dial kernel for each set of features and two classifiers with
The asymmetry features between electrodes i and j at fre- polynomial features for each set of features. The SVM classi-
quency n are obtained using the following equation fiers with polynomial did not converge whereas the classifiers
c(n, i, j) = Xi (fn ) − Xj (fn ) with radial kernel resulted in a very low accuracy of almost
0 %.
in which Xi (fn ) is the frequency power at electrode i and To test our classifiers, we used 20-fold cross validation in
the nth bin. This equation is applied to scalp symmetric which we divided our 265 samples into testing samples (10%)
electrodes only such as (C3, C4), (FP1, FP2)...etc. and training samples (90%) which means that the samples
we used for training are different from those used for testing.
5.3 Support Vector Machines (SVMs) We repeated this approach 20 times during which the testing
For classification, we used support vector machines (SVMs). and training samples were selected randomly and we made
SVM is a supervised learning technique. Given a training set sure that the training and testing samples are different in
Percentage Percentage
100.00 100.00
90.00 90.00
80.00
80.00
70.00
70.00 60.00
60.00 overall overall
50.00
50.00 absence of anger 40.00 absence of fear
40.00 30.00
presence of anger 20.00 presence of fear
30.00
10.00
20.00 Run number
0.00
10.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.00 Run number
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Figure 7: A comparison of the classification accuracy


Figure 6: A comparison of the classification accuracy of fear emotion using a linear SVM kernel on the
of anger emotion using a linear SVM kernel on the second set of features (alpha band + asymmetry).
second set of features (alpha band + asymmetry).
100.00
Percentage
90.00
80.00
overall
70.00
the 20 trials. Since we have only two samples per emotion 60.00 absence of sad
presence of sad
and per subject and using random selection for training and
50.00
40.00

testing samples make our approach user independent. Fig. 4


30.00
20.00

compares the true positive, false negative and overall detec- 10.00
0.00 Run number

tion accuracy results of two different classifiers. We found 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

out that the use of the alpha band combined with EEG scalp
difference resulted in a better detection accuracy than using Figure 8: A comparison of the classification accuracy
the alpha band only. This again proves that using neuro- of sad emotion using a linear SVM kernel on the
science findings in feature selection helps in decreasing the second set of features (alpha band + asymmetry).
size of the feature set and results in better classification ac-
curacies. Also, we found out that the radial kernel for both
types of features resulted in 0 % accuracy for joy and in a in 48% of the samples. In this work, we did not ignore the
very high classification accuracy of almost 100% for the not samples for which self reports did not match the elicited
joy emotion. The average detection accuracy is 51% and emotions. It may have increased the accuracy if we used the
83% for the presence of joy and not joy emotion respectively samples for which the participants have felt and reported
using linear kernel. the same emotion as the intended one. Also, the accuracy
Fig. 5 shows the average overall detection accuracy, av- may be affected if the samples used are the ones that the
erage detection accuracy of the presence of the joy and the participants reported the emotions with high intensities.
average detection accuracy of the absence of the joy emotion Table 2 shows a comparison of the average detection accu-
using a linear SVM kernel. The average overall detection ac- racy for the four emotions. For each emotion, we are report-
curacy represents the number of correctly classified samples ing the results of the linear SVM kernel on two feature sets,
that represent joy or not joy divided by the total number of using the alpha band only and using the alpha band along
testing samples which is 27 samples, 10% of the total number with scalp asymmetries. For each feature set, the percentage
of samples. The average detection accuracy of the presence of presence of joy, for instance, is computed as:
of joy is the number of correctly classified samples that rep- N
X
resent joy divided by the total number of joy samples in the ( F (i)) ∗ 100/N
testing set. Finally, the average detection accuracy of the i=1
absence of joy is the number of correctly classified samples
where F(i) is 1 if the joy sample number i was correctly
that represent not joy divided by the total number of not
classified and 0 otherwise. N is the number of all the joy
joy samples in the testing set. From the graph, it can be
samples in the 20 different runs. The overall accuracy is
deducted that the false negative is in the range of 77% to
the number of samples whether it is joy or not joy that are
95% which means that the false positive is very low in the
correctly classified divided by the total number of samples
range of 5 % to 23 %.
in the 20 different runs.
We applied the same approach for building classifiers for
It is observed that the accuracy of the linear kernel for the
anger, fear and sad emotions. Fig. 6, Fig. 7, Fig. 8 shows
second feature set (alpha + asymmetry) is higher than the
the classification accuracies of the linear SVM kernel for the
linear kernel for the first feature set (alpha band only) in joy,
second set of features (alpha + asymmetry) for anger, fear
anger and sad emotions. Whereas, the detection accuracy
and sad emotions respectively.
for the linear kernel for the first feature set (alpha band
The reason why the accuracies of anger, fear, joy and
only) is higher in the fear emotion than the linear kernel of
sadness range from 30% to 72.6% can be explained by the
the second feature set (alpha + asymmetry).
fact that voluntary facial expressions may affect the emo-
tional state of people differently and with different inten-
sities. Coan and Allen [3] who experimented on the same 7. CONCLUSION
dataset, reported that the dimensions of experience vary as a The goal of this research is to study the possibility of clas-
function of specific emotions and individual differences when sifying four different emotions from brain signals that were
compared self reports against the intended emotions to be elicited due to voluntary facial expressions. We proposed
elicited with certain facial expressions. Table 1 shows the an approached that is applied on a noisy database of brain
report rates for different emotions. Table 1 can show that signals. Testing on large corpus of 36 subjects and using
self reports were different from the the intended emotions two different techniques for feature extractions that rely on
[7] R. El Kaliouby and P. Robinson. Mind reading
Table 1: Self Report Rates by Emotion. The rate machines: automated inference of cognitive mental
column reflects the percentage that self reports were states from video. In 2004 IEEE International
the same as the target emotion. Conference on Systems, Man and Cybernetics,
Emotion Rate
Anger 65.7% volume 1.
Fear 61.8% [8] S. Gunn. Support Vector Machines for Classification
Joy 50.0% and Regression. ISIS Technical Report, 14, 1998.
Sadness 30.6%
Overall Average 52.0% [9] K. Kim, S. Bang, and S. Kim. Emotion recognition
system using short-term monitoring of physiological
signals. Medical and biological engineering and
Table 2: Results of emotion classification using lin- computing, 42(3):419–427, 2004.
ear SVM kernels on two different feature sets: using [10] S. Kim, P. Georgiou, S. Lee, and S. Narayanan.
the alpha band only and using scalp asymmetries. Real-time emotion detection system using speech:
Emotion Alpha Alpha + asymmetry Multi-modal fusion of different timescale features. In
presence overall presence overall
Anger 38% 73% 53% 74%
IEEE 9th Workshop on Multimedia Signal Processing,
Fear 58% 79% 38% 77% 2007. MMSP 2007, pages 48–51, 2007.
Joy 38% 73% 51.2% 74% [11] M. Kostyunina and M. Kulikov. Frequency
Sadness 48% 77% 61% 79% characteristics of EEG spectra in the emotions.
Neuroscience and Behavioral Physiology,
26(4):340–343, 1996.
domain knowledge, we reached an accuracy of 53%, 58%,
51% and 61% for anger, fear, joy and sadness emotions re- [12] J. Lehtonen. EEG-based brain computer interfaces.
spectively. Helsinky University Of Technology, 2002.
[13] T. Musha, Y. Terasaki, H. Haque, and G. Ivamitsky.
7.1 Future Directions Feature extraction from EEGs associated with
One of the areas where we can enhance this study is to emotions. Artificial Life and Robotics, 1(1):15–19,
reduce the number of features. This can be done by reduc- 1997.
ing the number of channels. We will work on studying the [14] R. Picard. Affective computing. MIT press, 1997.
effect of reducing the number of channels against the clas- [15] R. Plutchik. A general psychoevolutionary theory of
sification accuracy. Reducing the number of channels will emotion. Theories of Emotion, 1, 1980.
help us reduce the processing time and make the classifica- [16] S. Reid, L. Duke, and J. Allen. Resting frontal
tion task more portable. Hence, it can be used in real time electroencephalographic asymmetry in depression:
applications. Inconsistencies suggest the need to identify mediating
Another way to achieve a better classification results is to factors. Psychophysiology, 35(04):389–404, 1998.
improve our preprocessing stage. This can be done by using [17] D. Sander, D. Grandjean, and K. Scherer. A systems
Independent Component Analysis (ICA). ICA is a compu- approach to appraisal mechanisms in emotion. Neural
tational model that can extract the different components of networks, 18(4):317–352, 2005.
the signals. For instance, ICA can separate EEG and phys- [18] A. Savran, K. Ciftci, G. Chanel, J. Mota, L. Viet,
iological noise from the recorded signals. B. Sankur, L. Akarun, A. Caplier, and M. Rombaut.
Finally, It will be interesting to compare the results achieved Emotion detection in the loop from brain signals and
from our methodology with an emotion detection system facial images. Proc. of the eNTERFACE 2006, 2006.
that relies on facial expressions for emotion detection. [19] K. Schaaff and T. Schultz. Towards an EEG-based
Emotion Recognizer for Humanoid Robots. The 18th
8. REFERENCES IEEE International Symposium on Robot and Human
[1] K. Ansari-Asl, G. Chanel, and T. Pun. A channel Interactive Communication, pages 792–796, 2009.
selection method for EEG classification in emotion [20] F. Strack, L. Martin, and S. Stepper. Inhibiting and
assessment based on synchronization likelihood. In facilitating conditions of the human smile: A
Eusipco 2007, 15th Eur. Signal Proc. Conf. nonobtrusive test of the facial feedback hypothesis.
[2] G. Chanel, J. Kronegg, D. Grandjean, and T. Pun. Journal of Personality and Social Psychology,
Emotion assessment: Arousal evaluation using EEG’s 54(5):768–777, 1988.
and peripheral physiological signals. Lecture Notes in [21] F. Strack, N. Schwarz, B. Chassein, D. Kern, and
Computer Science, 4105:530, 2006. D. Wagner. The salience of comparison standards and
[3] J. Coan and J. Allen. Varieties of emotional the activation of socail norms consequences for
experience during voluntary emotional facial judgments of happiness and their communications.
expressions. Ann. NY Acad. Sci, 1000:375–379, 2003. 1989.
[4] J. Coan, J. Allen, and E. Harmon-Jones. Voluntary
facial expression and hemispheric asymmetry over the
frontal cortex. Psychophysiology, 38(06):912–925, 2002.
[5] P. Ekman, W. Friesen, J. Hager, and A. Face. Facial
Action Coding System. 1978.
[6] P. Ekman, R. Levenson, and W. Friesen. Autonomic
nervous system activity distinguishes among emotions.
Science, 221(4616):1208–1210, 1983.
World’s First Wearable Humanoid Robot that
Augments Our Emotions
Dzmitry Tsetserukou Alena Neviarouskaya
Toyohashi University of Technology University of Tokyo
1-1 Hibarigaoka, Tempaku-cho, Toyohashi, Aichi, 7-3-1 Hongo, Bunkyo-ku, Tokyo,
441-8580 Japan 113-8656 Japan
dzmitry.tsetserukou@erc.tut.ac.jp lena@mi.ci.i.u-tokyo.ac.jp

ABSTRACT
In the paper we are proposing a conceptually novel approach to
reinforcing (intensifying) own feelings and reproducing
(simulating) the emotions felt by the partner during online
communication through wearable humanoid robot. The core
component, Affect Analysis Model, automatically recognizes nine
emotions from text. The detected emotion is stimulated by
innovative haptic devices integrated into the robot. The
implemented system can considerably enhance the emotionally
immersive experience of real-time messaging. Users can not only
exchange messages but also emotionally and physically feel the
presence of the communication partner (e.g., family member,
friend, or beloved person).

ACM Classification Keywords


H5.3 [Information interfaces and presentation (e.g., HCI)]:
Group and organization interfaces – web-based interaction. H5.2.
[Information interfaces and presentation (e.g., HCI)]: User Figure 1. User communicating through iFeel_IM!. The devices
Interfaces – haptic I/O, interaction styles, prototyping. I.2.7 worn on the body enhance experienced emotions.
[Artificial Intelligence]: Natural Language Processing –
companies providing media for remote online communications
language parsing and understanding, text analysis. I.2.9
place a great importance on live communication and immersive
[Artificial Intelligence]: Robotics.
technologies. Along with widely used Instant Messengers (such as
Yahoo IM, AOL AIM, Microsoft Windows Live Messenger,
General Terms Google talk), such new web services as Twitter, Google Wave
Design, Experimentation. gain notability and popularity worldwide. Such applications allow
keeping in touch with friends in real-time over multiple networks
and devices. Recently mobile communication companies
Keywords launched Instant Messenger service on cellular phones (e.g., AIM
Wearable humanoid robot, affective user interfaces, haptic
on iPhone). 3D virtual worlds (e.g., Second Life, OpenSim) are
display, tactile display, haptic communication, online
also embedded with chat and instant messenger. Such systems
communication, Instant Messaging, 3D world.
encourage people to establish or strengthen interpersonal
relations, to share ideas, to gain new experiences, and to feel
1. INTRODUCTION genuine emotions accompanying all adventures of virtual reality.
“All emotions use the body as their theater…” However, conventional mediated systems usually (1) support only
Antonio Damasio simple textual cues like emoticons; (2) lack visual emotional
Computer-mediated communication allows interaction of people signals such as facial expressions and gestures; (3) support only
who are not physically sharing the same space. Nowadays, manual control of expressiveness of graphical representations of
users (avatars); and (4) completely ignore such important channel
of social communication as sense of touch.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are Tactile interfaces could allow users to enchance their emotional
not made or distributed for profit or commercial advantage and that communication abilities by addind a whole new dimention to
copies bear this notice and the full citation on the first page. To copy mobile communication [4,15]. Besides emotions conveyed
otherwise, to republish, to post on servers or to redistribute to lists, through text, researchers developed an additional modality for
requires prior specific permission and/or a fee. communicating emotions in Instant Messenger (IM) through
Augmented Human Conference, April 2–3, 2010, Megève, France. tactile interfaces with vibration patterns [18,24,26]. However, in
Copyright © 2010 ACM 978-1-60558-825-4/10/04…$10.00.
the proposed methods users have to memorize the vibration or pin completely. Instead, they stay at home safely and comfortably and
matrix patterns and cognitively interpret the communicated control the robotic replicas of themselves. Human is able to feel
emotional state. Demodulation of haptically coded emotion is not and to see what the surrogate feels and sees. However, surrogates
natural for human-human communication, and direct evocation of are physically stronger, move faster, and look more beautiful than
emotion cannot be achieved in such kind of systems. Moreover, human beings. “Terminator. Salvation” offers us a definition of
users have to place fingers on the tactile display in order to what separates man from machine – it is the human heart
maintain contact with tactors that interrupts the typing process (repository and indicator of our emotions). “Avatar” suggests the
while Instant Messaging. The rest of shortcomings of opportunity of controlling the behavior of sapient humanoid by
conventional communication system hold true. brain waves of the genetically matched human operator. Despite
different ideas presented in those movies, the main message is
2. AFFECTIVE HAPTICS. EMERGING that humans must not heavily rely on their substitutions
FRONTIER (surrogates, cyborgs, avatars, etc.) and live by their own.
Everything we know about the world entered our minds through The paradigm of wearable humanoid robotics is to augment the
the senses of sight, hearing, test, touch, and smell. Our feelings human beings abilities rather than substitute them. Such robots are
are rich and continuous flow of changing percepts. All our senses characterized by wearable design, structure based on that of the
play significant role in recognition of emotional states of human body, embedded devices for influencing and enhancing
communication partner. Human emotions can be easily evoked by our emotional state, health, physical strength, etc., and artificial
different cues, and sense of touch is one of the most emotionally intelligence allowing communication and sensing the user and
charged channels. environment.
Affective Haptics is the emerging area of research which focuses
on the design of devices and systems that can elicit, enhance, or
influence on emotional state of a human by means of sense of
touch. We distinguish four basic haptic (tactile) channels
governing our emotions: (1) physiological changes (e.g., heart
beat rate, body temperature, etc.), (2) physical stimulation (e.g.,
tickling), (3) social touch (e.g., hug), (4) emotional haptic design
(e.g., shape of device, material, texture). Brain
Driven by the motivation to enhance social interactivity and Head Heart
emotionally immersive experience of real-time messaging, we
pioneered in the idea of reinforcing (intensifying) own feelings
and reproducing (simulating) the emotions felt by the partner
through specially designed wearable humanoid robot iFeel_IM! Hands
(Figure 1). The philosophy behind the iFeel_IM! (intelligent
wearable humanoid robot for Feeling enhancement powered by
affect sensitive Instant Messenger) is “I feel [therefore] I am!”.
The emotion elicited by physical stimulation might imbue our
Chest
communication with passion and increase the emotional intimacy,
ability to be close, loving, and vulnerable. The interpersonal
relationships and the ability to express empathy grow strongly Back
when people become emotionally closer through disclosing
thoughts, feelings, and emotions for the sake of understanding.
In this work, we focus on the implementation of an innovative
system, which includes haptic devices for generation of physical Side
stimulations aimed at conveying the emotions experienced during
online conversations. We attempt to influence on human emotions Side
Abdomen
by physiological changes, physical stimulation, social touch, and
emotional design. Figure 2. Wearable humanoid robot iFeel_IM!.

3. ARCHITECTURE OF WEARABLE The structure of the wearable humanoid robot iFeel_IM! is shown
in Figure 2. As can be seen, the structure is based on that of the
HUMANOID ROBOT human body and includes such parts as head, brain, heart, hands,
A humanoid robot is an electro-mechanical machine with its chest, back, abdomen, and sides.
overall appearance based on that of the human body and artificial
intelligence allowing complex interaction with tools and In the iFeel_IM!, great importance is placed on the automatic
environment. The field of humanoid robotics is advancing rapidly sensing of emotions conveyed through textual messages in 3D
(e.g., ASIMO, HRP-4C). However, they have not found the virtual world Second Life (artificial intelligence), the
practical applications at our homes yet (high price, large sizes, visualization of the detected emotions by avatars in virtual
safety problems, etc.). Recent science fiction movies explore the environment, enhancement of user’s affective state, and
future vision of co-existence of human beings and robots. In the reproduction of feeling of social touch (e.g., hug) by means of
movie “Surrogates” humans withdrew from everyday life almost haptic stimulation in a real world. The architecture of the
iFeel_IM! is presented in Figure 3.
Affect
Analysis
Model
chat text

emotion: intensity

3D world Second Life

PC

Chat Haptic
log Devices
file Controller

D/A

Driver Box HaptiTemper and HaptiShiver

HaptiHeart HaptiHug HaptiTickler HaptiBatterfly

Figure 3. Architecture of the iFeel_IM!. In order to communicate through iFeel_IM!, users have to wear innovative affective
haptic devices (HaptiHeart, HaptiHug, HaptiButterfly, HaptiTickler, HaptiTemper, and HaptiShiver) developed by us.

As a media for communication, we employ Second Life, which (HaptiHeart, HaptiHug, HaptiButterfly, HaptiTickler,
allows users to flexibly create their online identities (avatars) and HaptiTemper, and HaptiShiver) worn by user is activated.
to play various animations (e.g., facial expressions and gestures)
of avatars by typing special abbreviations in a chat window. 4. AFFECT RECOGNITION FROM TEXT
The Affect Analysis Model [20] senses nine emotions conveyed
The control of the conversation is implemented through the
through text (‘anger’, ‘disgust’, ‘fear’, ‘guilt’, ‘interest’, ‘joy’,
Second Life object called EmoHeart (invisible in case of ‘neutral’
‘sadness’, ‘shame’, and ‘surprise’). The affect recognition
state) attached to the avatar’s chest. In addition to communication
algorithm, which takes into account specific style and evolving
with the system for textual affect sensing (Affect Analysis
language of online conversation, consists of five main stages: (1)
Model), EmoHeart is responsible for sensing symbolic cues or
symbolic cue analysis; (2) syntactical structure analysis; (3) word-
keywords of ‘hug’ communicative function conveyed by text, and
level analysis; (4) phrase-level analysis; and (5) sentence-level
for visualization (triggering related animation) of ‘hugging’ in
analysis. Our Affect Analysis Model was designed based on the
Second Life. The results from the Affect Analysis Model
compositionality principle, according to which we determine the
(dominant emotion and intensity) and EmoHeart (‘hug’
emotional meaning of a sentence by composing the pieces that
communicative function) are stored along with chat messages in a
correspond to lexical units or other linguistic constituent types
file on local computer of each user.
governed by the rules of aggregation, propagation, domination,
Haptic Devices Controller analyses these data in a real time and neutralization, and intensification, at various grammatical levels.
generates control signals for Digital/Analog converter (D/A), Analyzing each sentence in sequential stages, this method is
which then feeds Driver Box for haptic devices with control cues. capable of processing sentences of different complexity, including
Based on the transmitted signal, the corresponding haptic device simple, compound, complex (with complement and relative
clauses), and complex-compound sentences. To measure the
accuracy of the proposed emotion recognition algorithm, we (indicating the strength of emotion, namely, ‘low’, ‘middle’, or
extracted 700 sentences from a collection of diary-like blog posts ‘high’). If no emotion is detected in the text, the EmoHeart
provided by BuzzMetrics (http://www.nielsenbuzzmetrics.com). remains invisible and the avatar facial expression remains neutral.
We focused on online diary or personal blog entries, which are The examples of avatar facial expressions and EmoHeart textures
typically written in a free style and are rich in emotional are shown in Figure 4.
colourations. Three independent annotators labelled the sentences During a two month period (December 2008 – January 2009), 89
with one of nine emotions (or neutral) and a corresponding Second Life users became owners of EmoHeart, and 74 of them
intensity value. actually communicated using it. Text messages along with the
We developed two versions of the Affect Analysis Model (AAM) results from AAM were stored in an EmoHeart log database.
differing in syntactic parsers employed during the second stage of From all sentences, 20 % were categorized as emotional by the
affect recognition algorithm: (1) AAM with commercial parser AAM and 80 % as neutral (Figure 5). We observed that the
Connexor Machinese Syntax (http://www.connexor.eu) (AAM- percentage of sentences annotated by positive emotions (‘joy’,
CMS); (2) AAM with GNU GPL licensed Stanford Parser ‘interest’, ‘surprise’) essentially prevailed (84.6 %) over sentences
(http://nlp.stanford.edu/software/lex-parser.shtml) (AAM-SP). annotated by negative emotions (‘anger’, ‘disgust’, ‘fear’, ‘guilt’,
The performance of the AAM-CMS and AAM-SP was evaluated ‘sadness’, ‘shame’). We believe that this dominance of positivity
against two sets of sentences related to ‘gold standards’: 656 expressed through text is due to the nature and purpose of online
sentences, on which two or three human raters completely agreed; communication media.
(2) 249 sentences, on which all three human raters completely
agreed. An empirical evaluation of the AAM algorithm showed anger
promising results regarding its capability to accurately (AAM- 8.8
disgust
CMS achieves accuracy in 81.5 %) classify affective information 0.6 fear
in text from an existing corpus of informal online communication.
9.0 guilt
68.8
5. EmoHeart 1.1
interest
joy
1.0
sadness
6.9 1.8 shame
2.1 surprise

Figure 5. Percentage distribution of sentences.

We analysed the distribution of emotional sentences from


EmoHeart log data according to the fine-grained emotion labels
from our Affect Analysis Model (Figure 6).
Joy Sadness
percent of all sentences, %
Neutral
Emotional
0 20 40 60 80 100

percent of em otional sentences, %


Negative
Positive
0 20 40 60 80 100

Figure 6. Avatar facial expressions and EmoHeart textures.


We found that the most frequent emotion conveyed through text
Anger Fear messages is ‘joy’ (68.8 % of all emotional sentences), followed
by ‘surprise’, ‘sadness’ and ‘interest’ (9.0 %, 8.8 %, and 6.9 %,
Figure 4. Avatar facial expressions and EmoHeart textures. respectively).
The motivation behind using the heart-shaped object as an
additional channel for visualization was to represent the 6. AFFECTIVE HAPTIC DEVICES
communicated emotions in a vivid and expressive way. According to James-Lange theory [13], the conscious experience
Once attached to the avatar in Second Life, EmoHeart object (1) of emotion occurs after the cortex receives signals about changes
listens to each message of its owner, (2) sends it to the web-based in physiological state. Researchers argued that feelings are
interface of the AAM, (3) receives the result (dominant emotion preceded by certain physiological changes. Thus, when we see a
and intensity), and visually reflects the sensed affective state venomous snake, we feel fear, because our cortex has received
through the animation of avatar’s facial expression, EmoHeart signals about our racing heart, knocking knees, etc. Damasio [5]
texture (indicating the type of emotion), and size of the texture distinguishes primary and secondary emotions. Both involve
changes in bodily states, but the secondary emotions are evoked missing each other to send physical sensation of the hug over
by thoughts. Recent empirical studies support non-cognitive distance [12]. User can wear this shirt, embedded with actuators
theories of nature of emotions. It was proven that we can easily and sensors, in everyday life.
evoke our emotions by something as simple as changing facial However, these interfaces suffer from inability to resemble
expression (e.g., smile brings on a feeling of happiness) [29]. natural hug sensation and, hence, to elicit strong affective
Moreover, it is believed that some of our emotion responses are experience (only slight pressure is generated by vibration
mediated by direct pathways from perceptual centers in temporal actuators) [10]; lack the visual representation of the partner,
cortex and the thalamus to the amygdala [17]. which adds ambiguity (hugging in a real life involves both visual
In order to support the affective communication, we implemented and physical experience), and do not consider the power of social
several novel haptic gadgets embedded in iFeel_IM!. They make pseudo-haptic illusion (i.e., hugging animation is not integrated).
up three groups. First group is intended for emotion elicitation Recently, there have been several attempts to improve the force
implicitly (HaptiHeart, HaptiButterfly, HaptiTemper, and feeling by haptic display. Mueller et al. [19] proposed air-
HaptiShiver), second type evokes affect in a direct way inflatable vest with integrated compressor for presentation of hug
(HaptiTickler), and third one uses sense of social touch over a distance. Air pump inflates the vest and thus generates light
(HaptiHug) for influencing on the mood and providing some pressure around the upper body torso.
sense of physical co-presence. All these devices produce different
senses of touch including kinesthetic and coetaneous channels Hug display Huggy Pajama is also actuated by the air inflation
[14]. Kinesthetic stimulations, which are produced by forces [27]. The air compressor is placed outside of the vest allowing the
exerted on the body, are sensed by mechanoreceptors in the usage of more powerful actuator. However, pneumatic actuators
tendons and muscles. This channel is highly involved in sensing possess strong nonlinearity, load dependency, time lag in
stimulus produced by HaptiHug device. On the other hand, response, and they produce loud noise [19].
mechanoreceptors in the skin layers are responsible for the Our goal is to develop a wearable haptic display generating the
perception of cutaneous stimulation. Different types of tactile forces that are similar to those of a human-human hug. Such
corpuscles allow us sensing thermal property of the object device should be lightweight, compact, with low power
(HaptiTemper), pressure (HaptiHeart, HaptiHug), vibration consumption, comfortable to wear, and aesthetically pleasing.
frequency (HaptiButterfly, HaptiTickler, and HaptiShiver), and
When people are hugging, they generate pressure on the chest
stimuli location (localization of stimulating device enables
area and on the back of each other by the hands, simultaneously.
association with particular physical contact).
The key feature of the developed HaptiHug is that it physically
The affective haptic devices worn on a human body and their 3D reproduces the hug pattern similar to that of human-human
models are presented in Figure 7. interaction. The hands for a HaptiHug are sketched from a real
human and made from soft material so that hugging partners can
realistically feel social presence of each other.
The couple of oppositely rotating motors (Maxon RE 10 1.5 with
gearhead GP 10 A 64:1) are incorporated into the holder placed
on the user chest area. The Soft Hands, which are aligned
horizontally, contact back of the user. Once ‘hug’ command is
received, couple of motors tense the belt, pressing thus Soft
Hands and chest part of the HaptiHug in the direction of human
body (Figure 8).
Soft Hand
Direction of
belt tension Pressure

Soft Hand

Figure 7. Affective haptic devices worn on a human body.

6.1 HaptiHug: Realistic Hugging Over a


Human body
Distance
6.1.1 Development of the haptic hug display Motor
On-line interactions heavily rely on senses of vision and hearing, holder
and there is a substantial need in mediated social touch [9].
Among many forms of physical contact, hug is the most Pressure
emotionally charged one. It conveys warmth, love, and affiliation. Couple of motors
DiSalvo et al. [7] introduced “The Hug” interface. When person
desires to communicate hug, he/she can squeeze the pillow, so Pressure Direction of belt tension
that such action results in the vibration and temperature changes
in the partner’s device. The Hug Shirt allows people who are Figure 8. Structure of the wearable HaptiHug device.
The duration and intensity of the hug are controlled by the with Kinotex tactile sensor measuring the pressure intensity
software in accordance with the emoticon or a keyword, detected through amount of backscattered light falling on photodetector
from text. For the presentation of a plain hug level (e.g., ‘(>^_^)>’, [23]. The taxels displaced with 21.5 mm in X and 22 mm in Y
‘{}’, ‘<h>’), a big hug level (e.g., ‘>:D<’, ‘{{}}’), and a great big direction make up 6x10 array. The Kinotex sensitivity range is
hug level (e.g., ‘gbh’, ‘{{{}}}’), the different levels of pressure from 500 N/m2 to 8 000 N/m2.
with different durations are applied on the user’s back and chest. The example of pressure patterns on the back of the user and on
The Soft Hands are made from the compliant rubber-sponge the chest area are given in Figure 11 and Figure 12, respectively.
material. The contour profile of a Soft Hand is sketched from the
male human and has front-face area of 155.6 cm2. Two identical
pieces of Soft Hand of 5 mm thickness were sandwiched by
narrow belt slots and connected by plastic screws. Such structure
provides enough flexibility to tightly fit to the human back
surface, while being pressed by belt. Moreover, belt can loosely
move inside the Soft Hands during tension. The dimensions and
structure of Soft Hands are presented in Figure 9.
Cover fabric

Belt slots

Figure 11. Example of pressure distribution on the back of the


Soft Hand user. The highest pressure corresponds to 4800 N/m2.
Soft Hand Belt

Figure 9. Left: Soft Hand dimensions. Right: sandwiched


structure of Soft Hands.

6.1.2 Social pseudo-haptic touch


We developed animation of hug and integrated it into Second Life
(Figure 10).

Figure 12. Example of pressure distribution on the chest of the


user. The highest pressure corresponds to 5900 N/m2.
The results of measured average pressure are listed in Table 1.
Experimental results shows that males produce more force on the
partner back than females. Interestingly, the pressure on the chest
Figure 10. Snapshots of hugging animation in Second Life. changes nonlineary. The probable cause of this is that while
experiencing great big hug level humans protect the vitally
During the animation the avatars approach and embrace each important part of our body, heart, from overloading.
other by hands. The significance of our idea to realistically
reproduce hugging is in integration of active-haptic device Table 1. Experimental findings.
HaptiHug and pseudo-haptic touch simulated by hugging
animation. Thus, high immersion into the physical contact of Plain Hug, Big Hug, Great big
partners while hugging is achieved. In [16], the effect of pseudo- kN/m2 kN/m2 hug, kN/m2
haptic feedback on the experiencing force was proved. We expect Male Back 1.4 2.5 5.05
that hugging animation will also increase the force sensation.
Female Back 1.7 2.9 6.4
6.1.3 Hug measurement
Since so far there were no attempts to measure the pressure and Chest 2.3 3.5 5.9
duration of the hug, we conducted the experiments. A total of 3
pair of subjects (3 males and 3 females) with no previous
The developed HuptiHug device can achieve the force level of
knowledge about experiment was examined. Their age varied
plain hug (generated pressure is bigger comparing with other hug
from 24 to 32. They were asked to hug each other three times
displays). We consider that there is no reason to produce very
with three different intensities (plain hug, big hug and great big
strong forces (that requires more powerful motors) resulting
hug levels). The subject’s chest and upper back side were covered
sometimes in unpleasant sensations. Based on the experimental Table 3. Each affective haptic device is responsible for
results we designed the control signals in such a way that the stimulation of particular emotion.
resulting pressure intensity, pattern, and duration are similar to
those of human-human hug characteristics. Emotions
Social
Haptic Device Sad
We summarized the technical specifications of the hug displays in Joy Anger Fear Touch
Table 2 (O means this characteristic is present, – is absent). ness
Table 2. Specifications of the hug displays. HaptiHeart V V V
HaptiButterfly V
Hapti The Hug Hug Huggy
Hug Hug Shirt vest Pajama HaptiShiver V

Weight, kg HaptiTemper V V V
0.146 >1.0 0.160 >2.0 >1.2
HaptiTickler V
Overall HaptiHug V V
sizes 0.5 × 0.4 × 0.4 × 0.3 ×
Height, m × 0.1 × 0.4 0.6 0.5 0.55 0.45
Width, m The heart sounds are generated by the beating heart and the flow
of blood through it. There are two major sounds that are heard in
Wearable
design O – O O O the normal heart and are often described as a lub and a dub
(“lubb-dub” sound occurs in sequence with each heartbeat). The
Generated first heart sound (lub), commonly termed S1, is caused by the
Pressure, 4.0 – – 0.5 2.7 sudden block of reverse blood flow due to the closure of mitral
kPa and tricuspid valves at the beginning of ventricular contraction.
The second heart tone “dub”, or S2, is resulted from sudden lock
DC Vibro- Vibro- Air Air of reversing blood flow at the end of ventricular contraction [3].
Actuators
motors motors motors pump pump
We developed the heart imitator HaptiHeart to produce special
Visual heartbeat patterns according to emotion to be conveyed or elicited
representati – – (sadness is associated with slightly intense heartbeat, anger with
on of the O – –
quick and violent heartbeat, fear with intense heart rate). We take
partner
advantage of the fact that our heart naturally synchronizes with
Social the heart of a person we hold or hug. Thus, the heart rate of a user
pseudo- O – – – –
touch is influenced by haptic perception of the beat rate of the
HaptiHeart. Furthermore, false heart beat feedback can be directly
Based on interpreted as a real heart beat, so it can change the emotional
human- O – – – –
human hug perception.
The HaptiHeart consists of two modules: flat speaker FPS 0304
and speaker holder. The flat speaker sizes (66.5 x 107 x 8 mm)
Developed HuptiHug is capable of generating strong pressure and rated input power of 10 W allowed us to design powerful and
while being lightweight, and compact. Such features of haptic hug relatively compact HaptiHeart device. It is able to produce
display as visual representation of the partner, social pseudo- realistic heartbeating sensation with high fidelity. The 3D model
haptic touch, and pressure patterns similar to that of human- of HaptiHeart is presented in Figure 13.
human interaction, increase the immersion into the physical
contact of partners while hugging greatly.

6.2 HaptiHeart Enhancing User Emotions


Each emotion is characterized by a specific pattern of
physiological changes. We selected four distinct emotions having
strong physical features [28]: ‘anger’, ‘fear’, ‘sadness’, and ‘joy’.
The precision of AAM in recognition of these emotions is
considerably higher (‘anger’ – 92 %, ‘fear’ – 91 %, ‘joy’ – 95 %,
‘sadness’ – 88 %) than of other emotions. Table 3 shows the
affiliation of each haptic device and emotion being induced.
Of the bodily organs, the heart plays a particularly important role Heart-shaped
in our emotional experience. The ability of false heart rate Speaker Case
feedback to change our emotional state was reported in [6]. The Flat Speaker
research on interplay between heart rate and emotions revealed
that different emotions are associated with distinct patterns of
Figure 13. HaptiHeart layout.
heart rate variations [2].
The pre-recorded sound signal with low frequency generates the 6.4 HaptiTickler: Device for Positive
pressure on the human chest through vibration of the speaker
surface. Emotions
Two different types of tickling are recognized. The first type is
6.3 Butterflies in the Stomach knismesis referring to feather-like (light) type of tickling. It is
elicited by a light touch or by a light electrical current at almost
(HaptiButterfly) and Shivers on Body’s Spine any part of the body [25]. It should be emphasized that this type
(HaptiShiver/HaptiTemper) of tickling does not evoke laugh and is generally accompanied by
HaptiButterfly was developed with the aim to evocate joy an itching sensation that creates the desire to rub the tickled part
emotion. The idea behind this device is to reproduce effect of of the body. The second type of tickle called gargalesis is evoked
“Butterflies in the stomach” (fluttery or tickling feeling in the by a heavier touch to particular areas of the body such as armpits
stomach felt by people experiencing love) by means of the arrays or ribs. Such kind of stimuli usually results in laugher and
of vibration motors attached to the abdomen area of a person squirming. In contract to knismesis, one cannot produce
(Figure 14). gargalesis in oneself. Two explanations were suggested to explain
the reasons of inability to self-tickling. The scientists supporting
Plastic frame interpersonal explanations argue that tickling is fundamentally
interpersonal experience and thus requires another person as the
Revolute joint
source of the touch [8]. On the other side of the debate is a reflex
view, suggesting that tickle requires the element of
Vibration unpredictability or uncontrollability. The experimental results
motor from [11] support the later view and reveal that ticklish laugher
evidently does not require that stimulation be attributed to another
person. However, the social and emotional factors in ticklishness
affect the tickle response greatly.
Holder We developed HaptiTickler with the purpose to evoke positive
affect (joy emotion) in a direct way by tickling the ribs of the
Figure 14. Structure of HaptiButterfly. user. The device includes four vibration motors reproducing
stimuli that are similar to human finger movements during rib
We conducted the experiment aimed at investigation of the tickling (Figure 16).
patterns of vibration motor activation that produce most
pleasurable and natural sensations on the abdomen area. Based on Large vibration
the results, we employ ‘circular’ and ‘spiral’ vibration patterns. motor (index finger)
The temperature symptoms are great indicators of differences
Small vibration
between emotions. The empirical studies [28] showed that (1) fear
motors (middle and
and (in a lesser degree) sadness, are characterized as ‘cold’ ring finger)
emotions, (2) joy is the only emotion experienced as being
‘warm’, while (3) anger is ‘hot’ emotion. Large vibration
motor (little finger)
Fear is characterized by the most physiological changes in the
human body. Driven by the fear, blood that is shunted from the
viscera to the rest of the body transfers heat, prompting thus
perspiration to cool the body. Figure 16. HaptiTickler device.
In order to boost fear emotion physically, we designed The uniqueness of our approach is in (1) combination of the
HaptiShiver interface that sends “Shivers down/up human body’s unpredictability and uncontrollability of the tickling sensation
spine” by means of a row of vibration motors (HaptiShiver), and through random activation of stimuli, (2) high involvement of the
“Chills down/up human body’s spine” through both cold airflow social and emotional factors in the process of tickling (positively
from DC fan and cold side of Peltier element (HaptiTemper). The charged on-line conversation potentiates the tickle response).
structure of HaptiShiver/HaptiTemper device is shown in Figure
15. 7. EMOTIONAL HAPTIC DESIGN
Peltier
Aesthetically pleasing objects appear to the user to be more
DC Fan Vibration Aluminum
element motor plate effective by virtue of their sensual appeal [22]. The affinity the
user feels for an object is resulted from the formation of an
emotional connection with the object. Recent findings show that
attractive things make people feel good, which in turn makes
them think more creative. The importance of tactile experience to
produce the aesthetic response is underlined in [21].
We propose the concept of Emotional Haptic Design. The core
idea is to make user to feel affinity for the device by means of (1)
Front view Back view
appealing shapes evoking the desire to touch and haptically
Figure 15. Structure of HaptiShiver/HaptiTemper. explore them, (2) usage of material pleasurable to touch, and (3)
the pleasure anticipated through wearing.
The designed devices are pleasurable to look at and to touch 9. CONCLUSIONS
(colorful velvet material was used to decorate the devices) and While developing the iFeel_IM! system, we attempted to bridge
have personalized features (in particular, the Soft Hands in the gap between mediated and face-to-face communications by
HaptiHug can be sketched from the hands of the real enabling and enriching the spectrum of senses such as vision and
communication partner). touch along with cognition and inner personal state.
The essence of emotional, moral, and spiritual aspects of a human In the paper we described the architecture of the iFeel_IM! and
being has long been depicted using heart-shaped symbol. The the development of novel haptic devices, such as HaptiHeart,
heart-shaped HaptiHeart was designed with primary objective to HaptiHug, HaptiTickler, HaptiButterfly, HaptiShiver, and
emotionally connect the user with the device, as heart is mainly HaptiTemper. The emotional brain of our robot, Affect Analysis
associated with love and emotional experience. The Model, can sense emotions and intensity with high accuracy.
HaptiButterfly, its shape, and activated vibration motors induce Moreover, AAM is capable of processing the messages written in
the association with a real butterfly lightly touching the human informal and expressive stile of IM (e.g., “LOL” for “laughing out
body by spreading its wings. loudly”, “CUL” for “see you later”, “<3” for love, etc.).
We placed great attention on the comfortable wearing of garment. Haptic devices were designed with particular emphasis on natural
Such devices as HaptiButterfly, HaptiTickler, and HaptiShiver and realistic representation of the physical stimuli, modular
have the inner sides made from foam-rubber. While contacting the expandability, and ergonomic human-friendly design. User can
human body, surface shape is self-adjusted to fit particular perceive the intensive emotions during online communication, use
contour of the human body, and any uncomfortable pressure is desirable type of stimuli, comfortably wear and easily detach
therefore avoided. All of the designed devices have flexible and devices from torso. The significance of our idea to realistically
intuitive in use system of bucklers and fasteners to enable user to reproduce hugging is in integration of active-haptic device
easily adjust the devices to the body shape. HaptiHug and pseudo-haptic touch simulated by hugging
animation. Thus, high immersion into the physical contact of
8. USER STUDY partners while hugging is achieved.
Recently, we demonstrated the wearable humanoid robot
iFeel_IM! at such conferences as INTETAIN 2009, ACII 2009, The preliminary observation has revealed that developed devices
and ASIAGRAPH 2009 (Figure 17). In total more than 300 are capable of influencing on our emotional state intensively.
persons had experienced our system. Users were captivated by chatting and simultaneously
experiencing emotional arousal caused by affective haptic
Most of them commented that the haptic modalities (e.g.,
devices.
heartbeating feeling, hug sensation) were very realistic. Subjects
were highly satisfied with wearing the HaptiHug. The Our primary goal for the future research is to conduct extensive
simultaneous observation of hugging animation and experiencing user study on wearable humanoid robot iFeel_IM!. Additional
hugging sensation evoked surprise and joy in many participants. modalities aimed at intensifying affective states will be
However, there were some remarks about necessity to design investigated as well. For example, it is well known that duration
unique haptic stimuli for particular user (e.g., heart rate of each of emotional experiences differ for joy, anger, fear, and sadness.
user while experiencing the same emotions is different). Another embodiment of iFeel_IM! aims at sending emotional
messages to the partner in real-time, so that he/she can perceive
our emotions and feel empathy. The heartbeat pattern of the
communicating persons can be recorded in real-time and
exchanged through conversation.
iFeel_IM! system has great potential to impact on the
communication in ‘sociomental’ (rather than ‘virtual’) online
environments, that facilitate contact with others and affect the
nature of social life in terms of both interpersonal relationships
and the character of community. It is well known that our
emotional state and our health are linked strongly. Through
We believe that integration of Affective haptics and artificial
Figure 17. Demonstration of iFeel_IM! at ASIAGRAPH 2009. intelligence technologies into online communication system will
provide such important channel as sensual and non-verbal
From our personal consideration we noticed that while stimulation connection of partners along with textual and visual information.
joy through HaptiButterfly many participants were smiling.
Participants expressed anxiety when fear was evocated through
fast and intensive heartbeat rate. Taking into account our 10. ACKNOWLEDGMENTS
observations we will work further in order to improve the The research is supported in part by the Japan Science and
emotional immersive experience while online communication. Technology Agency (JST) and Japan Society for the Promotion of
Science (JSPS).
The atmosphere between the participants and exhibitors became
more relaxing and joyful while iFeel_IM! demonstration. That
proves that wearable humanoid robot was successful at emotion
elicitation. Also, in spite of users varied greatly in size, the device
was capable of fitting everyone.
11. References IEEE Virtual Reality (New Prunswick, USA, March 18 - 22,
[1] Anderson, C.A. 1989. Temperature and aggression: 2000). VR '00. IEEE Press, New York, 403-408.
ubiquitous effects of heat on occurrence of human violence. [17] LeDoux, J.E. 1996. The emotional brain. New York: Symon
Psychological Bulletin 106, 1, 74-96. & Shuster.
[2] Anttonen, J., and Surakka, V. 2005. Emotions and heart rate [18] Mathew, D. 2005. vSmileys: imagine emotions through
while sitting on a chair. In Proceedings of the ACM vibration patterns. In Proceedings Alternative Access:
Conference on Human Factors in Computing Systems Feeling and Games.
(Portland, USA, April 01 - 07, 2005). CHI '05. ACM Press, [19] Mueller F.F., Vetere F., Gibbs M.R., Kjeldskov J., Pedell S.,
New York, 491-499. Howard S. 2005. Hug over a distance. In Proceedings of the
[3] Bickley, L.S. 2008. Bates’ Guide to Physical Examination ACM Conference on Human Factors in Computing Systems
and History Taking. Philadelphia: Lippincott Williams & 2005 (Portland, USA, April 01 - 07, 2005). CHI '05. ACM
Wilkins. Press, New York, 673-1676.
[4] Chang, A., O’Modhrain, S., Jacob, R., Gunther, E., and Ishii, [20] Neviarouskaya, H. Prendinger, and M. Ishizuka. 2009.
H. 2002. ComTouch: design of a vibrotactile communication Compositionality principle in recognition of fine-grained
device. In Proceedings of the ACM Designing Interactive emotions from text. In Proceedings of the International
Systems Conference (London, UK, June 25 - 28, 2002). DIS AAAI Conference on Weblogs and Social Media (San Jose,
'02. ACM Press, New York, 312-320. USA, May 17 - 20, 2009). ICWSM 2009. AAAI Press,
[5] Damasio, A. 2000. The feeling of what happens: body, Menlo Park, 278-281.
emotion and the making of consciousness. London: Vintage. [21] Noe, A. 2004. Action and Perception. Cambridge: MIT
[6] Decaria, M.D., Proctor, S., and Malloy, T.E. 1974. The Press.
effect of false heart rate feedback on self reports of anxiety [22] Norman, D.A. 2004. Emotional design. Why we love (or
and on actual heart rate. Behavior research & Therapy, 12, hate) everyday things. New York: Basic Book.
251-253.
[23] Optic Fiber Tactile Sensor Kinotex. Nitta Corporation.
[7] DiSalvo, C., Gemperle, F., Forlizzi, J., and Montgomery, E. http://www.nitta.co.jp/english/product/mechasen/sensor/kino
2003. The Hug: an exploration of robotic form for intimate tex_top.html
communication. In Proceedings of the IEEE Workshop on
[24] Rovers, A.F., and Van Essen H.A. 2004. HIM: a framework
Robot and Human Interactive Communication (Millbrae,
for haptic Instant Messaging. In Proceedings of the ACM
USA, Oct. 31 - Nov. 2, 2003). RO-MAN '03, IEEE Press,
Conference on Human Factors in Computing Systems 2004
New York, 403-408.
(Vienna, Austria, April 24 - 29, 2004). CHI '04, ACM Press,
[8] Foot H.C., and Chapman, A.J. 1976. The social New York, 1313-1316.
responsiveness of young children in humorous situation. In
[25] Sato, Y., Sato, K., Sato, M., Fukushima, S., Okano, Y.,
Chapman, A.J. & Foot, H.C. (Eds.), Humour and Laugher:
Matsuo, K., Ooshima, S., Kojima, Y., Matsue, R., Nakata, S.,
Theory, research, and applications. London: Wiley, 187-214.
Hashimoto, Y., Kajimoto, H. 2008. Ants in the pants –
[9] Haans, A., and Ijsselsteijn, W. I. 2006. Mediated social ticklish tactile display using rotating brushes. In Proceedings
touch: a review of current research and future directions. of the International Conference on Instrumentation, Control
Virtual Reality, 9, 149-159. and Information Technology (Tokyo, Japan, August 20 - 22,
[10] Haans, A., Nood C., Ijsselsteijn, W.A. 2007. Investigating 2008). SICE '08, 461-466.
response similarities between real and mediated social touch: [26] Shin, H., Lee, J., Park, J., Kim, Y., Oh, H., and Lee, T. 2007.
a first test. In Proceedings of the ACM Conference on A tactile emotional interface for Instant Messenger chat. In
Human Factors in Computing Systems (San Jose, USA, Apr. Proceedings of the International Conference on Human-
28 - May 3, 2007). CHI '07. ACM Press, New York, ACM Computer Interaction (Beijing, China, July 22 - 27, 2007).
Press, 2405-2410. HCII '07. Springer Press, Heidelberg, 166-175.
[11] Harris, C.R., and Christenfeld, N. 1999. Can a machine [27] Teh J.K.S., Cheok A.D., Peiris R.L., Choi Y., Thuong V.,
tickle? Psychonomic Bulletin & Review 6, 3, 504-510. and Lai S. Haggy pajama: a mobile parent and child hugging
[12] Hug Shirt. CuteCircuit Company. communication system. 2008. In Proceedings of the
http://www.cutecircuit.com International Conference on Interaction Design and Children
(Chicago, USA, June 11 - 13, 2008). IDC 2008. ACM Press,
[13] James, W. 1884. What is an Emotion? Mind, 9, 188-205. New York, 250-257.
[14] Kandel, E. R., Schwartz, J. H., and Jessel, T. M. 2000. [28] Wallbott, H.G., and Scherer, K.R. 1988. How universal and
Principles of Neural Science. New York: McGraw-Hill. specific is emotional experience? Evidence from 27
[15] Keinonen T. and Hemanus J. Mobile emotional notification countries on five continents. In Scherer, K.R. (Eds.), Facets
application. 2005. Nokia Co. US Patent No. 6 959 207. of Emotion: Recent Research. Hillsdale (N.J.): Lawrence
Erlbaum Inc., 31-56.Zajonc, R.B., Murphy, S.T., and
[16] Lecuyer, A., Coquillart, S., Kheddar, A., Richard, P., and
Inglehart, M. 1989. Feeling and facial efference: implication
Coiffet, P. 2000. Pseudo-haptic feedback: Can isometric
of the vascular theory of emotion. Psychological review, 96,
input devices simulate force feedback? In Proceedings of the
395-416.
KIBITZER: A Wearable System for Eye-Gaze-based Mobile
Urban Exploration
Matthias Baldauf Peter Fröhlich Siegfried Hutter
FTW FTW FTW
Donau-City-Strasse 1 Donau-City-Strasse 1 Donau-City-Strasse 1
1220 Vienna, Austria 1220 Vienna, Austria 1220 Vienna, Austria
+43-1-5052830-47 +43-1-5052830-85 +43-1-5052830-10
baldauf@ftw.at froehlich@ftw.at shutter@ftw.at

ABSTRACT (POIs) icons on 2D maps. Instead, MSI research aims to develop


Due to the vast amount of available georeferenced information new forms of sensing the users’ bodily position in space and to
novel techniques to more intuitively and efficiently interact with envision new interactions with the surrounding world through
such content are increasingly required. In this paper, we introduce gesture and movement.
KIBITZER, a lightweight wearable system that enables the Key interaction metaphors for recently implemented MSI systems
browsing of urban surroundings for annotated digital information. are the ‘magic wand’ (literally pointing the handheld at objects of
KIBITZER exploits its user’s eye-gaze as natural indicator of interest to access further information [16]), the ‘smart lens’
attention to identify objects-of-interest and offers speech- and (superimposing digital content directly on top of the recognized
non-speech auditory feedback. Thus, it provides the user with a real-world object [14]), or the ‘virtual peephole’ (virtual views
6th sense for digital georeferenced information. We present a aligned to the current physical background, e.g. displaying a
description of our system’s architecture and the interaction “window to the past” [15]). These interaction techniques have
technique and outline experiences from first functional trials. been successfully evaluated in empirical field studies, and first
applications on mass market handhelds have attracted much end-
Categories and Subject Descriptors user demand on the market [20].
H.5.2 [Information Interfaces and Presentation]: User Metaphors like the magic wand, smart lens and virtual peephole
Interfaces – Interaction styles incorporate the handheld as the gateway for selecting physical
objects in the user’s surroundings and to view related digital
General Terms content. However, it may not always be preferable to put all
Experimentation, Human Factors attention on the mobile device, such as when walking along a
crowded pedestrian walkway or when having both hands busy.
Keywords We present KIBITZER, a gaze-directed MSI system that enables
Wearable Computing, Mobile Spatial Interaction, Eye-gaze the hands-free exploration of the environment. The system
enables users to select nearby spatial objects ‘with the blink of an
eye’ and presents related information by using speech- and non-
1. INTRODUCTION speech auditory output.
Computers are becoming a pervasive part of our everyday life,
and they increasingly provide us with information about the In the following section, we provide an overview of relevant
ambient environment. Smartphones guide us through unfamiliar previous work in the area of eye-based interaction. We then
areas, revealing information about the surroundings, and helping describe KIBITZER’s system architecture and the realized user
us share media with others about certain places. While such interaction technique, and we demonstrate its operation with
location-based information has traditionally been accessed with a photos and screenshots. We conclude with experiences from first
very limited set of input devices, usually just a keyboard and functional trials and plans for further research.
audio, multimodal interaction paradigms are now emerging that
take better advantage of the user’s interactions with space.
2. EYE-BASED APPLICATIONS
The research field of Mobile Spatial Interaction [7] breaks with Applications making use of eye movement or eye gaze patterns
the conventional paradigm of displaying nearby points-of-interest can be broadly categorized as diagnostic and interactive
applications [4].
Permission to make digital or hard copies of all or part of this work for In diagnostic use cases, an observant’s eye movements are
personal or classroom use is granted without fee provided that copies are captured to assess her attentional processes over a given stimulus.
not made or distributed for profit or commercial advantage and that User tests including eye tracking techniques are a popular method
copies bear this notice and the full citation on the first page. To copy in HCI e.g. to evaluate the usability of user interfaces and
otherwise, to republish, to post on servers or to redistribute to lists,
improve product design. In the 1940s, pioneering work in this
requires prior specific permission and/or a fee.
field was done by Fitts et al [6] who were the first ones to use
Augmented Human Conference, April 2–3, 2010, Megève, France. cameras to capture and analyze observants’ eye movements. They
Copyright © 2010 ACM 978-1-60558-825-4/10/04…$10.00.
collected data from pilots during landing to propose a more then we explain the involved software components and their
efficient arrangement and design of the cockpit instruments. functionality.
The investigation of interactive eye tracking applications started 3.1 Mobile Equipment
in the 1980s. In contrast to diagnostic systems where the recorded The core hardware component of our setup is an iView X HED
data mostly is evaluated after the test, eye movements in system, a latest generation mobile eye tracker from Sensomotoric
interactive scenarios are exploited as input parameters for a Instruments GmbH (SMI). It includes two cameras to record both
computer interface in real-time. Bolt [2] first introduced the idea an eye’s movement and the current scene from the user’s
of eye movements acting as a complementary input possibility for perspective. For best possible stability the equipment is mounted
computers. First studies investigating the usage of eye movements at a bicycle helmet (Figure 1). Via USB the tracker is connected
for common desktop computer tasks were presented by Ware [19] to a laptop computer (worn in a backpack) where the video stream
and Jacob [9]. Ware identified eye input as fast technique for is processed and the current eye-gaze is calculated.
object selection; Jacob found it to be effective for additional tasks
such as moving windows. Recently, eye gaze as natural input has To augment a user’s relative gaze direction with her global
been revisited in the context of so-called attentive user interfaces position as well as her head’s orientation and tilt, we use a G1
[18] that generally consider intuitive user reactions such as phone powered by Android (Figure 2). This smartphone contains
gestures and body postures to facilitate human-computer- all necessary sensors such as a built-in GPS receiver, a compass
interaction. Commercial interactive eye tracking applications are and accelerometers. With a custom-made fixation the G1 device is
currently focused on military use cases and tools for people with mounted on top of the bicycle helmet (Figure 3).
severe physical disabilities [11]. A comprehensive survey of eye
tracking applications can be found in [4].
Mobile applications in the field of Augmented or Virtual Reality
that allow the visual exploration of real or virtual worlds are
mainly restricted to the awareness of head movement. E.g. Feiner
et al [5] and, more recently, e.g. Kooper et al [10] and Reitmayr et
al [13] presented wearable systems that support object selection
by gaze only estimated from head pose. Bringing the object-of-
interest within the center of a head-worn see-through display
would select or trigger another specified action. The exploitation
of a user’s detailed eye gaze through suitable trackers is a rarely
considered aspect in Augmented Reality. Very recent examples
include the integration of eye-tracking in a stationary AR
videoconferencing system [1] and a wearable AR system Figure 1. iView X HED, a Figure 2. G1 phone
combining a head-mounted display and an eye-tracker to virtually
mobile eye-gaze-tracker [17] powered by Android
interact with a real-world art gallery [12].
Dependent on the underlying technology, two basic types of eye
tracking systems can be distinguished. Systems based on so-called
electro-oculography exploit the electrostatic field around a
human’s eyes. Electrodes placed next to the observant’s eyes
measure this field’s changes as the eyes move. As the eye’s
position can only be estimated using this technique, electro-
oculography is rather applied for activity recognition than for
gaze detection [3].
In contrast, video-based eye tracking approaches make use of one
or several cameras recording one or both eyes of a user. By
analyzing reflections in the captured eye using computer vision
techniques the eye’s position and thus, the eye gaze can be
determined. The cameras can either be placed near the object-of-
interest (usually a computer display) to remotely record the user’s
head in a non-intrusive way or mounted on a headpiece worn by
the user. Though this approach’s disadvantage of intrusiveness,
head-mounted eye trackers offer a higher accuracy than non-
intrusive systems and can also be applied in mobile use cases [11]
such as our exploration scenario.
Figure 3. The combined head-mounted device worn
during a functional test.
3. SYSTEM SETUP
This section introduces the technical setup for the realization of
3.2 Software Components
our KIBITZER system. First, we present the used hardware, and
Figure 4 gives an overview of the software components of our
system architecture and their communication.
The aforementioned eye tracker system comes with a video buildings can be determined. The resulting list contains the
analyzer application that is installed on the laptop computer. This matching POIs’ names and locations as well as the relative angles
iView X HED application offers a socket-based API interface via and distances with regard to the passed user position and
Ethernet to inform other applications about calculated eye data. orientation. More details about the used visibility detection engine
For our scenario, we implemented a small component that can be found in [16].
connects to this interface and forwards the fetched gaze position
in pixels (with regard to the scene camera’s picture) via 3.3 Initial Calibration
Bluetooth. Before the equipment can be used it needs to be calibrated. This
procedure starts with a nine point calibration for the eye-tracker
Our mobile application installed on the attached smartphone
using the iViewX HED application. For this purpose, nine markers
receives the gaze position and (based on a prior calibration) is
must be arranged at a nearby wall in a 3x3 grid, whereas the
able to convert these pixel values in corresponding horizontal and
marker in the center should be placed at the user’s eye height. The
vertical deviations in degrees with regard to a straightforward
calibration points then can be setup in the application via the
gaze. These values are continuously written to a local log file
delivered scene video and mapped point by point to the
together with the current location, the head’s orientation and tilt.
corresponding gaze direction. The standard procedure is extended
Adding the horizontal gaze deviation to the head’s orientation
with a custom calibration to later map gaze positions in pixels to
and, respectively, adding the vertical gaze deviation to the head’s
gaze deviations in degrees. By turning and tilting the head
tilt results in the global eye gaze vector. To provide auditory
towards the calibration points while now keeping eye gaze
feedback the mobile application makes use of an integrated text-
straight forward, conversion factors for horizontal and vertical
to-speech engine.
gaze positions can be determined based on the fetched compass
and accelerometer data.

Gaze detection
4. USER INTERACTION
After the calibration process, the presented equipment is ready for
outdoor usage. Previously mentioned, our goal is the gaze-
sensitive exploration of an urban environment providing the user
with a 6th sense for georeferenced information.
Bluetooth sender When to trigger which suitable action in an eye-gaze-based
system is a commonly investigated and discussed issue known as
the ‘Midas Touch’ problem. A good solution must not render void
the intuitive interaction approach of such an attentive interface by
Location and
increasing the user’s cognitive load or disturbing her gaze-pattern.
orientation detection At the same time, the unintended invocation of an action must be
avoided.

Text-to-
The task of object selection on a computer screen investigated by
speech Jacob [9] might seem related to our scenario of mobile urban
engine exploration where we want to select real-world objects to learn
more about annotated POIs. Jacob suggests either to use a
keyboard to explicitly execute the selection of a viewed item via a
key press or, preferably, apply a dwell time to detect a focused
gaze and fire the action thereafter. In Jacob’s experiment, users
were provided with visual feedback about the current selection
Visibility detection and therefore, were able to easily correct errors.
Due to our mobile scenario, we want to keep the involved
equipment as lightweight as possible sparing an additional
keyboard or screen. Therefore, we rely on an explicit eye-based
action to trigger a query for the currently object. As though the
Figure 4. Overview of our system’s software user would memorize the desired object, closing her eyes for two
components and their communication. seconds triggers the selection. In technical terms, the spatial query
is executed for the last known global gaze direction if the user’s
tracked eye could not be detected during the last two seconds. An
The mobile application may invoke a remote visibility detection invocation of the query engine is marked in the log file with a
service via a 3G network. This service takes the user’s current special status flag.
view into account: By passing a location and an orientation (in
The names of the POIs returned by the visibility detection service
our case the global eye gaze vector) to this HTTP service, a list of
are then extracted and fed into the text-to-speech engine for voice
currently visible POIs in this direction is returned. The engine
output. If a new query is triggered during the output, the text-to-
makes use of a 2.5D block model, i.e. each building in the model
speech engine is interrupted and restarted with the new results.
is represented by a two-dimensional footprint polygon, which is
The auditory output is either possible via the mobile’s built-in
extruded by a height value. Based on this model, POIs with a
loudspeakers or attached earphones.
clear line-of-sight to the user and POIs located inside visible
5. TOUR ANALYSIS scene video is overlaid with a red cross representing the user’s
During the usage of our KIBITZER all sensor values are current gaze and thus, can be used to evaluate our system’s
continuously recorded to a log file. These datasets annotated with accuracy. Furthermore, when combined with the visibility
corresponding time stamps enable a complete reconstruction of detection engine, the tour reconstruction can be used to
the user’s tour for later analysis. automatically identify areas of interest or compile further
statistics.
To efficiently visualize a log file’s content we implemented a
converter tool that generates a KML file from a passed log file.
KML is a XML-based format for geographic annotations and
6. CONCLUSIONS AND OUTLOOK
In this paper, we introduced KIBITZER, a wearable gaze-sensitive
visualizations with the support of animations. The resulting tour
system for the exploration of urban surroundings, and presented
video can be played using Google Earth [8] and shows the user’s
related work in the field of eye-based applications. Wearing our
orientation and gaze from an exocentric (‘third person’)
proposed headpiece, the user’s eye-gaze is analyzed to implicitly
perspective (Figure 5). The displayed human model is orientated
scan her visible surroundings for georeferenced digital
according to the captured compass values; its gaze ray is
information. Offering speech-auditory feedback via loudspeakers
corrected by the calculated gaze deviations. The invocation of the
or earphones, the user is unobtrusively informed about POIs in
visibility detection service, i.e. the gaze-based selection of an
their current gaze direction. Additionally, we offer tools to
object, is marked by a different-colored gaze ray.
reconstruct a user’s recorded tour visualizing her eye-gaze. These
animations are not only useful for accuracy tests during
development but rather aim at a later automated tour analysis, e.g.
to identify areas of interest.
Experiences from first functional tests and reconstructed tour
videos showed that the proposed system’s overall accuracy is
sufficient for determining POIs in the user’s gaze. However, in
some trials the built-in compass was heavily influenced by
magnetic fields resulting in wrong POI selections. This problem
could be solved by complementing the system with a more robust
external compass.
During these tests we observed some minor limitations of the
chosen vision-based gaze tracking approach and the blinking
interaction. In rare cases, unfavorable reflections caused by direct
sunlight prevented a correct detection of the user’s pupil and
therefore, interfered the gaze tracking. Obviously, at night the
usage of such a vision-based system is not feasible without any
artificial light source.
Figure 5. Screenshot of a KML animation Our proposed research prototype is a first step towards the
reconstructed from the logged tour data. exploitation of a user’s eye-gaze in mobile urban exploration
scenarios and therefore, it is deliberately designed for
experimentation. The current system built from off-the-shelf
hardware components provides a complete framework to study
possible gaze-based interaction techniques. With the future arrival
of smart glasses or even intelligent contact lenses, the required
equipment is supposed to become more comfortable to wear, if
not almost unnoticeable.
Applying the presented system, we will evaluate the usability and
effectiveness of eye-gaze-based mobile urban exploration in
upcoming user tests. We will set special focus on the acceptance
of the currently implemented ‘blinking’ action and the
investigation of alternative interaction techniques, respectively.
Inspired by ‘mouse-over’ events known from Web sites such as
switching an image when moving the mouse cursor over a
sensitive area, implicit gaze feedback is conceivable. When a user
Figure 6. Screenshot of the video taken by the helmet- glances at an object, she might be notified about the availability
mounted scene camera. The red cross represents the of annotated digital information by a beep or tactile feedback. The
current eye gaze. combination of our gaze-based system with a brain-computer-
interface to estimate a gaze’s intention and thus, trigger an
according action is another promising direction for future
As the scene camera’s video stream can be recorded via the research.
iViewX HED application, the reconstructed tour animation can be
compared to the actually captured scene video (Figure 6). The
7. ACKNOWLEDGMENTS Information in an AR Information Space. In International
This work has been carried out within the projects WikiVienna Journal of Human-Computer Interaction, Vol. 16, No. 3,
and U0, which are financed in parts by Vienna’s WWTF funding 425-446.
program, by the Austrian Government and by the City of Vienna [11] Morimoto, C.H., and Mimica, M.R.M. 2005. Eye gaze
within the competence center program COMET. tracking techniques for interactive applications. In Computer
Vision and Image Understanding, Vol. 98, No. 1, 4-24.
8. REFERENCES [12] Park, H.M., Lee, S.H., and Choi, J.S. 2008. Wearable
[1] Barakonyi, I., Prendinger, H., Schmalstieg, D., and Ishizuka, Augmented Reality System using Gaze Interaction. In Proc.
M. 2007. Cascading Hand and Eye Movement for of the 7th IEEE/ACM international Symposium on Mixed
Augmented Reality Videoconferencing. In Proc. of 3D User and Augmented Reality, 175-176.
Interfaces, 71-78.
[13] Reitmayr, G., and D. Schmalstieg, D. 2004. Collaborative
[2] Bolt, R.A. 1982. Eyes at the Interface. In Proc. of Human Augmented Reality for Outdoor Navigation and Information
Factors in Computer Systems Conference, 360-362. Browsing. In Proc. of Symposium on Location Based
[3] Bulling, A., Ward, J.A., Gellersen, H., and Tröster, G. 2009. Services and TeleCartography, 31-41.
Eye Movement Analysis for Activity Recognition. In Proc. [14] Schmalstieg, D., and Wagner, D. 2007. The World as a User
of the 11th International Conference on Ubiquitous Interface: Augmented Reality for Ubiquitous Computing. In
Computing, 41-50. Proc. of Symposium on Location Based Services and
[4] Duchowski, A.T. 2002. A breadth-first survey of eye TeleCartography, 369-391.
tracking applications. In Behavior Research Methods, [15] Simon, R. 2006. The Creative Histories Mobile Explorer -
Instruments, & Computers (BRMIC), 34(4), 455-470. Implementing a 3D Multimedia Tourist Guide for Mass-
[5] Feiner, S., MacIntyre, B., Höllerer T., and Webster, A. 1997. Market Mobile Phones. In Proc. of EVA.
A Touring Machine: Prototyping 3D Mobile Augmented [16] Simon, R., and Fröhlich, P. 2007. A Mobile Application
Reality Systems for Exploring the Urban Environment. In Framework for the Geo-spatial Web. In Proc. of the 16th
Personal and Ubiquitous Computing, Vol. 1, No. 4, 208-217. International World Wide Web Conference, 381-390.
[6] Fitts, P. M., Jones, R. E., and Milton, J. L. 1950. Eye [17] SMI iView X™ HED. http://www.smivision.com/en/eye-
movements of aircraft pilots during instrument-landing gaze-tracking-systems/products/iview-x-hed.html. Accessed
approaches. In Aeronautical Engineering Review 9(2), 24– January 07 2010.
29.
[18] Vertegaal, R. 2002. Designing Attentive Interfaces. In Proc.
[7] Fröhlich, P., Simon, R., and Baillie, L. 2009. Mobile Spatial of the 2002 Symposium on Eye Tracking Research &
Interaction. Personal and Ubiquitous Computing, Vol. 13, Applications, 23-30.
No. 4, 251-253.
[19] Ware, C., and Mikaelian, H.T. 1987. An evaluation of an eye
[8] Google Earth. http://earth.google.com. Accessed January 7 tracker as a device for computer input. In Proc. of the ACM
2010. CHI + GI-87 Human Factors in Computing Systems
[9] Jacob, R.J.K. 1990. What you look at is what you get: eye Conference, 183-188.
movement-based interaction techniques. In Proc. of the [20] Wikitude. http://www.mobilizy.com/wikitude.php. Accessed
SIGCHI conference on Human factors in computing systems, January 07 2010.
11-18.
[10] Kooper, R., and MacIntyre, B. 2003. Browsing the Real-
World Wide Web: Maintaining Awareness of Virtual
Airwriting Recognition using Wearable Motion Sensors

Christoph Amma, Dirk Gehrig, Tanja Schultz


Cognitive Systems Lab (CSL), Karlsruhe Institute of Technology, Germany
(christoph.amma,dirk.gehrig,tanja.schultz)@kit.edu

ABSTRACT line and accessible. Although everybody savours the small


In this work we present a wearable input device which en- size of hand-helds, the operation of these devices becomes
ables the user to input text into a computer. The text is more challenging - only small keys fit on to the device, if at
written into the air via character gestures, like using an all. This makes complex operations like the input of text
imaginary blackboard. To allow hands-free operation, we very cumbersome. it requires good eye-sight and hand-eye
designed and implemented a data glove, equipped with three coordination, it keeps hands and eyes busy, and is difficult
gyroscopes and three accelerometers to measure hand mo- if the device or the person is moving during operation. Fur-
tion. Data is sent wirelessly to the computer via Bluetooth. ther considering future augmented reality applications with
We use HMMs for character recognition and concatenated displays integrated in glasses, there is an obvious need for
character models for word recognition. As features we ap- new input devices offering more natural interaction possibil-
ply normalized raw sensor signals. Experiments on single ities. Future wearable computer systems will loose many of
character and word recognition are performed to evaluate their advantages, if the interface would still rely on an extra
the end-to-end system. On a character database with 10 text input device held in hand.
writers, we achieve an average writer-dependent character
recognition rate of 94.8% and a writer-independent charac- Fortunately, the technology of sensors is advancing signifi-
ter recognition rate of 81.9%. Based on a small vocabulary cantly, allowing for very small, body-worn, wearable sensors
of 652 words, we achieve a single-writer word recognition that foster the design of unobtrusive, mobile, and robust in-
rate of 97.5%, a performance we deem is advisable for many terfaces suitable for intuitive wearable computing systems.
applications. The final system is integrated into an online Traditionally used sensors, which are small and come at very
word recognition demonstration system to showcase its ap- reasonable costs are accelerometers, gyroscopes and magne-
plicability. tometers. Based on the sensor signals, pattern classifica-
tion techniques can be applied to recognize the performed
Categories and Subject Descriptors gestures. Current research on inertial sensor-based gesture
recognition concentrates on the recognition of a small set
H.5.2 [Information Interfaces and Presentation]: In-
of very simple predefined single gestures. These gestures
put Devices and Strategies; I.5.1 [Computing Method-
can then be mapped to certain functions to build gesture
ologies]: Models—Statistical
based interfaces. Kim et al. [7] use a pen-like accelerometer-
based device to recognize single Roman and Hangul charac-
General Terms ters written in the air.
Design, Algorithms, Human Factors
While predefined gestures are easier to recognize for the ma-
1. INTRODUCTION chine, it is more burden for the users since they have to learn
The field of Human-Computer Interaction has a long tradi- and memorize these gestures. Therefore, we aim at an user-
tion of exploring and implementing numerous devices and adaptable approach that does not make any assumptions on
interfaces. However, the requirements and conditions for the used gestures. Furthermore, our approach reaches be-
such interfaces shifted drastically over the past years due to yond [7] by introducing a hands-free device to write Roman
the changing demands of our global and mobile information- characters and even whole words in the air. We developed
based society. Today’s electronic devices and digitial assis- a data glove instead of a pen, which is less obstrusive than
tants fit into the smallest pocket, are ubiquitous, always on- an additional device. We implemented an HMM-based rec-
ognizer for whole words by concatenating character models
allowing complex text input. For the purpose of this study
we limited the input character set to capital Roman letters.
Permission to make digital or hard copies of all or part of this work for However, since we apply statistical models, our system can
personal or classroom use is granted without fee provided that copies are deal with any kind of one-handed gestures.
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific 2. RELATED WORK
permission and/or a fee.
Augmented Human Conference April 2-3, 2010, Megève, France. The field of gesture recognition has been extensively studied
Copyright c 2010 ACM 978-1-60558-825-4/10/04 . . . $10.00. in the past. Two main approaches can be identified, external
strokes and therefore are able to use traditional handwrit-
ing training data for their recognizer. Both works rely on the
reconstruction of the 3D trajectory. Since errors accumulate
quickly over time in this approach, it will only give reason-
able results for short periods of time and is thus difficult
to apply to the continuous recognition of words. There has
also been work on single gesture recognition using a range
of other classification methods, such as Naive Bayes, Mul-
tilayer Perceptrons, Nearest Neighbour classifiers (Rehm et
al. [14]), and SVMs (Wu et al. [15]). Oh et al. [11] use
Fisher Discrimant Analysis to recognize single digits. The
mentioned methods might perform well on single gesture
recognition but are not applicable to recognize a continuous
sequence of gestures.
Figure 1: Prototypical CSL data glove.
2.3 Continuous Gesture Recognition
Hidden Markov Models have been applied to gesture and
systems and internal systems, with the latter using body- handwriting recognition, since they can be easily concate-
mounted sensors, including hand-held devices. The former nated to model continuous sequences. A good tutorial in-
type is traditionally based on non-wearable video based sys- troduction in the context of speech recognition was written
tems and thus is not applicable to our envisioned scenario. by Rabiner [13]. Liang et al. [9] make use of HMMs for con-
tinuous sign language recognition, but they use desk based
Polhemus magnetic sensors. Kim and Chien [8] also used
2.1 Sensors and Devices such a sensor to recognize complex gestures based on con-
Different sensing techniques are applied to gesture recogni- catenated HMMs of single strokes. Starner and McGuire [10]
tion based on body-worn sensors. Brashear et al. [3] use a use HMMs for continuous mobile american sign language
hybrid system, consisting of a head mounted camera and recognition with a 40 word vocabulary, modelling each word
accelerometers worn at the wrist for sign language recogni- by one HMM. Plamondon and Srihari [12] give a survey
tion. While such a hybrid system seems promising for sign on traditional on-line and off-line handwriting recognition,
language recognition, the number and positioning of sensors, where HMMs are applied with success for the on-line case.
particularly those positioned at the head, limit the users’ ac-
ceptance. Kim et al. [7] use a wireless pen device, equipped It seems that few attempts have been made towards contin-
with accelerometers and gyroscopes for airwriting character uous gesture recognition solely with inertial sensors. Fur-
recognition. Their good recognition indicate that airwrit- thermore, the set of gestures is also normally limited to a
ing recognition based on accelerometers and gyroscopes is small number of 10 to 20 gestures, which are often chosen
feasible. and defined for ease of discrimination. In our work, we in-
troduce a system capable to model sequences of primitive
Sensors are often integrated into a device, which has to be gestures and evaluate our system with character gestures.
carried around and held in hand for operation. Therefore,
external devices are more obtrusive. Instead, data gloves 3. AIRWRITING CHALLENGES
have been proposed for gesture recognition, for example by
In comparison to conventional online handwriting recogni-
Hofmann et al. [6] and Kim and Chien [8]. Usually, these
tion, typically used on tablet PC’s, we have to deal with
gloves contain a magnetic sensor to determine the global
some specialities when using inertial sensors for writing in
position through an externally induced magnetic field, pro-
the air. In traditional online handwriting recognition, the
hibiting mobile or outdoor applications. These gloves also
2D trajectory of the written strokes is available and classi-
incorporate many sensors delivering information on finger
fication is done based on this data. In contrast, we get the
movement and position, which makes them quite bulky. Al-
linear acceleration and the angular velocity of the sensor.
ternatively, an accelerometer equipped bracelet for gesture
Figure 2 shows the raw sensor signals received by writing
recognition was proposed by Hein et al. [5] and Amft et al.
the character A. The two main differences of our approach
[1], who used accelerometers included in a watch to control
from traditional handwriting recognition are the absence of
certain functions of the watch by gestures. In our work, we
the trajectory information and the pen-up and pen-down
combine the advantages of data gloves with the convenience
movements. The missing pen-up and down movements re-
of small hand-held devices by designing and implementing a
sult in a single continuous stroke for the whole writing, i.e.
very slim and unobstrusive interfaces based on a data glove,
we get a continuous data stream from the sensors which lacks
as depicted in Figure 1. One should be aware that this is
any segmentation information. If a writer produces several
a first prototype and that current technology already allows
sentences, the result would be one single stroke. This makes
further miniaturization.
the task more difficult, since pen-up and down movements
automatically give a segmentation of the data. While this is
2.2 Single Character Classification not a segmentation of characters it gives useful information,
Cho et al. [4] and Kim et al. [7] use a pen-type device for we are missing. Also the motion between consecutive char-
single digit and single character recognition using Bayesian acters, which would normally be pen-up, will be represented
Networks. Kim et al. [7] introduce a ligature model based in the signals we get from the inertial sensors. We face this
on Bayesian Networks to recognize pen-up and pen-down problem by introducing a repositioning model between the
individual characters of words. The experimental results on
word recognition show that this modification of the HMM 500
is suitable.

Acceleration in mg
0 ay
We also do not get the 3D trajectory easily. While it is
theoretically possible to reconstruct the trajectory from a 6 −500
ax
DOF sensor like the one we use by applying a strapdown
inertial navigation algorithm, it is practically a hard task −1000
az
because of sensor drift and noise. A standard strapdown
algorithm integrates the angular rate once to obtain the at- −1500
0 0.5 1
titude of the sensor, then the gravitational acceleration can Time in s
be subtracted from the acceleration signals and finally dou-
ble integration of the acceleration yields the position. This 150

Angular velocity in deg/s


triple integration leads to summation of errors caused by 100
sensor drift and noise and after a few seconds the error in
50 gx
position is so high that no reconstruction which is close to
the real trajectory is possible. Bang et al. [2] propose an al- 0
gorithm to reconstruct the trajectory of text written in the −50 gz
air, which works well for isolated characters or short words
−100 gy
(in sense of writing time). Since our goal is a system, that
allows for arbitrary long input, we avoid this problem by di- −150
0 0.5 1
rectly working on the sensor signals without estimating the Time in s
3D trajectory. The experimental results show that this is
feasible in practice.
Figure 2: Raw sensor signals for the character A.
Gravitation is also a problem, we have to deal with, even The upper plot shows accelerometer data, the lower
without estimating the trajectory. Earth acceleration is al- plot shows gyroscope data.
ways present in the measured acceleration signals and can
only be subtracted when the exact attitude of the sensor is
known. Due to the mentioned difficulties concerning sen- 5. DATA ACQUISITION
sor drift, gravitational acceleraction is compensated by sub- We have collected character data from 10 subjects, five male
tracting the signal mean. This approximation equals the and five female. One of the subjects was left-handed, the
assumption, that the sensor attitude is constant over the other nine were right-handers. Every subject wrote 25 times
time of one recording. The Variance of the signals is also the alphabet resulting in 650 characters per writer and 6500
normalized to reduce the effect of different writing speeds. characters in the dataset in total. In the first recording
sessions, the alphabet was recorded in alphabetical ordering,
in the latter sessions, every character was recorded in the
4. DATA GLOVE context of every other. For every subject, data was recorded
Our system consists of a thin glove and a wristlet. The in one recording session. Word data was collected from one
glove holds the sensor and the wristlet holds the controller test person which contributed 652 English words to the data
board and the power supply. Figure 1 shows a picture of the set. The words were chosen at random from the list of the
system. A microcontroller reads the data from the sensor “1000 most frequent words in English“2 from the corpus of
and sends it over a bluetooth link to a computer, where the the University of Leipzig. Every word was recorded once.
recognition is performed. The power supply consists of two Table 1 gives a summary of the recorded data.
rechargeable micro cells allowing operation for more than 4
hours. As sensor, we use an Analog Devices ADIS163641 The subjects were sitting on a chair and were asked to write
Inertial Measurement Unit. This sensor contains three or- in front of them like writing on an imaginary blackboard.
thogonal accelerometers for measuring translational accel- The writing hand was approximately at eye level. Writers
eration and three orthogonal gyroscopes for measuring an- were asked to write characters between 10 to 20 cm in height
gular velocity. The sensor is cubic with an edge length of and to keep the wrist fixed while writing. Furthermore, the
23 mm. This is due to the 3-dimensional orientation of the subjects were told to write in place, which means, the hori-
gyroscopes, 3-axis accelerometers are available as standard zontal position should be approximately the same for every
flat integrated curcuits. The measurement range of the ac- character written.
celerometers is -5 g to 5 g with a resolution of 1 mg. The
gyroscope measurement range is ±300 deg/s with a resolu- All writing in our experiments was done in capital block let-
tion of 0.05 deg/s. The sampling rate is 819.2 Hz. The sensor ters. Since every writer has its own writing style even for
data is read out by a Texas Instruments MSP430F2132 mi- block letters, this led to several writing variants for some
crocontroller with 8KB Flash memory and 512 Byte RAM, letters. These variants are referred to as allographs in hand-
operating at up to 16 MHz. The microcontroller sends the writing recognition. For example, the letter E has five main
data over an Amber Wireless AMB2300 class 2 Bluetooth allographs even in our small data set. The subjects were
Module to a laptop. asked to be as consistent as possible in the way they write

1 2
www.analog.com http://www.wortschatz.uni-leipzig.de/html/wliste/html
“A”
“TO”
“T” Repos “O”
Data Sensor Preprocessing Normalized Decoding (HMM + Recognized
Acquisition Values Features Language Model) Character or Word

Figure 3: System Overview: Motion data is gathered by the sensors on the glove. The raw data is sent to a
computer and preprocessed resulting in a feature vector. The feature vectors are classified using an HMM
decoder in combination with a language model. The recognized character or word is the output of the system.

Writer Char. Samples Time 6. RECOGNITION SYSTEM OVERVIEW


Character Recordings Our system consists of a glove equipped with sensors to mea-
A 650 555359 11m 18s sure linear acceleration and angular velocity of hand motion.
B 650 964185 19m 37s Sensor data is sent to a computer via Bluetooth. A user can
C 650 733603 14m 56s perform 3-dimensional gestures, which are recognized by an
D 650 924252 18m 48s HMM classifier based on the deliverd signals. Basically any
E 650 717878 14m 36s kind of gesture sequence can be modeled. We use character
F 650 682580 13m 53s gestures and whole words written in the air to show the po-
G 650 594320 12m 05s tential of our system. Figure 3 shows the basic functional
H 650 1122828 22m 51s blocks in a diagram.
I 650 604499 12m 18s
J 650 779136 15m 51s 6.1 Modeling
Word Recordings The HMM modeling was done using the Janus Recognition
A 3724 2332410 47m 27s Toolkit, developed at the Karlsruhe Institute of Technol-
ogy (KIT) and Carnegie Mellon University in Pittsburgh.
Table 1: Overview of the collected data recordings. As features, we used the accelerometer and gyroscope data,
The samples column gives the total amount of sensor which was normalized by mean and variance resulting in a
data samples and the time column the corresponding six-dimensional feature vector per sample. The sampling
recording time. rate was always set to 819.2 Hz and no filtering was applied
to the signals, since the signals are already very smooth.
We have made experiments with a moving average filter,
the letters, i.e. stick to one writing variant. When making which showed no impact. A 0-gram language model was
mistakes, the subjects were able to correct themselves, they used which allows the recognition of single instances of char-
were not observed all the time. For that purpose, the record- acters or words out of a defined vocabulary.
ing software allows repeating of characters, which were not
written or segmented correctly.
6.2 Character Recognition System
The segmentation of the recordings was done manually by For all characters, HMMs with the same topology were used.
the subject while recording. The subject had to press a key, We always used left-right models with self transitions. Hof-
before starting and after finishing a character or word. This mann et al. [6] use HMMs for accelerometer based gesture
key press was performed with the other hand than the one recognition with a data glove. They evaluate the perfor-
used for writing. All subjects were told to have their writing mance of ergodic and left-right models and can find no signif-
hand in start position for the first stroke of the individual icant difference for the task of gesture recognition. The out-
character or word and to hold it still, when pressing the put probability function is modeled by Gaussian Mixtures.
start key. After pressing the start key, the character or word The number of states and the number of Gaussians per states
was written in the air. When the end point of the writing was varied on a per experiment basis. That means for one
motion was reached, the subjects should hold their hand experiment, every character model has the same number of
still and press the stop key. This kind of segmentation is states and Gaussians. Before training, the Gaussians are
not as accurate as using a video stream recorded in parallel initialized by k-means clustering of the training data. All
but avoids manual postprocessing. The motion between two training samples are linearly split into as many sample se-
characters was also recorded for initialization and training quences of equal length as the HMM model has states. By
of repositioning HMMs used for word modeling. Table 1 that all samples are initially assigned to exactly one HMM
shows large differences in recording time, which is probably state. All samples corresponding to the same state are then
not only caused by differences in writing speed, but also by collected and clustered in as many clusters as the number
differences in segmentation quality. of Gaussians in the Gaussian Mixture. Mean and Variance
of the clusters are then used as initial values for the GMM.
Afterwards the models are trained to maximize their likeli-
hood on the training data using Viterbi training. Different
6 States, 1 GMM 10 States, 2 GMMs 10 States, 5 GMMs

100

T repos O

Recognition Rate in %
95
Figure 4: Context model for the word “TO”. It con-
sists of the independent models for the graphemes
“T” and “O” with 7 states and the 2 state model for
the repositioning in between. 90

writing variants of one and the same character are modeled


by one HMM. There is always one model for one character. 85
A B C D E F G H I J Avg
Writer

6.3 Word Recognition


Word recognition was performed by concatenating charac- Figure 5: Results of the writer-dependent charac-
ter HMMs to form word models. For a given vocabulary, ter recognition for 3 different HMM topologies and
word models were built from the existing character models. number of Gaussians. The worst and best perform-
This enables the use of an arbitrary vocabulary. Normally, ing systems are shown by black and white bars. To
one has to move the hand from the endpoint of a character avoid the risk of overspecification, the system rep-
to the start point of the next character. To model these resented by the grey bar was used for evaluation.
repositioning movements, a repositioning HMM is inserted
between the character models. We use one left-right HMM
to model all repositioning movements. Figure 4 shows an ex- Gaussians per mixture was investigated. In Figure 5 the
ample word model. The initialization and training of these results of three exemplary systems with different number
models is the same as for character models. Training data of states and Gaussians are shown. The worst performing
for the repositioning models was collected along with the system with 6 states and 1 Gaussian per state reached an
data for the character models. average recognition rate of 93.7% and the best performing
system with 10 states and 5 Gaussians per state reached
6.4 Evaluation Method 95.3%. Since the data set is small, overspecification of mod-
On the collected data, two main kinds of experiments have els is an issue. To avoid the possibility of overspecificication,
been performed, writer-dependent recognition and writer- we took a well performing system with few Gaussians in to-
independent recognition. For the experiments on writer- tal. Our final evaluation of the writer-dependent system was
dependent recognition, data from one writer was taken and done with a 10 state model with 2 Gaussians per state. The
divided into a training, development and test set. On the performance of this system is represented by the grey bar
development set, the parameters, namely number of HMM in Figure 5. Table 2 shows the breakdown of results for all
states and number of Gaussians, were optimized. All writer- writers in the column named WD. An average recognition
dependent results given are from the final evaluation on the rate of 94.8% was reached for the writer-dependent case.
test set. Writer-independent recognition was evaluated us-
ing a leave-one-out cross validation. We took the data from
one writer as test set and the data from the other writers Writer-Independent. As written in section 6.4, a leave-
as training data. This is done once for every writer. Again one-out cross validation was performed on the collected char-
different parameters for the number of states and Gaussians acter data set to investigate the performance of a writer-
were used and the parameter set was chosen, which gives the independent system. The cross validation was done on all
best average recognition rate over all writers. No indepen- data and on the right-hander data only. When models are
dent test set was used here, due to the small data collection. trained on solely right-hander data and tested on the left-
The performance of all systems is given by the recognition hander data, results are very poor. We repeated all exper-
rate, which is defined as the percentage of correctly classified iments without the left-hander data to exclude this effect.
references out of all references. In case of character recogni- We used HMMs with 8, 10, 12 and 15 states and 2 to 6 Gaus-
tion, a reference is one character, in case of word recognition sians per state. Parameters were optimized on the data of
a reference is one word. the left out writer in each cross validation fold. Figure 6
shows the recognition rate dependent on the total number
7. EXPERIMENTS AND RESULTS of Gaussians of the system for the right-handers only test. If
more than one parameter combination resulted in the same
7.1 Character Recognition total amount of Gaussians, the range of achieved recognition
rates is illustrated by a vertical bar. One can clearly see the
Writer-Dependent. We evaluated writer-dependent recog- performance gain by increasing the number of Gaussians.
nition performance by testing the recognizer on each writers We used models with 15 states and 6 Gaussians per state
data individually. For each writer, the data was divided for the writer-independent evaluation. Table 2 shows the
randomly in a training (40%), development (30%) and test breakdown of the individual results for the right-hander only
(30%) set. The character frequencies were balanced over the evaluation and the evaluation with all ten writers. A recog-
sets. A number of 6, 8, 10 and 12 HMM states and 1 to 5 nition rate of 81.9% was reached for the writer-independent
Writer Recognition Rate
WD WI(RH) WI(ALL)
A (RH) 97.9 75.4 73.5 82.5
B (RH) 98.5 83.5 84.3
C (LH) 96.4 46.8
D (RH) 98.0 90.5 89.8 82
E (RH) 89.2 86.9 85.4

Recognition Rate in %
F (RH) 91.3 77.2 77.1
G (RH) 94.9 80.6 78.8 81.5
H (RH) 91.8 72.9 73.8
I (RH) 96.4 86.0 87.8
81
J (RH) 93.9 84.3 84.6
Average 94.8 81.9 78.2
80.5
Table 2: Results of character recognition exper-
iments, writer-dependent and writer-independent.
The second column of the table shows the writer- 80
dependent results. The third column shows the 10 20 30 40 50 60 70 80 90 100
Total Number of GMMs
writer-independent results when leaving out the
left-hander, the fourth column shows the writer-
independent results for all writers. Figure 6: Results of the writer-independent charac-
ter recognition on right-handers dependent on the
total amount of Gaussians per HMM. If different
parameter combinations had the same total amount
case on the right-hander data.
of Gaussians, the performance range is shown as ver-
tical bar. A polynomial fitted on the data illustrates
It is not surprising that the recognition performance drops
the tendency.
when testing on the left-handed person, since the writing
style of this person differs in a fundamental way from the
writing of the right-handed test persons. All horizontal
strokes are written in opposite direction and all circular let-
ters are also written in opposite direction.

The main problem of the right-hander only systems are am- hypothesis
biguities in characters and writing variants of different writ- A B C D E F G H I J K L MN O P Q R S T U VWX Y Z
ers. Figure 7 shows the confusion matrix for the cross val- A A
idation on the right-handed writers. First of all, there are B B
C C
problems with similar graphemes like the pairs (P, D) and D D
(X, Y ). The similarities are obvious. In case of P and D, E E
F F
the only difference is the length of the second stroke (the G G
arc). In case of X and Y, depending on how people write H H
I I
it, the only difference is the length of the first stroke (from J J

false negatives
upper left to lower right). One should notice that the writ- K K
reference

L L
ers did not get any kind of visual feedback on their writing. M M
This probably leads to even more ambiguous characters than N N
O O
when writing with visual feedback. The pair (N, W ) is also P P
subject to frequent misclassification. The reason gets obvi- Q Q
R R
ous when considering the way the character N was written S S
by some test persons. Four of the nine right-handers started T T
U U
the N in the upper left corner. They moved down to the V V
lower left corner and up to the upper left corner again be- W W
X X
fore writing the diagonal stroke. An N written this way has Y Y
the same stroke sequence than a W . Figure 8 illustrates this Z Z
ambiguity.
A B C D E F G H I J K L MN O P Q R S T U VWX Y Z
We see that most classification errors arise from the differ- false positives
ences in writing style between the individual writers. The
test persons do write characters in different ways even un- Figure 7: Accumulated confusion matrix for the
der the constraint of block letters. Some of the variants cross validation of the right-handed writers. The
have a very similar motion sequence to variants of different confusion matrices of the tests on each writer were
characters observed by other writers. This leads to more summed together.
ambiguities than in the writer-dependent case. On the sin-
gle character level, it is hard to solve this problem. But
when switching to recognition of whole words, the context
Train Test Word Training Recognition
Iterations Rate
Set A Set B 0 74.2
Set B Set A 0 71.8
Average 0 73.0
Set A Set B 10 97.5
Set B Set A 10 97.5
Average 10 97.5
(a) (b) (c)
Table 4: Results of the writer-dependent word
Figure 8: Writing variants of N (a),(b) compared to recognition. The character models were already
the stroke sequence of W (c). The allographs (b) trained on character data. The number of training
and (c) have the same basic stroke sequence. iterations in the table corresponds only to further
training on words.
Data Features States GMM Rec. Rate
RH axyz 15 6 76.8 is inaccurate and by that also character models can profit
RH axyz ,gxyz 15 6 81.9 from word training. We can see that word recognition per-
formance is in the same range than character recognition
Table 3: Comparison of sensor configurations performance for this writer. Typical misclassifications occur
on writer-independent character recognition. Ac- by confusing words that barely differ, like “as” and “was” or
celerometer and gyroscope features are compared “job” and “jobs”.
to accelerometer only features.
8. DEMONSTRATOR
information should help dealing with these ambiguities. Finally we built an online demo system to showcase the ap-
plicability of the recognizer. The trained models from the
We also investigated the effect of using only accelerometer word recognition experiments were taken and a small vo-
data as features. We would be able to keep the sensor flat cabulary was used. The system can recognize the words
and cheaper if we do not use gyroscopes. We compared the necessary to form the sentences “Welcome to the CSL” and
results of using accelerometers (axyz ) and gyroscopes (gxyz ) “This is our Airwriting System”. The demonstration system
to the results of using only accelerometers. Table 3 shows the uses a laptop with a 2.4 Ghz Intel Core 2 Duo processor
results of this comparison. We see, that accelerometer only and 2 GB RAM for the data recording and the recognition.
performance is worse than with the full sensor setup, but The system runs stable and few recognition errors occur.
depending on the application, this might still be acceptable. A demonstration video of the system can be seen on our
website 3 .

7.2 Word Recognition 9. CONCLUSION AND FUTURE WORK


The word data was recorded from test person A, who also
We designed and implemented a wearable computer device
contributed to the character data set. To build a writer-
for gesture based text input. The device consist of a slim
dependent word recognizer, we took the character data from
data glove with inertial sensors and the ability to trans-
this test person to initialize the character models. We used
fer sensor data wireless. This enables hands-free opera-
the trained models from the experiments on writer-dependent
tion. We made experiments on character and word recog-
character recognition, i.e. HMMs with 10 states and 2 Gaus-
nition. For the task of writer-dependent character recog-
sians per state. The repositioning models were trained on
nition, we reached an average recognition result of 94.8%.
the repositioning data from the character recording session.
For the writer-independent case, a recognition rate of 81.9%
We used models with 7 states and 2 Gaussians per state. We
was reached. We identified ambiguities between characters
randomly split the word data set into two sets (Set A and
caused by individual writing style as main reason for the sig-
Set B) of equal size, taking either of the two once as train-
nificantly lower recognition rate in the writer-independent
ing and once as test set. The vocabulary always contained
case. It will be hard to solve these problems on the single
all 652 words from the recording, that means the recognizer
character level. Since our main goal is not single charac-
had models for all these words and chose one out of this
ter recognition, but recognition of text, context information
set as hypothesis. The words were all recorded only once
should help to solve most of the ambiguities. We imple-
and there were no duplicates in the set. That means, no
mented a word recognizer using the existing character mod-
word in the test set appeared in the training set before. We
els from character recognition and reached an average writer-
evaluated the recognizer without any additional training on
dependent recognition rate of 97.5% for a single test person
word recordings and after 10 training iterations on the word
on a vocabulary of 652 words. We bypass the problems aris-
training set. Table 4 shows the results of the experiments.
ing from sensor inaccuracies when performing a trajectory
Word training boosts performance significantly. The reason
reconstruction by working directly on the acceleration and
is probably two-fold. First, the repositioning models from
angular velocity values. We show that the introduction of
the character experiments are supposingly not very good,
repositioning models between characters is a suitable way to
since they were trained with the movements occuring in the
artificial pauses between consecutive character recordings. 3
http://csl.ira.uka.de/fileadmin/Demo-Videos/airwriting -
Second, the manual segmentation in the character recording chris.mpg
deal with the lack of pen-up and down information. We im- hidden markov models. Applied Intelligence,
plemented a demo system and showed that online operation 15(2):131–143, 2001.
is feasible. [9] R.-H. Liang and M. Ouhyoung. A real-time continuous
gesture recognition system for sign language. In Third
We plan to extend the word recognition system to be writer- IEEE International Conference on Automatic Face
independent by building a word database with different writ- and Gesture Recognition, 1998. Proceedings., pages
ers. Then we will analyze, if context information really 558–567, 1998.
solves most of the ambiguities arising in writer-independent [10] R. McGuire, J. Hernandez-Rebollar, T. Starner,
character recognition. We also plan to use a more complex V. Henderson, H. Brashear, and D. Ross. Towards a
language model, which will enable us to recognize whole sen- one-way american sign language translator. In Sixth
tences. IEEE International Conference on Automatic Face
and Gesture Recognition, 2004. Proceedings., pages
We will further miniaturize the device by reducing power 620–625, 2004.
consumption and with this the size of batteries. We will also [11] J. Oh, S.-J. Cho, W.-C. Bang, W. Chang, E. Choi,
further investigate the ability to abandon the gyroscopes. J. Yang, J. Cho, and D. Kim. Inertial sensor based
recognition of 3-d character gestures with an ensemble
10. ACKNOWLEDGMENTS classifiers. Ninth International Workshop on Frontiers
We would like to thank Michael Mende for developing the in Handwriting Recognition, 2004. IWFHR-9 2004.,
circuit layout of the electronic components for us and Wolf- pages 112–117, 2004.
gang Rihm for the soldering of the components. [12] R. Plamondon and S. Srihari. Online and off-line
handwriting recognition: a comprehensive survey.
IEEE Transactions on Pattern Analysis and Machine
11. REFERENCES Intelligence, 22(1):63–84, 2000.
[1] O. Amft, R. Amstutz, Smailagic, A., D. Siewiorek,
and G. Tröster. Gesture-controlled user input to [13] L. Rabiner. A tutorial on hidden markov models and
complete questionnaires on wrist-worn watches. In selected applications in speech recognition. In
Human-Computer Interaction. Novel Interaction Proceedings of the IEEE, pages 257–286, 1989.
Methods and Techniques, volume 5611 of Lecture [14] M. Rehm, N. Bee, and E. André. Wave like an
Notes in Computer Science, pages 131–140. Springer egyptian: accelerometer based gesture recognition for
Berlin / Heidelberg, 2009. culture specific interactions. In BCS-HCI ’08:
[2] W.-C. Bang, W. Chang, K.-H. Kang, E.-S. Choi, Proceedings of the 22nd British HCI Group Annual
A. Potanin, and D.-Y. Kim. Self-contained spatial Conference on HCI 2008, pages 13–22, Swinton, UK,
input device for wearable computers. Seventh IEEE UK, 2008. British Computer Society.
International Symposium on Wearable Computers, [15] J. Wu, G. Pan, D. Zhang, G. Qi, and S. Li. Gesture
2003. Proceedings., pages 26–34, 2003. recognition with a 3-d accelerometer. In Ubiquitous
[3] H. Brashear, T. Starner, P. Lukowicz, and H. Junker. Intelligence and Computing, volume 5585 of Lecture
Using multiple sensors for mobile sign language Notes in Computer Science, pages 25–38. Springer
recognition. Seventh IEEE International Symposium Berlin / Heidelberg, 2009.
on Wearable Computers, 2003. Proceedings., pages
45–52, 2003.
[4] S.-J. Cho and J. Kim. Bayesian network modeling of
strokes and their relationships for on-line handwriting
recognition. Pattern Recognition, 37(2):253–264, 2004.
[5] A. Hein, A. Hoffmeyer, and T. Kirste. Utilizing an
accelerometric bracelet for ubiquitous gesture-based
interaction. In Universal Access in Human-Computer
Interaction. Intelligent and Ubiquitous Interaction
Environments, volume 5615 of Lecture Notes in
Computer Science, pages 519–527. Springer Berlin /
Heidelberg, 2009.
[6] F. Hofmann, P. Heyer, and G. Hommel. Velocity
profile based recognition of dynamic gestures with
discrete hidden markov models. In Gesture and Sign
Language in Human-Computer Interaction, volume
1371 of Lecture Notes in Computer Science, pages
81–95. Springer Berlin / Heidelberg, 1998.
[7] D. Kim, H. Choi, and J. Kim. 3d space handwriting
recognition with ligature model. In Ubiquitous
Computing Systems, volume 4239/2006 of Lecture
Notes in Computer Science, pages 41–56. Springer
Berlin / Heidelberg, 2006.
[8] I.-C. Kim and S.-I. Chien. Analysis of 3d hand
trajectory gestures using stroke-based composite
AUGMENTING THE DRIVER’S VIEW
WITH REALTIME SAFETY-RELATED INFORMATION
Peter Fröhlich, Raimund Schatz, Stephan Mantler
Peter Leitner, Matthias Baldauf
Telecommunications Research Center (FTW) VRVis Research Center
Donaucity-Str. 1, 1220 Vienna, Austria Donaucity-Str. 1, 1220 Vienna, Austria
{froehlich, schatz, baldauf}@ftw.at mantler@vrvis.at

ABSTRACT INTRODUCTION
In the last couple of years, in-vehicle information systems In-vehicle information systems, such as personal navigation
have advanced in terms of design and technical sophistica- devices, built-in driver assistance units and Smartphones,
tion. This trend manifests itself in the current evolution of have become standard equipment in today’s cars - and their
navigation devices towards advanced 3D visualizations as capabilities are quickly evolving. The most obvious ad-
well as real-time telematics services. We present important vances are related to the visual presentation at the in-
constituents for the design space of realistic visualizations vehicle human-machine interface (HMI). On the consumer
in the car and introduce realization potentials in advanced mass market, we see a clear trend towards increasingly rea-
vehicle-to-infrastructure application scenarios. To evaluate listic representations of the driver’s outside world, includ-
this design space, we conducted a driving simulator study, ing textured 3D renderings of highway junctions, road de-
in which the in-car HMI was systematically manipulated tails, mountains, and buildings [14]. Arrows and icons are
with regard to its representation of the outside world. The exactly overlaid over the virtual representation of the driv-
results show that in the context of safety-related applica- er’s field of view to aid in navigation tasks. This develop-
tions, realistic views provide higher perceived safety than ment towards realistic visualization is further strengthened
with traditional visualization styles, despite their higher by the advent of augmented reality navigation systems on
visual complexity. We also found that the more complex market-available handheld devices (e.g. [12]).
the safety recommendation the HMI has to communicate,
Up to now, such realistic visualizations are mostly applied
the more drivers perceive a realistic visualization as a valu-
to navigation. However, with emerging co-operative ve-
able support. In a comparative inquiry after the experiment,
hicle-to-infrastructure or vehicle-to-vehicle communica-
we found that egocentric and bird’s eye perspectives are
tions technology [4,16,20,18], they will also become rele-
preferred to top-down perspectives for safety-related in-car
vant for delivering more advanced safety-related services.
safety information systems.
For example, drivers could be notified about sudden inci-
dents and provided with recommendations on how to react
Author Keywords
User studies, Telematics, Realistic Visualization accordingly. In this context, the major challenge is the fact
that the driver actions required can be fairly unusual and
ACM Classification Keywords unexpected, and thus might not be adequately understood or
H.5.1. Information Interfaces and Presentation: Multimedia implemented. For example, drivers may be asked to stop
Information Systems—Artificial, augmented, and virtual before a tunnel on the emergency lane due to an accident
realities; H.5.2. Information Interfaces and Presentation: ahead.
User Interfaces—GUI In this application context, realistic visualization could
represent both merit and demerit: information attached to a
quasi-realistic mapping of the outside reality might be rec-
ognized more quickly than with today’s schematic visuali-
Permission to make digital or hard copies of all or part of this work for zations, but on the other hand the wealth of details might as
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
well hamper the identification of task-relevant information.
bear this notice and the full citation on the first page. To copy otherwise, It should be clear that the effects of realistic visualizations
to republish, to post on servers or to redistribute to lists, requires prior on usability and user experience must be fully understood
specific permission and/or a fee. before recommending their use in millions of cars. In order
Augmented Human Conference, April 2–3, 2010, Megève, France. to achieve this goal, systematic and reflective user-oriented
Copyright © 2010 ACM 978-1-60558-825-4/10/04…$10.00.
research is needed.
In this paper, we present an experimental study to evaluate current systems, such additional virtual information typical-
the influence of realistic visualizations on perceived driving ly relates to route indications, congestion information, as
safety and satisfaction. We were interested in finding out well as information on points-of-interest. Typical means of
whether realistic visualizations provide an added value in augmentation are color coded lines or arrows, icons, text
terms of safety and user experience, or whether they are just and numbers. Visualization approaches reach from colored
“eye-candy” that could even endanger the driver and other overlays over the road to virtual “follow-me cars” implicitly
traffic participants. We first specify the basic elements and indicating the speed and direction (compare [13,8]).
characteristics of realistic visualizations. Departing from
this, we formulate a set of research questions that are and Constituents Characteristics
describe the method of an experimental study to address
Map representation Schematic 2D Untextured 3D Textured 3D
them. We finally provide a detailed results description and
provide suggestions for further research. Viewing perspective Top-Down Bird’s Eye Egocentric

SPECIFYING REALISTIC VISUALIZATIONS Envir. objects None Selected All


The extent of realism of an HMI’s real-world representation Augmentation Text Icons Arrows and lines
can be described by a number of constituents (see Figure 1):
map representation, viewing perspective, environmental Match with real view of outside world
objects, and augmentation. The possible properties of these
constituents are now briefly described, and then prototypi- Figure 1: Constituents of Realistic Visualizations
cal combinations are presented.
Prototypical visualization styles
Constituents of realistic visualizations Each one of the aforementioned constituents is necessary to
Map representation: Digital 2D maps have long been the describe the extent of realism of an outside world represen-
standard way to present the outside world on the HMI. tation on the HMI. However, the constituents also must be
Meanwhile, however, the global availability of environmen- seen in combination, as their properties exert mutual influ-
tal models from map providers like Navteq Inc. has moti- ence on each other. The following combinations can be
vated the integration of 3D spatial representations also in regarded as prototypical variants within the design space of
portable navigation devices. Starting from basic 2.5D build- realistic visualizations:
ing representations and schematic landscape models, we
Conventional view: In most of the navigation systems cur-
witness a gradual increase in fidelity towards fully textured
rently available, a schematic 2D map of the outside world is
scenes with complete buildings.
presented from a bird’s eye perspective, with occasional
Viewing perspective: Most HMIs provide dynamic map display of a few important points of interest in the environ-
displays that automatically align themselves towards the ment. The recommended route is visualized by a schematic
driver’s surroundings, based on orientation information overlay over the map, and for recommendations and warn-
derived by a compass or by the sequence of GPS co- ings icons are georeferenced on the map.
ordinates. Another common feature is a “bird’s eye” view
Realistic view: An idealized “reality view” of an in-car
on the road situation, with the camera being positioned
HMI would be a quasi-realistic 3D map representation, dy-
slightly behind and above the virtual vehicle. When 3D
namically presented from the driver’s own viewing pers-
maps with detailed objects are displayed, often also a fully
pective, including all environmental objects. This visualiza-
egocentric view is provided, which matches the driver’s
tion would be augmented by accurately spatially-referenced
field-of-view. This view then can be of use for the display
overlay lines and arrows, as well as by 3D-spatially refe-
of complex junctions, for example.
renced icons. Realistic views may also be realized by aug-
Environmental objects: Due to current developments to- mented reality, provided that accuracy problems with align-
wards realistic visualizations, an increasing amount of envi- ing the virtual guidance information with the real scene are
ronmental objects becomes displayed to the user. This in- solved.
cludes navigation-related objects, such as turn indications
Conventional and realistic interleaved: most contemporary
on the road and direction signs. High fidelity representa-
navigation systems also feature dynamic switching between
tions of surrounding objects are also used to provide loca-
different visualization modes, depending on current context.
tion-based search and purchase, such as when a user is
For example, in non-critical situations, a conventional
looking for the next gas station or shopping mall. Further-
bird’s eye view is presented as default (with varying accu-
more, significant architectural landmarks are shown in more
racy of environmental objects). However, when approach-
detail.
ing critical points (such as highway junctions), the device
Augmented information: To provide the actual recommen- might switch to a realistic view, in order to avoid ambi-
dations on the HMI, the above-described scene representa- guous situations and reduce the potential for misinterpreta-
tions are overlaid with virtual objects and elements that tion and navigation errors by the driver.
indirectly refer or point to aspects of the environment. In
RESEARCH ISSUES A generic approach to evaluate the added value of realistic
The key purpose of realistic visualizations is to reduce the visualizations is to compare the prototypical extreme va-
amount of abstract symbolization. This way, map use is riants of visualizing the real-world (as described in the pre-
reduced to “looking rather than reading” [15]. In the car, vious section) with regard to their support for the driver.
realistic views could potentially make visual processing Based on these considerations, the following research ques-
easier and enable better concentration on the driving task. tions have been formulated:
Inferring from earlier results in cognitive psychology [5],
one might argue that the more realistic a virtual representa- 1. To what extent does any visualization of the real-
tion (of the road situation) is presented, the easier a map- world support drivers while following safety-
ping to the real situation based on perceptual features is related HMI recommendations, as compared to no
possible. Especially in complex driving situations, this real-world visualization?
could result in increased driving safety. Furthermore, a 2. To what extent does a realistic visualization sup-
higher realism of visualizations may promise higher usage port drivers while following safety-related HMI
satisfaction and appeal to customers than standard visuali- recommendations, as compared to a conventional
zations. visualization?
On the other hand, also problematic aspects of realistic vi- 3. To what extent does an interleaved presentation of
sualizations in cars need to be taken into account. First, it a conventional visualization (in non-critical situa-
may take more time to identify task-relevant information in tions) and a realistic visualization (in critical situa-
realistic displays, which would limit a faster mapping be- tions) support drivers while following safety-
tween virtual and real environment. This may lead to se- related HMI recommendations, as compared to the
rious restrictions and poor compliance with international continuous display of a realistic visualization?
car safety standards, such as the ‘European Statement of
Principles’ [6]. Furthermore, it is not clear which features 4. When considered in isolation, how are the consti-
of realistic views really help the user to match with the real tuents of realistic visualizations (map representa-
road situation. They could as well just be “eye candy”: nice tion and perspective) likely to support the driver?
to look at, but without any major safety benefit. 5. Does the urgency of the safety scenario influence
The general challenge in this regard is therefore to identify the extent to which a realistic visualization can
the safety impact of an increased realism of visualizations support a driver in following an HMI recommen-
in selected realtime safety-related traffic telematics scena- dation?
rios. When designing the HMI for in-vehicle safey informa-
tion systems, user interface for such systems, a basic ques- METHOD
tion could be whether or not the visualization capabilities of We conducted a driving simulator study with potential fu-
today’s in-car information systems should be exploited. ture users of advanced traffic telematics systems.

Recommendations provided in these scenarios vary in their Participants


level of urgency. When driving along a prescribed route 28 participants, 16 male and 12 female, took part in the
without any incidences, the information by the HMI must study. Their mean age was 32.7, ranging from 18 to 59
be monitored from time to time, but also in this case urgent years. 70% were frequent drivers. 60 % of the users owned
reactions are not necessary. However, when the system cal- a navigation device. As remuneration, each subject received
culates a detour (e.g., due to a congestion), the driver needs a voucher for a consumer electronics store.
to be notified and given detailed instructions on how to
change the directions. Recommendations get more urgent Simulation environment
when a user is asked to use a certain lane on the road, due A simulation instead of a field environment has been cho-
to temporary roadwork. sen, because the investigated scenarios would be harmful
for the involved drivers and impracticable with the current-
Unfortunately, existing research studies on the effect of
ly installed telematics infrastructure. A number of simula-
visual presentations in the car may not fully apply, as these
tion environments for driving have been developed, many
are about textual, iconic and simple spatial representations.
of them dedicated for the purpose of in-car HMI evaluation.
Realistic representations have not yet been subject to rigor-
The fidelity of these simulators varies strongly, ranging
ous examination in the open research community. While
from highly advanced moving-base simulators involving
there are some approaches on the use of augmented head-up
physical motion to single computer screens with game con-
displays, especially the use of reality views on head-down
troller settings [3,19]. However, to the best of our know-
displays are only beginning to be researched (compare
ledge, not even the most advanced prototypes provide dedi-
[11]).
cated simulation features with regard to realistic HMI vi-
sualizations.
Scenarios Both the “outside reality” (windscreen simulation) and the
HMI display are rendered with the same rendering engine
Driving Simulator 3D Visualization IVIS and based on the same spatial model. This architecture
Presentation and Rendering Presentation
enables systematic and fine-grained variations of scene re-
presentations on the HMI display. Both the windscreen and
Windscreen Simulation HMI Device the HMI simulations were rendered in realtime at 25 fps.
Our laboratory simulator was running on a Windows PC
with a powerful graphics adaptor. Users were sitting on a
ACHTUNG BRAND IN TUNNEL 2,8 km
driving seat and in front of a dashboard, both taken from a
real car. They were operating a steering wheel, gas and
NICHT IN TUNNEL EINFAHREN! 0h01‘
1:09
FOLGEN SIE DEM PFEIL!

break pedals (the clutch was not used; automatic transmis-


sion was assumed). The windshield view was displayed by
+ Scene Representation Overlays
a large 42” TFT screen, covering about 75 degrees of par-
Cockpit controls (e.g. Arrows)
(Pedals, Steering Wheel) ticipants’ field of view. Our setup also follows the guide-
Texts, Icons, lines for the placement of personal navigation devices [7]:
Controls the in-car information system was modeled by a second 8”
TFT screen (landscape format) mounted to the lower left
Figure 2: Realistic visualization rendering environment side of the windscreen, next to the simulation car’s left A-
pillar.
To overcome this shortcoming, we have developed a versa-
tile simulation environment that employs highly detailed The screen layout of the HMI was designed to be consistent
geospatial models of current existing and future highways with contemporary in-car navigation systems (compare ),
(see Figure 2). These models were originally created for involving a large ‘map area’ to represent the outside world,
construction planning and for the visualization of construc- with an information area below that displays textual and
tion alternatives to facilitate public discussion. iconic recommendations, as well as the current speed. At
the very bottom, navigation elements were displayed, which
As such, the models may guarantee a higher validity than in
were not active, as their functionality was not needed for
usual tests with abstracted highway simulations, due to a
the experiment.
higher degree of user familiarity with the “look-and-feel” of
a country’s (here: Austria) road infrastructure.

Application scenario Urgency Possibilities for realistic visualization


Navigation: Driving on the highway and following the Low Realistic representations of complex junctions,
instructions of the navigation system. Not far from a highlighting of dynamic route changes, e.g.
highway exit, a new route is recommended, requiring the integrated into today’s navigation HMI styles
driver to react and leave the highway.
Lane utilization: Driving on the highway and following Medium Mark lane utilization information directly on
the instructions of the HMI. Suddenly, the system warns the scene representation, with an overlayed
the driver of an accident ahead and instructs the driver to route projection
use the right (or left) lane.
Urgent incident: Driving on the highway and following High Highlight where to drive or stop in urgent cas-
the instructions of the navigation system. Suddenly, the es, with a an overlaid route projection and an
system warns the driver of an accident behind the curve arrow indicating the destination.
and instructs the driver to stop on the emergency lane at a
certain position.

Table 1: Typical application scenarios for safety-related traffic telematics services, their urgency and related oppor-
tunities for realistic visualization.
Experimental application scenarios Experimental visualization styles
The test users were exposed to three safety-related applica- Four visualization variants were specified: ‘none’ (as a con-
tion scenarios, as specified in Table 1: navigation with un- trol condition), ‘conventional’, ‘realistic’, and ‘interleaved’.
expected route change, lane utilization, and urgent incident Each visualization style was then realized for the three ap-
warning. The dramaturgical design of these scenarios fol- plication scenarios, resulting in 12 different combinations.
lowed a three-phase structure: the initial phase, the critical Figure 3 illustrates the realization of the conventional and
moment, and the final phase. the realistic view for navigation, lane change, and urgent
incident. In the ‘none’ variant, the map area was filled with
In the initial phase, users were driving for about 1 km along
grey color. In the interleaved variant, the conventional view
the highway, following the routing instructions of their in-
was in the initial and the final phase, and the realistic view
car information system. Then, when entering a predefined
in the critical phase.
zone, a warning was presented to the user, consisting of a
short audio signal, a text message and an icon (see Figure
Procedure and measures
2). The first line of the text message recommends an action The overall duration of test (from the participant’s entering
to the driver, together with an indication of distance. The and leaving the test room) was approximately two hours. A
second line provides information on the cause for the given test assistant was present to conduct the interview, to pro-
recommendation. vide task instructions, and to note specific observations
The critical phase was between the point of the warning made during the experiment. Each individual test consisted
reception and the point at which the action requested in the of an introduction phase, in which the test persons were
respective scenario (the respective turn, lane selection, or briefed about the goals and procedure of the test, and data
emergency stop) should have been performed at the latest. on demographics and previous experiences was gathered.
Then, the participants were enabled to familiarize them-
The final phase mostly served as a way to let users natural- selves with the driving simulator and with the HMI. To
ly finish their driving task. For example, in the lane change minimize a potential habituation effect, it was assured that
scenarios, the driver passed the partly blocked road section the users were informed about and had actively used each
and was then told about the scenario end. visualization and each application scenario. The subsequent
phases of the study will now be described in detail.

Safety Scenario Conventional HMI View Realistic HMI View


Navigation: route following with un-
expected route change.

Lange utilization: use a certain lane


because of an accident ahead.

Urgent incident: stop at a certain lane


at a certain position.

Figure 3: HMI Screenshots from the IVIS simulation for different application scenarios
Experimental part RESULTS
In this section, the results from the experimental part, the
The two independent factors of the experimental part were
post-experimental inquiry, and the comparative inquiry are
visualization and safety scenario. Each participant was driv-
described. The statistical analysis was based on the data-
ing 12 conditions, the product of 4 of scenario types and 3
from 28 participants. Mean differences were calculated with
of visualization variants. (This way, participants encoun-
non-parametrical techniques for dependent samples (Fried-
tered every possible combination of visualization and sce-
man and Wilcoxon tests). In all figures, the error bars
nario types). In order to avoid order effects, the sequence of
represent 95% confidence intervals. Throughout the meas-
conditions was varied systematically. At the start of each
ures used in the study we did not find age- or gender-
condition, the car was “parked” at the emergency lane of a
specific differences.
highway. The participant was instructed to drive along the
highway and to follow the instructions on the HMI as accu-
Experimental part
rately as possible. Task completion
In the critical phase of each driving condition, the experi- Our results are characterized by a very high task completion
menter assessed task completion. Task completion was giv- ratio across all test conditions: 99.4% of the navigation,
en if the subjects generally followed the system instructions lane utilization and urgent stop recommendations were gen-
(taking the right exit, selecting the right lane, and emerging erally followed. We found no significant differences be-
stop on the right lane). Furthermore, the test facilitator tween the different visualization styles
noted incidents that occurred during the driving situation.
Post-condition Questionnaire
To capture the immediate driving- and HMI-related impres-
sions, the participants filled out a questionnaire after each Figure 4 presents an overview of participants’ mean ratings
of the 12 conditions. The first question aimed at under- of the four different visualization styles on three scales:
standing the general support perceived in the driving situa- perceived general support in the respective driving situa-
tion. The two subsequent questions were designed to under- tion, support for identifying the relevant details and match-
stand the visualization’s support for identifying the driving- ing with the outside real world. On all three scales, partici-
task relevant information (a potential problem area of de- pants rated those visualizations without a real-world repre-
tailed realistic visualizations) and its support for finding sentation worse than all others. Participants consistently
matches between the road situation and the HMI display (a judged the realistic view as more supportive than the con-
potential advantage of realistic visualizations). ventional view (all differences significant, p < .05). On
none of the three scales, any difference could be found be-
Final interview tween the realistic and the interleaved visualizations.
The final interview aimed at gathering the participants’
overall reflections of the driving situations experienced in
the different conditions. The first two questions directly
addressed the potential strengths by asking: “Did realistic
visualizations support you in finding accordances between
the road situation and the HMI display?”, and the weak-
nesses “Did realistic visualizations deter you from identify-
ing the task-relevant details in the necessary time span?”
Due to the realistic nature of the test, the 12 visualization
variants tested represent specific prototypical combinations
of constituents. In order to also obtain a rough understand-
ing of the impact of the constituents of realistic visualiza-
tions in isolation, a systematic comparison was performed,
based on an illustrated questionnaire. Due to their impor-
tance, ‘map representation’ and ‘viewing perspective’ were
selected as the constituents of interest in the interview. Re-
garding the ‘viewing perspective’, the users were shown
Figure 4: Mean post-condition ratings on the visualization
three clusters of screenshots of 2D and 3D views in naviga- styles, with regard to perceived general support in the driving
tion, lane change and urgent incident warning scenarios, situation, the support for identifying relevant details and for
one cluster only including top-down, the other only bird’s matching the virtual representation with the real-world
eye, and the third only egocentric perspective. The partici-
pants were then asked to provide a ranking on the three
different perspectives, with regard to their assumed support
in the driving situations. The same principle was applied for
‘map representation’.
Figure 5 again shows the perceived overview support in the Realistic: In the realistic view conditions, we noticed hat
driving situation, but here separated by the three safety sce- users tried to follow the indicated arrow as closely as possi-
narios. The ratings are mostly consistent throughout all ble. In the urgent incident scenario, this attitude sometimes
safety scenarios. A notable exception was observed when resulted in driving significantly slower to exactly stop at the
looking at the difference between the conventional and the indicated location. However, this behavior was mostly ob-
realistic view: this was perceived as significantly lower served the first and second time a realistic view was used.
rated in the urgent incident and lane utilization scenarios,
Interleaved: The switch from conventional view to the rea-
but not in the navigation scenario (p<.001, p<.004, p=082).
listic view was noticed well by the drivers. In general, the
When directly comparing the rating values for the conven-
observations made in the critical moment were similar to
tional visualizations between the different scenarios, the
the ones made for the realistic visualization.
conventional visualization was rated better in the navigation
than in the urgent incident scenario (p<.01). The mean rat- Participant impressions
ings in the lane utilization scenario also tended to be lower,
The participants‘ comments provided after using the visua-
but the difference did not reach significance (p = .065).
lizations were as follows:
No visualization: The vast majority of users stated that
without a real-word visualization it was difficult to follow
the lane utilization and urgent incident recommendations on
the HMI. They were basically regarded as a standard fea-
ture for every form of navigation devices.
A few participants stated that in principle it could suffice to
provide safety warnings without a real-word representation,
but that in this case a combination with audio output would
be necessary. Furthermore, they wished the icon placed at a
more prominent place on the screen (interestingly, many
participants only took notice of the icon in the no-
visualization condition).
Conventional: The majority of participants complained
about the experienced difficulties in interpreting the over-
layed lines and icons over the schematic 2D map, when
Figure 5: Mean post-condition ratings on the visualization following utilization and emergency stop recommendations.
styles, with regard to perceived general support in the driving Furthermore, users of latest navigation systems criticized
situation, separated by the three safety scenarios.
the relatively low number of displayed details on the map
and the lack of a car position item. What was often positive-
Observations
ly valued was the good foresight provided by the bird’s eye
perspective.
The main observations of incidents that had been noted
during the test conditions were as follows: Realistic: Many participants stated that they felt safe when
using the realistic visualization. A very often mentioned
No visualization: When being confronted with on-screen reason was that the “1:1” match with the outside world im-
navigation instructions, drivers did not encounter notable proved orientation. They would have liked to see even more
problems. In the two other scenarios, subjects often ap- spatially-referenced annotations, such a blocking icon di-
peared to be confused about how they should behave cor- rectly placed on the respective lane. The display of many
rectly. They were unsure about where exactly to change details was not seen as distractive from the relevant infor-
lanes or where to stop (but as indicated above, the vast ma- mation. The few critical remarks were related to less fore-
jority stopped at the right lane). Several users also got noti- sight, as compared to the conventional view.
ceably excited after receiving a warning and very attentive-
ly looked onto the road situation, to look for the announced Interleaved: Participants provided similar comments with
incident. regard to the interleaved as to the realistic view. The switch
was not seen as an added value by the participants. Many
Conventional: During navigation, no notable problems were stated that they would have preferred a continuously dis-
observed. However, in the other two scenarios many users played realistic view.
were unsure about where to stop or which lane to take. This
was obviously due to the rather schematic visualization on
the 2D map.
Final Interview CONCLUSIONS
The participants widely stated that realistic visualizations In the following, the results are summarized with regard to
had enabled them to find a match between the HMI and the the research questions:
real road situation (mean rating of 16.11 on a 20 point
scale, SD = 3.8). Similarly, many participants stated that Q1: Real-world visualization in general (baseline)
realistic visualizations had not hindered them in finding the The results suggest that an HMI is perceived to support a
relevant details on the screen display (mean rating of 5.5 on driver better in following safety-related recommendations if
a 20 point rating scale, SD = 4.4). it displays a real-world visualization, as compared to a pure
textual and iconic message. A map appears to be regarded
Comparative inquiry
as a standard HMI feature, and it helps to better orientate
Figure 6 shows the ranking results from the comparative oneself. The added value of such a real-world representa-
inquiry on the perspectives top-down, bird’s eye, and ego- tion is consistently supported by user ratings and com-
centric, with regard to their assumed support in the driving ments. On other hand, our task completion results show that
situations. Overall, the top-down perspective was rated sig- the pure display of text and an icon obviously suffices to
nificantly lower than the other perspectives (for both correctly follow a recommendation, at least in low-
p<.001). However, the ratings for bird’s eye and egocentric complexity driving situations.
perspective did not differ significantly from each other, the
navigation scenario again differed from the other two sce- Q2: Realistic vs. conventional visualization
narios: here the bird’s eye view was preferred to the ego- We found that realistic visualizations is perceived as an
centric perspective (Z=-2.05, p<.05). added value when presenting safety-related recommenda-
tions on the HMI, as compared to conventional visualiza-
tions. This is a result that was not easily predictable: in
principle, the many ‘irrelevant’ details shown in realistic
visualizations could as well have been assumed to be dis-
turbing. Also we found that realistic views do not decrease
task completion, at least in simple scenarios.

Q3: Interleaving conventional and realistic visualization


Switching between a conventional visualization (shown in
non-critical situations) and a realistic visualization (shown
in critical situations) does not provide an added benefit, as
compared to the continuous display of a realistic visualiza-
tion.

Q4: Constituents of realistic visualization


Regarding the main constituents of realistic visualizations,
we found that, when considered in isolation, 3D represent-
tations are preferred to schematic 2D representations on the
HMI. Regarding the viewing perspective, the top-down
Figure 6: Mean interview rankings on perspectives, with re- alternative appears to be not well suited for in-vehicle safe-
gard to perceived general support in the driving situation, ty information systems. This is not only based on the results
separated by the three scenarios. comparative inquiry results, but also by frequent comments
throughout the test conditions.

The comparative inquiry on the map representation re- Q5: Influence of safety scenarios
vealed a strong preference of 3D over 2D (77.4% vs. Throughout the study, we found that drivers felt even more
22.6%; Z=-3.49, p<.001). Again, the navigation differed supported by realistic visualizations when they had to fol-
from lane utilization and urgent incident: here a difference low urgent and non-standard instructions in the urgent inci-
between 3D and 2D could not be found. dent and lane utilization scenarios. While drivers in prin-
ciple followed the general instructions correctly, they often
felt insecure when choosing the right lane or place to stop.
DISCUSSION
encountered as downgraded or simplified implementations.
The experiment presented in this paper is the first compre- For example, visualizations currently marketed as “reality
hensive evaluation of the suitability of different visualiza- views” actually still have many aspects of schematic repre-
tion styles and their constituents for safety-related in-car sentations: in many cases they do not display the current
information applications. The goal was to overcome the situation, but only display 3D templates or 2D images of
current scarcity of prescriptive knowledge on this important prototypical junctions. To advance towards safe and satis-
and safety-relevant topic. factory realistic visualizations in the car, the results clearly
encourage the scientific advancement and understanding of
Our simulator study results show that realistic HMI visuali- the design space for realistic visualizations.
zation styles have a significant positive impact on the user
experience. In comparison to other visualization styles, rea- ACKNOWLEDGMENTS
listic views provided added value in terms of driver support This work has been carried out within the projects
and perceived safety, beyond a purely aesthetic function as REALSAFE and U0, which are financed in parts by
visual enhancement or “eye candy”. These utilitarian bene- ASFiNAG AG, Kapsch TrafficCom AG, nast consulting,
fits materialized particularly in more acute safety-critical the Austrian Government and by the City of Vienna within
scenarios which required effective and timely action by the the competence center program COMET.
driver. Furthermore, we did not find any evidence for nega-
tive impact of realistic views on participants, e.g. in terms REFERENCES
of diminished task-performance, distractions by visual clut- 1. Allen, R. W., Cook, M. L., Rosenthal, T. J. (2007). Ap-
ter or reduced safety. Our findings may thus challenge con- plication of driving simulation to road safety. Special Is-
ventional recommendations which postulate the simplifica- sue in Advances in Transportation Studies 2007.
tion and reduction of visual HMIs designs [6]. In the light 2. Böhm, M., Fuchs, S., Pfliegl, R. (2009). Driver Beha-
of our results, the application of realistic views in safety vior and User Acceptance of Cooperative Systems based
contexts should be considered again on a broader level. We on Infrastructure-to-Vehicle Communication. Proc. TRB
therefore suggest further systematic research on the merits 88TH Annual Meeting.
and demerits of realistic visualizations for in-vehicle navi-
gation and safety applications. 3. Burnett, G. (2008): “Designing and Evaluating In-Car
User Interfaces”. In: J. Lumsden (Ed), Handbook of Re-
Our results also show that compared to traditional naviga- search on User Interface Design and Evaluation for Mo-
tion, safety scenarios have different properties, and conse- bile Technology. Idea Group Inc (IGI), 2008
quently different visualization requirements: in the naviga-
4. COOPERS project: http://www.coopers-ip.eu/
tion scenario, users saw no additional benefit of realistic
views over conventional, schematic ones. However, with 5. Crampton, J. (1992) A cognitive analysis of wayfinding
rising urgency of the scenarios, participants found realistic expertise. Cartographica 29 3: 46-65
views to be significantly more useful. This shows not only 6. European Commission. Commission Recommendation
that reality views provide tangible benefits for the driver, of 22 December 2006 on safe and efficient in-vehicle in-
but also that safety-related HMI represents an application formation and communication systems: Update of the
class distinct from pure navigation, requiring dedicated user European Statement of Principles (ESOP) on human
experience research. machine interface. Commission document C (2006)
Our study participants were only exposed to relatively sim- 7125 final, Brussels.
ple environments (highway) and tasks (such as stopping at 7. Janssen, W. (2007). Proposal for common methodolo-
the emergency lane). This may explain the observed insen- gies for analysing driver behavior. EU-FP6 project
sitivity of users’ (near to perfect) task completion rate to HUMANIST. Deliverable 3.2.
visualization style. Thus, our results should not be genera- 8. Levy, M. Dascalu, S., Harris, FC. ARS VEHO: Aug-
lized towards more challenging high complexity scenarios. mented Reality System for VEHicle Operation. Proc.
Under high strain and cognitive load, users might change Computers and Their Applications, 2005.
preferences and perform better with other or even without
HMI visualizations. Future studies should extend and vali- 9. Martens, M.H., Oudenhuijzen, A.J.K., Janssen, W.H.,
date the design space towards such higher complexity de- and Hoedemaeker, M. (2006). Expert evaluation of the
mands. TomTom-device:location, use and default settings. TNO
memorandum. TNO-DV3 2006 M048.
In this study, we were deliberately interested in understand-
10. McDonald, M., Piao, J., Fisher, G., Kölbl, R., Selhofer,
ing the effects of certain prototypical extreme variants (no
A., Dannenberg, S., Adams, C., Richter, T., Leonid, E.,
visualization, conventional, realistic and interleaved views).
Bernhard, N. (2007). Summary Report on Safety Stan-
Obviously, further visualization variants are possible in this
dards and Indicators to Improve the Safety on Roads,
context. Most importantly, we want to stress the fact that
Report D5-2100. COOPERS project.
these three styles represent idealized variants highly suita-
ble for experimental testing, but which in practice are rather
11. Medenica, Z., Palinko, O., Kun, O., and Paek, T. (2009). 16. REALSAFE project:
Exploring In-Car Augmented Reality Navigation Aids: https://portal.ftw.at/projects/all/realsafe/
A Pilot Study. EA Ubicomp. 17. Ruddle, R.A., Payne, S. J., Jones, D. M. (1997). Navi-
12. Mobilizy: http://www.mobilizy.com/drive gating buildings in “desk-top” virtual environments:
13. Narzt, W., Pomberger, G., Ferscha, A., Kolb, D., Mül- Experimental investigations using extended navigational
ler, R., Wieghardt, J., Hörtner, H., Lindinger, C (2006).: experience. Journal of Experimental Psychology: Ap-
Augmented reality navigation systems. Universal Ac- plied 3 2: 143-159.
cess in the Information Society 4(3): 177-187. 18. TomTom HD Traffic: www.tomtom.com
14. Navigon: www.navigon.com 19. Wang, Y., Zhang, W., Wu, S., and Guo, Y. (2007). Si-
15. Patterson, T. (2002). Getting Real: Reflecting on the mulators for Driving Safety Study – A Literature Re-
New Look of National Park Service Maps. Proc. Moun- view. In: R. Shumaker (Ed.): Virtual Reality, Proc. HCII
tain Cartography Workshop of the International Carto- 2007, LNCS 4563, pp. 584–593, 2007.
graphic Association; 20. Vehicle Infrastructure Integration (VII) initiative:
www.mountaincartography.org/mt_hood/pdfs/patterson http://www.vehicle-infrastructure.org
1.pdf.
An Experimental Augmented Reality Platform
for Assisted Maritime Navigation
Olivier Hugues Jean-Marc Cieutat Pascal Guitton
MaxSea – ESTIA Recherche ESTIA Recherche University Bordeaux 1 (LaBRI) & INRIA
Bidart France Bidart, France Bordeaux, France
+33 5 59 41 70 96 +33 5 59 43 84 75 +33 5 40 00 69 18
o.hugues@net.estia.fr j.cieutat@estia.fr guitton@labri.fr

ABSTRACT These aims have led maritime software publishers to develop


This paper deals with integrating a vision system with an efficient increasingly sophisticated platforms, offering very rich virtual
thermal camera and a classical one in maritime navigation environments and real time information updates. There are many
software based on a virtual environment (VE). We then present an companies on the embedded maritime navigation software market.
exploratory field of augmented reality (AR) in situations of They can be separated into two categories. The first part includes
mobility and the different applications linked to work at sea those, which develop applications enabling embedded sensors to
provided by adding this functionality. This work was carried out be taken advantage of (radar, depth-finder, GPS, etc.), such as
thanks to a CIFRE agreement within the company MaxSea Int. Rose Point [4], publisher of Coastal Explorer software (Figure 3)
and MaxSea International [14], publisher of the MaxSea
TimeZero software (Figure 2). Other companies offer hardware
Categories and Subject Descriptors platforms in addition to their software applications, like Furuno
H.5.1 [Information Interfaces and Presentation]: Multimedia [10] (Figure 4) and Garmin [11] (Figure 1).
Information Systems — Artificial, augmented and virtual realities;
I.3.7 [Computer Graphics]: Three-Dimensional Graphics and
Realism—Virtual reality.

General Terms
Experimentation, Human Factor, Security.

Keywords
Augmented Reality, Mixed Environment, Image Processing, Figure 1. Garmin Figure 2. MaxSea
Human Factor, Combining exteroceptive data.

1. INTRODUCTION
The continuous progress of new technologies has led to a
proliferation of increasingly smart and powerful portable devices.
The capabilities of devices on board a ship now enable crews to
be offered a processing quality and volume of information until
now unrivalled. In a hostile environment such as the sea, users
need a relevant flow of information. Computer assisted vessel
management is therefore increasingly widespread and Figure 3. Coastal Explorer Figure 4. Furuno
digitalisation is an inescapable development. The three main aims
are as follows:
1. Improved safety (property, environment and people) These environments enable navigation to be greatly improved by
2. Increased gains from productivity (fishing, etc) only showing the necessary information, eg. by combining
satellites photos of the earth and nautical charts like PhotoFusion
3. The representations required for environmental control in Figure 5 proposed by MaxSea [14].
(orientation, location and direction)

Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, to republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
Augmented Human Conference, April 2–3, 2010, Megève, France.
Copyright © 2010 ACM 978-1-60558-825-4/10/04…$10.00.
Copyright 2004 ACM 1-58113-000-0/00/0004…$5.00.
Mixed Reality

Real Augmented Augmented Virtual


Environment Reality Virtuality Environment
(RE) (AR) (AV) (VE)
Figure 6. Mixed Reality Continuum [15]

In our context, users on board vessels find themselves


alternatively in three of the four situations of Mixed Reality:
1. The Real World in which they are naturally immersed.
2. Augmented Reality by adding useful “field specific
information” since adding our detailed vision system later
on.
Figure 5. PhotoFusion: mixture of satellite photos and 3. Virtual Environment adapted to the use explained hereafter.
nautical charts (MaxSea) [14]
2.1 The navigation tool
The main navigation tool centralizes the data produced by all the
There are many different types of data whose relevance depends embedded hardware (radar, depth-sounder, GPS, etc) and
of the context in which they are used. Furthermore, the loss of combines them with nautical, coastal or river chart databases.
reference points due to bad weather also has to be taken into
account when weather conditions deteriorate (mist, fog, rough sea,
etc). Emotional reactions are even more intense in extreme
conditions whether at sea or in the mountains.
Firstly, we describe how we incorporated a vision system into
maritime navigation software and the various ways of remotely
controlling the camera. Consequently, we are able to increase the
video flow using information from the assisted maritime
navigation application. We used work on the LookSea project by
“Technology System, Inc” [20] as a basis and did not propose a
system with a Head Mounted Display (HDM). The proposed
augmented reality functionalities open up new exploratory fields
for GPS like handling virtual entities in real life.
On the other hand, the concept of augmented reality itself is not
only purely technological: it is also based on aspects linked to
human perception, which also depends on the genetic and cultural
heritage of individuals [6]. Its is therefore necessary to take into
account the human factor and stress caused by difficult sea
conditions which are likely to increase the risk of accidents [12]
and adapt to both users and context of use [18, 7].
This paper presents one of the platform's major applications, at the Figure 7. Information flow
crossroads between various different technologies, ie. following
targets.
The navigation tool informs the user of the state of the
2. VIRTUAL ENVIRONMENT environment thanks to a visualization, which combines this
The aim of virtual reality is to provide a person (or people) with a different information according to the user's commands via the
sensory-motor and cognitive activity in a digitally created Human Computer Interface (Figure 7). Although based on 2D
artificial world, which may be imaginary, symbolic or a nautical charts, current software very often proposes 3D views in
simulation of certain aspects of the real world [16]. Moreover, in order to increase realism and thus optimizes the orientation
[15], in 1994 Paul Milgram defined the concept of Mixed Reality required by the user. In this tool, we commonly find the following
(Figure 6) which provides a continuum linking the real world objects, directly derived from the field of navigation:
(RW) to Virtual reality (VR) with Augmented reality (AR), and 1. WayPoints (WP): Object representing a buoy signalling a
Augmented Virtuality (AV). specific geographical position
2. Route: Succession of points that the user needs to plan a
route
3. Trace: Succession of points where the vessel has already area in question, as illustrated in Figure 9, by simply clicking. We
sailed have called this functionality a “Clicked Reality” after the
“Clickable Reality” highlighted by Laurence Nigay in [13].
4. Targets: Two major families of targets: ARPA and AIS
It is possible to follow a target (whether moving or not). We shall
o ARPA: Automatic Radar Plotting Aids (moving or detail this functionality in Section 5. Once the user has chosen the
not) from an eco-radar.
target, the camera's orientation can be locked on this target from a
o AIS: Automatic Identification System: System for contextual menu in the virtual environment (Figure 9) so that it
automatically exchanging messages between does not allow the target to leave its field of vision. Technically
vessels. this locking works by updating the camera's orientation in real
time according to the target's position as well as the boat's position
o Man Overboard. and orientation.
All these objects are present in navigation software and are
displayed on charts. In Figure 8, we can see a Vessel (red), two
WayPoints (yellow with a black star) and a Route (red) added to
the chart's display. These objects, drawn in vectorially, can be
moved by the user (except the boat whose location depends on the
GPS position).

Figure 9. Left : Radar + Boat + WP. Right : Contextual Menu

The third camera-piloting mode is based on using WayPoints,


which represent a position in the real environment. WayPoints are
created by users, who can modify their position (amongst other
things) as they please. If the user decides to lock a WayPoint type
target with the camera, it will then be possible to pilot it by
moving the WayPoint in the virtual environment.
Finally, the fourth possibility for piloting the camera is the
Figure 8. View of the Virtual Environment automatic supervision mode illustrated in Figure 10 where an
object floating on the surface of the water is visible thanks to the
camera's thermal mode. This information, once processed, may
trigger an alarm in the navigation platform as the SAFRAN
3. PILOTING THE VISION SYSTEM Unidentified Floating Object Module Detection [24].
We propose in a single and unique application, two distinct, but
complementary presentations (AR and VE) for the user.
We placed a motorized video camera on a vessel. This camera is
motorized on two axes. It has a 360° Azimuth rotation axis and an
elevation rotation axis of approximately 90°. It includes a dual
axis gyroscope to compensate for the boat's movements caused by
floating. This camera also has the particularity of having classical
black and white vision mode (day-light/lowlight) making it
possible to see during daylight and with little exposure and an
ultra sensitive thermal vision mode in the infrared medium
(wavelength 2.5 - 25 µm). The video flow is incrusted in the
software's main screen as can be seen in Figures 14 and 15.
Figure 10. Supervision enabling obstacles to be highlighted
The motorized camera can also be piloted in four different ways.
The first consists of using explicit commands. When the cursor is 4. EXAMPLE OF ENRICHMENT
placed in the video flow window a menu appears enabling the We propose using the video flow to enrich information from the
user to act directly on the camera’s axes as well as the zoom. assisted maritime navigation application's virtual environment.
The second possibility is implicit piloting, which does not require This is an augmented reality system with video monitor (indirect
the camera's degree of freedom to be directly handled. From the vision), where the virtual world is augmented by the video flow
chart's contextual menu (or any object drawn on the screen) it is (fifth category of taxonomy introduced by Milgram in 1994 [15]).
possible to ask the camera to point in the direction of the object or Technology System's LookSea project [20] underlines the fact that
the current state of direct vision technology is not compatible with
use at sea especially because of the difficulty caused and the
reduction in the field of vision. It is however worth observing that
the camera is not fitted to the visualizations device, but placed
somewhere on the vessel since the camera's position with regard Entrance to the marina
to the GPS antenna is known. As we are in a mobile context, the
increased video flow faces certain difficulties [23] in calculating
the camera's exposure. It is impossible to equip all the real world
elements with artificial markers [21, 19, 3]. We must therefore
implement a calculation for the camera’s exposure using a
"markerless" system. Such techniques use existing natural
characteristics in the real scene such as corners, contours and line Figure 13. Recalled synthetic information enables the scene to
segments [22, 5, 9] invariant to certain transformations. We have be reed more effectively
opted for integrating a georeferenced video flow in a virtual
environment where all the elements are themselves georeferenced
as in [17, 1]. 4.3 Combining Real & Virtual
Using Fuchs & Moreau's functional taxonomy [16], we shall show This functionality represents virtual objects added to the real
in four points how our assisted maritime navigation augmented scene, or replacing real objects by virtual objects like 3D
reality platform relates to admit concepts and how it proposes new representations of the coast or seabed. In addition to the problem
exploratory fields. of aligning two worlds we are also faced with the problem of
masking the real world with the virtual world or vice-versa. This
problem of concealing certain elements can be partially solved
4.1 Documented Reality thanks to image segmentation techniques (thresholding, cutting
Like Fuchs & Moreau's Documented Reality functionality, our into regions, etc), enabling the sea and the sky to be dissociated
video flow can be enriched with information identifying what is from other elements in the real scene. Given field specific
visible from the camera, but without alignment between the real hypotheses, priority is given to adding virtual elements.
and virtual worlds (Figure 11). We can see that this functionality
does not respect the third property of augmented reality systems
proposed by Azuma [2]. In Figure 12 we illustrate tide
To align the two worlds, we use an inertial unit located on the
information where the height is represented by filling a small
vessel that takes into account the boat's roll, pitch and yaw. This
“capsule” and the tide's movement represented by a directional
unit has a three-axis accelerometer, a three-axis gyroscope and
arrow.
two distant GPS receivers whose phase difference enables the
vessel's angular degrees of freedom to be calculated. The boat's
movements are thus compensated by directly applying these
angular displacements to the 3D environment virtual camera.

4.4 Virtualized Reality


We go from one environment to another. On the one hand, our
platform consists of a VE fully representing the real environment
(charts, weather, sea-bed, coasts, nearby vessels, etc), which can
Figure 11. Simple non-recalled synthetic information: be used to substitute reality. On the other hand, we also have the
Documented scene possibility of increasing the video flow by virtual vectorial
objects, which have no physical reality (WayPoints, Routes,
Trace, Buoys, etc), like in Figure 15.

4.5 Visualization Modes


Our platform's design enables the user to get two visualization
Figure 12. Capsule showing the height of the tide and its modes. The first visualization mode integrates a thumbnail image
dynamics of the augmented video flow in the virtual environment as in
Figure 14. The second visualization mode, as presented in
4.2 Reality with Augmented Comprehension (or Figure 15 enables the augmented video flow to be visualized in
Visibility) the main part of the screen as well as a thumbnail image
This involves incrusting semantic information such as the function representing the virtual environment. We also allow users to
or referencing real objects on the real scene's images [8]. We use modify the video flow transparency value. This enables the virtual
the georeferenced data of objects to inform users of their position environment's chart to show through the video flow. In Figure 16,
like in Figure 13, for example, where we can see the information the top image's transparency value has been changed so as to
showing where port entrances are located with an onboard view. show the chart.
In this functionality, it is possible to take into account the thermal
part of our vision system whose flow is also augmented, as
presented above in Figure 10.
5. CONCLUSION
In a context of major developments in Human Computer Interface
(HCI) in mobile systems, we explore the possibilities offered by
creating a new mixed environment further to integrating in a
single application, a rich virtual environment and an augmented
reality environment. We try to satisfy the user's request that varies
according to sailing conditions. The functionalities provided by
augmented reality must therefore differ according to people and
weather conditions, hence the need to provide contextual
information.

6. FUTURE WORK
Given the exploratory nature of this platform, we consider several
fields of work important. These can de divided into two
Figure 14. VE augmented video flow thumbnail image categories. The first refers to the technology and the second
relates to the human factor.
From the technological point of view, aligning the real world and
virtual world remains a challenge, which the boat's movements do
not facilitate. The precision of GPS data and recognizing shapes
in image analysis are complex issues which still need to be dealt
with.
Concerning the human factor, we propose determining the extent
to which this platform helps users to satisfy their orientation
needs. In which conditions is it more natural to use a VE or AR to
navigate and to which extent is it possible to contextualize the
information?
Secondly, we would like to extend our platform to enable it to be
generalized as an AR platform with a VE, which can be, used both
at sea (on boats) and on land (by car or by foot).

Figure 15. VE thumbnail image in the augmented video flow 7. REFERENCES


[1] Amir H. Behzadan, and Vinett R. Kamat. Georeferenced
Registration of Construction Graphics in Mobile Outdoor
Augmented Reality. Journal of Computing in Civil
Engineering 21, 4 (July 2007), 247–258
[2] Azuma, R., Baillot, Y., Behringer, R., Feiner, S., Julier, S.,
and MacIntyre, B. Recent advances in augmented reality.
IEEE Computer Graphics and Applications 21, 6 (2001), 34–
47.
[3] Cho, Y., Lee, J., and Neumann, U. A Multi-ring Color
Fiducial System and an Intensity-Invariant Detection Method
for Scalable Fiducial-Tracking Augmented Reality. In IWAR
(1998).
[4] Coastal Explorer. http://rosepointnav.com/default.htm,
October 2009.
[5] Comport, A., Marchand, F., and Chaumette, F. A Real-time
Tracker for Markerless Augmented Reality In ACM/IEEE
Int. Symp. on Mixed and Augmented Reality, ISMAR’03
(Tokyo, Japan, October 2003), 36– 45.
[6] Damasio, A. Decartes’s Error. BasicBooks, 1983
[7] Dey, A., and Abowd, G. Towards a Better Understanding of
Figure 16. Combining charts and video flows Context and Context-awareness. In Proceedings of the 2000
Conference on Human Factors in Computing Systems (The
Hague, The Netherlands, April 2000).
[8] Didier, J.-Y. Contribution à la dextérité d’un systéme de [18] Schilit, B., Adams, N., and Want, R. Context-aware
réalité augmentée mobile appliquée à la maintenance Computing Applications. 1st International Workshop on
industriel le. PhD thesis, Université d’Evry, Décembre 2005. Mobile Computing Systems and Applications (1994), 85–90.
[9] Drummond, T., and Cipolla, R. Real-time Tracking of [19] State, A., Hirota, G., Chen, D., Garret, W., and Livingston,
Complex Structures for Visual Servoing. In Workshop on M. Superior Augmented Reality Registration by Integrating
Vision Algorithms (1999), 69–84. Landmak Tracking and Magnetic Tracking. Computer
[10] Furuno. http ://www.furuno.fr/, Octobre 2009. Graphics, (Annual Conference Series) 30 (1996), 429–438.

[11] Garmin. http://www.garmin.com/garmin/cms/site/fr, October [20] Technology Systems Inc. Augmented reality for marine
2009. navigation. Tech. rep., LookSea, 2001.

[12] Goleman, D. Emotional Intelligence. New York: Bantam [21] William A. Hoff, Khoi Nguyen, and Torsten Lyon. Computer
Books, 1995. Vision-Based Registration Techniques for Augmented
Reality. Intel igent Robots and Computer Vision XV, In Intel
[13] Laurence Nigay, Philippe Renevier, Marchand, T., ligent System and Advanced Manufacturing, SPIE 2904
Salembier, P., and Pasqualetti, L. La réalité cliquable: (November 1996), 538–548.
Instrumentation d’une activité de coopération en situation de
mobilité. Conférence IHM-HCI2001 Lil le (2001), 147–150. [22] Wuest, H., Vial, F., and Stricker, D. Adaptive Line Tracking
with Multiple Hypotheses for Augmented Reality. In
[14] MaxSea. http://www.maxsea.fr, October 2009. ISMAR’05: Proceedings of the Fourth IEEE and ACM
[15] Milgram, P., and Kishino, F. Taxonomy of mixed reality International Symposium on Mixed and Augmented Reality
visual displays. IEICE Transactions on Information Systems (Washington, DC, USA, IEEE Computer Society. 2005), 62–
E77-D, 12 (December 1994), 1–15. 69.
[16] Philippe Fuchs. Les interfaces de la réalité virtuelle. La [23] Zendjebil, I. M., Ababsa, F., Didier, J., Vairon, J., Frauciel,
Presse de l’Ecole des Mines de Paris, ISBN 2- 9509954-0-3 L., and Guitton, P. Outdoor Augmented Reality: State of the
(1996). Art and Issues. 10th ACM/IEEE Virtual Reality International
Conference (VRIC2008), Laval : France.
[17] Schall, G., Mendez, E., Kruijff, E., Veas, E., Junghanns, S.,
Reitinger, B., and Schmalstieg, D. Handheld augmented [24] SAFRAN, UFO Detection. http://www.safran-group.com
reality for underground infrastructure visualization. Personal
and Ubiquitous Computing, Special Issue on Mobile Spatial
Interaction 13, 4 (Mai 2008), 281–291.
Skier-ski system model and development of a computer
simulation aiming to improve skier’s performance and ski

François ROUX Gilles DIETRICH Aude-Clémence Doix


INSEP, Laboratoire d'Informatique UFR STAPS, Laboratoire Action, INSEP, Laboratoire d’Informatique
Appliquée au Sport Mouvement et Adaptation Appliquée au Sport
11 avenue du Tremblay 1 rue Lacretelle 11 avenue du Tremblay
75012 Paris, FRANCE 75015 Paris, FRANCE 75012 Paris, FRANCE
+33 492 212 477 +33 156 561 245 Tel+33 674 711 601
lias@insep.fr gilles.dietrich@parisdescartes.fr lias@insep.fr

ABSTRACT the eight body-techniques. Based on these results, we created a


Background. Based on personal experience of ski technological model of the skier-ski system. Then, we have made
teaching, ski training and ski competing, we have noticed that a reading template and a model to coach young alpine skiers in
some gaps exist between classical models describing body- clubs and world cup alpine skiers and, we have obtained results
techniques and actual motor acts made by performing athletes. demonstrating the usefulness of our research.
The evolution of new parabolic shaped skis with new mechanical
and geometric characteristics increase these differences even Interpretation. These results suggest that it is now
more. Generally, scientific research focuses on situations where possible to create a three-dimensional simulator of an alpine skier.
skiers are separated from their skis. Also, many specialized This tool is able to compare competitors’ body-techniques to
magazines, handbooks and papers print articles with similar detect the most performing body-techniques. Additionally, it is
epistemology. In this paper, we describe the development of a potentially helpful to consider and evaluate new techniques and
three-dimensional analysis to model the skier-skis’ system. We ski characteristics.
subsequently used the model to propose an evaluation template to
coaches that includes eight techniques and three observable
Categories and Subject Descriptors
consequences in order to make objective evaluations of their Measurement, Performance, Experimentation.
athletes’ body-techniques. Once the system is modeled, we can
develop a computer simulation in the form of a jumping jack,
General Terms
.
respecting degrees of freedom of the model. We can manipulate
movement of each body segment or skis’ gears’ characteristics to Keywords
detect performance variations. The purpose of this project is to Skier-ski system; Computer simulation; Techniques Reading
elaborate assumptions to improve performance and propose template; Elite Skiing.
experimental protocols to coaches to enable them to evaluate
performance. This computer simulation also involves board and 1. INTRODUCTION
wheeled sports. There are gaps in research between classical models, describing
body-techniques judged to be efficient and motor acts performed
Methods. Eleven elite alpine skiers participated. Video by athletes that we have observed during ski racing competition.
cameras were used to observe motor acts in alpine skiers in two Therefore, we decided to launch a research study on to analyse
tasks: slalom and giant slalom turns. Kinematic data were input alpine skiing kinematics.
into the 3D Vision software. Two on-board balances were used to
measure the six components of ski-bootsàskis torques. All data We believe that analysis of slalom or giant turn movement in
sources were then synchronized. alpine skiing are necessary to improve the current understanding
of elite alpine skiing performance. Several studies have already
Findings. We found correlations between force and been done on alpine skiing biomechanics: analysis of the carving
torque measurements, the progression of center of pressure and turn with a combination of kinematics, electromyographic and
pressure measurement method [1]; Analysis of the ski stiffness
and snow conditions on the turn radius [2]. Research in ski racing
used motion capture with inertial measurement units and GPS to
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are collect biomechanical data and improve performance [3].
not made or distributed for profit or commercial advantage and that A three-dimensional movement analysis of the alpine skier while
copies bear this notice and the full citation on the first page. To copy turning is a first step. The modeling of the skier and his/her
otherwise, to republish, to post on servers or to redistribute to lists,
technique will allow the development of a computer simulation in
requires prior specific permission and/or a fee.
which every body segment will be modeled and potentially
Augmented Human Conference, April 2–3, 2010, Megève, France.
manipulated to interpret some parameters.
Copyright © 2010 ACM 978-1-60558-825-4/10/04…$10.00.
The last decade has seen the transformations of alpine skier
equipment notably in terms of the geometrics and mechanical The second experiment involved one female alpine skier,
characteristics of skis due to new materials as well as the arrival performing the Giant Slalom. We created a mechanical model of
of snowboarders. Research devices with micro computing and skier-ski system thanks to Rossignol R&D who provided us with
video cameras have also considerably improved. an on-board balance to measure ski bootsàski torques and
On account of this evolution, we have developed a motion snowàski torques. The balance was inserted between the ski boot
analysis system as a tool for alpine skiing research. The scientific and the ski (figure 1). Skier had to carry herself units’ data
literature is generally dedicated to the study of the alpine skier’s acquisition from the on-board balance (figure 1). Data was
body, independently from his equipment [2]. Our proposal is to sampled at 936 Hz.
consider skier and skis as a system divided into several sub- Six video cameras were located on the sides of the analyzed turn,
systems: skis, lower-limbs, trunk/head, upper-limbs which all three on the left side, and three on the right side.
interact with the coxal joint.
Modeling allows us to propose a reading template to coaches and
instructors which can be potentially useful for champions’
performance analysis and knowledge. It is made with eleven
observation benchmarks, including eight specific ones for body-
techniques and three with mechanical consequences. The goal is
to simultaneously use and control these body-techniques to
optimize efficiency according to the possible trajectory of the
race’s layout as well as the relief of the slope.
We assume that the simulator will be a tool for coaches to
improve alpine skier performance and to improve new skis
development.
The purpose of this article is three-fold. First a 3D analysis of
eleven elite alpine skiers when performing a giant slalom turn or
two slalom turns during a real skiing situation. We have done a
tridimensional modeling of the skier, highlighting each body-
segment. Second, a computer simulation in which we can
manipulate movement of each body segment or skis’ gears’
characteristics to detect performance variations. Third, from the
eight body-techniques (with their three consequences), as the
reading template for coaches to improve skiers’ efficiency, we
have provided details of one, for an example.

2. SKIER-SKIS’ SYSTEM MODELING


2.1 Methods
The first experiment took place in Les Saisies during an ISF
(International Ski Federation) race, with nine skiers. The second
experiment took place on the Grande Motte Glacier in Tignes, and
the last one took place on the Mont de Lans Glacier in Les Deux-
Alpes.
Eleven French elite alpine skiers (World Cup level) participated.

During the first experiment, we worked with the FFS (French Ski
Federation) and videotaped nine athletes during a Giant Slalom
ISF race. The French teams’ director obtained the agreement from
the referee of the race to put video cameras on the sides of the
slope. The aim was to measure kinematic data of the skiers’
techniques during a real race situation, to find out how alpine Figure1. Unit data acquisition, focal points and
skiers start a turn and pilot for an efficient trajectory. on-board Balance

Distance between two Giant Slalom gates is 20 meters. We used


two video cameras and oriented the optical field of the second Nineteen focal points, made by fluorescent yellow and black
camera in order to make a 30 degrees angle with the optical field squares tennis balls or scotch-tape, have been placed on the
of the first video camera. The common optical field of both video subject (anatomical landmarks) and skis on each side: one on the
cameras was oriented according to the place where alpine skiers tip of the ski; one on the tail of the ski; one on the front binding;
started to turn. The distance between the first camera and the on the ski boot regarding the ankle; one on the knee; one on the
slope was 20 meters, with maximal zoom lens while the distance hip; one on the elbow; one on the hand; one on the shoulder; one
of the second camera and the slope was 30 meters with maximal on the top of the helmet.
zoom lens. Data from the video were collected. Data from the subject were input into 3D Vision software [4] and
treated by a DLT (Direct Linear Transformation) algorithm [5].
Every 1/25th seconds, each position of each focal point located on
subjects has been calculated. That way, the skier-ski system is
located into the space of experiment.
The six video cameras were connected together to a
synchronization device. To synchronize kinematic data (from
video cameras) with on-board balance, we used a remote control
which sent out a signal to units’ data acquisition (on-board
balance data) and which switched on a diode into the cameras’
field of vision. The time-code value matching to the lighting of
the diode is the time’s origin of kinematic data. The signal
recorded in units’ data acquisition is the time’s origin of torques’
data. We made match the two origins (common event: remote
control signal), and reduced the on-board balance frequency to
25Hz to synchronize all data.
We kept data when the subject skied through the experiment
space: from the internal piquet of the upper gate to the internal
piquet of the lower gate.
Data from focal points from the subject (on-board balance) was
input into 3D Vision software and treated by a DLT algorithm.

Finally, for our last experiment, three male alpine skiers and one
Figure 2. Skier-skis’ system and sub-systems
female alpine skier participated to the study. They skied on the
Giant Slalom and Slalom.
Kinematic data were acquired the same way, except for video We assume that we can better improve skis’ development and
cameras. They were put on scaffolding to get rid of the spray training separating skier and skis, contrarily to a study about
snow, due to ski turns, which kept from seeing lower focal points. effects of ski and snow-cover on alpine skiing turn without
We got two Kistler on-board force plates to measure ski- studying the skier while using a sledge as the skier [2].
bootsàski torques and snowàski torques. Force plates were
inserted between the ski-boot and the ski. Skiers had to carry About the kinematic study, skier system is represented as a
units’ data acquisition of the on-board balance. Data was sampled mechanical structure made by many stiff and rigid segments
at 936 Hz. articulated with each other. Their moves are determined by
To collect 3D kinematic data, the measurement area was first degrees of freedom allowed by skeletal anatomy. The torques
calibrated. The area marking has been realized with focal points applied on joints have a muscular origin.
made by fluorescent yellow and black squares tennis balls
impaled on a wood stick jabbed in the snow in the sight-line of The ski-bootàski torque is made up with three components of the
video cameras. Focal points were also put on the bottom of the support reaction (on the three axis) and with three torques
internal piquet of the upper gate, and on the bottom of the internal compounding the resulting torque, results of the distance between
piquet of the lower gate. the application point of the support reaction and the origin of the
Focal points data from the slalom gates and from the subjects binding mounting mark (engraved on the topsheet by the ski
were input into 3D Vision software and treated by a DLT constructor).
algorithm.

We calculated both the position and the progression of the center 3.2 Computer representation
of mass for each experiment. To determine the center of mass From the first model obtained with the 3D Vision software (figure
position, we had to calculate the weight, m1, m2, m3… mn, of each 3), we have determined the position of some segmental centers of
segment of the skier and of each ski gear, then determinate the masses, using an anthropometric model [6] [7]. Then, we
position of the center of gravity M1, M2, M3…Mn of each body calculated the global center of mass. Those segments have been
segments and ski equipment in a space defined according to a chosen because they can produce the eight fundamental
frame. The position of the skier-ski system’s center of gravity is techniques that the skier uses to control him/herself.
calculated according to positions PM1, PM2, PM3… PMn of each
center of gravity of body segments and ski equipment.

3. COMPUTER SIMULATOR
3.1 Tridimensional modeling
We consider skier-ski system as an articulated system divided in
several sub-systems (figure 2). The first one, where the torque
snowàskis is applied, is itself made by sub-systems skis and
lower limbs. The second one, which it articulated with the
previous one at each coxal joint, is constituted by the trunk, the
head and upper limbs.
With the new software ID3D (latest version of the 3D Vision
software), we can pilot the progression of each segment of the
jumping jack with captured kinematic of skier’s movements’
analysis. We can provide to the jumping jack anthropometrics
characteristics modeled by Hanavan and apply context’s torques
that constrain the jumping jack. We can then measure data.
With the computer simulator, we aim two goals. The first is
didactic, to show to coaches, trainees or athletes (beginners or
experts) some biomechanical causes which affect the skier-skis’
system and which the understanding is useful either to give
instructions or to act. The second one is technological; it consists
to impose technical instructions to the computer jumping jack, or
modify some equipment characteristics, to measure their
consequences. The goal is then to make assumptions about the
evolution of techniques, skis modeling, and relations between the
evolution of the ski structure and the torques produced during a
specific situation.
Figure 3. First modeling
4. RESULTS
Software works with a DLT algorithm (“Direct Linear The figure 4 shows the segmental modeling with resultant forces
Transformation”). It connects real coordinates (according to a of skiàski-boot torques, and also the direction of the global
known frame) of focal points which are in the common optical acceleration of the center of masses.
field of video cameras, with coordinates collected on the computer We calculated variations of joint angles to underline technique
screen of each focal point recorded by each video camera. This used by each subject to pilot himself. We made statistical
calculation enables to rebuild the position of the moving point into comparisons which make objective our empirical model.
the real space of experimentation. Then, the point which moves on The two next graphics (figures 5 and 6) show in 2D variations of
the screen owns the 3D characteristics of the real moving point lateral knee inclination to the on-edge angle.
kinematic. Graphics make appear a dispersion of size and timing but also a
The 3D computer representation distinguishes each body segment, similar shape showing that the action is realized by every racer.
the coxal joint, the progression of center of gravity, feet trajectory, So, it is a fundamental technique to make vary the on-edge angle
and each ski-bootàski torque (represented by arrows, figure 4). and make our biomechanical and technological models.

Figure 4. Jumping jack, computer simulator Figure 5. Angle right shin tip up

The computer simulator makes the manipulation of the model


possible to measure disturbances on the system: the trajectory
changes; a joint order; a torque ski-bootàski transformation
which corresponds to a modification of a mechanical or geometric
characteristics of a kind of ski. Thus, we will be able to discover,
in a short term, new ways to improve performance.
5.1 Lateral knee inclination
This technique controls on-edge ski angle variations loading more
the external foot leaning the ski. The observation will be done
staring at the skier with a front or a rear view.
This technique is evaluated by the size of the angle formed
between the axis middle of the ski-boot – knee and the slope’s
plane (figure 7). It is the only biomechanical way to increase the
external on-edge ski angle without moving the center of gravity
inwards. It creates a directional effect and the future upper ski. It
is only possible when the external knee bending is about 120°
[10].

Figure 6. Angle left shin tip up


That way, contrarily to the classical model, and comparing our
results with skeletal determinisms of our species described by
physiologists [8], we have showed that the most performing skiers
use the technique of the lower-limb’s plane rotation, made by the
axis middle of the ski boot – knee and knee -coxal joint around
the axis middle of the ski boot to the coxal joint by a femoral
adduction on the pelvis creating a lateral inclination of the outer
knee. The aim is to make vary the on-edge angle without moving Figure 7. Lateral knee inclination
the center of mass on the inner foot in order to create a radial
component to the contact snowàski setting off the steering The variations of lateral knee inclination to the on-edge angle are
change the skier wants to make. r r r
measured into the moving frame o, x , y , z (figure 8). The
5. EXAMPLE OF A BODY- binding mounting mark is the origin (OS1) that the ski constructor
skis designed, and which becomes the middle of the ski boot for
TECHNIQUE the coach. This is a negligible approximation (for the one who
Eight techniques have been highlighted with a ski motion analysis
observes): the ski boot is hold by ski bindings where the middle
[9]. Because of their relevancy for world level coaches and
of the ski boot matches with the binding mounting mark, but
physics theory, they have been defined according theses two separated by the thickness of the ski – ski-bindings interface.
points of view. For example, the body-technique told as “shin tip
up” became for coaches “lateral knee inclination”. The first
definition respects physicians logical thought. The second one
translates it as a useful technique to coaches because they can
directly discernible, and described it, by words used in this
professional field, biomechanical conceptions hold for the
mathematical definition, keeping same body frames.
According to mechanics laws and because of mechanical and
geometric characteristics of skis, three conditions are required to
produce a directional effect: a sliding (skier’s engine is the
gravity); a ski loading (a deforming effort on the ski); an on-edge
angle. To create and control this directional effect, skiers produce
eight techniques (actually, nine: one of them is not an action, but a
consequence: the lunge [9]). Theses eight techniques are three-
aimed, but during skiing, they are constantly interacting and make
a system. Following, we have focused on one technique: the
lateral knee inclination.
Figure 8. Variation knee inclination
5.2 Observation for coaches movements. Those loads variations are due to skier’s weight
speed of altitude variations of the center of gravity or the moving
We have transposed this biomechanical conception to the field of body segments (accelerations).
coaches completing the technique’s description by a cause which It causes loads variations which deform skis and make change the
makes sense to give an instruction: the lateral outer knee trajectory radius depending on the race layout demands
inclination is the only way to make vary, according to joints (interactions between loading, ski edge angle and skis’
limits, the on-edge angle of the ski (the outer one) always keeping characteristics). The on-board balance inserted between the ski
a predominant load on the outer ski in order to create a directional boot and the ski, to measure the torque skiàski-boot highlights
effect. This technique becomes possible when the skier’s knee efforts applied on the articulated system which is the skier body.
flexion is around 120° because it releases the second degree of Those data are necessary to make calculation of dynamical
freedom of his knee which allows the leg fixing [8] [10]. modeling of the system skier and the system ski.
We propose below, two synchronized pictures (figures 9 and 10) Plus, the forward-backward hip inclination with the knee bended
illustrating the invented words to pass on our evaluations and our about 120°, combined with the forward-backward trunk
technological conceptions, comparing techniques used by the inclination, aims to keep the pressure on the bearings (ankle core)
winner with one used by out athlete. whatever the friction force between snow and skis.
The observation of this technique makes sense only if the coach is
capable to connect them and think about the mechanical
consequences (modeling). The amplitude, the rhythm and the
timing causes the trajectory. The only goal is performance.

6. DISCUSSION
The system modeling, from this experimental method, seems now
possible. This study went further than results obtained with a 3D
motion analysis only [3] because it has highlighted some factors
to improve performance. Nevertheless, it keeps pursuing the
investigation of interactions of the mechanical and geometric skis’
characteristics between the skiàski-boot torque and the torque
applied on ski by the snow cover. Snow cover properties change
the on-edge angle on steering phase and loading on the ski [2].
Let’s remind that the technique is defined according to articular or
material marks taken from body or equipment. It corresponds to a
technique seen as pertinent for the performance with current skis,
and it also corresponds to a goal intended to vary ski-snow efforts
Figure 9. Picture of the winner or aerodynamic efforts characteristics.
This technical model of the skier is a tool for the coach and the
skier to improve training. The body-technique described is
referred to the middle of the ski-boot of the same side because the
articulate skier-skis’ system is mostly guided by each effort at the
contact ski-snow cover. Mostly, because the aerodynamic strain,
which depends on the speed and the skier’s shape, also affects the
system guiding but weaker than ski-snow efforts. It has also been
showed that the saggital balance (that we call forward-backward
inclination) is an important factor for performance [1].
It is still hard to predict loads on skis from the skier by
electromyographic study, because estimations from EMG are
barely reliable [11].
With the computer simulator, it is possible to apply on ski torques
measured between the skiàski-boot torque and the torque applied
on ski by the snow cover. It is also possible to apply the on-edge
angle measured. That way, the static load repartition is known.
The computer simulator is capable to know the dynamic torques
of skis and bindings (on-board balances measurements). Then, we
can link loads applied on ski by the skier and by the snow to the
Figure 10. Picture of our athlete ski materials and the ski structure. The simulator manipulates
torques and skier-skis system characteristics. We can though find
out what structure/material of the ski improves skier’s
We can notice that the weight, which can be measured by scales, performance
changes if the skier does flexing, extension, and/or segmental
Avoiding replacing 3D skiers motion analysis, in their contexts,
and measurement of external constraints applied on skiers, the [4] Maesani, M., Dietrich, G., Hoffmann, G., Laffont, I.,
computer simulator will allow to impose to the skier-ski system Hanneton, S., Roby-Brami, A. 2006. Inverse Dynamics for
univocal constraints reducing experimental uncertainties, and to 3D Upper Limb Movements - A Critical Evaluation from
make assumptions easily. That way, with coaches, ski Electromagnetic 6D data obtained in Quadriplegic Patients.
constructors, and researchers, studies will be lead to make and Ninth Symposium on 3D Analysis of Human Movement.
evaluate experimental protocols to improve ski development. Valenciennes.
Let’s add that the computer simulator is not only thought for [5] Abdel-Aziz, Y.I., Karara, H. M. 1971. Direct linear
alpine skiing, but also for board and wheeled sports. We can Transformation from Comparator Coordinates into Object
manipulate movement of each body (on the jumping jack) or skis’ Space Coordinates in Close-Range Photogrammetry.
gears’ characteristics to detect and even simulate performance [6] Hanavan, E. P. 1964. A mathematical Model of the Human
variations. Body. AMRL-TR-64-102, AD-608-463. Aerospace Medical
Research Laboratories, Wright-Patterson Air Force Base,
7. ACKNOWLEDGMENTS Ohio
Our thanks to first in memory of Alain Durey; to Rossignol Ltd.
for provided us devices; and to French Skiing Federation for [7] Miller, D.I., Morrison, W. 1975. Prediction of Segmental
allowed us to work with athletes. Parameters using the Hanavan Human Body Model. Med.
Sci. Sports 7, 207-212.
8. REFERENCES [8] Kapandji, I.A. 1982. Physiologie Articulaire. Maloine S.A.
éditeur. Paris.
[1] Müller, E., Schwameder, H. 2003. Biomechanical Aspects of [9] Roux F. 2000. Actualisation des Savoirs technologiques pour
New Techniques in Alpine Skiing and Ski-jumping. Journal la Formation des Entraîneurs de Ski Alpin de Compétition.
of Sports Sciences, 21, 679-692. Doctoral Thesis. University of Paris Orsay XI.
[2] Nachbauer, W., Kaps, P., Heinrich, D., Mössner, M., [10] Cotelli, C. 2008. Sci Moderno. Mulatero Editore.
Schindelwig, K., Schretter, H. 2006. Effects of Ski and Snow
Properties on the Turning of Alpine Skis – A computer [11] Buchanan, T.S., Llyod, D.G., Manal, K., Besier, T.F. 2005.
Simulation. Journal of Biomechanics, 39, Suppl.1, 6900. Estimation of Muscles Forces and Joint Moments Using a
Forward-Inverse Dynamics Model. Official Journal of the
[3] Brodie, M., Walmsley, A., Page, W. 2008. Fusion Motion American College of Sports Medicine. 1911-1916.
Capture: A Prototype System Using Inertial Measurement
Units and GPS for the Biomechanical Analysis of Ski
Racing. Journal of Sports Technology, 1, 17-28.
T.A.C: Augmented Reality System for Collaborative Tele-
Assistance in the Field of Maintenance through Internet.
Sébastien Bottecchia Jean-Marc Cieutat Jean-Pierre Jessel
ESTIA RECHERCHE - IRIT ESTIA RECHERCHE IRIT
Technopôle Izarbel Technopôle Izarbel 118, Route de Narbonne
64210 Bidart (France) 64210 Bidart (France) 31000 Toulouse (France)
(+33)5 59 43 85 11 (+33)5 59 43 84 75 (+33)5 61 55 63 11
s.bottecchia@estia.fr j.cieutat@estia.fr jessel@irit.fr

ABSTRACT mechanical/electronic systems and the increasingly rapid renewal


In this paper we shall present the T.A.C. (Télé-Assistance- of ranges.
Collaborative) system whose aim is to combine remote The compression of training periods and the multiplication of
collaboration and industrial maintenance. T.A.C. enables the maintenance procedures favor the appearance of new constraints
copresence of parties within the framework of a supervised linked to the activity of operators, eg. the a lack of "visibility" in
maintenance task to be remotely "simulated" thanks to augmented the system to be maintained and the uncertainty of operations to
reality (AR) and audio-video communication. To support such be carried out. These constraints often mean that mechanics have
cooperation, we propose a simple way of interacting through our to be trained "on the job ", which can in the long term involve a
O.A.P. paradigm and AR goggles specially developed for the greater number of procedural errors and therefore increase
occasion. The handling of 3D items to reproduce gestures and an maintenance costs as well as lead to a considerable loss of time.
additional knowledge management tool (e-portfolio, feedback, In this highly competitive globalised context, the demand of
etc) also enables this solution to satisfy the new needs of industry. industrialists to increase the performance of technical support and
maintenance tasks requires the integration of new communication
Categories and Subject Descriptors technologies. When an operator working alone needs help, it is
H.5.3 [Information Interface and Presentation]: Group and not necessarily easy to find the right person with the required
Organization Interfaces – Synchronous interaction, Computer- level of skill and knowledge. Thanks to the explosion of
supported cooperative work, Web-based interaction. bandwidth and the World Wide Web, real time teleassistance is
becoming accessible. This collaboration between an expert and an
K.4.3 [Management of Computing and Information Systems]: operator is beneficial in many ways, such as with regard to quality
System Management – Quality assurance. control and feedback, although a system enabling remote
interactions to be supported is needed. With AR, we can now
General Terms envisage a remote collaboration system enabling an expert to be
Performance, Reliability, Experimentation, Human Factors. virtually cop resent with the operator. By allowing the experts to
see what the operators see, they are able to interact with operators
in real time using an adequate interaction paradigm.
Keywords
Augmented Reality – TeleAssistance – Collaboration – Computer
Vision – Cognitive Psychology. 2. A.R. FOR MAINTENANCE & TELE-
ASSISTANCE
We shall firstly take a brief look at existing systems and see that
there are two major types which are quite separate. We shall then
1. INTRODUCTION study the basic aspects which led us to build our solution .
Over the last few years the world of industry has held great
expectations with regard to integrating new technological 2.1 Current systems
assistance tools using augmented reality. This need shows the Amongst the AR systems aimed at assisting maintenance tasks,
difficulties encountered by maintenance technicians currently the KARMA prototype [8] is certainly the most well-known
faced with a wide variety of increasingly complex because it was at the origin of such a concept as far back as 1993.
The aim of this tool was to guide operators when carrying out
maintenance tasks on laser printers. Later other systems followed
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are like those of the Fraunhofer Institute [20] and Boeing [18] in
not made or distributed for profit or commercial advantage and that 1998. The purpose of the first was to teach workers specific
copies bear this notice and the full citation on the first page. To copy gestures in order to correctly insert car door bolts. The second was
otherwise, or republish, to post on servers or to redistribute to lists, aimed at assisting the assembly of electric wiring in planes.
requires prior specific permission and/or a fee. Following these systems, industry became increasingly interested
Augmented Human Conference, April 2–3, 2010, Megève, France. in using such AR devices in their fields of activity. We then saw
Copyright © 2010 ACM 978-1-60558-825-4/10/04…$10.00.
the creation of more ambitious projects like ARVIKA [1] whose operator's display device. Here the expert is able to enrich real
purpose was to introduce AR in the life cycle of industrial images to ensure the operator fully understands the action to be
product, Starmate [22] to assist an operator during maintenance carried out.
tasks on complex mechanical systems, and more recently ARMA
[7] which aims to implement an AR mobile system in an 2.2 Motivation/Issue
industrial setting. Even more recently, Platonov [19] presented In the paragraph above we saw that existing systems are either
what can be described as a full functional AR system aimed at very maintenance-oriented with a single operator with a device or
repairs in the car industry. This system stands out from others collaboration-oriented which do not necessarily enable direct
because it proposes an efficient technique enabling visual markers assistance to be provided for the task in hand.
to be avoided. Our work is therefore based on the possibility of remote
The vocation of all of these systems is to support operators in the collaboration enabling both efficient and natural interaction as in
accomplishment of their tasks by providing contextualized (visual a situation of copresence, whilst taking advantage of the
or sound) information in real time. Both of these conditions possibilities offered by AR in the field of maintenance. Although
should reduce the risks of running errors according to Neumann's in [14] Kraut shows us that a task can be carried out more
work [18]. efficiently when the expert is physically present, his study also
shows that remote assistance provides better results than working
Another common point is the importance placed on transparency
alone, as confirmed by Siegel and Kraut in [23]. Other studies like
in interaction with the machine. This is effectively a key point of
[15] even show that a task can be accomplished more quickly and
AR in this field. Users must be able to pay their attention to the
with less error when assisted rather than alone with a manual.
task in hand and not have to concentrate on how to use the tool
However, communication mechanisms and the context play an
itself, hence the different strategies of each project in creating
important role when both the operator and expert share the aim:
prototypes. Also, the choice of the display device is important
because the objective may be to reduce the need for resorting to They share the same visual space. In remote
classical supports (paper), thus leaving operator's hands free [24]. collaboration, the expert does not necessarily have a
However, certain contradictory studies [25][10] are not spatial relation with objects [14] and must therefore be
conclusive with regard to the efficiency of AR compared to paper able to have a peripheral visual space so as to better
supports. apprehend the situation. This will directly affect
Finally, all these systems are particularly pertinent when tasks are coordination with the operator's actions and enable the
governed by rules which allocate specific actions to specific expert to permanently know the status of work [9]. The
situations, ie. within the framework of standard operational lack of peripheral vision in remote collaboration
procedures. In this case we talk about explicit knowledge, therefore reduces the efficiency of communication when
although accessing this knowledge is not necessarily sufficient to accomplishing a task [11].
know how to use it, which is known as tacit (or implicit) They have the possibility of using ostensive references,
knowledge. This belongs to the field of experience, aptitude and ie. deixis ("That one!", "There!") associated with
know-how. This type of knowledge is personal and difficult to gestures to name an object. Much research as in [14]
represent. and [4] suggests the importance of naming an object in
Thus, current AR systems for maintenance are of little use when collaborative work or not. This type of interaction is
an unforeseen situation occurs in which case it is sometimes directly related to the notion of shared visual space
necessary to resort to a remote person who has the required level referred to above.
of qualification. These characteristics provided by a collaborative relationship of
copresence are symmetrical [2], ie. those involved have the same
It is only very recently that systems which support remote
possibilities. On the contrary, remote collaboration systems
collaborative work for industrial maintenance have begun to
introduce asymmetries in communication. Billinghurst [3]
appear. However, greater importance is given to the collaborative
highlights three main asymmetries which can hinder
aspect than to maintenance. In [26] Zhong presents a prototype
collaboration:
which enables operators, equipped with an AR display device to
be able to "share" their view with a remote expert. The operator implementation asymmetry: the physical properties of
can handle virtual objects in order to be trained in a task which is the material are not identical (eg. different resolutions in
supervised by an expert. However, the expert can only provide display modes)
audio indications to guide the operator. Concerning [21], Sakata functional asymmetry: an imbalance in the functions
says that the expert should be able to remotely interact in the (eg. one using video, the other not)
operator's physical space. This operator has a camera fitted with a
laser pointer, and the entire system is motorized and remotely social asymmetry: the ability of people to communicate
teleguided by the expert who can therefore see the operator's work is different (eg. only one person sees the face of the
space and point to objects of interest using the laser. The other)
interaction here is therefore limited to being able to name objects Remote collaboration between an operator and an expert must be
(in addition to audio capabilities). There are other systems like [6] considered from the point of view of the role of each party,
which enable the expert to give visual indications to an operator therefore necessarily introducing asymmetries, eg. due to the fact
with an AR display device fitted with a camera. What the camera that the operator does not need to see what the expert sees.
sees is sent to the expert who can "capture" an image from the However, Legardeur [16] shows that the collaboration process is
video flow, add notes, then send back the enriched image to the unforeseeable and undetermined, which means that experts may
have at their disposal possibilities for interaction close to those of (Monocular Orthoscopic Video See-Through) satisfies the criteria
operators as well as those which are available in real life, ie. the of our application. The first of these criteria was that the operator
ability to name and mime actions. Finally, the underlying element must be able to easily apprehend the environment, without being
with regard to collaboration in the field of tele-assistance is the immersed and keep as natural a field of vision as possible, ie.
notion of synchronism: collaboration may be synchronous or having the impression of seeing what can be seen with the naked
asynchronous. This shows the need for a real time interaction eye (eg. orthoscopic).
method between parties.

3. THE T.A.C. SYSTEM


3.1 Principle
To propose a solution combining remote collaboration and
maintenance thanks to augmented reality, we have chosen two
basic aspects:
The mode of interaction between parties: This is the
way expert can "simulate" their presence with operators
The shared visual space: This is about being able to
show the expert the operator's environment AND the
way in which the operator is able to visualize the
expert's information
Through these aspects we also suggest that our system is able to
support synchronous collaboration between parties.
To implement this, we propose the following principle of use
Figure 2. Simulation of the operator's field of vision carrying
(figure 1): the operator is equipped with a specific AR display
our MOVST. At the top a classic display (inside the red
device. Its design enables it to capture a video flow of what the
rectangle). At the bottom an orthoscopic display.
carrier's eye exactly sees (flow A) and a wide angle video flow
(flow B). Amongst the two video flows which the expert will
receive, there is the possibility of incrementing flow A thanks to
our interaction paradigm (cf. paragraph 3.3). The incrementations
are then sent in real time to the operator's AR display.

Figure 3. Prototype of our AR goggles known as MOVST.

Figure 1. How the T.A.C. system works. The operator's view is


sent to the expert who enhanced it in real time by simply
clicking on it.
Hereafter we shall examine in greater detail our interaction
paradigm and the visualization system supported by it as well as
other functionalities.
3.2 Perceiving the environment
For each AR system developed, its type of display should be
specifically chosen. Within the framework of maintenance, we
must therefore take into account the constraints imposed by the
operator's work. The many different aspects of using an AR
system in working conditions linked to a manual activity poses
certain problems. Furthermore, we must take into account how the
situation is seen by the expert who must effectively apprehend the
operator's environment as if he or she were there in person. In [5]
Figure 4. Expert interface. The orthoscopic view (inside the
we presented our visualization system carried by the operator and
red rectangle) is placed in the panoramic view.
which is responsible for providing an exact vision of part of what
is seen to the expert. This specific HMD, known as MOVST
In order not to overload the operator's visual field with virtual
elements, the choice of a monocular system has the advantage of
only being able to be partly augmented. Finally, the "Video See-
Through (VST)" principle was chosen for two reasons. Firstly,
because it has an orthoscopic system, with a VST it is easier to
implement the carrier's point of view. Secondly, it is possible to
switch between orthoscopic display and classic display (figure 2).
The advantages of the classic display lie in the fact that it can be
used like any screen. It is therefore possible to present videos,
technical plans, etc.
This so-called classic information is essential because it
characterizes the "visibility" of the overall system subject to
maintenance. Mayes in [17] distinguishes, amongst other things, Figure 5. Operator's augmented view after a "Picking"
the importance for the user of conceptualizing the task thanks to operation. Here we clearly see the advantage of being able to
this type of information. However, the previous model of our discriminate an important element by showing it rather than
MOVST only enabled the expert to see the "augmentable" part of describing it.
the operator's field of vision, ie. approximately 30˚. In order to The second mode, known as "Outlining", uses the idea of
take into account the lack of peripheral vision as mentioned in sketching the elements of a scene using the hands to highlight
2.2, adding a second wide angle camera on the MOVST enables them. These gestures support the verbal description. The
this problem to be solved (figure 3). principles of AR mean that we have the possibility of
With regard to the expert's interface (figure 4), this gives a retranscribing this visually for the operator. Elements in the scene
panoramic video of the scene in which the orthoscopic video is which require the operator's can be highlighted by drawing the
incrusted (PiP or Picture in Picture principle). contours or the silhouette of these objects (figure 6).

3.3 The P.O.A. interaction paradigm


In [5] we presented a new interaction paradigm based on the
ability of a person to assist another in a task. Generally, when
physically present together, the expert shows how to carry out the
task before the operator in turn attempts to do so (learning
through experience). To do this, the expert does not only provide
this information orally as can be found in manuals, but uses more
naturally ostensive references (since the expert and the operator
are familiar with the context). Our P.O.A. (Picking Outlining
Adding) paradigm is inspired by this and is based on three points:
"Picking": the simplest way to name an object
"Outlining": the way to maintain attention on the object
of the discussion whilst being able to provide adequate
Figure 6. Operator's augmented view after "Outlining". The
information about it
expert has selected the elements of interest and has given the
"Adding": or how to illustrate actions usually expressed temperature of an object.
using gestures With regard to the expert, this is done by clicking on the
In order to implement these principles, we propose simply interesting parts whose 3D modeling is known by the system. We
clicking on the video flow received from the operator. also have the possibility of adding characteristic notes (eg.
The first mode, "Picking", therefore enables an element belonging temperature of a room, drill diameter).
to a work scene to be quickly named. This is equivalent to The final mode, known as "Adding", replaces the miming of an
physically pointing to an object. The visual representation can be action using adequate 3D animations. The expert has a catalogue
modelised in different ways like simple icons (circles, arrows, of animations directly related to the system subject to
etc). Thus, the expert, by simply clicking on the mouse on an maintenance. According to the state of progress of the task and the
element of interest in the video, enables the operator to see the need, the expert can select the desired animation and point to the
associated augmentation (figure 5). This provides experts with an element to which it refers. Eg. (figure 7) the virtual element is
efficient way of remotely simulating their physical presence in a placed directly where it should be.
more usual way and saying: "take this object and ...".
The expert is someone who has been received a training in how to
carry out maintenance on a helicopter turboshaft engine. The first
example is not a real problem since it is simply question of
assembling an electrically controlled engine in an order pre-
defined by the expert (A, B, C, and D in figure 8). This simple
example was initially chosen because 3D modeling and the
associated animations were easy to create. Currently
implementing our system is based on ARToolKit [13] and
OpenCV [12] libraries for 3D recognition. To establish
connection between two computers (voice and video session), we
used the SIP signaling protocol implemented in SophiaSIP
library. Transfer data is ensured by SDP and RTP protocols of the
Live555 C++ library.
The second example concerns measuring the wear of blades in a
Figure 7. Operator’s augmented view after "Adding". The helicopter turboshaft engine (E, F, and G in figure 9). This
expert shows the final assembly using a 3D virtual animation requires the use of a specific instrument which needs to be
placed on the real element. inserted in a precise location. The checking of measurements is
supervised by the expert (this operation can prove delicate for
3.4 Other functionalities beginners).
From the point of view of interaction by the system to support
collaboration, P.O.A. interaction may be completed by the
expert's ability to handle virtual elements. "Adding" enables
4.2 Discussion
actions expressed using gestures via animations to be illustrated, During experiments, it became clear that our system provided
but this is only meaningful within the framework of a formal and easier and more natural interaction than other systems which
therefore modelised process. This is not the case in unforeseen provide traditional audio and video communication. The
situations. For these, we are currently taking advantage of the possibility for synchronous interactions by the expert vis-à-vis the
formidable development of miniaturized inertial units. This works operator stimulate exchanges and offer a strong feeling of being
by handling this interactor associated with a 3D virtual element in physically present which in the end leads to greater efficiency.
the expert interface. The unit's position and orientation is This is due to the ability to act in unforeseen situations thanks to
retranscribed on the 3D element. The operator sees the virtual part "Picking" and "Outlining" and well determined processes thanks
handled just like if the expert had done so using the real part to "Adding". Technical feasibility is extremely important with the
whilst using a tangible interface. However there is the problem of increasing calculation capacities of laptops and the explosion of
the expert not being able to handle at the same time both 3D the bandwidth of communication networks. However, in
interactors and the keyboard to provide important information. To experimental conditions the expert preferred it when the video
support the transfer of implicit knowledge between the expert and offered a resolution of at least 640x480, which was not always
operator, it is more efficient to add a "speech to text" type man- possible because of our network's limited bandwidth. Most often,
machine interaction mode. we were according the time of day forced to use a resolution of
320x240, enabling us to highlight this problem. It is therefore
The T.A.C. system, with its simulation of copresence, enables us necessary to currently look at an exclusive communication
to support a tool in full development in the world of work: the e- solution between the expert and the operator. It also became clear
portfolio. This tool aims to manage a career path and validate that the expert would himself have liked to control the virtual
acquisitions. In sum, this is a database enabling a person's skills to objects supported by "Adding" instead of simple animations. We
be capitalized. Thus, the T.A.C. system can be seen as a are currently working on this taking inspiration from interaction
monitored system providing the possibility of recording images modes and virtual reality. Finally, the operator expressed the wish
from different operations carried out with a view to an e- to be able to control switching from classic to orthoscopic
qualification. Work and qualifications can therefore be more displays in the MOVST and more generally have greater
easily combined. possibilities for controlling the display system.
Regarding the expert, recording images from different operations
is first and foremost a quality control system. Since maintenance 5. CONCLUSION
tasks in industry are highly formalized (set of basic operations), In this paper we have presented a system enabling two remote
their supervision in the event of problems thanks to the synoptic parties to be able to collaborate in real time in order to
view of operations carried out enables the cause to be analyzed. successfully carry out a mechanical maintenance task. This system
Its feedback can also be capitalized on to be used when designing is based on our P.O.A. interaction paradigm enabling the expert's
future products and new maintenance procedures. presence to be simulated with an operator in a situation of
assistance. This prototype was tested on simple cases, but which
4. INITIAL RESULTS were representative of certain real maintenance tasks and it
4.1 Preliminary tests showed that it was able to support both defined and undefined
We tested the T.A.C. system using two examples to verify their interaction processes. However, we must provide the means for
use within the framework of remote assistance. Operators do not greater interaction between parties and carry out a more in-depth
have specific knowledge in the field of mechanical maintenance. study of the real benefits of such a system.
Figure 9. Other examples of assistance.
E: "Undo this cap so you can then turn the shaft"
F: "Place the instrument in hole no. 1, that one there"
G: "Look over here, the small needle says 2 tenths, that's ok"
Figure 8. Example of collaboration
A: "Take this stator and put it on the red support"
B: "That's how the rotor and the case are put together" 6. ACKNOWLEDGMENTS
C: "Turn the carter in this direction until you hear it click" We would like to thank LCI, which specializes in turboshaft
engine maintenance, Christophe Merlo for his help on the
D: "Put the screws here and here with this torque" knowledge of collaborative processes and Olivier Zéphir for his
various contributions with regard to cognitive psychology.
7. REFERENCES [15] Kraut, R., Millerand, M. and Siegel, J. 1996. Collaboration
[1] Arvika. Augmented reality for development, production, in performance of physical tasks: effects on outcomes and
servicing. http://www.arvika.de, URL. communication. CSCW ’96: Proceedings of the 1996 ACM
conference on Computer supported cooperative work, pages
[2] Bauer, M., Heiber, T., Kortuem, G. and Segall, Z. 1998. A 57–66.
collaborative wearable system with remote sensing. ISWC
’98: Proceedings of the 2nd IEEE International Symposium [16] Legardeur, J. and Merlo, C. 2004. Empirical Studies in
on Wearable Computers, page 10. Engineering Design and Health Institutions, chapter Methods
and Tools for Co-operative and Integrated Design, pages pp.
[3] Billinghurst, M., Kato, H., Bee, S. and Bowskill, J. 1999. 385–396. KLUWER Academic Publishers.
Asymmetries in collaborative wearable interfaces. ISWC '99,
pages 133–140, 1999. [17] Mayes, J. and Fowler, C. 1999. Learning technology and
usability: a framework for understanding courseware.
[4] Bolt, R.1980. ‘put-that-there’: Voice and gesture at the Interacting with Computers, 11:485–497.
graphics interface. SIGGRAPH ’80: Proceedings of the 7th
annual conference on Computer graphics and interactive [18] Neumann, U. and Majoros, A. 1998. Cognitive, performance,
techniques, pages 262–270. and systems issues for augmented reality applications in
manufacturing and maintenance. VRAIS ’98: Proceedings of
[5] Bottecchia, S., Cieutat, J., and Merlo, C. 2008. A new AR the Virtual Reality Annual International Symposium, page 4.
interaction paradigm for collaborative TeleAssistance
system: The P.O.A. International Journal on Interactive [19] Platonov, J., Heibel, H., Meyer, P. and Grollmann, B. 2006.
Design and Manufacturing, N°2. A mobile markless AR system for maintenance and repair.
[6] Couedelo, P. Camka system. http://www.camka.com, URL. Mixed and Augmented Reality (ISMAR’06), pages 105–108.

[7] Didier, J. and Roussel, D. 2005. Amra: Augmented reality [20] Reiners, D., Stricker, D., Klinker, G. and Muller, S. 1999.
assistance in train maintenance tasks. Workshop on Augmented reality for construction tasks: doorlock assembly.
Industrial Augmented Reality (ISMAR’05). IWAR ’98: Proceedings of the international workshop on
Augmented reality: placing artificial objects in real scenes,
[8] Feiner, S., Macintyre, B. and Seligmann, D. 1993. pages 31–46.
Knowledge-based augmented reality. Commun. ACM,
36(7):53–62. [21] Sakata, N., Kurata, T., Kato, T., Kourogi, M. and Kuzuoka,
H. 2006. Visual assist with a laser pointer and wearable
[9] Fussell, S., Setlock, L., Setlock, L.D. and Kraut, R. 2003. display for remote collaboration. CollabTech06, pages 66–
Effects of head-mounted and scene-oriented video systems 71.
on remote collaboration on physical tasks. CHI ’03:
Proceedings of the SIGCHI conference on Human factors in [22] Schwald, B. 2001. Starmate: Using augmented reality
computing systems, pages 513–520. technology for computer guided maintenance of complex
mechanical elements. eBusiness and eWork Conference
[10] Haniff, D. and Baber, C. 2003. User evaluation of augmented (e2001), Venice.
reality systems. IV ’03: Proceedings of the Seventh
International Conference on Information Visualization, page [23] Siegel, J., Kraut, R., John, B.E. and Carley, K.M. 1995. An
505. empirical study of collaborative wearable computer systems.
CHI ’95: Conference companion on Human factors in
[11] Heath, C. and Luff, P. 1991. Disembodied conduct: computing systems, pages 312–313.
Communication through video in a multi-media office
environment. CHI 91: Human Factors in Computing Systems [24] Ward, K. and Novick, D. 2003. Hands-free documentation.
Conference, pages 99–103. SIGDOC ’03: Proceedings of the 21st annual international
conference on Documentation, pages 147–154.
[12] INTEL. OpenCV. http://sourceforge.net/projects/opencv/
[25] Wiedenmaier, S. and Oehme, O. 2003. Augmented reality for
[13] Kato, H. and Billinghurst, M. ARToolkit. assembly processes-design an experimental evaluation.
http://www.hitl.washington.edu/artoolkit/, URL. International Journal of Human-Computer Interaction,
[14] Kraut, P., Fussell, S. and Siegel, J. 2003. Visual information 16:497–514.
as a conversational resource in collaborative physical tasks. [26] Zhong, X. and Boulanger, P. 2002. Collaborative augmented
Human-Computer Interaction, 18:13–49. reality: A prototype for industrial training. 21th Biennial
Symposium on Communication, Canada.
Designing and Evaluating Advanced Interactive
Experiences to increase Visitor’s Stimulation in a Museum
Bénédicte Schmitt (1), (2), Cedric Bach (1), (3), Emmanuel Dubois (1), Francis Duranthon (4)
benedicte.schmitt@irit.fr, cedric.bach@irit.fr, emmanuel.dubois@irit.fr, francis.duranthon@cict.fr

(1) University of Toulouse, IRIT (2) Global Vision Systems


118 Route de Narbonne 10 Avenue de l'Europe
31062 Toulouse Cedex 4 31520 Ramonville Saint Agne
France France
(3) Metapages (4) LECP/Muséum d’Histoire Naturelle de Toulouse
12 rue de Nazareth 35 Allées Jules Guesde
3100 Toulouse 31000 Toulouse
France France

ABSTRACT play and adding new interests on the exhibition. More advanced
forms of Interactive Systems, called Mixed Interactive Systems
In this paper, we describe the design and a pilot study of two (MIS) [6] have also been developed to serve this goal. Mixed
Mixed Interactive Systems (MIS), interactive systems combining Interactive Systems combine digital and physical artifacts.
digital and physical artifacts. These MIS aim at stimulating Examples of MIS include Augmented Reality (AR), Mixed
visitors of a Museum of Natural History about a complex Reality (MR) and tangible user interfaces (TUI). The interest of
phenomenon. This phenomenon is the pond eutrophication that is such advanced interactive experiences is that rather than
a breakdown of a dynamical equilibrium caused by human manipulating technological devices, visitors handle physical
activities: this breakdown results in a pond unfit for life. This objects related to the exhibits, such as wooden blocks to create
paper discusses the differences between these two MIS programs to control a robot on display [9], or physical objects to
prototypes, the design process that lead to their implementation trigger different phenomena on the environment [19], and tightly
and the dimensions used to evaluate these prototypes: user coupled to the display and animation of digital artifacts carrying a
experience (UX), usability of the MIS and the users’ predefined knowledge. Users can explore the advanced interactive
understanding of the eutrophication phenomenon. experience created by the system and discover its content. Limits
of such approaches mainly lie in the fact that they do not propose
Categories and Subject Descriptors: H.5.2. [User clear challenges: proposed tasks are open-ended and users can
Interface]: Prototyping| Evaluation/Methodology| User-centered terminate them whenever they want. However involving physical
design| Theory and methods|. artifacts in an interactive experience is strongly in line with the
most recent trends of museology: indeed Wagensberg [22]
General Terms develops an approach for modern museum in which it is
Design, Experimentation, Human Factors. recommended to maintain real objects or phenomena at the center
of exhibits. Among the above systems, only advanced interactive
Keywords experiences prompt visitors to manipulate real objects or real
Mixed Interactive Systems, Advanced Interactive Experience, co- phenomena to stimulate visitors.
design, museology, eutrophication
But who are museum visitors? Most of the research about learning
1. INTRODUCTION in museums is dedicated to children. However, Wagensberg [22]
During the past years, increasing cultural interactive experiences points out the universality of museum audience. In addition, a
were produced in particular in museology. A major goal of this recent study of Hornecker [10] shows a high interest in using
trend is to increase the involvement of visitors during a visit, in Tangible User Interfaces (TUI) in museums: they are universal
order to make them actors of their own museum experience. and can engage a range of visitor profiles. Investigating the use of
Different attempts have been introduced in Museum. Guides [23] interactive systems for adults therefore appears as a required
are used to provide additional information about the exhibit complement to existing studies related to children (Figure 1). TUI
objects by use of numeric comments. Games [23], [25] can are thus good candidates for providing a fun experience while
propose challenges to visitors, encouraging them to learn through enhancing the process of teaching complex natural phenomenon
to adult visitors.

Permission to make digital or hard copies of all or part of this work for Nevertheless introducing such technology also raises the problem
personal or classroom use is granted without fee provided that copies are of evaluation. Indeed such context usability evaluation methods
not made or distributed for profit or commercial advantage and that (UEM) are still required to study the usability of the application,
copies bear this notice and the full citation on the first page. To copy i.e. how efficient, effective and easy to learn they are [11]. In
otherwise, to republish, to post on servers or to redistribute to lists, addition evaluating the user experience (UX) is equally important
requires prior specific permission and/or a fee. because visiting a museum is a leisure activity rather than a
Augmented Human Conference, April 2–3, 2010, Megève, France. working task. Evaluating such advanced interactive experiences
Copyright © 2010 ACM 978-1-60558-825-4/10/04…$10.00.
therefore required studying how visitors feel about their objects have no means to express how to act on them. Therefore it
experience [12]. is required to base their use on affordable actions, behavior and
representations. Involving multiple physical objects is another
In this paper, we focus on the introduction and evaluation of a limitation because the user will have to grab and release several
Mixed Interactive Systems in a Museum of Natural History. These objects and find a place where to put them. In addition,
MIS are used to illustrate and teach a complex phenomenon: a technological limitations exist such as the detection and
pond eutrophication. localization of the required physical artifacts.
However MIS also have the advantage of prompting people, as
the technology is embedded into the physical environment [7].
Users can experience a new concept with less reluctance or
without feeling any constraints. They can manipulate objects
without barriers that separate digital and physical worlds.
Another advantage of MIS is their affordance, as objects provide
some common representations. Furthermore they provide some
opportunities not present in desktop interfaces [15]. Users can
explore the physical objects and evaluate which actions to
perform. This can also enable several groups of users, e.g. novices
and children, to use MIS more intuitively. In other words MIS are
potentially more universal than desktop interfaces [16].
Universality and accessibility of MIS are two major advantages
which can be used to help users familiarize with complex or
abstract concepts. We hypothesize that museums could benefit
from these advantages to engage visitors about their exhibits.
The impacts of MIS on users can be an interest topic to measure,
as they engage users. The next section presents the reasons that
Figure 1. “Mixed Interactive Systems for Eutrophication with
encourage us to use both UEM and UX evaluation methods.
Palette” (MISEP): an example of a TUI for Museum
We first motivate the use of MIS in this context and briefly review 2.2 Usability and UX evaluations
the goals of usability and UX evaluations. We then present the These past years, a new concept has emerged: user experience. As
basis of the pond eutrophication and the principles of the co- it is recent, user experience (UX) has no consensual definition yet.
design process we applied to design and implement two different In particular, Bevan [3] studies the difference between usability
MIS. These MIS, namely MISE and MISEP, are introduced, and UX evaluation methods and first highlights different possible
illustrated and finally compared in order to assess their adequacy interpretations of UX in the literature: the goal of UX can be to (a)
with the eutrophication context. Results of a pilot study focusing improve human performances or to (b) improve user’s satisfaction
on their usability and UX with these prototypes are also presented. in terms of use and appropriation of the interactive system.
Hassenzahl points out the role of hedonic and pragmatic goals; on
2. RELATED WORK this basis UX can be considered as the subjective aspects of a
As previously shown, advanced interactive systems have been system [8]. Moreover, according to results of a survey of
developed in domains such as cultural activity and informal Wechsung et al. [24], the interest in usability focused mainly on
learning. Here we discuss limitations of these systems but also designing better products whereas in UX, it is generally more
such advantages that explain this introduction. linked to concepts with emotional content (e.g. fun and joy).
2.1 Use, limitations and advantages of MIS Finally, in the context of informal learning systems, speed and
Complex or abstract concepts like in informal learning are often accuracy are no longer the unique goals; they also should bring
difficult to explain or to understand. MIS provide tools to simulate new knowledge and stimulate user’s emotion, which are indeed
natural phenomena and make these concepts reachable by users. evaluated through UX methods: it is expected from UX
Manipulating physical and digital artifacts is one of the most evaluations to extract users’ feelings and opinions about the
interesting characteristics of MIS, making these concepts easier to system. In short one can say that usability measurements reveal
understand and to perceive as users can experience them [18] problems related to the system behavior and measurements of UX
[21]. A study of Kim et al. [16] shows that MIS support the highlight some additional perspectives to understand their impact.
designer’s cognitive activities and spatial cognition by giving a
―sense of presence‖. MIS make use of three-dimensional 3. MUSEOGRAPHIC CONSIDERATIONS
interfaces to provide a sense of reality that other systems cannot The aim of our collaboration with the Natural History Museum of
offer. Toulouse is to make visitors aware about the eutrophication
process by explaining this phenomenon. It seems primordial to
MIS have some limitations. The first one is that physical objects show that a pond is not just a waterbody, but a complex and
do not provide a trace of actions [15]: once an action is performed dynamical system (Figure 2). A pond can live 100 years or more
on an object, the object itself is not able to provide any and fill up naturally over years. However, this filling can be
information about its previous state to the user or event to the slowed or accelerated by human activities. These activities impact
computer system. A second limitation with the use of physical parameters involved in the eutrophication process: if human adds
objects involved in MIS is to ensure that their planed use is weed killer or pumps water, the pond disappears faster; if human
perceivable and understandable by the users: by essence physical removes mud, the pond disappears slower. These parameters are
for example: water temperature, oxygen rate, water level, mud activities that can be applied to other thematic to respect the
level. consistency of an interactive exhibit. For our prototypes, the
generic activity is ―Making an action on an environment has a
Most visitors ignore all the effects of their activities on a pond and perceptible consequence on many different objects of this
making them actors of an advanced interactive experience can environment‖.
aware them about the eutrophication. However visitors cannot
experience the real phenomenon with real objects since the The aim of the analysis of interactive principles phase is to define
lifetime is long and it is complicated to insert a real pond into a how to make interactive one of the needs of the domain. All the
Museum. As demonstrated previously, MIS represent a fit elements listed in the previous phase are taken in account. The
solution for museum. We decide to simulate the real phenomenon minimal functions of the system which are necessary to make
on a digital pond and to put forward physical objects to represent interactive the generic activity are identified in this phase. For our
human activities. We face two different solutions: either all prototypes, the minimal functions can be expressed by: ―To show
human activities can be made physically, or a man manipulated that an action on an environment has a perceptible consequence
physically can select digital human activities and impact the on many different objects of this environment, the system should
digital environment. The interest of MIS is also that visitors can allow to perform an action, to present a flexible environment, to
interact with a realistic pond to better observe the evolution of the distinguish impacted objects and the environment, and to present
ecosystem. attributes of these objects‖. A set of recommendations to stage the
minimal functions are also listed, for example for our prototypes:
―To putting across that impacted parameters are constituent of
the environment, the system should mark the link between these
parameters and the pond‖.

Preliminary Analysis of interactive principles


analysis
Figure 2. An oligotrophic pond which becomes an eutrophic
pond [26].
Through this advanced interactive experience, the Museum wants
to deliver three main messages to visitors: Optimization
 A dead pond is a filled pond. Analysis
 A pond is a system that produces filling
 Man can accelerate or slow the eutrophication process.

The design process, described in the next section, takes into Evaluation Design
consideration these museographic requirements.

4. DESIGN PROCESS
We use a specific co-design process [1] as our collaboration with Implementation
the museum involves a multidisciplinary team, composed by
museographic experts, ergonomists and designers, to design our
prototypes. This co-design process facilitates the communication
between the participants and is adapted to Mixed Interactive
Systems, systems we decide to design. Furthermore, with regards
to software design process, the present process focuses more on Production
pedagogic, museographic and visitors’ requirements than on
engineering the software, as the question of technology is Figure 3. Overview of the co-design process for Museums
postponed as the end of the development cycle. Finally, in
contrast with traditional HCI processes, this design process
The elements of these above phases are the same for the two
primarily supports exploration of initial expectations rather than
prototypes as museographic and visitors’ requirements has to be
just users requirements. It then turns these expectations into
well understood before interaction techniques problematic and
interaction considerations and finally iterates to finalize the
technical questions.
application.
The optimization phase aims at designing the interaction with the
This co-design process consists in four phases we will define and system. Our prototypes result from two different optimization
illustrate: preliminary analysis, analysis of interactive principles, phases as the analyses of this phase do not deal with same
optimization and production (Figure 3). This process has the problems: the second prototype should interact directly with the
advantage to guide the design team throughout these phases, and digital 3D pond and put forward fewer devices. The main concept
particularly to facilitate the transformation of the requirements of this phase is to use the participatory design involving end users.
into interactive experience. This iterative phase improves different dimensions of the ongoing
In the preliminary analysis phase, we have analyzed the prototype, like social or educational dimensions. Users can
museologic domain to list all its activities, constraints, targeted participate to design the prototype by use of some creative
users. The objective of this first phase is to extract some generic
methods like brainstorming, focus group and to assess it by objects are used as an input. Visitors observe the effect of their
participating to user tests. actions on the digital environment, which includes the elements
found in MISE and, additionally, a garden and a field.
The last phase conducts to the production of one designed
prototypes.

5. PROTOTYPES
The eutrophication process is complex to explain and we design
two alternative prototypes that have differences regarding
interaction space and forms of coupling between the two worlds.
These aspects are detailed in the next sections and can impact
understanding of this process and our objective is to determine the
role of these different aspects.

5.1 MISE (Mixed Interactive Systems for


Eutrophication)
For this first prototype, visitors manipulate objects of a physical
scale model which represents a natural environment around a
pond: a set of houses with gardens, a field and a forest. This Figure 5. MISEP: the physical objects used to interact with
physical scale model is used as an input and gives no feedback to digital objects.
visitors. Manipulations of physical objects in this scale model
have a direct impact on a digital environment that includes: a 3D Visitors manipulate the palette to select an activity that activates
representation of a pond, a timeline representing the life the corresponding digital object on the 3D environment. Visitors
expectancy of the pond, and a representation of environmental manipulate the human figurine to move this object in order to
parameters relevant to the pond eutrophication such as water place it where the human activity should be performed. So visitors
temperature, oxygen rate, water level, mud level (Figure 4). can directly interact with the visualization as the digital objects
follow human figurine’s position.
5.3 Characteristics comparison
A comparison of the two prototypes allows us to determine
hypothesis about which one better enhances learning and
understanding of the eutrophication phenomenon. Some studies
have tried to define characteristics of MIS [5] [18], even if the
domain is recent. We focus on characteristics that can be similar
or additional from both Dubois studies and Price et al. studies and
can better distinguish our prototypes. These prototypes can be
qualified by two parameters: interaction space and forms of
coupling between the two worlds.
Interaction space is composed by an input interaction space (e.g.
physical artifacts) and an output interaction space (e.g. digital
artifacts). Here we can define the devices, the input focus and the
Figure 4. MISE: the data visualization and the scale model location of the interaction spaces (separate, contiguous,
embedded).
Through these manipulations, visitors can simulate some human
activities and observe their effect on the digital representation of  The devices are different as MISE counts more physical
the pond and associated parameters. Human activities that can be objects than MISEP, respectively eight and two. We expect
simulated include: pumping water by turning a tap in a private to investigate if these differences influence user behavior and
garden or using a field hose over a field, adding weed killer with a impact learning of the eutrophication process. Moreover, for
crop duster over a field, or removing mud of the pond with a MISEP, visitors can control their actions as they can press a
shovel. Each human activity is therefore activated through the use button to release them. We believe this can impact perception
of two different forms of objects: those adopting a global of the ecosystem evolution.
representation and those adopting an individual representation of
the activities. For example visitors can add respectively weed  The input focus is greater for the first prototype. The scale
killer on a field going through it with a tractor or on a private model catches more the visitor attention than the two
garden shaking a weed killer bag. physical objects hold by users for MISEP. For MISE, users
have to make specific manipulations for each physical object.
5.2 MISEP (Mixed Interactive Systems for Moreover users are passive behind the visualization, while
Eutrophication with Palette) they can directly interact with the visualization of MISEP.
For this prototype, visitors manipulate two physical objects: a We want to discover if visitors focus more on the objects
palette and a human figurine (Figure 5). The palette presents all they manipulate or activities these objects represent.
simulated human activities to visitors: adding weed killer,  The location is also different. MISE has separate input and
pumping water and removing mud. The human figurine is a output interaction spaces, as its input has no direct link with
metaphor of the Human acting on the pond. These two physical the output and visitors can not directly interact with the
visualization. On the contrary, MISEP has contiguous spaces,
since digital objects follow the position of physical objects
manipulated by users. Indeed we want to investigate if this
difference impacts understanding of the phenomenon. We
also investigate whether the visitors make a link between
their manipulations and the visualization feedback.
Forms of coupling between the two worlds define the relation
between physical and digital worlds in terms of content and
behavior. Content is all the representations of the two worlds and
their consistency. Behavior is either the manipulations or the Figure 6. Example of two dimensions of the SAM method.
movements made in the physical environment. Content and At the end of the post-test questionnaire, participants can answer
behavior can refer respectively to semantic and articulatory some questions on the museographic considerations to assess their
distances of Norman [17]. understanding of the eutrophication phenomenon, e.g. ―What are
 For MISE, physical artifacts have no overlap with digital the impacts of our activities on the environment?‖, ―What can
artifacts, as the scale model and the visualization have no reduce the pond lifetime?‖ or ―What can extend the pond
similar representations. So visitors cannot make a link lifetime?‖.
between representations of physical and digital artifacts as During the test, we record some log files to collect metrics: the
easily as for MISEP. Actually, for this prototype, visitors can length of the experience, the actions done, the error rate, the
manipulate digital objects, which have a match with physical cancel rate, the success rate, the time to select the first the first
representations. They can select a tap on the palette and activity, the time between activities, the selected activities rate
move a tap model on the 3D scene. We think that this could and the number of filling.
have an effect on understanding of the phenomenon.
6.2 Participants
 Moreover, the devices of MISE require varied manipulations: The participants of the pilot study match with a user profile we
users can turn a tap or shake a fertilizer bag, etc. Even if we defined in order to have a homogeneous panel of participants. Six
try to propose manipulations un common use, this could have participants take part to this pilot study (5 males and 1 female),
an impact on the learning curve and the efficiency of the ranged from 26 to 40 years old. They are regular computer users
different devices. and they are not familiar with the eutrophication phenomenon.
These characteristics help us to discover if the prototypes have an
impact on users during the experience and effects on stimulation, 6.3 Experimental conditions
as we expect to arouse more questions about this phenomenon We have conducted the pilot study in a user lab and participants
after the experience. have no time constraints. They can begin the experience and stop
it anytime. No session training is required as we test if the systems
6. EXPERIMENTAL DESIGN are easy to learn and to simulate Museum context.
We have conducted a pilot study to test our experimental settings The participants test only one of the two systems, so 3 of the 6
and to highlight some technical problems of the prototypes and users assess one prototype and the other 3 users assess the other
experiment instrumentation. prototype. We believe the participants can rationalize their
understanding of eutrophication if they test the two prototypes,
6.1 Method and we want to avoid this as we measure this parameter.
Our objective is to study three dimensions: usability, user
experience and museographic considerations. We designed a user 6.4 Quantitative measures
test protocol to assess these dimensions. Quantitative measures concern especially usability and UX. The
Before the evaluation, a profile questionnaire is proposed to the usability is measured in terms of effectiveness, efficiency,
participants. Some questions deal with their experience of satisfaction and ease to learn with logs and results of
Museums and their basic knowledge about eutrophication. questionnaire. The UX is measured in terms of emotions of the
participants with the SAM method. It allows us to have non verbal
A post-test questionnaire is also proposed to assess the usability of
measures on the feelings of the participants before and after the
the system and to measure UX. We build this post-test usability
test.
questionnaire using all SUS items (System Usability Scale) [4]
and with some questions of the SUMI (Software Usability 6.5 Qualitative measures
Measurement Inventory) [27] and the IBM Computer Usability
Qualitative measures concern the understanding of the
Questionnaire. This questionnaire includes Likert scale of 5 or 7
museographic considerations and some details on the feelings of
items. We also use the SAM (Self-Assessment Manikin) method
the participants about their experience. We investigate the
[13] (Figure 6) to measure users’ experience and emotions on a
stimulation of participants by the advanced interactive experience
scale of 5.
and if they enjoy it. An objective of the systems is that
This method is interesting as a personal assistance for users is not participants ask themselves more questions about eutrophication
required and we can measure the dynamic of four emotions: and that they wish to learn more about it.
pleasure-displeasure, degree of arousal, dominance-
submissiveness and presence. We gather data with SAM method 7. EARLY RESULTS
before and after the user test to measure if the system affects the The pilot study aims at assessing our test protocol and our
emotions of the participants. prototypes, for example to detect technical issues. Some results
have been gathered for the three metrics but they are not Results concerning users’ feeling of presence are encouraging as
statistically significant due to the small number of participants per the corresponding score after the experiment is relatively close to
prototype condition. Here we do not compare prototypes with the one before.
these results, as we will make this comparison with results of user
test, when the protocol and the systems will be improved. Improvements of the protocol and the prototypes
However we can extract some benefits and lacks of the current This pilot study assessed the evaluation protocol and helped us
prototypes. detect several interesting areas of improvement. Some questions
Stimulation and Eutrophication understanding were not well understood by users, such as the question ―Can you
tell me if in our everyday life, some actions can be close to this
First impressions of users on MIS were positive and they seemed experience?‖. Therefore we need to reformulate them. We also
engaged and stimulated by the prototypes, as proved by comments noticed that a major issue has not been assessed by the
collected during the post-interview: ―Manipulations involve the questionnaire: the impact of time on the pond. None of the
users during the experience‖, ―The system is play and questions assessed whether users could understand that a pond is a
stimulating‖. Moreover, users had more questions after the machine that produces filling. Moreover, we don’t know whether
experience than before, between 6 and 10. Another main point users understood well the match between filling and death
was that users had another perception of a pond after the although results show that they realized that the pond dies or fills.
experience. Before the experience, they believed that a pond is
―stagnant water‖ or ―a big puddle‖ and the experience made us Finally, this pilot study revealed some technical and graphics
understand that a pond is ―a stretch of water exposed to direct or problems in the prototypes. Users suggested interesting
indirect environmental modifications‖ or ―a living organism modifications in the user interface and they pointed out the lack of
impacted by human activity‖. So users perceived impacts of their feedback. In future work, we need to improve user feedback with
actions on the pond and it died faster because of these actions. But respect to the needs of this particular audience.
users had not perceived time action on the pond, so they were not 8. DISCUSSION
able to understand that a pond was a system produces filling. This pilot study reveals the importance of the co-design process
Usability and the involvement of a multidisciplinary team. During the
preliminary analysis phase and the analysis of interactive
The results of SUS for the two prototypes show that they can be principles, this co-design process enables designers or non-experts
characterized as ―OK‖ according to [2], since average scores are to grasp different aspects of museum domain and minimal
52.5 for MISE and 51.2 for MISEP. The analysis reveals that the functions to make these aspects interactive. Then designers or
systems could be considered as ―marginal‖ but they could become non-experts have main elements to design a system and to
―acceptable‖ with some improvements. The results of the SUMI evaluate it in order to propose systems that answer museum needs
and IBM questionnaires support the previous analysis. and user satisfaction.
One interesting result about efficiency is that users spent less than Besides, this pilot study highlights that Mixed Interactive Systems
1 minute to select the first activity for the two prototypes. are complex to assess. MIS evaluations involve multiple
Nonetheless, the effectiveness of the prototypes was constrained dimensions and multiple metrics: the systems must stimulate and
by several technical issues and the lack of feedback about their engage users while enhancing the learning of a real complex
appropriate use. This point is confirmed by the error rate from the phenomenon. First we use both usability and UX evaluation
use of the systems (57% for MISEP and 42% for MISE). We plan methods to gather complete data about usability issues and users’
to improve the prototypes and further evaluate it with a larger feelings. For this last dimension, we have noticed that generally
number of users. UX evaluation method requires experts and time. We decide to
apply the SAM method as this is rather easy to use and to analyze.
User Experience
We observe that this method brings to us additional data about
The results of the SAM method (Table 1) indicate that users prototypes that we can collect with such common usability
enjoyed their experience. Results for pleasure with MISEP are evaluation methods, e.g. feelings about presence. Subsequently
relatively high, even if the score after the experiment was lower we use questionnaire to collect users’ answers about the
than the one before. This result can be explained by results for understanding of the phenomenon and we notice the complexity
dominance. It seems that users felt dominated by the system. to assess this dimension. Actually this kind of measure requires a
These results are consistent with usability results, in particular lot of questions but these questions should not let users rationalize
effectiveness, and with users’ comments about technical their experience or include some indications on answers.
problems.
Finally, the formalization of MIS can certainly go through the
Table 1 : Average results of the SAM method characterization of the systems. This can allow designers to have
some recommendations about systems and impacts of some
MISE MISEP
dimensions on the design or on users. Moreover some evaluation
Before After Before After tools can be provided to measure these dimensions if results are
pleasure 2,67 3,67 4 3,33 significant. Some studies have already been conducted to compare
some characteristics of MIS, like the impact of direct and indirect
degree of arousal 3,33 3 2,67 2 multi-touch input [20] or the effect of representation location on
dominance 4 2,33 3,67 2 interaction [18]. These studies deliver first results and contribute
to the maturation of the domain.
presence 3 2,67 4 3,33
9. CONCLUSION [10] Hornecker, E. and Stifter, M. 2006. Learning from
In this paper we have presented the design and the pilot study of Interactive Museum Installations About Interaction Design
two Mixed Interactive Systems, systems that combine digital and for Public Settings. In Proceedings Australian Computer-
physical artifacts. These prototypes have the objective to make Human Interaction Conference OZCHI'06.
accessible the eutrophication phenomenon to visitors of the [11] ISO 9241-11. 1998. Ergonomic requirements for office work
Natural History Museum of Toulouse. with visual display terminals (VDTs) -- Part 11: Guidance on
Our objective is to propose the most usable and enjoyable system usability
to museum visitors and the one enhances the learning of the [12] ISO 9241-210. 2007. Ergonomics of human-system
complex phenomenon. So we use a co-design process to better
interaction -- Part 210: Human-centred design for interactive
understand messages to deliver to museum visitors about the
eutrophication. We also associate two evaluation methods to systems
detect usability problems and users’ feelings about our prototypes: [13] Isomursu, M., Tähti, M., Väinämö, S., and Kuutti, K. 2007.
usability evaluation methods and user experience evaluation Experimental evaluation of five methods for collecting
methods. After a pilot study, our prototypes will be improved emotions in field settings with mobile applications. IJHCS:
from both these types of data and answers to our questionnaire. Volume 65 (Issue 4), 404--418.
The test protocol will also be improved based on our observations [14] Kim, MJ and Maher, ML. 2008. The Impact of Tangible
during the pilot study and on early results of this pilot study. Then
User Interfaces on Designers' Spatial Cognition',Human-
we planed user tests in laboratory to determine which system to
insert into the Museum. Computer Interaction,23:2,101 — 137.
[15] Manches, A., O'Malley, C., AND Benford, S. 2009. Physical
We also try to extract some recommendations about
Manipulation: Evaluating the Potential for Tangible
characteristics of systems, i.e. interaction space and worlds match,
Designs. In Proceedings of the 3rd International Conference
and their effects on perception and understanding of visitors from
on TEI’09 (Cambridge, UK, February 16-18, 2009).
results of user tests. These recommendations can guide future
[16] Marshall, P., 2007. Do tangible interfaces enhance learning?
designs of Mixed Interactive Systems.
In: TEI '07: Proceedings of the 1st international conference
10. REFERENCES on Tangible and embedded interaction. ACM, New York,
[1] Bach C., Salembier P. and Dubois E. 2006. Co-conception NY, USA, pp. 163-170.
d'expériences interactives augmentées dédiées aux situations [17] Norman D. A. and Draper S. W. 1986. User Centered System
muséales. In Proceedings of IHM'06 (Canada). IHM’06. Design: New Perspectives on Human computer interaction.
ACM Press, New York, NY, 11-18. 1986. Pp 31-61.
[2] Bangor, A., Kortum, P. and Miller, J. 2009. Determining [18] Price, S. Falcao, TP. Sheridan, J and Roussos, G. 2009. The
what individual SUS scores mean: Adding an adjective rating effect of representation location on interaction in a tangible
scale, Journal of Usability Studies, 4(3), 114-123. learning environment. In: Proceedings of the 3rd
[3] Bevan N. 2009. What is the difference between the purpose International Conference on TEI’09 (Cambridge, UK,
of usability and user experience evaluation methods? February 16-18, 2009). ACM Press, New York, NY, 82-92.
UXEM'09 Workshop, INTERACT 2009, (Uppsala, Sweden). [19] Rizzo, F. and Garzotto, F. 2007. "The Fire and The
[4] Brooke, J. 1996. SUS: a "quick and dirty" usability scale. In Mountain‖: Tangible and Social Interaction in a Museum
P. W. Jordan, B. Thomas, B. A. Weerdmeester & A. L. Exhibition for Children. In Proceedings of IDC '07. ACM
McClelland (eds.) Usability Evaluation in Industry. London: Press.
Taylor and Francis.
[5] Dubois E. 2009. Conception, Implémentation et Evaluation [20] Schmidt, D. Block, F. and Gellersen, H. 2009. A
Comparison of Direct and Indirect Multi-Touch Input for
de Systèmes Interactifs Mixtes : une Approche basée
Large Surfaces. In: INTERACT 2009, 12th IFIP TC13
Modèles et centrée sur l'Interaction. Habilitation à diriger des Conference in Human-Computer Interaction (Uppsala,
recherches, Université de Toulouse. Sweden August 26-28, 2009).
[6] Dubois, E., Nigay, L., Troccaz, J. 2001. Consistency in [21] Schkolne S, Ishii H, Schröder P: Immersive Design of DNA
Augmented Reality Systems, Proceedings of EHCI'2001, Molecules with a Tangible Interface. IEEE Visualization
Springer Verlag, 111-122. 2004: 227-234
[7] Fails, J., Druin, A., Guha, M. L., Chipman, G., and Simms, [22] Wagensberg, J. 2005. The ―total‖ museum, a tool for social
S. 2005. Child’s play: A comparison of desktop and physical change. História, Ciências, Saúde – Manguinhos, v.12
interactive environments. In Proceedings of Interaction (supplement), 309-21.
Design and Children (IDC’2005). Bolder, CO. [23] Wakkary, R., Hatala, M., Muise, K., Tanenbaum, K.,
[8] Hassenzahl. M. 2008. User Experience (UX): Towards an Corness, G., Mohabbati, B. and Budd, J. 2009. Kurio: a
Experiential Perspective on Product Quality. In Proceedings museum guide for families. In Proceedings of the 3rd
of IHM’08 (Metz). IHM’08. Keynote presentation. International Conference on Tangible and Embedded
Interaction, ACM Press, New York, NY, 215-222.
[9] Horn, M., Solovey, E. T. and Jacob, R.J.K. 2008. Tangible
Programming and Informal Science Learning: Making TUIs [24] Wechsung, I., Naumann, A.B., and Schleicher, R. 2008.
Work for Museums, In Proceedings of Conference on Views on Usability and User Experience: from Theory and
Interaction Design for Children (IDC’2008). Practice.
[25] Yiannoutsou, N., Papadimitriou, I., Komis, V., and Avouris,
N. 2009. "Playing with" museum exhibits: designing
educational games mediated by mobile technology. In
Proceedings of the 8th international Conference on
interaction Design and Children (Como, Italy, June 03 - 05,
2009). IDC '09. ACM, New York, NY, 230-233.
[26] http://www.rappel.qc.ca/lac/eutrophisation.html
[27] http://sumi.ucc.ie/whatis.html
Partial Matching of Garment Panel Shapes with Dynamic
Sketching Design

Shuang Liang, Rong-Hua Li, George Baciu, Eddie C.L. Chan, Dejun Zheng
Department of Computing
The Hong Kong Polytechnic University
Hung Hom, Kowloon, Hong Kong
{cssliang, csrhli, csgeorge, csclchan, csdzheng}@comp.polyu.edu.hk

ABSTRACT with a stage of technology, in addition to the existing sys-


Fashion industry and textile manufacturing in past decade, tems, but also achieving a non-rigid garment design with
have been starting to reapply enhanced intelligent CAD pro-dynamic prediction or suggestion that professional design-
ers, researchers, as well as manufactures will certainly be
cess technologies. In this paper, we propose a partial panel
interested with [14]. This requires partial recognition and
matching system to facilitate the typical garment design pro-
cess. This process provides recommendations to the designermatching of incomplete garment panel. Figure 1 illustrates
during the panel design process and performs partial match-the designer’s working scenario of garment panel design pro-
ing of the garment panel shapes. There are three main partscess.
in our partial matching system. First, we make use of a In this paper, we propose a partial matching system with
dynamic prediction and non-rigid sketching features for gar-
Bézier-based sketch regularization to pre-process the panel
sketch data. Second, we propose a set of bi-segment panel ment panel design. It is a google-alike design system, which
shape descriptors to describe and enrich the local featuresdynamically returns the possible solutions along with the
of the shape for partial matching. Finally, based on our shape drawing process. Our experiment results show that
previous work, we add an interactive sketching input envi- the proposed system provides an effective and efficient gar-
ronment to design garments. Experiment results show the ment design platform with partial panel matching.
effectiveness and efficiency of the proposed system. The rest of paper is organized as following. Section 2
presents related work of shape matching. Section 3 describes
the framework of proposed panel design system. Section 4, 5
Categories and Subject Descriptors and 6 present the methodology of sketch data regularization,
H.3.3 [Information Search and Retrieval]: Query for- panel shape representation and partial matching algorithm
mulation, search process; H.5.2 [Information Interfaces respectively. Section 7 and 8 give the experimental setup,
and Presentation]: User Interfaces; I.3.5 [Computer Graph- performance evaluations, and discussions. Finally, section 9
ics]: Computational Geometry and Object Modeling. concludes our work.

General Terms 2. RELATED WORK


In this section, we look at the related work of shape match-
Algorithms, Design, Experimentation ing algorithms. Shape matching has been well investigated
in computer vision in the last few decades [1][2][5][8][9][10][12]
1. INTRODUCTION [15]. The partial matching problem we meet in garment
In recent years, garment computer-aided design systems design process falls into partial-complete matching (PCM).
have been rapidly developed and have become the basis for PCM refers to finding the shapes which contain a part that
the clothing design process. Typical commercial design plat- is similar to the query shape, i.e., matching some part of
forms are built towards rigid shape generation and editing. shape B as fit as possible to the complete shape A.
There are also some systems [4][11][13] incorporate sketching Partial shape matching is the key technique for developing
interface to provide designing flexibility and freedom. partial matching system for garment design process which is
However, there are requirements to complete sketch input the task of matching sub-parts or regions. These parts are
in order to operate, which may result in time and efficiency not predefined to be matched which can be any sub-shape of
lost throughout the designing process. This may be solved a larger part in any orientation or scale. Many local shape
descriptors were presented to deal with PCM problems.
Ozcan et al. [8] used genetic algorithm to perform par-
tial matching based on attributed strings. Their approach
Permission to make digital or hard copies of all or part of this work for claimed to be fast, but they cannot guarantee to obtain the
personal or classroom use is granted without fee provided that copies are optimal result. Berretti et al. [1] proposed a local shape
not made or distributed for profit or commercial advantage and that copies descriptor which partitions a shape into tokens and repre-
bear this notice and the full citation on the first page. To copy otherwise, to sents each token by a set of perceptually salient attributes
republish, to post on servers or to redistribute to lists, requires prior specific with orientation and curvature information. But the above
permission and/or a fee.
Augmented Human Conference April 2-3, 2010, Megève, France. method only considered geometric features while neglected
Copyright ⃝ c 2010 ACM 978-1-60558-825-4/10/04 ...$10.00. topological features. Tanase et al. [10] and Chen et al. [1]
(a) Garment design workspace. (b) Sketching interface. (c) Designed panel shape.

Figure 1: Illustration of working scenario for garment panel design process.

used turning function of two polylines and distance across


the shape (DAS) to represent local shape information. But
it may not be suitable for garment panel shapes. Chi et
al. [3] proposed a primitive based descriptor according to
the law of Gestalt theory, which is effective and efficient in
partial object retrieval in cluttered environment. However,
their method only considers two types of primitives of arc
and line and could not represent complex shapes.
However, these PCM algorithms are mostly based on static
geometric information, but fail making use of the dynamic
features from shape generation process.
Figure 2: Framework of online garment panel design
system.
3. OVERVIEW OF GARMENT PANEL DE-
SIGN SYSTEM
The dynamic garment panel design process is a real-time pre-collected panel shape database with bi-segment shape
interaction between user and the computer. The efficient, descriptors.
flexible, natural and convenient shape generation process is In the partial matching module, a partial matching algo-
the ultimate goal of the system. To this end, we propose rithm is developed to calculate the similarity for dynamic
a partial matching system with sketching interface which panel shape design. This similarity matching is asymmet-
returns possible panel shape solutions dynamically in the ric and partially completed among panel candidates in the
process of garment design. database. There are generally three cases. First, if the in-
The proposed system is in three phases. First, we provide complete shape is part of a candidate panel, they are con-
a freehand sketching interface to support the non-rigid, flexi- sidered highly similar because the partial input shape can
ble and natural design behavior of generating garment panel be completed later. Second, if the incomplete shape con-
shapes. Second, a prior knowledge-based panel database is tains certain components that do not exist in the candidate
pre-collected to support the partial matching process. Third, panel or the incomplete shape contains more components
the proposed system makes use of our proposed partial match- than the candidate panel, the candidate panel is considered
ing algorithm to return the matched panel shape solutions not to conform to the user’s intention and the similarity
dynamically with the partial sketch input based on domain value should be very low no matter how similar the corre-
knowledge. sponding parts are. Finally, if the incomplete shape is part
The framework of our online garment panel design sys- of two or more candidate panels, the candidate panel with
tem is shown in Figure 2, which mainly contains three key the fewest components will have the highest similarity.
modules: sketch regularization, panel feature extraction and
partial matching. 4. BÉZIER-BASED SKETCH REGULARIZA-
The sketch regularization module aims to build a natu-
ral and flexible panel design system, which is compatible TION
with different numbers of input strokes and drawing orders. In our system, we perform a primitive-based sketch regu-
The diversity of the original sketch data makes it difficult larization to preprocess the sketch data, which involves two
to enumerate all possible stroke combinations in both spa- steps: stroke segmentation and primitive fitting, as shown
tial and temporal dimensions. After the raw sketch data is in Figure 3. The raw sketch data varies by user behaviors,
captured by the input device, we regularize the sketch data drawing informality and ambiguity, which must be regular-
into approximated primitive shapes into lines and curves. ized consistently to have a standard representation.
The feature extraction module then extracts both topo- First, we decompose the original strokes into geometric
logical and geometric features from the regularized sketch- primitive shapes with lines and curves based on pen speed
ing panels. These regularized panels are established by a and curvature information [6]. We monitor user’s drawing
Algorithm 1. Bézier-based sketch regularization
Input: A sketch segment S = {(x1 , y1 ), · · · , (xn , yn )}
Output: A Bézier curve {(x′1 , y1′ ), · · · , (x′n , yn′ )}
Step1: Estimate t.
We calculate the curve length at each point by

 0, i=1
∑i √
ci =
 (xk − xk−1 )2 + (yk − yk−1 )2 , 2≤i≤n
Figure 3: Sketch regularization procedure. (a) Raw k=2
sketch input data; (b) Stroke segmentation results;
(c) Primitive fitting results. Then, we estimate t by

ti = ∑
n √
ci
,1 ≤ i ≤ n
(xk −xk−1 )2 +(yk −yk−1 )2
activity and split strokes where pen speed reaches the min- k=2

imal value or where variety of curvatures occurs evidently.


Since drawings can be composed of a group of basic primitive Step2: Estimate control points.
shapes (lines and curves) in a fixed way, stroke segmenta- We estimate the X-coordinate of control points by
tion can therefore reduce sketch diversity and computational minimizing the following non-linear objective function:
complexity. ( )
Second, we make use of lines and Bézier form to smooth ∑
n ∑
4
4
min [xk − Pi (1 − tk )4−i tik ]2
the segments. Quadratic and cubic Bézier curves are most Pi ,i=0,1,··· ,4 k=1 i=0 i
common; higher degree curves are more expensive to evalu-
ate. We choose the fourth-order of Bézier form to smooth where xk denotes the X-coordinate of the point, Pi
a curve. Thereby, we can reduce computational complexity, denotes the X-coordinate of the control points,
while approximating most curves in panel shapes. Figure 4 tk , 1 ≤ k ≤ n can be calculated from step 1. The objective
illustrates some complex curves contained in panel shapes. function can be solved by Levenberg-Marquardt algorithm
As we can see, a second-order Bézier is able to fit the curve [7]. Obviously, the Y-coordinate of the control point can be
from front panel shape in Figure 4(a), while it is not capable calculated in the same way.
of approximating the curve from sleeve panel in Figure 4(b). Step3: Compute the output points.
We calculate the X-coordinate of the output points by
( )

4
4
x′k = Pi (1 − tk )4−i tik , 1 ≤ k ≤ n
i=0 i

Then, a rounding operator is performed to obtain the


integer coordinate. Likewise, the Y-coordinate can
be obtained in the same way.

A good local shape descriptor for partial panel shape match-


ing should satisfy the following requirements: 1) the descrip-
Figure 4: Illustration of Bézier approximation of tor should encode rich local features of the shape, which
garment panels. (a) Front panel shape; (b) Sleeve means the local descriptor should describe the shape cor-
panel shape. rectly and possess strong discriminative ability to differen-
tiate various local shapes; 2) the local shape descriptor rep-
There are basically three major steps to approximate a resentation should be conveniently processed by the subse-
curve into Bézier form. Assume a segment S is represented quent partial matching algorithm; 3) the features contained
by a concrete point set S = (x1 , y1 ), (x2 , y2 ), ..., (xn , yn ). in shape descriptor should accommodate scaling, translation
The two end points are regarded as two control points, and and rotation invariance to find similar panel shapes; 4) the
the objective of the Bézier form is to estimate the other features contained in shape descriptor should not be influ-
three control points. The detailed Bézier form procedure is enced greatly by user’s drawing styles. Therefore, we first
described in Algorithm 1. divide a panel shape into several segments according to ver-
texes of the panel. We then build a bi-segment model to
represent the shape by a sequence of bi-segments and their
encoding attributes.
5. BI-SEGMENT PANEL SHAPE DESCRIP- 5.1 Topological Descriptors
TORS For topological shape descriptors, we consider the topolog-
In this section, we propose a bi-segment panel descriptor ical relations that encode the type of the segments/primitives.
to represent the panel shape content. The bi-segment fea- In this paper we call it as binary topological relation. We
ture descriptor consists of geometric features of a pair of define the topological relations between line and curve prim-
connected segments together with the topological relation itives as:
between them. Definition 1 (Binary topological relation): Assume
the primitive set is of type ΣT = {Tline , Tcurve }, then the bi-
nary topological relation R between two adjacent primitives
P1 and P2 could be specified
 as:
 Rl,l , if P1T = P2T = Tline
R(P1 , P2 ) = Rl,c , if P1T ̸= P2T

Rc,c , if P1T = P2T = Tcurve
Figure 5 shows the three types of binary topological re-
lation between panel primitives. As we could see, these
topological shape descriptors consider relations of these two
primitives and reflect the structural characteristics of the
shape. Figure 6: Illustration of turning angle vectors in a
two curves case.

B is a (2i + 3)-tuple with i sample points on each segment:


B = (R, A, δ, θ (1) , θ (2) )
where R is the binary topological relation of the two edges
that connecting at vertex, A is the inner angle between the
two edges, δ is the primitive length ratio, θ (1) and θ (2) are
Figure 5: Types of binary topological relations. turning angle vectors of the two edges respectively and each
turning angle vector consists of i elements.
With the definition of bi-segment model, a panel shape
5.2 Geometric Descriptors P could be represented by an ordered bi-segment sequence.
For geometric features, we consider four types of shape de- We call it as the bi-segment panel descriptor in this paper.
scriptors derived from the vertex and its adjacent primitives More specifically, a n bi-segments shape is described by P =
of the bi-segment data, including inner angle of the vertex, (B1 , B2 , ..., Bn ), where Bi is the bi-segment model as defined
ratio of primitive lengths, and turning angles. The detailed above. Figure 7 shows a panel shape that is represented by
description of these four descriptors is given as follows: a sequence of bi-segments.
(1) Inner Angle (A): The inner angle formed by the two
primitives at the vertex.
B1 B9 B7
(2) Primitive Ratio (δ): The ratio between the lengths of
the two connecting primitives/edges. Note that the ratio is B2 B6
calculated by the length of the shorter edge divided by the
B8
longer edge to achieve rotation invariant. B5
(3) Turning Angle Vector 1 (θ (1) ): The first turning angle
vector consists of angles between the tangent at the vertex
and each corresponding sample point on edge 1. B3 B4
(4) Turning Angle Vector 2 (θ (2) ): The second turning
angle vector consists of angles between the tangent at the Figure 7: Bi-segment representation of panel shape
vertex and each corresponding sample point on edge 2. P = (B1 , B2 , ..., B9 ).
In followings, we particularly explain the calculation of
turning angle vectors. Figure 6 illustrates turning angle vec-
tors in two curves where the types of the connecting primi- The advantages of this bi-segment panel descriptor are
tives are curves. twofold: 1) this descriptor is capable of capturing both topo-
Turning Angle Vector 1 and Turning Angle Vector 2 could logical features that reflecting structural properties and ge-
(1) (1) (1) (1)
be represented as θ (1) = (θ1 , θ2 , θ3 , θ4 ) and θ (2) = ometric features that describing shape information which is
(2) (2) (2) (2) closed to human’s visual perception of local panel shapes;
(θ1 , θ2 , θ3 , θ4 ) respectively, where θi denotes each inner
2) this descriptor is easily processed by the partial shape
angle between the tangent of the vertex (blue dot) and the
matching task, as it is only in the form of numerical value.
tangent of the ith sample point (red dot) shown in the Figure
Therefore, by introducing the bi-segment panel shape de-
6. Here, the sample points are generated by the equidistant
scriptor, we could reach a more comprehensive and efficient
sampling along the curves. We sample four points along
representation of panel shapes and features. In next sec-
each curve in Figure 6. Note that the number of the sample
tion, we make use of the bi-segment panel descriptor to the
points can be different in various shapes. Apparently, if the
subsequent partial shape matching algorithm
primitive is a line, all the turning angles are zero.

5.3 Bi-segment Panel Model


Panel shape is typically a closed shape that could be de- 6. BI-SEGMENTS PARTIAL MATCHING
composed into lines and curves. Therefore, we model the In this section, we propose a partial shape matching algo-
panel shape by a Bi-segment sequence with its correspond- rithm for garment panel design. We present the similarity
ing intrinsic characteristics. By combining both topological measurement between two bi-segments.
and geometric features together, we get the definition of Bi- Assume bi-segment B1 = (R1 , A1 , δ1 , α(1) , α(2) ) and B2 =
segment model as: (R2 , A2 , δ2 , β (1) , β (2) ), the dissimilarity between B1 and B2
Definition 2 (Bi-segment model): A bi-segment model is defined in Equation 1:
dis(B1 , B2 ) = w1 λ(R1 , R2 ) + w2 f (A1 , A2 )
(1)
+w3 g(δ1 , δ2 ) + w4 h(α, β)
where wi , (1 ≤ i ≤ 4) denotes the weight coefficient, and
{
0, if R1 = R2
λ(R1 , R2 ) = (2)
1, otherwise

f (A1 , A2 ) = |A1 − A2 | (3)


g(δ1 , δ2 ) = |δ1 − δ2 | (4)

s
(1) (1) (2) (2)
h(α, β) = min{ ((αi − βi )2 + (αi − βi )2 ),
i=1
(5) Figure 8: Samples collected in panel shape database.
∑s
(1) (2) (2) (1)
((αi − βi )2 + (αi − βi )2 )}
i=1

where s is the number of sample points. Note h(α, β) de- Obviously, when more items are returned, recall will be in-
notes the minimum calibration distance between two turning creased but precision will be decreased. In a recall/precision
angle sets of primitives. graph, a higher curve signifies a higher recall/precision value.
The similarity between two bi-segments can be derived We will further describe and analyze the result in the next
from the above dissimilarity distance as follows: section.

{
0, if dis(B1 , B2 ) > σ 8. RESULTS AND ANALYSIS
sim(B1 , B2 ) =
1− dis(B1 ,B2 )
σ
, otherwise In this section we evaluate the effectiveness of our pro-
(6) posed partial matching in the performance of recall and pre-
where σ is a threshold introduced from our experiments cision rate. In section 8.1, we compare the recall rate with
to normalize similarity and make it fall onto the range of partial shapes varying from one to four bi-segments. In sec-
[0,1]. tion 8.2, we compare the precision rate with four different
descriptors, attributed strings descriptor [8], geometric de-
7. EXPERIMENTAL SETUP scriptor, topological descriptor and our proposed bi-segment
descriptor. In section 8.3, we test the response time of our
We carry out a partial matching experiment to verify the
partial matching garment design system.
effectiveness and efficiency of our proposed method. We con-
ducted our experiment with 200 panel shapes, which are col-
lected by ten experienced panel designers. The machine we
8.1 Recall Rate
used in our experiment was a HP TouchSmart with AMD Figure 9 shows the relationship of number of retrieved
Turion X2 RM-74(2.2GHz) CPU, 2GB memory. We im- shapes to recall rate using partial shapes with different num-
plemented our system using Visual C++ in the Microsoft ber of bi-segments. As we can see, along with the completion
Windows Vista operating system environment. of the drawing process, the partial matching recall rate in-
The experiment was first carried out by collecting sketch creases gradually. The more complete the input panel shape
samples with traditional CAD software. As mentioned, we is, the clearer the user’s intention is expressed. Perhaps,
invited garment designers to freehand sketch the usual, com- the most important point to be stated is a high recall rate
mon and standard panel shapes. We collected 20 samples of partial matching system returns the wanted shapes. We
sketches from each designer. A panel database with 200 could see when four bi-segments all nonetheless always out-
panel shapes is successfully established. Figure 8 shows some perform one bi-segment at every setting. While returning 20
panel shape examples in our database. shapes, an input partial shape with four bi-segments could
Second, we need to build up a feature database based on achieve 90% recall rate.
panel database. As can be seen in Figure 8, different panel
shapes have different features. We apply our proposed bi- 8.2 Precision Rate
segment shape descriptors to extract features from panel Figure 10 depicts the relationship between the precision
shapes. We sample 5 points on each segment in our ex- of recall rate using four different descriptors of attributed
periment, and get a 13-dimension numerical vector for each strings, geometric descriptor, topological descriptor and our
bi-segment. proposed bi-segment descriptor. We average the result and
Finally, we evaluate the effectiveness of our proposed par- plot these four curves with different features used in partial
tial matching. Similar to the main performance metrics of matching. As can be seen in Figure 10, attribute strings
interest for general information retrieval, we test the effec- descriptor has only 73% precision. Due to neglecting of the
tiveness of partial matching by recall and precision rate. Re- topological characteristics, it cannot fully express the panel
call is defined as the ratio between the number of relevant content. Clearly, our proposed bi-segment descriptor has the
returned shapes and the total number of relevant shapes, best performance and achieves in average 20% more preci-
while precision is defined as the ratio between the number sion than all the other three descriptors.
of relevant returned shapes and the total number of returned
shapes. 8.3 Response Time
matching accuracy of the proposed method.
1
One bi−segment For future work, it is promising to extend the application
0.9 Two bi−segments domains of our shape descriptor, i.e., mechanical drawing.
Three bi−segments
0.8 Four bi−segments
0.7
10. ACKNOWLEDGEMENT
This work is supported by the Research Grants Council of
0.6
Recall

the Hong Kong Special Administrative Region, under RGC


0.5 Earmarked Grants (Project No. G-U432).
0.4

0.3
11. REFERENCES
[1] S. Berretti, A. D. Bimbo, and P. Pala. Retrieval by
0.2
shape similarity with perceptual distance and effective
0.1 indexing. IEEE TRANSACTIONS ON
5 10 15 20
Numbers of retrieved shapes MULTIMEDIA, 2(4), 2000.
[2] L. Chen, R. Feris, and M. Turk. Efficient partial shape
Figure 9: Performance of partial matching system matching using smith-waterman algorithm. 2008.
with increasing integrity of input shapes. [3] Y. Chi and M. Leung. Part-based object retrieval in
cluttered environment. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 29(5):890–895,
1
Bi−segment descriptor
2007.
0.95 Attributed strings [4] P. Decaudin, D. Julius, J. Wither, L. Boissieux,
Geometric descriptor
0.9 Topological descriptor A. Sheffer, and M.-P. Cani. Virtual garments: A fully
geometric approach for clothing design. 25, 2006.
0.85
[5] L. J. Lateck, V. Megalooikonomou, QiangWang, and
0.8
D. Yu. An elastic partial shape matching technique.
Precision

0.75 Pattern Recognition, 40:3069–3080, 2007.


0.7 [6] S. Liang and Z. Sun. Sketch retrieval and relevance
0.65
feedback with biased svm classification. Pattern
Recognition Letters, 29(12):1733–1741, 2008.
0.6
[7] D. Marquardt. An algorithm for least-squares
0.55
estimation of nonlinear parameters. SIAM J. Appl.
0.5
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Math., 11(2):431–441, 1963.
Recall [8] E. Ozcan and C. K. Mohan. Partial shape matching
using genetic algorithms. Pattern Recognition Letters,
Figure 10: Precision-Recall graph of partial match- 18(10), 1997.
ing systems. [9] E. Saber, Y. Xu, and A. M. Tekalp. Partial shape
recognition by sub-matrix matching for partial
matching guided image labeling. Pattern Recognition,
Finally, the time cost used in partial matching process is 38:1560–1573, 2005.
also a major evaluation factor for real-time interactive per- [10] M. Tanase and R. C. Veltkamp. Part-based shape
formance. The response time should be as small as possible retrieval. 2005.
and generally it should be less than a second. In our experi- [11] E. Turquin, J. Wither, L. Boissieux, M.-P. Cani, and
ments, the average response time of our proposed descriptor J. F. Hughes. A sketch-based interface for clothing
is about 31.2 ms for partial matching. The performance of virtual characters. IEEE Computer Graphics and
our proposed descriptor sufficiently fulfills the time require- Applications, 27(1):72–81, 2007.
ment for real-time interaction. [12] R. C. Veltkamp and M. Hagedoorn. State of the art in
shape matching. Principles of visual information
9. CONCLUSIONS retrieval, 2000.
In this paper we present a dynamic partial matching sys- [13] C. C. L. Wang, Y. Wang, and M. M. F. Yuen. Feature
tem for garment panel shape design with interactive sketch- based 3d garment design through 2d sketches.
ing input. First, we provide an intelligent panel sketch Computer-Aided Design, 35(7):659–672, 2002.
processing based on automatic referential search and fea- [14] J. Wang, G. Lu, W. Li, L. Chen, and Y. Sakaguti.
ture matching. Second, we propose to represent the gar- Interactive 3d garment design with constrained
ment panel shape with a bi-segment local shape descriptor contour curves and style curves. Computer-Aided
that incorporates both topological and geometric features. Design, 41:614–625, 2009.
This bi-segment shape descriptor satisfies rotation, scaling, [15] D. Zhang and G. Lu. Review of shape representation
and translation invariance. Third, a partial matching algo- andd escription techniques. Pattern Recognition,
rithm is presented to solve the panel matching problem with 37:1–19, 2004.
low computational complexity. Finally, we conduct experi-
ments based on our pre-collected panel database to evaluate
the proposed approach. Experiments show the encouraging
Fur Interface with Bristling Effect Induced by Vibration
Masahiro Furukawa Yuji Uema, Maki Sugimoto, Masahiko Inami
The University of Electro-Communications Graduate School of Media Design, Keio University
1-18-14, Chofugaoka, Chofu, Tokyo, JAPAN 4-1-1 Hiyoshi Kohoku-ku Yokohama-city, Kanagawa, JAPAN
furukawa@hi.mce.uec.ac.jp {uema, sugimoto, inami} @kmd.keio.ac.jp

ABSTRACT
Wearable computing technology is one of the methods that can
augment the information processing ability of humans. However,
in this area, a soft surface is often necessary to maximize the
comfort and practicality of such wearable devices. Thus in this
paper, we propose a soft surface material, with an organic
bristling effect achieved through mechanical vibration, as a new
user interface. We have used fur in order to exhibit the visually
rich transformation induced by the bristling effect while also
achieving the full tactile experience and benefits of soft materials.
Our method needs only a layer of fur and simple vibration motors. Figure 1. Soft and Flexible User Interface suitable for
The hairs of fur instantly bristle with only horizontal mechanical Wearable Computing Inspired by Hair Erection.
vibration. The vibration is provided by a simple vibration motor
embedded below the fur material. This technology has significant (a)
potential as garment textiles or to be utilized as a general soft
user interface.

Categories and Subject Descriptors


H.5.2 [Information Interfaces and Presentation]: User
Interfaces – Haptic I/O, Input devices and strategies, Interaction (b)
styles.

General Terms
Soft User Interface, Pet Robot, Visual and Haptic Design,
Computational Fashion

Figure 2. Bristling Effect with Horizontal Vibration


Keywords Provided from Vibration Motor.
Physical Computer Interfaces, Computational Clothing.
also been many improvements with output devices. Head
1. Introduction mounted displays are often used to provide textural information
The important part of wearable technology is its role as an [1], while vibration motors have mainly been used to provide
interface between information devices and humans [1]. There non-textural information [6]. The first advantage of a vibration
have been many improvements in terms of wearable input motor is its small size, making it easy to embed in garments
devices. A lot of research done on them has focused on the [7][21]. This feature makes it possible to keep the unique
context and physical characteristics of the user interface, such as textures of the garments, fully maintaining their tactile and
the material and texture. However, improvements to the physical visual characteristics, while simultaneously making this output
aspects of user interfaces have usually involved only the device wearable. By combining the physical features of both fur
reduction of thickness and greater flexibility [4][5]. There have and these vibration motors, we develop an interface that can be
used as both an input and output device.
Permission to make digital or hard copies of all or part of this work for For example, a cat’s body is covered in a coat of fur, which he
personal or classroom use is granted without fee provided that copies are uses not only to maintain his body temperature but also to
not made or distributed for profit or commercial advantage and that copies express his affection through bristling his fur [23] as shown in
bear this notice and the full citation on the first page. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific Figure 1 (a). The hair erection of chimpanzees is also known to
permission and/or a fee. have a social role as a means of visual communication [24]. Thus
Augmented Human Conference, April 2-3, 2010, Megève, France. the soft body hair of these animals performs their equally
Copyright © 2010 ACM 978-1-60558-825-4/10/04…$10.00. important role as output interfaces. Accordingly, it is important
to create a user interface that has a soft and flexible surface, shape as if it is breathing and this is controlled by a pressure fan
while also exhibiting both visual and tactile messages. within it. There is also an incandescent lamp underneath the fur,
whose luminescence captures the people’s attention in public.
1.1 Purpose of Research Together with its soft tactile surface, it would encourage people
to touch Tabby. Thus, we can suppose that an artificial fur
In this paper, we propose the novel texture control method, using
surface plays an important role in communication.
natural fur as a wearable interface. This method based on the
bristling effect allows the texture of natural fur to change Soft and furry materials are also known to provide psychological
suddenly as shown in Figure 2. This technology, which needs comfort [8]. For example, Wada proposed the seal-shaped
only a simple vibration motor and furry material, also has the Mental Commitment Robot that is covered by artificial fur, in
additional benefit of being a soft and flexible user interface. order to stabilize the mental state of the mentally ill and to
reduce the burden of workers who serve in care facilities [19].
The artificial fur of this robot gives a psychological sense of
1.2 Contributions cuteness and comfort to people. Moreover, Steve Yohaman also
The method with bristling effect by horizontal vibration has the
proposed the Haptic Creature Project, which involves a rabbit-
following advantages:
shaped robot covered by artificial fur, similar to the Mental
 This method only requires commonly available materials Commitment Robot [17].
and actuators.
As mentioned above, artificial fur can acts as a soft tactile
 This method can be easily applied to clothing materials, interface and also provide haptic comfort. Moreover, it is
making it compatible as wearable computer technology. possible to give the user an impression of holding a living
 This method can be used on the surface material of Pet mammal using haptics. For example, Hashimoto worked on an
robots, providing a natural and believable platform to Emotional Touch project, which has voice coils to control the air
express the internal state of these robots pressure between one’s hands and the voice coils in real time,
 This method can be used on the soft surface portions of and thus providing a novel haptic impression of holding a living
information devices including those found on common animal [20]. However, this project does not provide changing
household furniture such as couches and cushions that are visual effects, which the artificial fur can.
connected to home electronics.
Other than the merits of haptic and visual feedbacks, the furry
material also has another visual effect that makes use of the
2. Related Work changing attitude of each hair. Thus furry material has many hair
Previous related works on soft and flexible wearable interfaces
on the base, the appearance changing occurs when the attitude of
are described as the following.
hair changes. Thus, this feature has the ability to provide
information like event reporting, status changing, with remains
2.1 Material for Garment the surface soft. The technologies that aim to control this attitude
One of these wearable interfaces makes use of visual changes in
of hair are described as the following.
the surface itself in order to provide information to the user. For
example, Wakita proposed Fabcell, which is a novel type of
garment [9]. Fabcell is a non-luminous garment material which is 2.3 Static Electricity with Dense Fur
made of conductive yarn inter-woven with fibers dyed with liquid The electrostatic effect is known to affect the “standing” or
crystal ink. When a voltage is applied to the fabric, there would inclination angle of furs, and thus changing the shape of the furs.
be a change in temperature, which in turn changes the color of The Van de Graaff generator is used to generate static electricity,
the fabric. In another similar project, the “Huggy Pajama”, they which makes one’s hair stand up when the person touches the
also proposed a wearable display with color changing properties electrode of this generator. Thus, it is possible to apply this same
[10]. principle in order to control the inclination of the furs and
produce a bristling effect. However, the Van de Graaff generator
These works are in some ways similar to our system as these is relatively large and so is not very portable for users.
interfaces are also non-luminous and change visually to convey
information. In addition, they are also soft, flexible and wearable. Circuits of high voltage also produce static electricity and are
Nevertheless, the difference is their visual changes are much smaller in size than the Van de Graaff generator. Philips
slower. This is so because the speed of molecular changes caused Electronics has patented this method known as the Fabric
by temperature changes is much slower than mechanical changes. Display [11]. Nevertheless, there is the possibility of an electric
Examples of interfaces that work based on mechanical changes shock when the user comes in direct contact with the electrodes,
are described below. which makes it dangerous to be used.

2.2 Artificial Fur 2.4 Electromagnetic Forces, Shape-Memory


In this work, “Tabby” by Ueki, artificial fur is used as a tactile Alloy with Sparse Fur
interface that gives users a soft feeling. Tabby is a lamp that has On the other hand, there are works that are based on controlling
an animal like soft body covered by this artificial fur and it aims the inclination angle of individual hair, for example, Raffle’s
to stimulate communication between people within a public Super Cilia Skin [13]. This method uses electromagnetic forces
space [16]. Tabby’s artificial fur has a form that changes its to control the inclination of stick-like protrusions, which have
permanent magnets fixed under them. These protrusions are
distributed on an elastic membrane and are controlled by the
electric magnet array arranged below the elastic membrane. The
distribution density of these protrusions is much lesser than the
density of hair on animal fur. Thus the appearance of the surface
using this method still differs from that of animal fur.
In another work, Kusiyama proposed the Fur-fly, which involves
a servomotor that controls the inclination angle of each batch of
artificial feathers [12]. These artificial feathers can provide a soft,
haptics interface but as the servomotor used is relatively complex
and large in structure, it is not possible to control this interface in
a finer and more precise resolution. Figure 3. Bristling Effect
with Arrector Pili Muscle.
Lastly, shape-memory alloy (SMA) is known to change shape
through electrical controls rather than mechanical controls and
previous works that use this SMA actuation device include
Sprout I/O by Coelho [14] and Shutter [15]. Both works are types
of kinetic textures that use wool yam and felt, with SMAs
attached onto them. The latter not only has physical shutters that
are made by felt with attached SMA, the shutters also form a
matrix that can display characters through its cast shadow. Thus
the matrix can function like a dot matrix display to provide text Direction of Internal Weight Rotation
visuals. Nonetheless, SMA has the issue of having a relatively
lower response speed and it also moves slowly. Figure 4. Disk-shaped Vibration Motor
To Provide Horizontal Vibration.
3. Prototyping and Implementation Figure 5 (b). Opossum’s natural fur and the FM34F 1 disk-shaped
Arrector pili muscle is known to make body hair bristle up [22]. vibration motor from T.P.C. are used in this system.
Figure 3 shows the process of the bristling effect and the
positional relations of body hair, epitheca, arrector pili muscle The bristling effect is realized in the following procedure. First,
and hair root. The arrector pili muscle is a muscle located around the body hair is stroked with one’s hand in order to be
the hair root and its relaxed, initial state is shown in Figure 3 (a). compressed as shown in Figure 5 (a). The appearance of this
As it contracts by parasympathetic activation, traction power is state is as shown in Figure 2 (a). This state remains as such
generated that makes the body hair bristles up as shown in under the condition of no external force being exerted. Secondly,
Figure 3 (b). the body hair bristles up when mechanical vibration, generated
from the vibration motor, is applied to the fur as shown in Figure
However, industrially available natural fur does not have 5 (b). The appearance of this state is as shown in Figure 2 (b).
functioning arrector pili muscles. Thus an alternative method is Standard voltage 3.0[V] is used to drive the motor. The direction
necessary to make body hair bristle. And we managed to find that of vibration depends on the direction of the internal weight
vibration is useful as an alternative method for bristling effect. rotation as shown in Figure 4, and is parallel to the plane surface
The vibration motor, which is typically embedded in a mobile of the epitheca. Thus this mechanical vibration acts like the
phone, is inexpensive and also safe electrically. Thus disk- action of an arrector pili muscle.
shaped vibration motors were attached to the reverse side of the
epitheca of natural fur shown in Figure 4 and were then supplied 3.1 Selection of Material
with current to generate vibration. The size of this natural fur is This bristling effect is not observed in all types of animal fur. In
approximately 2cm in width and 25cm in length. From our test order to ensure that the conditions of the test are consistent
results, bristling effect was observed with several kinds of throughout, a fur that would always bristle consistently have to
natural fur. The bristling effect is shown in Figure 2. Figure 2 (a) be selected. Thus, a test is conducted in order to select a material
shows the initial state of the fur while Figure 2 (b) shows the which can produce this bristling effect consistently.
appearance of the fur after the bristling effect had occurred. As
Figure 2 shows, this bristling effect causes a visual change, Needless to say, artificial fur is a good choice because it can be
which is apparent. Details of this effect are described as the mass produced, readily available, and has uniform characteristics.
following. Additionally, using artificial fur in place of natural fur is more
desirable from the point of view of animal protection.
A prototype of the mechanism that produces the bristling effect is Unfortunately, results from our preliminary study showed that
consisted of natural fur and a disk-shaped vibration motor as artificial fur was not able to produce consistent bristling effects.
shown in Figure 5(a). The bristling effect, which is described in
this paper, is such that the hair of the natural fur would stand up
when the fur is vibrated using the vibration motor as shown in 1
Specification of Vibration Motor (T.P.C FM34F): Standard
Voltage 3.0V / Standard Speed 13,000rpm / Standard Current
100mA or less / Vibration Quantity 1.8G
only natural fur that has a high repeatability of bristling. When
the vibration is provided by one motor, approximately 10 cm of
the strip of natural fur is bristling.
It can be supposed this bristling effect that we have found is due
to the mechanical structure of the natural fur. Thus, the thickness
of the epitheca, the diameter and length of the body hair are
measured in order to reveal the mechanical characteristics. The
diameter is measured in micrometers. The result is as shown in
Table 1. The values in the table are taken as the average of 10
Figure 5. State Transition of Bristling Effect.
readings. There is no value for the diameter of mink’s body hair
Thus we focus on using natural fur. The results of the tests on as it is too small to be measured.
natural furs and the response time of the natural fur’s bristling The result shows that the epitheca of opossum is less thick than
effect are described below. that of the other 3 animals. In addition, although the Tibetan
The natural furs that are used for estimation are shown in Figure lamb has the greatest hair length, the hair is all bundled together
6. This estimation includes 4 materials which have relatively as shown in Figure 5 (d), which makes it more difficult for the
uniform hair type. The appearances are shown in Figure 6, and bundled hair to “stand” up. On the other hand, the hair of the
the these furs are from (a) an opossum, (b) a rabbit, (c) a mink opossum and the rabbit are not bundled, which makes it easier
and (d) a Tibetan lamb. for their hair to “stand” up. Therefore, we can suppose that the
thin epitheca and non-bundled fur of the opossum produces the
highest repeatability of the bristling effect.

3.2 Response time of Bristling Effect


One of the characteristics of the bristling effect that we have
found is the bristling speed. In terms of mechanical engineering,
mechanical findings are necessary for technological application.
Thus, as basic findings, the response time of this bristling effect
is measured. This response time is the duration of transition from
the initial state as shown in Figure 5 (a) to the bristling state as
shown in Figure 5 (b). Thus this response time can also be
defined as the duration from the start time of supplying voltage
to the vibration motor, and to the time of the hair reaching a
stable state just after the bristling effect.

Figure 6. Natural Fur Used for Experiment


3.2.1 Experimental Setup
The response time was measured by digital high-speed camera
Table 1. Mechanical Characteristics and Responses Casio EXILIM EX-F1. Experimental setup is as shown in Figure
7 and Figure 8. A matrix and an array of LEDs are set up behind
Material Thickness Diameter Length R the fur material. The matrix is used to calibrate displacements of
(a) Opossum 0.33 0.028 48 O the featured points, whereby each pixel is 5mm x 5mm in size.
As for the LEDs array, one LED would blink after another in
(b) Rabbit 0.53 0.026 36 X sequence at intervals of 10ms in order to confirm the frame rate
(c) Mink 0.78 - 7 X of the captured video. The LEDs array was controlled with
Arduino Duemilanove (ATmega168) and they would start to
(d) Tibetan lamb 0.65 0.047 55 X blink when a voltage across the vibration motor is detected. An
Unit: [mm] incandescent lamp was also used as the lighting. As indicated
above, standard 3.0V voltage was used and one disk shaped
A test on the reproducibility of the bristling effect is conducted in vibration motor was attached to the reverse side of the fur
the following procedure, which is similar to the previous one. material, which was put on the velour (Figure 7).
First, one disk shaped vibration motor was attached to the
reverse side of the epitheca fur with double-stick tape as shown 3.2.2 Measurement Procedure
in Figure 5 (a). Next, the fur was put on velour and the body hair 5 recordings were conducted under the same conditions and for
was stroked with one’s hand in order to be compressed as shown each recording, it would start to record before the vibration motor
in Figure 5 (a). Then a standard 3.0[V] voltage was applied to is activated and end after the fur has reached its stable state. The
the motor. Lastly, the reproducibility of the bristling effect was recording specifications are as such: the frame rate is 600fps and
estimated by visual judgment. video size is 432 x 192 pixels, in landscape mode. The bristling
effect was observed in every recording and analysis is conducted
The result of the reproducibility of bristling effect is as shown in
after the recordings. Three featured points of observations were
Table 1. Reproducibility is described as ‘O’ means high-
repeatability or ‘X’ means no repeatability. Opossum’s fur is the
Figure 9. Displacement of Measured Points from
Initial Position with High-Speed Camera at 600fps.

Figure 7. High-Speed Photography Setup

Figure 10. Initial Shifting of Measured Points After


Providing Horizontal Vibration to Epitheca
Figure 8. Recording Setup and Displacement of
Feature Points Used for High-Speed Tracking.
selected on the fur surface as shown in Figure 8. Pyramidal
implementation of the lucas kanade method is also used to track
the feature points [25]. During the test, these featured points
moved to the left side of Figure 8, with their movement
directions and trajectory shown as arrows in Figure 8. The length
of the arrow indicates the displacement of the featured point
while the positions of the points are described with the x and y
axis shown in Figure 8. The actual displacement was obtained Figure 11. Result of FFT of Vertical Shifting
after calibration and the transformation coefficients were (x, y) = Measured from Feature Point Tracking
(0.435, 0.385) millimeter per pixel.
sampling points are calibrated from 500ms onwards, and the
3.2.3 Result and Discussion Hanning window was used for FFT. The result is as shown in
The temporal changes of the trajectories of the 3 featured points Figure 11. The left side graph of Figure 11 describes the
are as shown in Figure 9. The result of 5 trials shows similar regularized power spectrum in the range of Nyquist frequency.
trend of bristling effect. Details of one of these 5 trials are as The close up of this figure is shown in the right side graph of
follow. Time 0 is the time when a voltage across the vibration Figure 11 with the peak frequency at approximately 57 Hz. It is
motor was detected and the trajectories are plotted from this time. supposed that this peak is the oscillation frequency of the fur
The left sided graph of Figure 9 describes the trajectory in x axis, surface.
while the right sided graph of Figure 9 describes the trajectory in
the y axis. Figure 10 is a larger scaled graph that shows the 4. Discussion
displacements in between 0 to 500ms. The left side graph of Our experiments showed that it is impossible to flatten the body
Figure 9 shows that the duration for the bristling effect to happen hair using vibration. This bristling effect is not reversible with
is approximately less than 500ms. The left side graph of Figure only mechanical vibration. However, manual stroking smoothes
10 shows that the transition may be finished in approximately out the fur easily. If this interface serves as not only an output
300ms. device but also an input device, the user strokes the fur naturally.
Thus in this interactive system, human reflex instinctively reacts
As there is no displacement of the featured points in the duration
to the fur interface, resolving this reversal issue in the bristle
of time before the vibration motor starts, it is supposed that the
effect. Furthermore, it is relatively easy for capacitive sensors to
bristling effect is caused by the mechanical vibration actuated by
detect the touch of a hand to the fur.
the disk shaped vibration motor.
As mentioned above, the visible physical changes caused by the
The oscillation of the featured point was observed as shown in
bristle effect serve as visual information presentation. On the
Figure 10. After comparison to the captured video, we deduced
other hand, the mechanical vibration serves as a tactile
that this is not a tracking error but is instead the oscillation of the
presentation method [21]. One of the goals of this technology is
fur surface, which continued after 500ms. Frequency analysis
to increase the desirability and comfort of wearable technology
was then conducted with the Fast Fourier Transformation. 2,048
for the user. It is not difficult to present tactile stimulation to
human, but through this method, using a single actuator, one can [11] Philips Electronics N.V. FABRIC DISPLAY. United States
simultaneously provide two modes of sensation—both visual and Patent, No. US 7,531,230 B2, 2009
tactile. This point is another advantage of our method. [12] K. Kushiyama. Fur-fly. Leonardo, Vol. 42, No. 4, pp. 376-
377, 2009
5. Conclusion [13] H. Raffle, M.W. Joachim, and J. Tichenor. Super cilia skin:
In this paper, we proposed the texture control method using An interactive membrane. In CHI Extended Abstracts on
natural fur and a simple vibration motor as a wearable interface. Human Factors in Computing Systems, 2003
This method based on bristling effect allows the texture of
natural fur to instantly change, and additionally serves as a soft [14] M. Coelho, P. Maes. Sprout I/O: A Texturally Rich Interface.
user interface. The result of estimation experiments Tangible and Embedded Interaction, pp. 221-222, 2008
demonstrated that opossum fur has the best response, with the [15] M. Coelho, P. Maes. Shutters: a permeable surface for
body hair bristling within less than 500ms and an oscillation environmental control and communication. Tangible and
frequency of 57Hz. Since our current prototype requires the use embedded interaction (TEI ’09), pp. 13-18, 2009
of natural fur, the achieved bristling effect is highly dependent on
[16] A. Ueki, M. Kamata, M. Inakage. Tabby: designing of
the mechanical properties of the natural fur. Thus, we will
coexisting entertainment content in everyday life by
further explore the mechanism of the bristling effect in the future
expanding the design of furniture. in Proc. of the Int. Conf.
using other materials.
on Advances in computer entertainment technology, Vol.
203, pp. 72-78, 2007
6. References [17] S. Yohaman, K. E. MacLean. The Haptic Creature Project:
[1] S. Mann, Wearable Computing: A First Step toward Social Human-Robot Interaction through Affective Touch.
Personal Imaging, Computer, vol. 30, no. 2, pp. 25-32, 1997 ACM SIGGRAPH 2007 Emerging Technologies, p. 3, 2007
[2] J. Rekimoto, K. Nagao, The world through the computer: [18] S. Yohanan, K. E. MacLean. The Haptic Creature Project:
computer augmented interaction with real world Social Human-Robot Interaction through Affective Touch. In
environments, Proceedings of the 8th annual ACM Proceedings of the AISB 2008 Symposium on the Reign of
symposium on User interface and software technology table Catz & Dogs: The Second AISB Symposium on the Role of
of contents, pp. 29 - 36, 1995 Virtual Creatures in a Computerized Society, vol. 1, pp 7-11,
[3] H. Ishii, B. Ullmer, Tangible Bits: Towards Seamless 2008
Interfaces between People, Bits, and Atoms. Proceedings of [19] K. Wada, T. Shibata, T. Saito, K. Sakamoto and K. Tanie,
CHI '97, pp. 234-241, 1997 Psychological and Social Effects of One Year Robot
[4] M. Orth, R. Post, E. Cooper, Fabric computing interfaces, Assisted Activity on Elderly People at a Health Service
CHI 98 conference summary on Human factors in computing Facility for the Aged, Proceedings of the 2005 IEEE,
systems table of contents, pp. 331 - 332, 1998 International Conference on Robotics and Automation, 2005
[5] D. De Rossi, F. Carpi, F. Lorussi, A. Mazzoldi, R. Paradiso, [20] Y. Hashimoto, H. Kajimoto, Emotional touch: a novel
E.P. Scilingo and A. Tognetti, Electroactive fabrics and interface to display "emotional" tactile information to a
wearable biomonitoring devices, AUTEX Research Journal, palm, Intl. Conf. on Computer Graphics and Interactive
vol. 3, no. 4, 2003 Techniques archive, ACM SIGGRAPH 2008 new tech
[6] T. Amemiya, J. Yamashita, K. Hirota, M. Hirose, Virtual demos table of contents, 2008
Leading Blocks for the Deaf-Blind:A Real-Time Way-Finder [21] A. Toney, L. Dunne, B. H. Thomas, S. P. Ashdown, A
by Verbal-Nonverbal Hybrid Interface and High-Density Shoulder Pad Insert Vibrotactile Display, Proceedings of the
RFID Tag Space, Virtual Reality Conference, IEEE, pp. 165, Seventh IEEE International Symposium on Wearable
IEEE Virtual Reality Conference 2004 (VR 2004), 2004 Computers, pp 35-44, 2003
[7] R.W. Lindeman, Y. Yanagida, H. Noma, K. Hosaka, [22] C. Porth, K.J. Gaspard, G. Matfin, Essentials of
Wearable vibrotactile systems for virtual contact and pathophysiology: Concepts of altered health states,
information display. Virtual Reality, 9, 203-213, 2006 Lippincott Williams & Wilkins, Chapter 60, 2006
[8] H.F. Harlow, R.R. Zimmerman. Affectional responses in the [23] J.A. Helgren, Rex cats: everything about purchase, care,
infant money. Science, p.130, 1959 nutrition, behavior, and housing, Barrons Educational Series
[9] A. Wakita, M. Shibutani, Mosaic textile: wearable ambient Inc, 2001
display with non-emissive color-changing modules. In: [24] F. B. M. Waal, Behavioral Ecology and Sociobiology,
Proceedings of the international conference on advances in Reconciliation and consolation among chimpanzees, vol. 5,
computer entertainment technology (ACE), pp.48-54, 2006 issue. 1, pp.55-66, 1979
[10] J. K. Soon Teh, A. D. Cheok, R. L. Peiris, Y. Choi, V. [25] J.Y. Bouguet, et al., Pyramidal implementation of the lucas
Thuong, S. Lai, Huggy Pajama: a mobile parent and child kanade feature tracker description of the algorithm, Intel
hugging communication system, Proceedings of the 7th Corporation, Microprocessor Research Labs, OpenCV
international conference on Interaction design and children Documents, 1999
table of contents, pp. 250-257, 2008
Evaluating Cross-Sensory Perception of Superimposing
Virtual Color onto Real Drink:
Toward Realization of Pseudo-Gustatory Displays
Takuji Narumi Munehiko Sato Tomohiro Tanikawa Michitaka Hirose
Graduate School of Graduate School of Graduate School of Graduate School of
Engineering, Engineering, Information Science and Information Science and
The University of Tokyo The University of Tokyo, Technology, Technology,
/JSPS, 7-3-1 Hongo, Bunkyo-ku, The University of Tokyo, The University of Tokyo,
7-3-1 Hongo, Bunkyo-ku, Tokyo, Japan 7-3-1 Hongo, Bunkyo-ku, 7-3-1 Hongo, Bunkyo-ku,
Tokyo, Japan 81-3-5841-6369 Tokyo, Japan Tokyo, Japan
81-3-5841-6369 81-3-5841-6369 81-3-5841-6369
sato@cyber.t.u-
narumi@cyber.t.u- tokyo.ac.jp tani@cyber.t.u- hirose@cyber.t.u-
tokyo.ac.jp tokyo.ac.jp tokyo.ac.jp

ABSTRACT
In this research, we aim to realize a gustatory display that enhances
Keywords
Gustatory Display, Pseudo-Gustation, Cross-sensory Perception.
our experience of enjoying food. However, generating a sense of
taste is very difficult because the human gustatory system is quite
complicated and is not yet fully understood. This is so because 1. INTRODUCTION
gustatory sensation is based on chemical signals whereas visual Because it has recently become easy to manipulate visual and
and auditory sensations are based on physical signals. In addition, auditory information on a computer, many research projects have
the brain perceives flavor by combining the senses of gustation, used computer-generated virtual reality to study the input and
smell, sight, warmth, memory, etc. The aim of our research is to output of haptic and olfactory information in order to realize more
apply the complexity of the gustatory system in order to realize a realistic applications [1]. Few of these studies, however, have dealt
pseudo-gustatory display that presents flavors by means of visual with gustatory information, and there have been rather few display
feedback. This paper reports on the prototype system of such a systems that present gustatory information. One reason for this is
display that enables us to experience various tastes without that gustatory sensation is based on chemical signals while visual
changing their chemical composition through the superimposition and auditory sensation are based on physical signals, which
of virtual color. The fundamental thrust of our experiment is to introduces difficulties to the presentation of a wide variety of
evaluate the influence of cross-sensory effects by superimposing gustatory information.
virtual color onto actual drinks and recording the responses of
Moreover, in the human brain's perception of flavor, the sense of
subjects who drink them. On the basis of experimental results, we
gustation is combined with the sense of smell, sight, warmth,
concluded that visual feedback sufficiently affects our perception
memory and so on. Because the gustatory system is so complicated,
of flavor to justify the construction of pseudo-gustatory displays.
the realization of a stable and reliable gustatory display is also
difficult.
Categories and Subject Descriptors
H.5.2 [INFORMATION INTERFACES AND Our hypothesis is that the complexity of the gustatory system can
PRESENTATION]: User Interfaces –Theory and methods. be applied to the realization of a pseudo-gustatory display that
presents the desired flavors by means of a cross-modal effect. In a
cross-modal effect, our perception of a sensation through one sense
General Terms is changed due to other stimuli that are simultaneously received
Experimentation, Human Factors. through other senses. The McGurk effect [2] is a well-known
example of a cross-modal effect. The visual input from the
articulatory movements of the lips saying “gaga” was dubbed over
by auditory input saying “baba”. Subjects who were asked to report
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are what they heard reported that they hear “dada”, which shows that
not made or distributed for profit or commercial advantage and that seeing the movement of the lips can interfere with the process of
copies bear this notice and the full citation on the first page. To copy phoneme identification.
otherwise, to republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee. By using this effect, we may induce people to experience different
Augmented Human Conference, April 2-3, 2010, Megève, France. flavors when they taste the same chemical substance. For example,
Copyright © 2010 ACM 978-1-60558-825-4/10/04...$10. although every can of “Fanta” (produced by the Coca-Cola
Company) contains almost the same chemical substances in almost
the same combination, we appreciate the different flavors of orange,
grape, and so on. It is thus conceivable that the color and scent of a
drink have a crucial impact on our interpretation of flavor, which is
not based entirely on the ingredients of the drink.
Therefore, for the realization of a novel gustatory display system,
we need to establish a method that permits people to experience a
variety of flavors not by changing the chemical substances they
ingest, but by changing only the other sensory information that
accompanies these substances.
In this paper, we first introduce the knowledge from conventional
studies about the influence that other senses have on gustation.
Next, based on this knowledge, we propose a method that changes
the flavor that people experience from a drink by controlling the Figure 1: Aji Bag and Coloring Device method (ABCD
color of the drink with a LED. We then report the results of an method)
experiment that investigates how people experience the flavor of a we eat food that we find displeasing. We also know that without
drink with color superimposed upon it, by using the proposed olfaction, we hardly experience any taste at all. Moreover, as
method and the same drink colored with dye. Finally, we evaluate Prescott explained, the smells we experience through our nose
the proposed method by comparing these results and discuss its stimulates gustatory sensation as much as the tasting by our tongue
validity and future avenues of research. [5]. Rozin reported that when people were provided with olfactory
stimulation, they said that the sensation evoked in their mouth was
2. GUSTATORY SENSATION AND CROSS- a gustatory sensation, even if the stimulation itself did not evoke
such a sensation [6]. Furthermore, there is a report that 80% of
SENSORY EFFECT what we call "tastes" have their roots in olfactory sensation [7].
The fundamental tastes are considered the basis of a visual system
for the presentation of various tastes such as RGB. There are On the other hand, it is well known that humans have a robust
several theories regarding the number of fundamental tastes. The tendency to rely upon visual information more than other forms of
four fundamental tastes theory includes sweet, salty, sour and sensory information under many conditions. As in the
bitter, while the five fundamental tastes theory adds a fifth taste abovementioned study of gustation, many studies have explored the
sensation, umami, to these four tastes. Moreover, a number of effect of visual stimuli on our perception of "palatability". Kazuno
research reports have indicated that "fundamental tastes do not examined whether the color of a jelly functioned as a perceptual cue
exist because gustation is a continuum", or "the acceptor sites of for our interpretation of its taste [8]. His survey suggests that the
sweetness or bitterness are not located in one place [3]." It can thus color of food functions as a perceptual cue more strongly than its
be said of this crucial idea that there is no definition of fundamental taste and smell.
tastes that is accepted by all researchers [4].
These studies, then, indicate the possibility of changing the flavor
In any case, what is commonly called taste signifies a perceptual that people experience with foods by changing the color of the
experience that involves the integration of various sensations. We foods. It is not difficult to quickly change the color of a food, and
perceive "taste" not just as the simple sensation of gustatory cells the three primary colors, which can be blended to create all colors,
located on our tongue, but rather, as a complex, integrated are well-known. Thus, if we can change the experience of taste by
perception of our gustatory, olfactory, visual, and thermal changing the color of a food, this is the key to the creation of a
sensations, as well as our sense of texture and hardness, our pseudo-gustation display, because it is easy to present visual
memory of other foods, and so on. When we use the common word information. Our research, therefore, focuses on a technological
flavor, then, we are in fact referring to what is a quite multi-faceted application of the influence of colors on gustatory sensation.
sensation.
It is therefore difficult to perceive gustatory sensation in isolation 3. PSEUDO-GUSTATORY DISPLAY
from any other sensation unless we take special training or have a BASED ON CROSS-SENSORY EFFECT
remarkably developed capacity. This suggests, however, that it is
EVOKED BY SUPERIMPOSITION OF
possible to change the flavor that people experience from foods by
changing the feedback they receive thorough another modality. VIRTUAL COLOR ONTO ACTUAL
While it is difficult to present various tastes through a change in DRINKS
chemical substances, it is possible to induce people to experience In this paper, we propose a method that can induce people to
various flavors without changing the chemical ingredients, but by experience various tastes only through the controlled
changing only the other sensory information that they experience. superimposition of color upon the same drink by means of a LED.
To do this, we invented the Aji (Aji means taste in Japanese) Bag
The reason for this is that the olfactory sense, above all other
and Coloring Device (ABDC) (Fig. 1) as a means of changing the
senses, is most closely related to our perception of taste. This
color of a drink without changing its chemical composition. In the
relationship between gustatory and olfactory sensation is
ABCD method, a small plastic bag filled with a liquid to be drunk
commonly known, as illustrated by our pinching our nostrils when
drink in our implementation because it would be safe if our
subjects happened to ingest it. Our prototype system of a pseudo-
gustatory display using the ABCD method is shown in Figure 3.

4. EVALUATING CROSS-SENSORY
PERCEPTION OF SUPERIMPOSED
VIRTUAL COLOR

4.1 Purpose of Experiment


To evaluate the proposed method’s effectiveness at inducing
Figure 2: Wireless LED node. people to experience various flavors, we performed an experiment
to investigate how people experience flavor in a drink with
superimposed color, by comparing the results of the proposed
method with those in which the drink was colored with dye.
We formulated a middle taste beverage whose taste was midway
between the taste of two commercially sold drinks. We asked the
subjects to drink the middle taste beverage and a middle taste
beverage that had been colored. The purpose of this experiment
was to examine how the subjects would interpret different flavors
when the color was changed, and to examine how they would
interpret a scented drink and a colored drink.
Figure 3: Prototype System of Pseudo-Gustatory
Display. 4.2 Experiment Procedure
is attached to a straw, and then the plastic bag is put into white- For the beverage used, we obtained the data for its sugar content
colored water into which color is superimposed by means of a and its ratio of acid to sugar from that given in the Dictionary of
wireless LED node. . Fresh Fruit Juices and Fruit Beverages for orange juice, apple juice
and grape juice [10]. According to a questionnaire regarding these
The Particle Display System [9] proposed by Sato et al. is used as a
three kinds of juices which were given to 23 people, there was no
coloring device. This system can be installed by distributing
specific difference between the relative gustatory images of orange
physically separated pixels into a large and complicated
juice and apple juice. For this reason, we chose orange juice and
environment. This system consists of hundreds of full-color and
apple juice as the objects of imitation.
wireless LED nodes. The wireless capability allows each node to be
freely moved without the distance limitation involved when wire For this experiment, we created a drink whose level of sweetness
cables are utilized. Users are therefore able to design a uniquely and sourness was midway between that of orange juice and apple
arranged pattern in full-color in the real world by distributing and juice, which we called "the intermediate drink." We made this
controlling the smart nodes. A wireless LED node (Fig. 2), which intermediate drink from sucrose and citric acid, using a sugar
as a pixel of Particle Display System was used as a coloring device content of 12% and a citric acid concentration of 0.43%. Orange
by putting it into the water in a waterproof pack or putting it in a juice, apple juice, and the intermediate drink all had approximately
glass with a lid on it. the same degree of sweetness, though the orange juice was the
This LED node consisted of a wireless communication module sourest. We then prepared three kinds of scented drinks and three
(SNODE2, by Ymatic Ltd.), a full-color LED, and a kinds of colored drinks based upon the intermediate drink.
microcontroller (PIC). The LED node can be connected to input After subjects had drunk the intermediate drink and the three kinds
devices such as acceleration sensors. It works as an autonomous of colored drinks, they were asked to compare each colored drink
processing system that changes color by means of interaction with with the intermediate drink and to plot their experience of the taste
users when data from the connected input devices are processed of the colored juice on plotting paper. The plotting paper had two
within it. axes: one for sweetness and one for sourness. We defined the origin
Because certain liquids are too clear to diffuse the light, we could of the plotting paper as the taste of the intermediate drink. We
not change the color of the liquid to be drunk by direct exposure to prepared three kinds of colored drinks as objects of comparison: an
the LED. We therefore used white-colored water and took a small imitation apple juice drink, an imitation orange juice drink, and a
plastic bag filled with the liquid to be drunk, attached the bag to a drink which had an unfamiliar color. To eliminate any effect of the
straw, and then put the bag into the white-colored water. This order in which the juices were drunk, the order was randomly
white-colored water served as a medium for the diffusion of light assigned by the experimenters. In addition, subjects drank water in
and allowed the appearance of the drink to be changed to arbitrary the intervals between their drinking of the experimental drinks.
colors. Water with coffee cream was used as the white-colored
■Sweetness
■Sourness

Orange Yellow Green


Figure 4: Colored Drinks by ABC Method Figure 7: Average Scores of Sweetness / Sourness
(Orange, Yellow, Green) When Subjects Took Colored Drinks with Dyes.

.
■Sweetness
■Sourness

Figure 5: Aji Bag and Colored Water method (Left:


Outline of method, Center: Aji Bag, Orange Yellow Green
Right: Outside of Drink with Bag in)
Figure 8: Average Scores of Sweetness / Sourness
When Subjects Took Colored Drinks with Proposed
. Method.
Table 1: Flavors of the Drink Colored with Dyes Felt by
Subjects
(the number of the answer
Yellow Orangeis written in parenthesis)
Green
Lemon (8) Orange (15) Unknown (9)
Pineapple (3) Sour (3)
Other (8) Other (4) Other (8)

Figure 6: Colored Drinks by ABCD Method


(Orange, Yellow, Green) Table 2 Flavors of Colored Drink with Proposed
Method Felt by Subjects (the number of the answer is
written in parenthesis)
In addition, subjects were asked to interpret the flavor they
experienced for each drink as they were drinking them. To prevent
subjects from knowing too much about the purpose of the Yellow Orange Green
experiment, they were given their survey form after they had drunk Lemon (9) Orange (8) Melon (12)
all the drinks. An experiment using our method with colored drinks Pineapple (3) Peach (3) Apple (2)
was performed with 19 subjects, and an experiment using drinks Other (7) Other (8) Other (5)
colored with dye was performed with another 19 subjects. would change the appearance of the drinks without the addition of
dyes. To this end, we invented the Aji Bag and the Colored Water
4.3 Pseudo-Gustation with Dyes method (ABC method) (Fig. 5). In the ABC method, a small plastic
To evaluate the influence of color on the interpretation of flavor, bag filled with the liquid to be drunk is attached to a straw and put
three kinds of colored drinks were prepared that used the into a plastic bag inside the water that is colored with dye.
intermediate drink with colored water as objects for comparison.
We selected three colors: orange, yellow and green (Fig. 4). The 4.4 Pseudo-Gustation with Proposed Method
familiar colors orange and yellow were selected when orange juice Three kinds of colored drinks that used the intermediate drink with
and apple juice were the objects of imitation for the intermediate the ABCD method were prepared as objects of comparison. An
drink, and green was selected for one of the drinks because of its experimenter adjusted the color of the LED so it would resemble
unfamiliar color. the water colored with dyes in 4.3 and then superimposed that color
We used dyes for the addition of color to the drinks. Because the onto the white-colored water (Fig. 6).
taste, sweetness, and sourness of the intermediate drink could
change when it was mixed with dye, a technique was needed which
4.5 Results 4.6 Discussion
Because the subjects plotted the sweetness and the sourness of the There were a few subjects who replied that the colored drinks tasted
drinks according to their subjective standards, there was no point in the same as the intermediate juice. This clearly showed that the
using the distances between plane coordinates in our evaluation. We taste that people experience can be changed by altering the color of
therefore assigned scores based on the direction from the origin to what they drink. Also, the method of coloration does not have much
the points that were plotted by subjects. In particular, we assigned a of an impact on the quality of the cross-sensory effect in the
value of +1 to a score for sweetness/sourness when the point interpretation of flavor. These results confirmed that the proposed
plotted by a subject was in a positive direction from the origin on method is able to evoke the cross-sensory effect in the
the sweetness/sourness axis, and we assigned a value of -1 when a interpretation of flavor, and that it can be used in the realization of
point was plotted in a negative direction on that same axis. We a pseudo-gustatory display system.
assigned a value of 0 to a sweetness/sourness score when a point
While 79% of the subjects experienced an orange flavor from a
was plotted directly on the axis of sweetness/sourness.
drink that was colored orange with dyes, only 42% of the subjects
4.5.1 Colored Drink with Dyes experienced an orange flavor from a drink colored orange by means
When subjects tasted the drinks colored with dye, they responded of a LED. In addition, almost all the subjects could not identify the
that the colored drink had the same flavor as the intermediate juice flavor of the drink that was colored green with dyes, and 63% of
in 5 out of 57 trials (8.8%). The variances of the evaluated scores them experienced the flavor of melon from a drink that was colored
for the change in sweetness/sourness were around 0.8 and there green by means of a LED. This result is attributed to the turbidity
were no significant differences between the groups. (Fig. 7). These of the liquid. Orange juice is normally turbid, and the drink that
are very large variances. In addition, this particular tendency was was colored orange with dye was also cloudy. The drink that was
not found in the relative relationships among the three plotted colored orange with a LED, however, had a higher degree of
points. transparency than the orange juice. Similarly, there is no turbid
green drink in the marketplace, and melon soda, (which is popular
In response to the question "What was the drink that you just had?," in Japan) is a well-known green drink that is clear. Multiple
15 subjects replied "Orange" after they drank the orange colored subjects commented that they thought the drink colored with a LED
drink and 8 subjects replied "Lemon". After drinking the yellow was a carbonated drink. The drink colored with the LED tended to
colored drink, 3 subjects replied be associated with a carbonated drink that is normally transparent.
"Pineapple" and 9 subjects replied "Unknown." After drinking the It is difficult to mimic a liquid with a low degree of transparency
green colored drink, 3 subjects replied "Sour". (Table 1) Many using white-colored water and a LED. We consider that these
subjects could not offer a definite answer after tasting the green differences in visual appearance led to the differences in the
drink, whose color is unfamiliar in drinks. On the other hand, many experimental result.
subjects said that they tasted a specific flavor after tasting the Because the LED node consumed the battery power too quickly, on
drinks whose color only imitated that of a well known juice. These occasion the color of drink changed in front of a subject in the
results show that visual feedback is able to evoke a pseudo- midst of the experiment. The LED node can sustain the original
gustatory sensation of flavors, even if it is not able to change one’s color for 40 minutes, but color of the node changes to red when the
sensitivity to fundamental tastes. battery is low. For this reason, the logistics of battery duration and
4.5.2 Colored Drink with Proposed Method drainage need to be improved upon.
On the other hand, when subjects tasted the colored drink with the
proposed method, they responded that it had the same flavor as the 5. CONCLUSION
intermediate juice 4 times out of 57 trials (7.2%). The variances of In this research, we propose a novel pseudo-gustatory display that
the evaluated scores for the change in sweetness/sourness were can induce people to experience the same drink as having a variety
around 0.7 and there were no significant differences between the of tastes. This is done without changing the drink’s chemical
groups (Fig. 8). These, too, are very large variances. In addition, composition, bur rather, through the superimposition of virtual
this specific tendency was not found in the relative relationships color onto the drink; this method enhances our experience of
among the three plotted points. enjoying food. We evaluated the cross-sensory effect on flavor
interpretation that was evoked by our prototype system that
Concerning the question "What was the drink that you just had?,"
employs visual feedback with a full-color LED. The results of our
after tasting the orange colored drink, 8 subjects replied "Orange"
experiments show that we cannot change how people experience
and 3 subjects replied "Peach". After tasting the yellow colored
fundamental tastes by means of visual feedback. However, the
drink, 9 subjects replied "Lemon" and 3 subjects replied
results also show that visual feedback can influence the manner in
"Pineapple". After tasting the green colored drink, 12 subjects
which people interpret the flavors they experience and that the
replied "Melon” and 2 subjects replied "Apple" (Table 2). Many
proposed system works well as a pseudo-gustatory display.
subjects experienced a specific flavor after tasting the drinks
colored with the LED. These results show that the method of Because the coloring of drinks with dyes is different from coloring
coloration does not have much impact on the quality of the cross- them with a LED in terms of turbidity, the results confirmed that
sensory effect in the interpretation of flavor. the proposed method is not good for imitating certain types of
drinks. However, the results also show that the coloring method
does not have a great impact on the quality of the cross-sensory
effect during flavor interpretation. From these results, we concluded
that the proposed method is able to evoke a cross-sensory effect
during flavor interpretation and can be used to realize a pseudo- [5] Prescott J., Johnstone V., Francis J.: Odor–Taste Interactions:
gustatory display system. Effects of Attentional Strategies during Exposure, Chemical
Future work in this area will include improvement of the battery Senses 29 (2004), pp.331–340.
supply of the LED coloring device and development of a technique [6] Rozin P,: "Taste-smell confusion”and the duality of the
that will enable a pseudo-gustatory display to change tastes olfactory sense. Perception and Psychophysics, Vol. 31
interactively. (1982), pp. 397-401.
[7] Ichikawa K., Kankaku Kakuron [the Particulars about
6. ACKNOWLEDGMENT Sensation]: Aji Mikaku, Kan’nou Kensa Seminar Textbook
This work was supported by a Grant-in-Aid for Young Scientists [Textbook for Sensory Testing of Gustation Seminar], 1960.
(A) (21680011). [8] KAZUNO C., WATABE E., FUJITA A., MASUO Y.,
Effects of Color on the Taste Sense of Fruit Flavored Jelly,
7. REFERENCES Bulletin of Jissen Women's University, Faculty of Human Life
[1] T. Nakamoto and H. P. D. Minh: “Improvement of olfactory Sciences 13413244 Jissen Women's University 2006.
display using solenoid valves,” Proc. IEEE Virtual Reality [9] Munehiko SATO, Yasuhiro SUZUKI, Atsushi HIYAMA,
2007, pp. 179.186 (2007) Tomohiro TANIKAWA, Michitaka HIROSE: “Particle
[2] H. McGurk and J. MacDonald, Hearing lips and seeing voices. Display System – A Large Scale Display for Public Space – ”
Nature, 264, 746.748, 1976. ICAT 2009, Lyon, France, December, 2009
[3] Damak S, Rong M, Yasumatsu K, Kokrashvili Z, Varadajan [10] Yoshimura I., et al., The Dictionary of Fresh Fruit Juices and
V, Zou S, Jiang P, Ninomiya Y, and Margolskee R. Detection Fruit Beverages 1, 2. Asakura Publishing (1997), Nihon Kaju
of sweet and umami taste in the absence of taste receptor t1r3. Kyoukai.
Science, No. 301, pp. 850-853, 2003.
[4] Sakai N., Saito S., Mikaku-Kyukaku [Gustation & Olfaction],
Section 2, Koza Kankaku Chikaku no Kagaku [Lecture on
Science of Perception], Asakura Publishing, pp. 72-114.
The Reading Glove: Designing Interactions for
Object-Based Tangible Storytelling
Joshua Tanenbaum, Karen Tanenbaum, Alissa Antle
School of Interactive Arts + Technology
Simon Fraser University
350 - 13450 102 Avenue
Surrey, BC V3T 0A3 Canada
{joshuat, ktanenba, aantle}@sfu.ca

ABSTRACT future…whatever this object holds.”


In this paper we describe a prototype Tangible User Interface -Transcribed and paraphrased from Hellboy [7]
(TUI) for interactive storytelling that explores the semantic
In the 2004 film Hellboy, the character of Abe Sapien possesses
properties of tangible interactions using the fictional notion of
the ability to read the “memories” of objects by touching them
psychometry as inspiration. We propose an extension of
with his hands. This paranormal ability, known as psychometry
Heidegger’s notions of “ready-to-hand” and “present-at-hand”,
or object reading, has numerous occurrences in films, novels,
which allows them to be applied to the narrative and semantic
comics, and games. The idea of being able to extract the history
aspects of an interaction. The Reading Glove allows interactors
and future of everyday objects is a compelling one, with potent
to extract narrative “memories” from a collection of ten objects
narrative implications. Imagine being able to experience the
using natural grasping and holding behaviors via a wearable
history of a fragment of the Berlin Wall or the spacesuit worn
interface. These memories are presented in the form of recorded
by Neil Armstrong during his first moonwalk. While this notion
audio narration. We discuss the design process and present
remains largely relegated to the realm of fiction, tangible user
some early results from an informal pilot study intended to
interfaces (TUIs) make it possible to author interactive stories
refine these design techniques for future tangible interactive
that draw on the idea of psychometry as a metaphorical context
narratives.
for interaction.
In this paper we describe the Reading Glove: a prototype
Categories and Subject Descriptors
wearable user interface for interacting with Radio Frequency
H.5.2 [Information Systems]: User Interfaces – input devices
Identification (RFID) tagged objects in a tangible interactive
and strategies.
narrative system. The Reading Glove extends the sensory
apparatus of the interactor into a realm of meaning and
General Terms association, simulating the experience of revealing the hidden
Design “memories” of tagged objects by triggering digital events that
have been associated with them. An interactor augmented with
Keywords the Reading Glove need only touch a tagged object in order to
Interactive Narrative, Tangible User Interfaces, Wearable experience a narrative tapestry of its past uses.
Computing, Object Stories Previous work combining tangible computing with interactive
narrative has emphasized the technical and design challenges of
the hardware, while providing relatively little insight into the
1. INTRODUCTION experience of narrative when mediated by a collection of
Abe Sapien picks up a discarded weapon from the
objects. In this study, we explore the potential of tangible
wreckage. From across the room, Agent Manning snaps
interactions to increase a reader’s awareness of story objects as
at him “Hey, Fish-Stick! Don’t touch anything!” Abe
narratively meaningful. We first consider the relationship
regards him with bemused tolerance.
between objects and narrative, before discussing the ways in
“But I need to touch it,” he says, “to see.” which existing prototype tangible storytelling systems have used
“To see what?” objects. The central theoretical construct of our work is the
notion of semantically present objects. To explicate this idea
Abe runs his hand along the blade. “The past, the we propose a new interpretation of Heidegger’s notions of
“present-at-hand” and “ready-to-hand”. We then discuss the
Permission to make digital or hard copies of all or part of this work for design challenges of constructing the Reading Glove system.
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
We close with a discussion of a pilot user study and consider the
copies bear this notice and the full citation on the first page. To copy implications of this work for future tangible storytelling
otherwise, to republish, to post on servers or to redistribute to lists, systems.
requires prior specific permission and/or a fee.
Augmented Human Conference, April 2–3, 2010, Megève, France.
Copyright © 2010 ACM 978-1-60558-825-4/10/04…$10.00.
2. OBJECTS AS STORIES examples indicate the potential of object-based stories to evoke
Every object in our lives has a story to tell. The relationship deeply personal narrative associations, in effect triggering
between objects and stories is one with a rich history. People unconsciously embedded narrative scripts. Newman argues that
use collections of books, movies, artwork, and other objects to humans are predisposed to understand things in terms of
communicate and define their identities and personalities. narrative [15]. He describes this predilection for narrative in
Kleine et al. write: terms of a set of species wide archetypal narrative scripts
embedded in the human psyche [15].
Possessions to which there is attachment help narrate a
person’s life story; they reflect “my life.” One kind of It is the objects themselves that are central to the creation of rich
strong attachment reflects a person’s desirable narrative meanings in these stories. We contend that any
connections with others. For example, one person’s narrative system seeking to use object associations to evoke a
photographs signify “people who were important to me story needs to foreground the objects as semantically
at one time in my life,” a daughter’s ring portrays her meaningful. Stories told through objects have the potential to
mother’s love, and another person’s piece of furniture engage senses not ordinarily invoked in traditional storytelling
reflects his family heritage. Another kind of experiences. Touch, taste, and smell are currently underutilized
attachment portrays key aspects of a person’s for the telling of stories and their potential as additional
individuality…In this way, attachments help narrate channels for narrative information remains unexplored.
the development of a person’s life story [11].
People use possessions and personal artifacts to construct 3. PREVIOUS WORK
personal narratives [10]. Objects also allow people to 3.1 Other Systems
communicate across social, cultural, and linguistic divides. In There have been several attempts to merge research in
sociology there is a notion of boundary objects: artifacts that interactive narrative with research in tangible interaction. One
exist between two different worldviews. Boundary objects are popular approach has been to distribute narrative “lexia” –
sites of negotiation between opposing perspectives, and allow modular fragments of a larger story or stories – across a series
members of different groups to translate between a familiar of tangible objects. Holmquist et al. describe an object-based
view and an alien one [16]. tangible storytelling system in which readers used a barcode
In cultural heritage and museum studies, collections of artifacts scanner to retrieve video clips in a narrative puzzle [9]. This
are assembled as touchstones for preserving historical system only had five short video clips: two associated with
knowledge. Personal objects are often used for memory specific objects from the story, and three associated with generic
elicitation in the preservation of cultural knowledge. The tokens. The authors claim that the goal of the interaction was to
Australian Migration Heritage Center encourages the aging heighten the user’s sense of involvement in the story, but
members of post-war immigrant families to construct personal indicate that the small number of story fragments was a severely
stories out of their meaningful objects and documents [18]. limiting factor.
These “object stories” are part of a broader exploration of Mazalek et al. created a tangible narrative system called
movable heritage which they define as “any natural or genieBottles in which readers open glass bottles to “release”
manufactured object of heritage significance” [18]. By using trapped storytellers (genies) which reveal fragments of narrative
objects from their lives, participants are able to communicate information [14]. As with the work of Holmquist et al, the
and preserve personal stories that might otherwise be lost. authors stated that the goal of the research was to allow
Object stories have artistic and entertainment significance as computer stories to bridge the gap from the digital into the
well. Myst [6], one of the most significant early narrative physical environment. However; physical interaction was
games, revealed its story through meaningful collections of limited to opening and closing the tops of three glass bottles and
objects and narratively rich environments. Artist and writer it is unclear what role, if any these served in the story beyond
Nick Bantock has written several books investigating the being containers for the narrators.
narrative implications of collections of esoteric items. In The Both of these systems reduce their objects to the role of generic
Museum at Purgatory, Bantock uses unusual objects to conjure event triggers. In some contexts, the use of more generic tokens
an image of a possible afterlife, while in The Egyptian Jukebox allows the reader to imagine her own story within the system.
he composes assemblages of tantalizing objects as clues to an Budd and Madej designed PageCraft, a tangible narrative
extended riddle [1-2] In 2008, Rob Walker and Joshua Glenn system in which children created animated digital stories using
started the Significant Objects Project. They hypothesized that RFID tagged blocks on a physical game board [4, 12]. In their
investing an object with fictional meaning would increase its prototype, the tangible objects took a generic form in order to
material value. To test this theory they purchased inexpensive prevent their design from interfering with the creative process of
objects from thrift stores, and invited a group of volunteer the children using them. The system allowed children and
writers to compose a piece of fiction for each object. Each parents to tell their own stories using the physical tokens to
object and story was then auctioned off online [20]. In this “record” the narrative into a digital animated sequence.
project the objects and stories existed in a dialogue with each
Mazalek et al. made a similar design decision when creating the
other, with fiction arising from objects and imbuing them with
graspable “pawns” for their Tangible Viewpoints project. They
shades of meaning.
write “the abstract manner in which these figures evoke the
In each of these cases, objects are more than simply utilitarian human form allows them to take on different character roles
items with a functional purpose. Instead, they are gateways into given different sets of story content” [13]. In the Tangible
a web of human associations and meanings. The above Viewpoints project, these abstracted pawns were used to access
different character perspectives in a multi-viewpoint story. ready-to-hand, seamlessly augmenting the ability of the
Each pawn represented a specific character, which would be carpenter to perform the task. However, should the carpenter
surrounded by projected segments of associated narrative slip and miss the nail or hit his thumb, the hammer “surfaces”
information. Interactors could access this information through and becomes present-at-hand: an awkward tool which is not
the use of a small “lens-like” tangible object. In both PageCraft performing properly and thus becomes the object of its user’s
and Tangible Viewpoints, the objects themselves were designed attention.
to be abstract representations of the system’s digital To put this in a different context, it is possible to productively
information. map Heidegger’s notions onto Bolter and Grusin’s concepts of
In other tangible narrative systems, the relationship between the transparent immediacy and hypermediation [3]. In their writing,
physical interactive items and their associated digital interactions with mediated experiences exist in a state of
representations is less clear. The RENATI project places the immediacy, unless something happens to jolt the viewer into an
bulk of the physical representation into a large “statue”. The awareness of the mediated nature of the experience, which they
interactor stands in front of the statue while experiencing video term hypermediation. Therefore, immediacy is a form of being
clips associated with three different colored RFID tags [5]. ready-to-hand while hypermediation is akin to present-at-hand.
Interaction with RENATI involves placing specific tags on an This oscillation between two binary levels of awareness is
RFID reader (embedded in a clear acrylic hand) when prompted sufficient for understanding functional tools, and for
by the system. If the interactor selects the wrong tag, then the understanding passively mediated interactions, but tangible
system presents a montage of conflicting perspectives on the interactions – particularly those in which the tangible interface
story. In this case, the interaction is limited to deciding to obey is a site of meaning – do not fit cleanly into this model.
the system or not, and is accomplished by essentially pushing a We contend that it is necessary to reexamine these notions when
button. attempting to understand the workings of tangible and embodied
These prototypes all focus on the mapping of tangible object to interfaces. In particular, we think that these notions do not
system outcome, which tends to emphasize the system function account for the ways in which objects exist at an intersection of
of the object rather than the narrative meaning of the object. In potential meanings. The two states described represent
each of these examples, the link between the narrative functional extremes: either invisibly functioning or presently
information and the tangible objects is primarily utilitarian. malfunctioning. We think that there is a third, related mode of
Whether by design, or by designer oversight, the objects in these interacting with objects that is differentiated along semantic
prototypes are functional first, aesthetic second, and semantic a lines instead of functional lines. For the sake of discussion, let
distant third (or not at all). It appears that the objects in these us call this notion “present-at-mind”.
prototypes really just function as physical buttons, activating This idea of present-at-mind encompasses the ways in which we
narrative information that is often only loosely connected to the slip between different associative awarenesses while interacting
objects themselves. with an object or tool. We argue that this notion of present-at-
We contend that one of the unique affordances of an object- mind may be used to describe any situation in which an
based tangible narrative is the ability to emphasize each object awareness of the tool as a locus of meaning occurs.
as a site for embodied narrative meaning. In each of the Thus, from a first-person perspective, I can use a hammer to
examples above, the objects are gateways to meaning, rather drive nails and as long as I do not slip or hit myself it will
than loci of meaning. This is in part due to the limitations of the remain invisibly ready-to-hand. But what if I become aware of
technology employed in their creation and in part due to a the wear of the hammer’s grip, which in turn puts me in mind of
failure to frame the interactions with the objects in a way that my father, to whom the hammer once belonged? What if this
emphasized their physicality or their specific role within the calls my attention to a place where he carved his initials in the
narrative. handle? The hammer has not broken down as a functional tool,
but is no longer an invisible extension of my hand. It has
3.2 Theoretical Background shifted into a state of being present-at-mind, due to a web of
In this paper we propose a new approach to tangible object- associative entanglements in which it exists, rather than to a
based narratives that more closely couples the meaning of the breakdown of functionality. These entanglements are unique to
object with the meaning of the story. This involves rendering this particular tool: a different hammer would not evoke the
the tangible objects semantically present. To understand what same reaction. In this case the hammer is not just a stand-in for
we mean by this, it is necessary to look at some of the any hammer or an extension of the body, but instead a specific
theoretical and philosophical underpinnings of tool use and hammer with a specific story to tell.
tangible interaction. This awareness does not exist in isolation from the other two
In Where the Action Is, Paul Dourish discusses Heidegger’s Heideggerian conditions. Certain types of breakdown can
notions of ready-to-hand and present-at-hand [8]. Dourish trigger this awareness: the roughness of the hammer grip
interprets the notion of present-at-hand to refer to situations in wearing against the palm is sufficient to interrupt the flow of the
which tools “breakdown”, suddenly becoming the focus of our work, but once that interruption occurs, the mind is free to
attention. He contrasts this against the notion of ready-to-hand, explore a range of awareness and associations surrounding the
wherein tools disappear from our perceptions and serve as tool. In this case, we would suggest that one of the roles of
invisible extensions of ourselves. The canonical example of breakdown is as a possible gateway into a present-at-mind
these ideas is of a carpenter using a hammer. As long as the awareness that extends beyond the moment of breakdown.
interaction is proceeding smoothly the hammer is considered
In TUI research, one of the canonical properties of tangibles is a We had some rough criteria for object selection:
meaningful coupling of physical and digital representations
• Objects should invite touch. This might mean pleasing
[19]. In this case, the binary notions of ready-to-hand and
material textures or complex objects that could not be
present-at-hand become problematic as the operation of the
apprehended without physical handling.
tangible object as an interface device often involves paying
attention specifically to the object. The incorporation of a third • Objects should be mechanically interactive. We favored
semantic vector allows this model to account for the relationship objects with moving parts wherever possible, or objects that
between physical and digital representations in a tangible opened and closed.
interface. When the tangible is present-at-mind, it exists in the • Objects should fit together as a collection. We looked for
mind of the reader as a meaningful physical representation; objects with similar color schemes, and for objects that could
however, as an interface device it remains ready-to-hand as a conceivably come from the same place and time.
functional physical stand-in for its associated digital
representations. • Objects should support a wide range of uses, associations,
and imaginings. This was a largely subjective criterion, but we
wanted objects that could conceivably tell an abundance of
4. DESIGN PROCESS stories.
In order to explore these theoretical ideas within a design space,
we developed the Reading Glove. The intent of this system was • Objects should appear to have a history to them. For this
to create an interactive object-based narrative and an interface reason, we looked for older items, with evidence of a lifetime of
that leveraged natural exploratory behaviors. These behaviors use.
support the present-at-mind awareness of the relationship After several weeks of collecting and assembling, we settled on
between the objects and their associated narrative information. a set of 12 objects (see Figure 1). These included (top to bottom
and left to right) an antique camera, an antique telegraph key, a
4.1 Selecting the Objects pair of silver goblets, a top hat, a leather mask, a coffee grinder,
We had several high-level design goals for the narrative. One of antique goggles, a wrought metal rose, a glass vase on a metal
our central critiques of previous object-based narrative systems stand, a ceramic bottle, an antique scale, and a bookend with a
is a broad tendency toward using generic objects with few globe on it.
intrinsic narrative associations of their own. To address this, we
resolved to write a narrative that existed in both a textual form 4.2 Authoring the Narrative
and within a specific collection of meaningful objects. We set The full narrative creation process using these objects is a
out to write a story that required the objects themselves in order subject for another paper, currently submitted, but here we
for it to be complete; a story that could not be communicated provide a brief overview. With the objects selected, we
purely through language. We thus chose to begin with the explored the different possible narrative uses for each of them
objects themselves, in order to help ground the writing within and categorized these narrative possibilities into loose themes.
what would ultimately be the medium of its communication. Next, we constructed a sequence of events that could be told
entirely through object associations within one of these themes.
Knowing the events and objects that would comprise the
narrative, we sat down and wrote out the background and setting
for a central character and narrative situation around which this
story would revolve.
For each object’s occurrence in the plot, we wrote a short piece
of narration centered on that object. These narrative “lexia”,
when strung together, form a single short story, told through
objects. Four of these objects had only a single occurrence in
the storyline, while six of them occurred twice, for a total of
sixteen different narrative lexia. These were all written in a first
person past tense narration, and were recorded as sixteen
separate audio files. These varied in duration with the shortest
running 17 seconds and the longest lasting 38 seconds. The
entire narration was 7 minutes long. In order to help the reader
isolate each narrative lexia from the others, a distinctive chime
was placed at the beginning of each sound file.
We wanted the story to make sense regardless of the order in
which participants engaged the objects. We resolved to write a
story about a spy who is betrayed by his own agency for
political reasons and has to flee for his life. By structuring the
plot as a puzzle which is being pieced together by the central
character/narrator, we were able to reflect the fragmentary
nature of the interaction within the form of the story. Like a
Figure 1. The 12 Narrative Objects puzzle, we designed each narrative lexia with “conceptual
hyperlinks” that served as subtle guides to unraveling the wrist strap and no fingers, which allows it to fit most hands
mystery. Thus, when a reader selects the camera, she learns comfortably. Figure 3 shows how these components are
about a roll of film which was hidden inside a coffee grinder. connected to each other.
Each lexia also includes a direct reference to its associated
object.

4.3 Designing the Technology


Psychometry, when it occurs in fiction, often requires that an
object be held or touched in order to reveal its “memories”. We
wanted to simulate this “hands-on” interaction with our system.
As with the narrative design, we established several high level
goals for the creation of this system:
• Interactors needed to be free to move around
unencumbered by cables or other technology.
• Interactors needed to be able to use both of their hands
freely, without the need for additional overt interactive “tools”
or other interface devices.
• The interaction needed to encourage participants to
physically handle the objects in the narrative, without
interfering with the experience of the objects.
A glove-based wearable interface had the potential to address
most of these goals, provided it could be made unobtrusive
enough to prevent it from inferring with the tactile experience of
the objects. After investigating several different sensing
technologies, we settled on Radio Frequency Identification
(RFID), which would allow us to tag each object individually
and discretely. In order to read the information on these tags we
designed and built a portable RFID reader which could be
embedded in a soft fabric glove. The Reading Glove hardware is
comprised of an Arduino Lilypad microcontroller, an Figure 3. Circuit Diagram for the Reading Glove hardware
Innovations ID-12 RFID reader, and an Xbee Series 2 wireless The glove wirelessly transmits RFID tag information to a laptop
radio. These components, along with a power supply, are built computer running Max MSP, a programming environment
into a fingerless glove (see Figure 2). which allows for easy prototyping of audio and video
interactivity (see Figure 4). The signal is routed to a state
switch in Max which triggers the playback of any associated
media assigned to each tag.

Figure 2. The Reading Glove (large image), and components


(top row, left to right): Arduino Lilypad Microcontroller,
Xbee Wireless Radio, Innovations ID12 RFID Reader Figure 4. Reading Glove Program in Max MSP
The RFID reader is located in a small pocket on the palm of the As the hardware reached completion we needed to make some
glove, while the remaining components are secured within a decisions about the interaction logic of the system. The RFID
pouch on the back of the hand. The glove has an adjustable reader transmitted a tag’s ID every time it detected it, which
meant that an interactor holding an object or turning it over in 4.4 Testing the Technology
her hands could generate multiple activations from the same tag. We have not yet performed a formal study of this work, but we
We felt that re-triggering the audio every time the tag was have run a set of informal user trials, intended to interrogate
detected would frustrate the interactor, and ultimately some of the above design decisions in preparation of doing a
discourage physical play with the objects. However, the audio more extensive study.
clips required between 17 and 38 seconds to listen to, which
meant that a simple delay between activations was not a Participants, selected from the graduate student population, were
satisfactory solution. A delay had the potential to make the asked to interact with the Reading Glove story for as long as
system feel unresponsive, or non-functional. To solve this they liked. Each participant was given the same set of
problem we chose to “lock out” any given tag after the initial instructions, including information about the functioning of the
detection event, rendering it inert until a new tag was triggered. glove. Each session was videotaped for future review and
This meant that if an interactor wanted to interact with an object analysis. A short video of the studies may be viewed online
multiple times, he would need to switch to a second object, and [17]. Two of the seven participants did not speak English as
then return to the first. their first language, which we were concerned would
problematize their experience of the audio narration, however
For objects with multiple lexia, we were faced with the dilemma only one of these participants experienced any difficulty with
of how much authorial control we wanted to exert over the the story, which we discuss below.
reader’s experience of the different fragments. If we configured
the system to play these in chronological order we would be We structured this study to focus on several questions intended
structuring the way in which the story was presented, at least at to explore the functioning of the objects as semantically
an intra-object level. We were concerned that doing this would meaningful artifacts and the operation of the glove as a natural
discourage interactors from exploratory interactions with the interface:
objects by quickly revealing the limitations of the available 1. Could participants successfully piece together and recount
options. We made the decision to instead have the associated the basic story?
lexia presented at random, knowing that this was not a perfect
2. Could participants map specific objects to specific
solution. The random triggering of the lexia on an object meant
narrative information and themes?
that it was much more likely that an interactor would miss a
fragment of the story; however it rewarded sustained interaction 3. Was there a correlation between time spent engaged with
and exploration. the objects and the comprehension of the narrative?
One final design challenge was discovered during our initial 4. Did the glove-base interaction qualitatively change how
testing of the finished Reading Glove. We had initially set out interactors approached the objects compared to a non-wearable
to make the tags on the objects as unobtrusive as possible, in version?
order to avoid interactions with the tags as “buttons” instead of To test the first three questions, we asked participants to re-tell
with the objects themselves. This meant finding creative ways the story to us, and asked targeted questions about specific
to disguise the tags on the objects without interfering with their objects. To test the fourth question we split the participant
ability to be read by the glove. Unfortunately, it quickly group in half randomly. One group interacted with the objects
became evident that this was going to be impossible. The while wearing the glove and the other group was instructed to
passive RFID tags work through principles of induction: when leave the glove palm-up on the table and scan the objects over
the electromagnetic field generated by the reader is intercepted it. Due to time limitations, only seven participants were able to
by the antenna on the tag it induces a small current in the tag, complete the study, with four wearing the glove and three
which is enough to power a tiny transceiver attached to a tiny scanning objects over a stationary glove. With such a small
piece of memory containing the tag’s identification code. The study population we cannot draw generalizable conclusions;
effective range of this system is ordinarily a few inches, however the anecdotal evidence and critiques from the
however, when the tag is placed in proximity to a metal object participants provided valuable insight into certain aspects of the
this range drops substantially or disappears entirely, depending Reading Glove’s design.
on the metal. During the object selection phase we were
unaware of this constraint and so 4 of the 10 objects used in the One concern with this study was that the population from which
story were comprised of enough metal to render any tags in the participants were drawn was not wholly representative of the
direct contact with them inoperative. This forced us to abandon general public. Participants in this study were “tech-savvy”
our initial goal of disguising the tags entirely. graduate level researchers, many of whom had a direct interest
in games, narrative, tangibility, and interaction. In our
Instead, for the four problem objects – the metal rose, the experience, graduate students interacting with research
antique camera, the silver goblets, and the telegraph key – we prototypes tend to get caught up in trying to second-guess the
located the RFID tags on paper tags, wrapped in brown duct technology. Given that our goals for this study were to critique
tape to blend in with the color-scheme of the objects. The the design of the prototype this was not necessarily a drawback
remaining objects were tagged directly, using the same brown in this case. The pilot study ended up bearing a close
duct tape as a visual indicator of the tag’s presence. One resemblance to a process of expert review. At this stage in the
participant remarked that the paper tags made the objects feel design, we believe that this is a suitable and valid mechanism
like “artifacts from a museum collection”. However, this meant for critiquing the work.
that each tag had a clear visual indicator of its presence on an
object. Our biggest concern with this first prototype was that
participants would allow the novelty of the interaction to
distract them from the narrative content. The system as Although she was not happy with the ways in which the glove
designed is meant to be read rather than played with, and we limited her exploration of the objects, after the first time that she
worried that participants would grow impatient with the length inadvertently triggered an event, she learned to only handle
of the audio files, or that the oral nature of the story would objects when she wanted to learn more about their role in the
prove inaccessible to participants accustomed to visual and narrative. This raises a design question: had she been able to
textual narratives. We were pleasantly surprised when six of the interact freely with any object without triggering responses,
seven participants took the time to thoroughly “read” the story. would she have been able to maintain a coherent mapping of
Unsurprisingly, there seemed to be a direct correlation between which lexia were related to which object? We discussed how
time spent engaged with the prototype and overall narrative the interaction could be changed to better satisfy her
comprehension, across both conditions. Table 1 shows the time expectations, ultimately concluding that had she been in the
each participant spent interacting with the system before second group that was not wearing the glove that she would
deciding to stop reading. have had a more enjoyable interaction.
Of the seven participants, six were able to successfully recount We can apply the terminology introduced above to this situation
the central details of the story. Only Participant 4 was unable to to gain a better understanding of what was going on. When
reconstruct the sequence of events when asked to. To a certain Participant 1 first picked up an object and received the audio
extent this was likely due to language comprehension issues, as feedback the object was ready-to-hand, or transparently
Participant 4 was not entirely confident in her English language immediate. In this situation, the object operated as the
abilities. This might also account for her taking less time to instantiating point for the narrative event. When she set this
interact with the system than the other participants, who all had object down, however, and picked up a new object, the
a greater mastery of the language. associated narrative event interrupted this immediacy, creating a
Table 1. Participants' Reading Time moment of breakdown where she was forced to grapple with the
objects as interactive instruments, rendering them present-at-
Condition Participant # Time Spent Reading hand or hypermediated. In order to correct for this unwanted
Participant 1 12 min 26 sec behavior, she was forced to re-engage with the first object, and
to stay engaged with it while experiencing the associated lexia.
Wearing Participant 2 10 min 46 sec This creates conditions that foster a present-at-mind experience
Glove Participant 3 12 min 12 sec of the object, by encouraging the interactor to linger on details
of the object that might otherwise be passed over.
Participant 4 7 min 3 sec
Participant 5 12 min 58 sec 4.4.1.2 Memories & Objects
Not Wearing Most of the participants commented that they enjoyed the way
Participant 6 11 min 59 sec
Glove in which the story fit together like a puzzle, and many of them
Participant 7 12 min 53 sec
commented on the ways in which the objects served as external
referents for the story content. Participant 2 remarked that “it
When asked to describe the role of specific objects in the story was interesting how I could tie specific memories to specific
or specific object themes, all participants were able to make objects.”
meaningful connections, regardless of which group they Participant 3 said “I really like the fact that in addition to the
belonged to. audio you have these, sort-of touchstones, so like you can go
back and listen to that part of the story, you have like…a visual.
4.4.1.1 Touching & Triggering Just like in real life if you’re remembering something, like if
Interestingly, for at least one of the participants the glove-based you’re looking around your room and you see…‘I remember
interaction interfered with her ability to engage with the objects getting that statue at GenCon’ or something. So having that
to the extent that she desired. When asked what she liked about visual touchstone as a memory holder I think is a cool thing.”
the interaction, Participant 1 said “I like that I could touch Participant 7 also enjoyed the objects, and also remarked on his
things…I love touching things! When I go to a museum I suffer general enjoyment of non-linear narrative. In these cases we see
because I can’t touch things.” This excitement over touching evidence of the participants engaging the objects at a semantic
the objects interfered at first with her ability to access the level, which we frame as present-at-mind.
narrative information, because she would pick up an object and This non-linearity presented far fewer problems than we had
trigger an event, and then would set the object down and want to initially anticipated. Participant 2, for example, never listened
play with other objects while listening to the first event. to several important pieces of the story. However, when asked
Unfortunately, picking up new objects triggered new events, to recount the chain of events he was able to fill in the gaps in
interrupting the previous lexia before she had finished listening the story based on his understanding of the lexia on either side
to it. She expressed frustration over the pacing of the system of the missed pieces. Aside from Participant 4, Participant 6
saying “Even though I was able to touch I couldn’t really touch had the most difficulty constructing a picture of the narrative.
them as I wanted…I can touch, but I have to wait so it was When asked about his experience he said that he was
really slow when I had to wait, and I wanted to keep touching considering each narrative lexia as an isolated “allegory”, and
things and inspect them, but I wasn’t able to fully finish that he felt the overall message was “too subtle” for him to
inspecting them until I was finished hearing [the audio triggered grasp. This may have been in part due to the path that he took
by the initial touch].” through the objects, although further analysis of each
interactor’s “navigation” of the story is needed before this can with physical artifacts. The iterative design process of this
be fully understood. system demonstrates an integrative approach to tangible
In observations of the relationships between the participants and storytelling, and that the initial success of the prototype
the objects across the two groups, it was clear that the group indicates the value of this method. We believe that for tangible
wearing the glove spent much more time handling the objects, storytelling there needs to be a close relationship between the
playing with them, and generally engaging with their content of the system and the design of the interaction and
physicality. The three participants in the second group all tangibility. In order to accomplish this, the design process
exhibited the same interaction pattern. They would pick up an needs to be able to address both of these concerns in dialogue
object, scan it over the glove, and then set it back down on the with each other.
table while they listened to the associated audio clip. We do not Our initial testing of the Reading Glove, via an informal expert
have enough data to conclude whether or not this had a review process, indicates that it is possible to communicate a
measurable impact on the participants’ narrative comprehension, rich narrative experience along audio, visual, and tactile
however. This initial study suggests that the glove based modalities. The pleasure which our interactors displayed in
interaction may well afford a richer experience of the tangible their interactions with the Reading Glove is encouraging, as was
objects. the ease with which they adapted to the wearable interface. We
believe that by designing systems to be present-at-mind it is
possible to author richly meaningful interactive experiences.
5. CONCLUSIONS & FUTURE WORK
The initial testing of the Reading Glove indicates that it has the
ability to communicate a rich and detailed non-linear narrative 6. ACKNOWLEDGMENTS
experience that is largely grounded in physical artifacts. More We would like to thank Aaron Levisohn for his support of this
time needs to be spent with the video data of the pilot study project via the IAT 884 Tangible Computing course. We would
before any further work can be done on this project, however an also like to thank Greg Corness and Andrew Hawryshkewich for
obvious next step is a more formal controlled experiment. In their invaluable assistance in the development of the hardware
particular, it would be interesting to compare a version of the and software for this project. Finally, we want to acknowledge
story with the objects against a version using generic tokens. the excellent work of photographer Beth Tanenbaum, seen in
Our observations of the initial round of interactions have Figure 1.
suggested possible quantitative measures which may be used to
triangulate both the observations of the interactors and the
7. REFERENCES
[1] Bantock, N. The Egyptian Jukebox: A Conundrum. Viking
analysis of the interview data. In particular, we think it will be
Adult, 1993.
very interesting to combine coded video data with system logs
in order to get a clear picture of how long each participant is [2] Bantock, N. The Museum at Purgatory. Harper Perennial,
interacting with each object, and in what order the participants 2001.
are encountering the narrative lexia. [3] Bolter, J. D. and Grusin, R. Immediacy, Hypermediacy, and
We would also like to put this system in the hands of a less tech- Remediaton. The MIT Press, Cambridge, Mass, USA, 1999.
savvy population. These initial studies helped us to learn where [4] Budd, J., Madej, K., Stephens-Wells, J., de Jong, J. and
the system broke down, what things interactors found confusing, Mulligan, L. PageCraft: Learning in context: A tangible
and what information should be provided to the participants interactive storytelling platform to support early narrative
before beginning. We intend to use the knowledge gleaned development for young children. In Proceedings of the IDC'07
from this study to construct a more formal protocol to further (Azlborg, Denmark, June 6-8, 2007). ACM Press, 2007.
investigate this system.
[5] Chenzira, A., Chen, Y. and Mazalek, A. RENATI:
In this paper we have presented a new wearable interface for Recontextualizing narratives for tangible interfaces. In
tangible interactive storytelling, inspired by the paranormal Proceedings of the Tangible and Embedded Interaction (TEI'08)
notion of psychometry. Psychometry represents an extension of (Bonn, Germany, 2008). ACM Press, 2008.
the human sensory system into an external realm of meaning
and association. Our system augments the semantic perceptions [6] Cyan Worlds Myst. Broderbund, 1993.
of the interactor, revealing a stratum of memory encoded in a [7] del Toro, G. Hellboy. Sony Pictures Entertainment (SPE),
collection of compelling objects. USA, 2004.
One goal of this system was to author an object-based story [8] Dourish, P. Where the Action Is: The Foundations of
where the objects were loci of narrative meaning. In order to Embodied Interaction. MIT Press, Cambridge, 2001.
understand this, we proposed an extension of the Heideggerian [9] Holmquist, L. E., Helander, M. and Dixon, S. Every object
notions of present-at-hand and ready-to-hand, which have been tells a story: Physical interfaces for digital storytelling. In
used in HCI to understand the ways in which tools are more or Proceedings of the NordiCHI2000 (2000), 2000.
less “visible” at a functional level. We argue that in order to
understand tangible interfaces at a narrative level it is necessary [10] Hoskins, J. Biographical Objects: How Things Tell the
to consider a third vector: present-at-mind. In order to explore a Stories of People's Lives. Routledge, New York, 1998.
semantically present tangible interface in greater detail, we [11] Kleine, S. S., Kleine, R. E. and Allen, C. T. How is a
designed the Reading Glove system, which uses a new possession "me" or "not me"? Characterizing types and an
authoring methodology to couple story events and associations
antecedent of material possession attachment. Journal of in Berkeley's museum of vertebrate zoology, 1907-39. Social
Consumer Research, 22, 3 1995), 327. Studies of Science, 19, 3 1989), 387-420.
[12] Madej, K. Characteristics of Early Narrative Experience: [17] Tanenbaum, J. Handy Transparency: Unobtrusive
Connecting Print and Digital Game. PhD Thesis, Simon Fraser Interfaces for Distributed Object-Based Tangible Interactions.
University, Surrey, BC, 2007. http://www.youtube.com/watch?v=xUiBgPgvTNU, accessed on
[13] Mazalek, A., Davenport, G. and Ishii, H. Tangible December 6, 2010
viewpoints: A physical approach to multimedia stories. In [18] Thompson, S. Writing Object Stories.
Proceedings of the Multimedia (Juan-les-Pins, France, 2002). http://www.migrationheritage.nsw.gov.au/objects-through-
ACM Press, 2002. time/documenting/writing-object-stories/, accessed on
[14] Mazalek, A., Wood, A. and Ishii, H. genieBottles: An December 06, 2009
interactive narrative in bottles. In Proceedings of the ACM [19] Ullmer, B. and Ishii, H. Emerging Frameworks for
SIGGRAPH Conference (August 12-17, 2001). ACM Press, Tangible User Interfaces. J. M. Carrol (ed.), Human-Computer
2001. Interaction in the New Millenium, 2001, 579-601.
[15] Newman, K. The case for the narrative brain. In [20] Walker, R. and Glenn, J. About the Significant Objects
Proceedings of the Second Australasian Conference on Project. http://significantobjects.com/about/, accessed on
Interactive Entertainment (Sydney, Australia, 2005). Creativity December 06, 2009
& Cognition Studios Press, 2005.
[16] Star, S. L. and Griesemer, J. R. Institutional ecology,
'translations' and boundary objects: Amateurs and professionals
Control of Augmented Reality Information Volume by
Glabellar Fader
Hiromi Nakamura and Homei Miyashita
Meiji University
1-1-1, Higashimita ,Tama-ku, Kawasaki City, Kanagawa 214-8571
+81- 44 - 934 -7171
{ce97409,homei}@isc.meiji.ac.jp

ABSTRACT fading in and out the information volume in the AR presentation.


In this paper, we propose a device for controlling the volume of In this concept, a “reality mixer” enables seamless fade-in and
augmented reality information by the glabellar movement. Our fade-out of AR information superposed on a real-world
purpose is to avoid increasing the sum of the amount of presentation. It may also enable smooth crossfade and linking of
information during the perception of "Real Space +Augmented different virtual worlds. The operating interface is hands-free,
Reality" by an intuitive and seamless control. For this, we focused enabling control while both hands are being used for other
on the movement of the glabella (between the eyebrows) when the purposes. Both arbitrary and intentional control of presented
user stare at objects as a trigger of information presentation. The information is desirable, for natural operation.
system detects the movement of the eyebrows by the amount of We have applied this concept to development of the “FAR
the light reflected by a photo-reflector, and controlling Vision” (“Fader of Augmented Reality Vision”) interface for
information volume or the transparency of objects in augmented controlling the volume of the AR information added to
reality space. presentations. The fader operation is controlled by glabellar
movement[1]. Experimental trials have been performed to verify
Categories and Subject Descriptors the accuracy of glabellar movement detection and optimize
B.4.2 [Input/Output Devices]: Channels and controllers; H.3.3 detection points, as two prerequisites for intuitive, seamless
[Information Search and Retrieval]: Information filtering; H.5.1 control of the presentation.
[Multimedia Information Systems]: Artificial, augmented, and
virtual realities;
2. RELATED WORK
One information-adding system which is currently distributed as
General Terms an iPhone application program is the “Sekai Camera”[2].it
Human Factors enables on-screen perusal of added information, referred to as an
“Air Tag”, relating to the geographical location of the camera the
Keyword user is holding. Users can also upload information to the system.
glabellar, photo reflector, information volume Since its introduction, however, the Air Tags in some locations
have quickly become so voluminous that they impede real-space
observation. Restriction of consumer-generated media (CGM) is
1. INTRODUCTION undesirable, and yet its voluminous on-screen addition can make
The term AR refers both to the technique of superimposing virtual it difficult to observe not only the real-space view but also the
information on real-world visual information and to the resulting added information itself, and may ultimately cause an aversion to
visual state. In the realm of mixed reality (MR) formed by this using the application.
combination of real and virtual information, the volume of
information presented is inherently greater than in the ordinary A method of added-information display discrimination or
visual realm, and this has raised concerns that the dramatic information volume control is necessary, to avoid this problem. In
increase in information volume may impede awareness of the real- one type of discrimination, only information of interest to the user
world environment. is added. The Sekai Camera system performs this type of
discrimination based on the direction in which the iPhone is
This has led us to the concept of an information volume fader, pointed. This method is effective in concept, but has been found
analogous to sound volume faders in music production mixers, for lacking in intuitive feel and on-demand response. As it requires
holding and pointing, moreover, it is rather unsuitable for use
Permission to make digital or hard copies of all or part of this work for during work and other activities involving use of the hands. Head-
personal or classroom use is granted without fee provided that copies are mounted displays (HMDs) can provide a continuous “hands-free”
not made or distributed for profit or commercial advantage and that view of added information while worn, but in their present
copies bear this notice and the full citation on the first page. To copy configuration require temporary removal to turn off the added-
otherwise, or republish, to post on servers or to redistribute to lists, information display.
requires prior specific permission and/or a fee.
Augmented Human Conference, April 2–3, 2010, Megève, France.
Copyright 2010 ACM 978-1-60558-825-4/10/04…$10.00.
The Sekai Camera system provides one type of added-information the center of the baseline as 0 mm. With the device held stationary
volume control named "Air Filter", in the form of a slider that by hand, three measurement sets were performed at each
filters out certain information based on dates, distances, and other measurement point for each subject, with each set consisting of
parameters but includes no capability for dynamic information- measurement with the eyebrows relaxed, then fully knit, and then
depth control. again relaxed. The maximum and minimum output values in each
set were selected, and from among these, the largest and smallest
Various studies have been reported on the control of added-
values were taken as the “largest maximum output” and “smallest
information “volume” in AR. In one such study, Tenmoku et al.
minimum output”, respectively. The difference between these two
discussed information volume control and performed on-screen
output values was defined as the “output width” for that subject.
highlighting in accordance with distance[3]. The methods
described in this study, however, are not applicable to systems,
such as Sekai Camera, in which the added information is 5. RESULTS AND DISCUSSION
concentrated within a specific domain, and means of turning the Table 1. Experimental results:
information addition off and on are not discussed.
Subject A B C D E F G
(a) 14 16 14 16 14 16 18
3. SYSTEM (b) 49 89 32 32 71 26 80
In the system proposed herein, the depth and permeability of the (c) 18 30 13 26 23 10 29
AR information is controlled in a stepwise manner by the fader, (d) 2.72 2.97 2.46 1.23 3.09 2.60 2.76
which is operated by changing the glabellar inclination. In the (a) point of maximum output width, in mm from 0-mm centerline;
absence of any applied inclination, or “brow knitting”, mapping is (b) output width at point (a); (c) output width at centerline;
performed for added-information exclusion. When the user (d) ratio between (b) and (c)
“peers”, the accompanying change in glabellar inclination results
As shown in Table. 1, for the seven participants in this experiment,
in a “paranormal” effect, in which the AR comes into view. The
the output width was generally largest in the region between 14
glabellar movement, unlike that of the iris, can be either
and 18 mm from the centerline. Some differences were found
intentional or unconscious. The name of the system, “FAR
among the subjects in relation to distance from the centerline and
Vision”, was accordingly chosen to connote both “far vision” and
maximum output width, but for most of the subjects the output
“Fader of AR Vision”, and thus convey its mixed-reality (MR)
width was uniformly small throughout the region from near the
effect, a mixing of real-world and AR-world imagery.
centerline up to the point just before the 14- to 18-mm region
A photo-reflector (Rohm RPR-220) is used to detect the glabellar where it tended to exhibit the maximum values, as shown for
movement and thus exact its control of the fader, and may be Subject A in Fig. 2.
attached to spectacles, an HMD, or other such devices. The skin
surface of the glabella is illuminated by infrared light (IR) and
changes in the reflected IR intensity are monitored with an IR
sensor, for non-contact detection of glabellar movement.
An analogous device has been reported for detection of temporal
movement[4], but is inherently limited to conscious operation by
the nature of temple movement. The glabella-based detection, in
contrast, enables operation based on unconscious emotional
response as well as conscious intention.

Fig. 2. Glabellar output widths found for Subject A.


These results may be attributed to a characteristic feature of
glabellar skin movement. When the brow is knit, the central
region tends to protrude and a crease forms near the inner tip of
Fig. 1. Glabellar fader on spectacles and on HMD. each eyebrow. The skin movement is far more pronounced in the
regions near the eyebrow tips than in the central region,
4. EXPERIMENT presumably resulting in a greater change in reflection intensity
For the proposed system it is essential to establish the detection and thus in a greater output width in those regions.
accuracy and detection target position, to heighten the level of
intuitive, seamless operation of the glabellar fader. We therefore Variation was also found in the number of distinguishable output
investigated the “width”, as defined below, between fader output steps that could be detected in the output width, but as shown in
values at multiple detection points on the glabella of seven Fig. 1, the number was at least 26 and ranged up to 89, and was
participants. generally at least twice as large as the number of detectable steps
in the central region. In terms of information volume percentage,
For each subject, the measurements were performed at ten points one step corresponds with 1.15 to 4% of total information volume.
in 2 mm intervals in one direction along a horizontal baseline This is considered quite sufficient for control of information
extending between the inner tips of the eyebrows, beginning with volume as envisioned for the proposed system, even though the
requirements will naturally vary with the image and display 7. REFERENCES
conditions. It should therefore be possible to obtain comparatively [1] Hiromi Nakamura, Homei Miyashita. A Glabellar Interface
seamless information volume control, using detection points just for Presentation of Augmented Reality, Proceedings of
to the side of the eyebrow tip, 14 to 18 mm from the glabellar Entertainment Computing 2009 (in Japanese), pp.187-188
centerline. 2009.

6. CURRENT AND FUTURE OUTLOOK [2] Sekai Camera: http://support.sekaicamera.com/en


Current plans call for experimental evaluations directed toward [3] Ryuhei Tenmoku, Masayuki Kanbara, N. Yokoya. Intuitive
improving operating ease and mapping of the subjective degree of annotation of user-viewed objects for wearable AR systems,
eyebrow knitting of individuals to display content and enabling Proceedings of IEEE, International Symposium on Wearable
more intuitive operation. Development efforts are also envisioned Computers’05, pp. 200-201, 2005.
for other interfaces and systems centering on information volume [4] Kazuhiro Taniguchi, Atsushi Nishikawa, Seiichiro
control. Kawanishi, Fumio Miyazaki "KOMEKAMI switch:A novel
wearable input device using movement of temple, "Journal of
Robotics and Mechatronics, Vol.20, No.2 ,pp. 260 -272 ,
2008.
Towards Mobile/Wearable Device Electrosmog Reduction
through Careful Network Selection
Jean-Marc Seigneur Xavier Titi Tewfiq El Maliki
University of Geneva University of Geneva HES-SO Geneva
7, route de Drize 7, route de Drize 4, rue de la Prairie
Carouge, Switzerland Carouge, Switzerland Genève, Switzerland
Jean-Marc.Seigneur@unige.ch Xavier.Titi@unige.ch Tewfiq.Elmaliki@hesge.ch

ABSTRACT not carry enough energy per quantum to remove an electron from
There is some concern regarding the effect of smart phones and an atom or molecule.
other wearable devices using wireless communication and worn Section 2 presents the related work. Section 3 surveys the main
by the users very closely to their body. In this paper, we propose a different wireless technologies used by mobile devices from the
new network switching selection model and its algorithms that point of view of their electromagnetic radiated emission. In
minimize the non-ionizing radiation of these devices during use. Section 4, we propose our networks switching selection model
We validate the model and its algorithms with a proof-of-concept and algorithms to minimize exposures to electrosmog and we
implementation on the Android platform. validate them via discussing a proof-of-concept implementation.
Section 5 concludes the paper.
Categories and Subject Descriptors
C.1.2 [Network Architecture and Design]: Wireless 2. RELATED WORK
Communication. H.1.2 [User/Machine Systems]: Human The potential harmful effects of electrosmog have been
Factors. K.4.1 [Public Policy Issues]: Human Safety. K.6.2 researched in many occasions and there are still doubts regarding
[Installation Management]: Performance and usage these effects beyond the transformation of electromagnetic energy
measurement. in thermal energy in tissues. However, even a sceptical recent
survey [1] underlines that the precautionary principle, meaning
General Terms that efforts for minimizing exposure, should be followed,
Algorithms, Management, Measurement, Performance, Human especially for teenagers.
Factors One of the first means to reduce exposure, besides stopping using
it or using it only when needed and in good conditions (close to
Keywords the base station...), is to use a mobile phone with low Specific
electrosmog, wireless hand-over. Absorption Rate (SAR). However, as the SAR indicated on the
mobile phones is measured at their full power strength, some
phones with higher SAR may better manage their power strength
1. INTRODUCTION and end up emitting less than phones with lower SAR that emit
There are more and more wireless products that are carried out by more often at full power even if it is not needed. In the USA, the
users from broadly used mobile phones to more specific devices FCC has set a SAR limit of 1.6 W/kg, averaged over a volume of
such as cardio belts and watches to monitor heart rates whilst 1 gram of tissue in the head and in any 6 minute period. In
practicing sport. These devices use different wireless technologies Europe, the ICNIRP limit is 2 W/kg, averaged over a volume of
to communicate between each other and their Internet remote 10 grams of tissue in the head and in any 6 minute period.
servers, for example, to store the sport session data. Those devices Interestingly, the iPhone user manual underlines that it may give a
bring interesting aspects for the users. However, there is some higher SAR than the regulation if used in direct contact with the
raising concern about the effect of the non-ionizing body: “for body-worn operation, iPhone’s SAR measurement may
electromagnetic radiations of the wireless devices on the user’s exceed the FCC exposure guidelines if positioned less than 15
health. Those electromagnetic radiation exposures are generally mm (5/8 inch) from the body” [2].
coined “electrosmog”. Non-ionizing radiations mean that they do
As it is less common to stay very close to a mobile phone mast for
a long time, working on reducing the phone emission that is
Permission to make digital or hard copies of all or part of this work for carried all day long very close to the human body should have
personal or classroom use is granted without fee provided that copies are more effect for most users. However, Crainic et al. [3] have
not made or distributed for profit or commercial advantage and that investigated parallel cooperative meta-heuristics to reduce
copies bear this notice and the full citation on the first page. To copy exposure to electromagnetic fields generated by mobile phones
otherwise, or republish, to post on servers or to redistribute to lists, antennas at planning time whilst still meeting coverage and
requires prior specific permission and/or a fee.
Augmented Human Conference, April 2–3, 2010, Megève, France.
service quality constraints. It is different than our approach that
Copyright 2010 ACM 978-1-60558-825-4/10/04…$10.00.
focuses on reducing electromagnetic fields generated by the switch seamlessly to GSM/3G when the user leaves the Wi-Fi
users’ devices at use time. zone. It is also difficult for users to switch to other networks than
Algorithms and systems to seamlessly switch between the the ones provided by their telecom provider. From an energy
available networks to remain always best connected are being consumption point of view, according to Balasubramanian et al.
researched [4-7]. They mainly focus on quality of service (QoS) [12], for 10 kB data size, GSM consumes around 3.25 times less
for decision-making. In this paper, we add another dimension to than 3G and 1.5 times less than Wi-Fi (if the cost of scan and
the network selection issue and underline that electrosmog transfer is taken into account). However, for 500 kB+ data size,
exposure should also be taken into account. GSM consumes as much as 3G and twice more than Wi-Fi (even
if the cost of scan and transfer is taken into account). 3G
consumes around 1.9 times more than Wi-Fi (if the cost of scan
3. WIRELESS TECHNOLOGIES SURVEY and transfer is taken into account). It is worth noting that the
There are different wireless technologies that can be used for energy spent for GSM/3G networks can vary a lot depending on
communication by the devices carried out by the users. In this the distance between the user and the network antenna as the
section, we survey the main ones including those that are less distance can be quite long compared to Wi-Fi: for example for
well-known by the general public but are still important regarding GSM 900, a phone power output may be reduced by 1000 times if
the increased number of wearable communicating sport/health it is close to the base station with good signal. The peak handset
devices products. power limit for GSM 900 is 2 Watts and 1 Watt for GSM 1800.
3.1 WI-FI The peak power of 3G UMTS is 125 mW at 1.9 GHz. It has been
The most widespread wireless network technology on portable found that in rural area, the highest power level for GSM was
computer devices is Wi-Fi. Wi-Fi is unlicensed. There are used about 50% of the time, while the lowest power was used
different types of Wi-Fi networks, for example, Wi-Fi IEEE only 3% of the time. The corresponding numbers for the city area
802.11b and 802.11g use the 2.4 GHz band and 802.11a uses the were approximately 25% and 22%. The results showed that high
5 GHz band. 5 GHz signals are absorbed more readily by solid mobile phone output power is more frequent in rural areas
objects in their path due to their smaller wavelength and for the whereas the other factors (length of call, moving or stationary,
same power they propagate less far than 2.4 GHz signals. The indoor or outdoor) were of less importance [13]. Factors that may
average Wi-Fi range is between 30m and 100m. Mobile computer influence the power control are the distance between hand set and
devices that integrate Wi-Fi are not made to seamlessly switch base station and attenuation of the signal, the length of calls (the
between nearby available Wi-Fi networks. There are a number of phone transmits on the maximum allowed power level at the onset
security issues with Wi-Fi: WEP encryption has been broken for a of each call), and change of connecting base station, ‘‘hand-over’’
while; WPA encryption creates separate channels for each user (the phone will temporarily increase output power when
but most public Wi-Fi access points only ask for authentication connecting to a new base station). Hand-overs will be made when
and does not encrypt afterwards... Although privacy is beyond the the mobile phone is moved from one cell covered by one base
scope of the paper, it is important to note that for privacy’s sake station to another cell, but may also occur on demand from the
and the sensitive aspects of some communicated information such base station owing to a high load on the network at busy hours.
as heart rate profile, secure network should be considered. The iPhone 3G user guide indicates that 10g SAR is 0.235 W/kg
WiMAX is different than Wi-Fi, is more dedicated to long range for GSM 900, 0.780 W/kg for GSM 1800, 0.878 for UMTS 2100;
systems covering kilometers and is rarely integrated in mobile and 0.371 for Wi-Fi. It was worse for the iPhone with 1.388 W/kg
devices for now. The peak power of Wi-Fi 802.11b/g is 100 mW 1g SAR for UMTS 1900 and 0.779 W/kg 1g SAR for Wi-Fi.
and 802.11a is 1 W. Kang and Gandhi [8] found that SAR near- Combined with the fact that its user guide mentioned that it might
field exposure to a Wi-Fi 100 mW patch antenna radiated from a be higher at closer than 1.5 cm and that both 3G and Wi-Fi may
laptop computer placed 10 mm below planar phantom is 2.82 be enabled at the same time, it means that the iPhone can have a
W/kg 1g SAR and 1.61 W/kg 10g SAR at 2.45 GHz and at 5.25 1g SAR much higher than the 1.6 W/kg limit: above 1.388 +
GHz is 1.64 W/kg 1g SAR and 0.53 W/kg 10g SAR. A French 0.779 = 2.167 W/kg.
organization study found that all Wi-Fi 2.4 GHz cards studied are 3.3 Bluetooth, Zigbee, ANT…
under 2W/kg 10g SAR limit from 0.017 to 0.192 W/kg [9] at less Bluetooth, based on IEEE 802.15.1, also uses the 2.4 GHz band
than 12.5 cm. with a data rate of around 1 MB/s. A large number of mobile
3.2 GSM, UMTS/GPRS, 3G… phones integrates Bluetooth. Discovery and association of
Regarding mobile phones, although more and more smartphones Bluetooth devices are not designed to be seamless. Bluetooth
integrate Wi-Fi, their most widespread wireless network 2.1+ pairing uses a form of public key cryptography and is less
technology remains the one provided by their telecom operator: prone to Wi-Fi types of attacks [14]. The peak power of Bluetooth
GSM (around 900 MHz or 1800 MHz; maximum distance to cell ranges from 1 mW to 2.5 mW. The normal range of Bluetooth is
tower from 1 km to 20 km [10]), GPRS, EDGE, UMTS 3G around 10m, which is lower than Wi-Fi. With lower distances,
(around 2GHz; from 144 kB/s in moving vehicles to more than 2 Bluetooth has lower consumption than Wi-Fi: around 3 to 5 times
MB/s for stationary users [11]; maximum distance to cell tower more according to Su Min et al. [15]. However, for resource
from 0.5 km to 5 km [10])… The telecom operators have paid constrained wearable devices such as heart belts and cardio/GPS
licensed to be able to use them. There is different encryption watches, Bluetooth is still consuming too much energy. It is the
between each user using a cell. Mobile phones switch seamlessly reason that a new Bluetooth specification called “Bluetooth low
between GSM/3G cells and more and more mobile phones energy” has been released recently and would consume between
integrate now Wi-Fi. However, only a few phones and telecom 1% to 50% of normal Bluetooth depending on the application
providers allow the users to start a phone call with Wi-Fi and [16]. “Bluetooth low energy” is more seamless: it can support
connection setup and data transfer as low as 6ms. “Bluetooth low Based on the networking technologies that we have surveyed in
energy” can use AES-128 encryption. As Bluetooth consumed too the previous sections, the exposure can be significantly reduced
much energy for resource constrained devices, other networking by choosing among the different networking technologies
technologies have been used. Zigbee based on IEEE 802.15.4- available. On recent mobile phones, there are 4 main choices:
2003 runs at 868 MHz in Europe, 915 MHz in the USA and GSM, 2G, 3G and Wi-Fi. However, it may be cumbersome for the
Australia, and 2.4 GHz in most other places. Zigbee consumes user to learn which networking technology to choose depending
around 10 to 14 times less than Wi-Fi according to Su Min et al. on what they are doing with their phone and to constantly
[15]. The downsize of Zigbee is that it has a much lower data rate manually switch from one network to another. Fortunately, recent
from 20 kB/s to 250 kB/s. Another main network technology that mobile phone operating systems such as Android provide an
was used in many sport/health monitoring devices is ANT, which Application Programming Interface (API) that allows third-party
is proprietary. ANT and Zigbee can send data in less than 10ms. applications to switch from one networking technology to
However, ANT can send bigger files faster as its transmission is 1 another. In this section, we first describe our network selection
MB/s, which means lower energy to submit large files than model and its algorithms and then we explain how we have
Zigbee [17]. Fyfe reports even lower energy consumption for validated our approach with a proof-of-concept application
ANT compared to Zigbee for small data size (8 bytes) [18]. implemented on an Android phone.
Anyway, Zigbee and ANT are not available on mobile phones.
“Bluetooth low energy” seems a good candidate to replace ANT 4.1 Network Switching Selection Model and
and Zigbee due to its openness and number of products already
using Bluetooth. Martínez-Búrdalo et al. [19] have found that
Algorithms
Bluetooth generates very low 10g SAR of around 0.037 W/kg, We define Ni a network i among a set of n available networks in
unfortunately none of these networking technologies are [1; n].
considered as main connecting technologies maybe due to their Each network Ni is associated with a 10g SAR in W/kg defined as
limited range. SARi for the specific device carried out by the user. The related
work surveyed above has underlined that different mobile devices
3.4 Comparison Summary have different SARs.
Table 1. Networking Technologies Comparison
In this case, the optimal policy to minimise the electromagnetic
radiation from the mobile device is to select the network with the
Maximum Distance (m)
Energy Consumption*

10g SAR 3G iPhone

lower SAR. The algorithm in pseudo-code is:


Seamless Mobility#

Data Rate (MB/s)


Frequency (GHz)

(Wi-Fi reference)

Nchosen = N1
Openness#
Security#

Network
(W/kg)

for (int i=1; i<=n; i++)


Technology
if (SARi < SARchosen) Nchosen = Ni
That type of policy works well for voice call activities since the
duration time depends on the length of the conversation.
[2.4 [11; [30; However, for other activities that can be carried out faster with
Wi-Fi ; 5]
1 0.371 L
54] 150]
H L
faster networking technologies, such data exchange (file
[0.9
[0.67; [0.235; [1000;
download, health data monitoring transmission...), the data rate
GSM ;
2] 0.78]
H 0.0096
20000]
L M
of the network should be taken into account. We define the data
1.8]
rate of Ni as DRi in MB/s.
[1.8; [0.144; [500;
3G 2
2]
0.878 H
2] 5000]
L M
In this case of data exchange activities, the optimal policy to
Bluetooth 2.4 0.25 ~0.037 L [1; 3] 10 H M minimize electromagnetic radiation is different. The file size of
[0.8 the data to be exchanged is the same for all networks. If we define
[0.02;
Zigbee ; 0.085 n/a M
0.25]
100 M M the time of exposure with Ni as Ti and FS the file size of the data,
2.4]
we have:
ANT 2.4 0.017 n/a M 1 30 L M
Ti = FS / DRi
Bluetooth [0.01;
Low Energy
2.4
0.125]
n/a M 1 10 H M If we define the exposure during that data exchange with Ni as Ei,
for the optimal policy that would choose Ni, we want Ei <= Ej for
*: The energy consumption comparison is roughly derived from all j different than i in [1; n].
the results and information given in the references cited in this
Ei = SARi * Ti
paper.
Ei <= Ej
#: L: Low; M: Medium; H: High / ~: estimated based on [19]
SARi * Ti <= SARj * Tj
n/a: non-available
SARi * (FS / DRi) <= SARj * (FS / DRj)
SARi <= (DRi / DRj) * SARj
4. CAREFUL NETWORK SELECTION
Our goal is to minimise the exposure of the mobile user to The corresponding pseudo-code is then:
electromagnetic radiations while still allowing the users to benefit Nchosen = N1
from the communication of their devices with enough quality of for (int i=1; i<=n; i++)
service.
if (SARi < ((DRi / DRchosen) * SARchosen)) Nchosen = Ni
The related work surveyed in the previous sections has underlined Few phones and telecom providers allow the users to make phone
that some networking technologies may have much lower SARs calls directly through Wi-Fi. So we assume that Wi-Fi is not
than the maximum SAR measured and reported in their possible in our proof-of-concept prototype for “voice call”
specification depending on the context of use. For example, for activity. In addition, the Android API does not have an API to
GSM 900, a phone power output may be reduced by 1000 times if force switching to one or another of the available GSM, 2G or 3G
it is close to the base station with good signal. Let us define this networks. So in the “voice call” activity case, we can only display
attenuation depending on the context and each networking a message to the users explaining that they should manually select
technology as Ai for Ni. We assume that the activity happens in a GSM network or if not possible set the “Use Only 2G
the same context from its start to its end. For example, the user is Networks” settings.
not moving during the activity, hence the distance between the Concerning the “data exchange” activity, we use the data rate in
mobile device and the base station does not change. MB/s returned by the following method:
In this case, the pseudo-code for “voice call” activity optimal http://developer.android.com/intl/fr/reference/android/net/wifi/WifiInf
policy becomes: o.html#getLinkSpeed()
Nchosen = N1 As no Android API returns the data rate of GSM, 2G or 3G
for (int i=1; i<=n; i++) networks, we use a 3G fixed data rate of 0.5 MB/s and of 0.0096
if (SARi < ((Achosen/Ai) * SARchosen)) Nchosen = Ni MB/s for GSM. Future versions of the API may provide more
information about the type of network found and we could use
The pseudo-code for “data exchange” activity optimal policy finer-grained data rates from this information, for example, it may
becomes: return a dynamically inferred 2G data rate.
Nchosen = N1
If the outcome of the algorithm suggests using one of the Wi-Fi
for (int i=1; i<=n; i++) networks, the Android API facilitates programmatically switching
if (SARi < ((DRi / DRchosen) * (Achosen/Ai) * SARchosen)) to this Wi-Fi network thanks to the following methods:
Nchosen = Ni
http://developer.android.com/intl/fr/reference/android/net/wifi/WifiMa
If seamless hand-over between networking technologies was nager.html#enableNetwork(int, boolean)
possible, i.e., the activity would not be stopped when the network If the suggestion is to use GSM, 2G or 3G, as there is no API to
becomes unavailable and the next network must be used, and the switch to these networks, a message can be displayed to the user
user would move during the activity, our selection algorithm who can manually select a GSM network, set the “Use Only 2G
would be again carried out at time of new hand-over. Networks” settings or select the potential 3G network.
Concerning the attenuation factor depending on the context, we
4.2 Proof-of-Concept Validation of the Model mainly rely on the distance to the network antenna. Wi-Fi does
and its Algorithms not change its transmitting power and we assume it can transmit
In order to validate our previous algorithms, we have investigated as long as it is listed in the Wi-Fi networks discovered by the
how to implement them in an Android Google phone application. Android WifiManager. We could refine that assumption by only
As the SARs are different for each phones, our application must selecting Wi-Fi networks with a Received Signal Strength
be first configured with the SARs phone’s values that phones Indication (RSSI) above a threshold by using the following
vendors must provide by law (at least in the US and in Europe) method:
with the specification of their phones. It is done manually for the http://developer.android.com/intl/fr/reference/android/net/wifi/WifiInf
proof-of-concept but it could be automated by fetching that o.html#getRssi()
information on remote servers publishing phones SARs values As the distances to the GSM, 2G and 3G antennas have a
because the phones’ model can be programmatically obtained via significant impact on the “real” SAR, we have used the signal
the Android API and used to fetch the right values. strength returned by the Android API:
Then, our application asks the user which activity is going to be http://developer.android.com/reference/android/telephony/Neighbori
carried out: “voice call” or “data exchange”. Depending on that ngCellInfo.html#getRssi()
next activity, the right policy is chosen, i.e., “voice call” policy or If the signal strength returned is good, which corresponds to the
“data exchange” policy. The next user activity may be returned value 31, i.e., -51 dBm or greater, we use a 0.01
automatically inferred and when a new activity is detected the attenuation factor for the network SAR. Another approach may be
networking selection could be automatically triggered but this is to use the GPS locations of the user as given by the mobile phone
beyond the scope of the proof-of-concept prototype. GPS and of the antenna as given by third-parties information.
There are four main networking technologies that can be chosen Further work would also be required to fine tune the SAR
on current smart phones: 3G, 2G, GSM and Wi-Fi. The Android diminution depending on the distance and the networking
API facilitates getting information about the different networks technology.
available in proximity:
http://developer.android.com/intl/fr/reference/android/net/Connectivi 5. CONCLUSION
tyManager.html#getAllNetworkInfo() It is still not sure that the average level of electrosmog
http://developer.android.com/intl/fr/reference/android/net/wifi/WifiMa experienced by the users is harmful. However, as the
nager.html#getScanResults() precautionary principle is to minimize exposure and the major
source remains the electromagnetic radiations emitted by the
user’s mobile devices, it is worthwhile trying to minimize their [8] K. Gang and O. P. Gandhi, "Effect of dielectric
radiations. properties on the peak 1-and 10-g SAR for 802.11 a/b/g
With new smartphones, it is possible to switch to one or another frequencies 2.45 and 5.15 to 5.85 GHz," Electromagnetic
of the networking technologies available. We have used this Compatibility, IEEE Transactions on, vol. 46, pp. 268-274, 2004.
possibility to select the networking technologies depending on [9] Supélec, "Etude « RLAN et Champs
their characteristics and the context (user’s next activity, distance électromagnétiques » : synthèse des études conduites par
between the user and the base stations...) in order to minimize Supélec," 2006.
these radiations. [10] G. Maile, "Impact of UMTS," in Conference on Mobile
There are still a few functionalities to fine-tune and automate but Networks & the Environment, 2000.
such a network selection switching approach should be considered [11] Wikipedia. (accessed on 04/12/2009). 3G. Available:
by the manufacturers of mobile communicating devices worn http://en.wikipedia.org/wiki/3G
close to the human body if they want to say that they cared about
the precautionary principle. [12] N. Balasubramanian, A. Balasubramanian and
A.Venkataramani, "Energy consumption in mobile phones: a
measurement study and implications for network applications,"
6. ACKNOWLEDGMENTS presented at the Proceedings of the 9th ACM SIGCOMM
This work is sponsored by the European Union, which funds the
conference on Internet measurement conference, Chicago,
FP7-ICT-2007-2-224024 PERIMETER project [5].
Illinois, USA, 2009.
[13] L. Hillert, A. Ahlbom, D. Neasham, M. Feychting, L.
7. REFERENCES Jarup, R. Navin and P. Elliott, "Call-related factors influencing
[1] J. Vanderstraeten, "Champs et ondes GSM et santé: output power from mobile phones," J Expos Sci Environ
revue actualisée de la littérature ". Bruxelles, BELGIQUE: Epidemiol, vol. 16, pp. 507-514, 2006.
Association des médecins anciens étudiants de l'Université libre
de Bruxelles, 2009. [14] T. G. Xydis and S. Blake-Wilson, "Security
Comparison: BluetoothTM Communications vs. 802.11," ed:
[2] Apple. (2009, accessed on 04/12/2009). Guide Bluetooth Security Experts Group, 2002.
d'informations iPhone 3GS. Available:
http://manuals.info.apple.com/fr_FR/iPhone_3GS_informations_i [15] S. M. Kim, J. W. Chong, B. H. Jung, M. S. Kang and D.
mportantes_F.pdf K. Sung, "Energy-Aware Communication Module Selection
through ZigBee Paging for Ubiquitous Wearable Computers with
[3] T. G. Crainic, B. Di Chiara, M. Nonato and L., Multiple Radio Interfaces," in Wireless Pervasive Computing 2nd
"Tackling electrosmog in completely configured 3G networks by International Symposium, 2007.
parallel cooperative meta-heuristics," Wireless Communications,
IEEE, vol. 13, pp. 34-41, 2006. [16] Wikipedia. (accessed on 04/12/2009). Bluetooth low
energy. Available:
[4] Q. Song and J. Abbas, "Network selection in an http://en.wikipedia.org/wiki/Bluetooth_low_energy
integrated wireless LAN and UMTS environment using
mathematical modeling and computing techniques," Wireless [17] R. Morris. (2008, accessed on 04/12/2009). Comparing
Communications, IEEE, vol. 12, pp. 42-48, 2005. ANT and ZigBee. Available:
http://www.embedded.com/products/softwaretools/206900253
[5] PERIMETER. (accessed on 04/12/2009). Available:
http://www.ict-perimeter.eu [18] K. Fyfe. (2009, accessed on 04/12/2009). Low-power
wireless technologies for medical applications. Available:
[6] H. Liu, C. Maciocco, V. Kesavan and A. L. Y. Low, http://acamp.ca/alberta-micro-nano/.../Ken-Fyfe-HealthMedical-
"IEEE 802.21 Assisted Seamless and Energy Efficient Hand- Dec09.pdf
overs in Mixed Networks," ed, 2009, pp. 27-42.
[19] M. Martínez-Búrdalo, A. Martín, A. Sanchis and R.
[7] M. Kassar, B. Kervella and G. Pujolle, "An overview of Villar, "FDTD assessment of human exposure to electromagnetic
vertical hand-over decision strategies in heterogeneous wireless fields from WiFi and bluetooth devices in some operating
networks," Computer Communications, vol. 31, pp. 2607-2620, situations," Bioelectromagnetics, vol. 30, pp. 142-151, 2009.
2008.
Bouncing Star Project: Design and Development of
Augmented Sports Application
Using a Ball Including Electronic and Wireless Modules
Osamu Izuta Toshiki Sato Sachiko Kodama Hideki Koike
Graduate School of Electro- Graduate School of University of University of
Communications Information Systems Electro-Communications Electro-Communications
University of University of JP JP
Electro-Communications Electro-Communications kodama@hc.uec.ac.jp koike@is.uec.ac.jp
JP JP
izuta@edu.hc.uec.ac.jp dendenkamushi@gmail.com

ABSTRACT sensing, sound, and projection technology. After Ishii’s research,


In our project, we created a new ball, “Bouncing Star” (Hane- several athletic-tangible interfaces which use a ball have been
Boshi in Japanese), comprised of electronic devices. We also created. Some years later, Moeller and Agamanolis devised an
created augmented sports system using Bouncing Star and a experiment to play ‘sports over a distance’ through a life-size
computer program to support an interface between the digital and video conference screen using an unmodified soccer ball [2]. Also
the physical world. This program is able to recognize the ball’s Rudorf and Brunnett developed a table tennis application which
state of motion (static, rolled, thrown, bound, etc.) by analyzing could achieve real time tracking of the huge speed movement of a
data received through a wireless module. The program also tracks ball [3]. In 2006, Iyoda developed a VR application for pitching,
the ball's position through image recognition techniques. On this in which a player could pitch a ball, which included a wireless
system, we developed augmented sports applications which acceleration sensor, into a “screen shaped” split curtain equipped
integrate real time dynamic computer graphics and responsive with IR sensors [4]. That same year, Sugano presented an
sounds which are synchronized with the ball's characteristics of augmented sports game named “Shootball” which used a ball
motion. Our project's goal is to establish a new dynamic form of equipped with a shock sensor and wireless module to do
entertainment which can be realized through the combination of experiments for a novel, goal-based, sports game. Their system
the ball and digital technologies. used multiple cameras, and multiple projectors in the field [5].
In a more artistic field, Kuwakubo created a ball device,
Categories and Subject Descriptors “HeavenSeed,” which by means of an accelerator sensor and
wireless module, generated sounds when it was thrown [6].
H.5.3 [Group and Organization Interfaces] Computer-
Torroja also created a ball shaped sound instrument [7] to
supported cooperative work.
generate music when it is thrown or rolled using the similar
General Terms techniques as Kuwakubo.
As stated above, there have been many projects which have
Algorithms, Design, Experimentation, Human Factors
developed new ball sensing interfaces, yet there has not been a
Keywords ball interface which activates the full potential of the ball itself.
During the playing of a ball based sport, the ball itself has a
ball interface, wireless module, sensing technology, image
variety of states such as rolling, bouncing, being thrown, etc. as
recognition, augmented sports, interactive surface, computer–
well as a rapid change of position. If we identify such states and
supported cooperative play
connect these inputs directly to graphical and acoustic output, we
1. INTRODUCTION can create new dynamic ball based sports in which player can
In Ishii’s “PingPongPlus”[1], the pioneering research of dynamically move their bodies, and the audience can enjoy a
Augmented Sports, the use of 8 microphones beneath a table to complete synthesis of the scene (players, the ball, and a dynamic
sense the location of a ping pong ball created a novel athletics interactive play field).
driven, tangible, computer-augmented interface that incorporated Goal of our “Bouncing Star” Project
Our goal is to develop a new system for the creation of new ball-
Permission to make digital or hard copies of all or part of this work
based sports and to realize augmented sports applications using
for personal or classroom use is granted without fee provided that the system. To this end, we specify three necessary functions for
copies are not made or distributed for profit or commercial advantage our ball. The three functions are:
and that copies bear this notice and the full citation on the first page.  Precise detection of the ball's bounce
To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.  Precise tracking of the ball's position
Augmented Human Conference, April 2–3, 2010, Megève, France.
 Durability of the ball against shock
Copyright © 2010 ACM 978-1-60558-825-4/10/04…$10.00.
We named our ball "Bouncing Star" that is aimed to have these 3
functions and started the Bouncing Star Project to create new ball- with infrared LED's, so the system can detect the dynamic
based sports systems and applications using the developed ball. position of the ball using a high-speed camera. In addition, the
ball itself emits LED light and can change its color and emitting
speed based on the ball’s status. We produced two different ball
2. SYSTEM OVERVIEW based game applications using these characteristics of the
In our system, the ball's characteristic of motion is synchronized Bouncing Star. In these applications, real time CG and sounds are
with real time dynamic computer graphics and sounds. A high- generated in the playing field, in addition to changes in the color
speed camera and a projector are used. This camera is fixed at a and flicker speed of the light directly link to the movement of the
place where it can capture the whole playing field and acquires ball.
images of the infrared lights within the ball on the play field at a The Bouncing Star ball is composed of a core component and
frame rate 200fps. The camera is equipped with an IR filter that cover around the core. The Core is comprised of electric circuits
detects only infrared lights. Corresponding computer graphics are housed in a spherically shaped plastic. The Cover serves first as
generated in a PC and are projected onto the floor. Sounds protection for the core.
generated in a separate application are played through two
separate stereo speakers. Figure 1 shows a layout overview of the 3.2 The Core
system, and Figure 2 is a photograph of the application of the The Core is composed of a PIC micro controller, a three axis
system. acceleration sensor, a sound sensor, a ZigBee wireless
communications module, a lithium ion battery, 6 full color LED's,
and 6 infrared LED's.
The weight is 170g. The acceleration sensor is able to acquire
accelerations between +6G to -6G every 3axis (X-axis, Y-axis,
and Z-axis). The sound sensor is able to interpret sounds that
occur within the ball. The data from both sensors are calculated by
a micro controller (PIC16F88) in 256 steps approximately 160
times per second. The wireless module uses a “ZigBee” platform
which allows reliable connectivity over approximately 30 meters.
Wireless communications between environment, PCs, or another
Bouncing Star, etc. use the RS-232C serial protocol.

Figure 1. Overview Diagram of the System

Figure 3. Inside of a Core

3.3 Shock Absorbing Mechanism


The Cover component of the ball demands a material which is
strong enough for use with applications in real ball games, yet it
must have enough transparency to transmit LED light through the
outside of the ball.
Our first attempt was to make structure to cover the core with a
spherical shell made of silicone rubber. This cover had good
Figure 2. Playing field of the SPACE BALL Application
transparency and good elasticity allowing the ball to bounce well.
Developed Using the Bouncing Star System
However, the weight was 380g for the cover alone.
For the second cover, we made a special beach ball type cover
which could open and close to allow for removal of the core. The
3. IMPLEMENTATION TECHNOLOGY weight of the cover lowered to 120g and the shock against the
3.1 Design of the Ball: "Bouncing Star" core when it was bounced became very small because of the beach
"Bouncing Star" is a simple ball which houses electronic devices ball’s air cushion. But a problem arose in that the surface of the
inside, yet is strong enough to be thrown and bounced off walls beach ball was not strong and could be damaged allowing the air
and floors. It can recognize its various states, "thrown", "bounced", to leak. In addition, the problem of keeping the core at the center
"rolled" using built-in sensors. Furthermore, the ball is equipped of the ball proved difficult.
Table 2. Algorithm to Recognize Ball’s States
A > As A < As

S > Ss Bound (Nothing)

Static
S < Ss Thrown Rolled
Floating

Table 3. Modes of Emission Pattern of Light inside Ball


Mode Name Description
Change color gradationally when the ball
Gradation rolled, and change color slowly when the ball
bounces
The light pulses, the cycle of pulse depends
Pulse
on the acceleration value "A"
Change color quickly
Skip
Figure 4. Three Different Types of Covers when the ball bound or rolled
Above Left: Beach Ball, Abobe Right: Rubber Ball
As the acceleration value "A" increases,
Below: Sepak Takraw Ball
Burning change ball’s color from blue to green,
yellow, to become red
For the third option we used a Sepak Takraw ball which knit by
Turn off the light for a few seconds
six boards of polypropylene. This cover was both lightweight and Vanishing
when the ball is thrown
strong enough to serve as a protective cover. The weight was 80g.
This material could easily both fix the core at the ball's center and
allow for easy removal of the core. We measured the reflection Moreover, the detection of the three different states of the ball
coefficient for each ball (Table 1). (static, rolled, floating) use the acceleration of gravity value.
Table 1. Specs of Three Different Types of the Ball When the ball is on the ground, we can detect value A as a
constant acceleration of gravity value. This static state is
Ball Reflection
Material Diameter Weight recognized if the acceleration of gravity value does not change.
Type Coefficient
The rolled state is recognized by analyzing this acceleration of
Rubber Silicone gravity value in each axis of X, Y, and Z. While the floating state
98 mm 550g 0.70 is recognized by value A = 0. Because, when the ball is thrown
Ball Rubber
and floats in the air, the acceleration sensor cannot detect the
Beach acceleration, the PIC equipped within the ball performs these
Vinyl 220 mm 290g 0.34
Ball operations. Therefore the state of the ball is always transmitted to
Sepak the ball states recognition program installed in the PC.
Takraw Polypropylene 135 mm 250 g 0.43 The threshold As and Ss are easily changed from the PC interface
Ball to modify the value based on the environment of the play field or
the specific application's demand for the ball. The changed
threshold values are kept in case someone switches off the ball.
3.4 Bounce Detection Algorithm The threshold value is always saved in the memory of PIC when it
We developed a new algorithm to recognize ball's states (static, changes.
rolled, thrown, caught, etc) using information from the
In the event if we cannot use a projector and a high-speed camera
acceleration sensor and sound sensor. Since the ball is spun
in the system, the emission of LED light inside the ball continue
irregularly during play we therefore composed acceleration values
to be linked to the movement of ball, and we can play new sports
of the X, Y, Z-axis and calculated A value of total acceleration (1).
with the ball alone. Five different kinds of full color LED's
emission patterns were programmed for the ball in the PIC (Table
A = X 2+Y 2+Z 2 (1) 3).
3.5 Position Tracking Algorithm
When the ball collides with something, the sound sensor detects S We developed an image recognition program to recognize the
value of sound occurring within the ball. We compare acceleration position of the ball by a camera using the infrared lights housed
value A and sound value S with constant threshold As and Ss for within the ball. In a real demonstration environment, we first
each. By combining these, it became possible to recognize the calibrate the field coordinates from the camera coordinates using
states of the ball (Table 2). projective transformation. Second, the image captured by the
camera is labeled after binarization with a constant threshold and
also recognizes the position of the infrared lights. When several
sources of lights were detected in the space which is nearer than
the some set distance threshold, the ball is considered to be
located at the center position of those sources of lights. Because 6
infrared LED's are equipped inside the ball depending on the
shape, there is the case that some sources of lights are detected for
one ball.
3.6 Creation and Projection of Graphics and
Sound
The graphics are created in accordance with the ball tracking
information (location and timing of bounce). They are written in
Visual C++ with Open GL, or DirectX. We used a NVIDIA
Geforce 6800GT graphic board.
A projector is suspended 10 m above the play field surface where
the graphics are to be projected. We used BenQ SP 870 projector Figure 6. Spotlight Mode
for experiments in University of Electro-Communications and
public demonstrations held at three different places. (National Art
Spotlight Mode: A bright white spotlight moves according to the
Center in Tokyo [8], Laval Virtual 2008 [9], SIGGRAPH 2008
ball’s position. The spotlight is always above the ball, as the result,
[10])
player is high lighted too.
Environmental sound effects are created in accordance with the
timing of a bounce. Sound effects are designed so as to match the 3D Shape Mode: People can hit the CG image of cubes and
context of both graphics and the player’s emotion on that scene. spheres which are projected on the floor using the real ball. These
CG shapes moves according to real-time physical simulation, as
the result, people feel they are naturally interacting using their ball
4. APPLICATIONS with the virtual physical world.

4.1 Simple Graphics Effect for Ball Play


In Ishii’s “PingPongPlus”, existing sport “Ping Pong” was
adapted under their new applications. But in our study, we had to
go upstream to the origin of a ball play; that is, questioning how
do we play with a ball, when it has effective interactive graphics?
Taking a very bottom up approach, first, we created simple three
dimensional graphics which could be used as graphic effects for
the movement of the ball itself.
We created three different two dimensional modes (Fluid mode,
Star mode, Spotlight mode) and then we made a three dimensional
graphics mode in which people can hit virtual 3d objects (spheres
and boxes) seen under the floor using Bouncing Star ball, and see
physical interaction between the real ball and virtual 3D objects. Figure 7. 3D Shape Mode
Fluid Mode: Two dimensional fluid motion graphics are 4.2 Augmented Sports Applications
generated slowly at the point where the ball bounces. After making these simple graphic modes in which people
Star Mode: Many particle star shapes spread quickly from the fundamentally enjoy playing the ball with the interactive graphics
point where a ball is located. Stars are turning itself. on the floor, we proceeded to create more complicated sports
game applications, in which people more dynamically move their
bodies and can compete or collaborate in context of sporting game
scenario.
Space Ball 1
We developed an application named "SPACE BALL 1" based on
a condition when our system had no wireless communication
module and microphone in the ball. Therefore the information
about the ball information was derived from two high speed
cameras. This was the only information that our application in PC
could acquire for the system. The information that acceleration
sensor detected was used only to change the emission of light
from the ball itself.
A projection CG of 10 * 10 squares was spread across the field.
Figure 5. Star Mode (Figure 8) Player could get points by hitting the ball on these
panels, and two players could fight for turning these panels for Table 4. Direction of “SPACE BALL 2” Using Ball’s
points. Our challenge was to recognize the bound of the ball by Information
the image recognition alone with the second high-speed camera,
Ball’s information Direction in of “SPACE BALL 2”
and display CG effects, such as many scattered stars, when the
Ball bounced Change color of the ball
ball hits the boundary. However, the boundary identification with
outside
the second camera resulted in many false judgments. Change target’s placement
of the play field
Display the shock wave effect
Ball bounced inside Hit surrounding targets
of the play field
Randomly generated several different
color new targets
Extend remaining time by number and
Rolled interval of hit targets, if you hit several
targets in one throw
Can’t get points when the ball moves
Flying in the air
above the targets

Figure 8. Space Ball 1 (Laval Virtual 2008)

Space Ball 2
In "SPACE BALL 2", we use the ball which has the sound sensor
and the wireless communication module inside it. This application Figure 9. Description of Rules to Get Points and Make New
generates dynamic CG effects on the play field that change in Targets in Space Ball 2
sync with the ball's characteristic motion as detected by the ball
states recognition program. We prepared three different ball states,
"bounce", "rolled", and "flying" which are detected by the
program. The program then uses the position information of the
ball (this is achieved through the high speed camera) as
parameters to decide the direction of the game. Table 3 and Figure
4 show how the direction of game is determined by using ball’s
information. This application is designed as a multi-player
cooperative game. There is a time limit of 60 seconds per game. A
player can score by hitting the ball on the target projected CG spot.
The targets, of the same color, are displayed on the play field.
Their color and placement can be changed when player bounces
the ball outside the field, (at this moment the color of the ball also
changes). The players can choose their favorite placement of the
target spots, making it easier to get high scores by changing the Figure 10. Floor Coordination
ball's color though dribbling it on the floor with their hands.
Hitting a target in one bounce, or rolling the ball on a line of
targets generates higher scores. Figure 9 shows the rules of how to
get points and how to make new targets, as well as how to make
the time limit longer.
Sound effects have an important role in Space Ball 2. We applied
up-beat music as a basic BGM during the play. This up-beat
music is aimed at making people excited during the game. On the
continuous BGM, we added four different sound effects in
accordance with the ball's bounce. Sounds differ based on the
context of the scene, letting players know what happened in their
game (ex. They changed the targets coordinates, they got points
by hitting the target; the ball simply bounced inside the play field
but failed to hit the target). Each sound was designed and recorded
beforehand and plays when a bounce occurs with no delay.

Figure 11. Shock Wave Effect at Bound in “SPACE BALL 2”


5. DISCUSSION bouncing. We tested three different cover materials for the ball,
and at this point have a cover consisting of silicon rubber, which
5.1 Discussion about Play in Applications proved to be the easiest to bounce, though the weight is still heavy
Through 2008 to 2009, we created many experiments in both for small children. The Sepak Takraw type cover was the toughest,
outdoor and indoor fields on the campus of University of Electro- but it did not bounce accurately like a rubber ball. Every type can
Communications. We also made multiple public demonstrations be used in our applications, and almost all people enjoyed playing
in three different places. Hundreds people including small with the silicon rubber ball cases. People often showed
children experienced our applications, and enjoyed the practical concentration and enthusiasm for their games. Players made full
execution of the interaction. body movement for the games, and sometimes people played
In some cases, we gave people only the Bouncing Star Ball continuously for 15 minutes, perspiring during their workout. Not
without projected images. In these cases, we found that "vanishing only the players, but also we observed many audiences who enjoy
ball mode" got a favorable reception from many people. In this the whole game scene around the play field.
mode the ball seems to have really disappeared. This mode is The fusion of the ball and digital technology just began in this
used in a dark place, so players feel a sense of thrill to catch the past decade, but we believe the ball interface for augmented sports
disappearing ball. This sense created a popular reaction to this has great potential to create a new phase for human
mode. communication and entertainment activity. Our next phase would
There were small gaps between the ball’s movement and the game to make our tools an interface to connect the large real physical
CG scene in "SPACE BALL 1", and detection in this game was world to virtual information resource very smooth and very
relatively low using only the ball position information, because at intuitive, turning to a seamless natural state for human beings.
that time the application couldn’t yet use the acceleration and
sound information from the ball.
Therefore we added the wireless communication module and a
7. ACKNOWLEDGMENTS
combination of the sound and acceleration sensors in the ball for We would like to thank all the members who worked for this
"SPACE BALL 2". In doing this, it became possible to sync project in our laboratories, and Division of Technical Staffs at
various ball movements, therefore getting both the ball's states and University of Electro-Communications for their mechanical
the ball's position information. This enabled us to realize the real- engineering support. This development and research was
time bounce detection for the game. In addition, through analysis supported by the CREST project of the Japan. Science and
of the acceleration information, we realized the “rolled” status and Technology Corporation (JST).
“flying” status identification. Through development of this
software, we could realize several unique ways to recognize states 8. REFERENCES
of the ball, which were unavailable in the past project used [1]Hiroshi Ishii, Craig Wisneski, Julian Orbanes, Ben Chuu, and
because it used only electronic devices. Joe Paradiso. PingPongPlus: Design of an Athletic-Tangible
However, we cannot yet adapt the system of the Bouncing Star to Interface for Computer-Supported Cooperative Play.
all the movement of a ball. For example, if the ball is used in Proceedings of CHI’99, pp.394-401, 1999.
baseball, it is impossible in the current system to distinguish the [2]Florian Floyf Mueller, Stefan Agamanolis, Sports Over a
difference of the curves of ball which pitcher throws. Therefore Distance, ACM Computers in Entertainment, Vol.3, No.3, July
we are now developing a prototype, which includes a gyro sensor, 2005, Article 4E, 2005.
and are doing experiments on roll direction identification.
However, this presents some difficulties because the roll speed of [3]Stephan Rusdorf, Guido Brunnett, Real Time Tracking of High
the ball is too fast, and the gyro sensor cannot yet detect the roll Speed Movements in the. Context of a Table Tennis
direction of the ball precisely. We think that this problem can be Application, Proc. of ACM VRST, pages 192–200,. Monterey,
settled by putting multiple pieces hardware together though a 2005.
device which works similar to the software used for ball states [4]Akihiko Iyoda, Hidetaka Kimura, Satoru Takei, Yoshifumi
identification. Kakiuchi, Xiaodong Du, Sotaro Fujii, Yoshihiro Masuda,
Daisuke Masuno, Kazunori Miyata, A VR Application for
5.2 Mix of Ball and Digital Technologies Pitching Using Scceleration Snsor and Strip Screen, Journal of
We exhibited Space Ball 2, at SIGGRAPH2008 New Tech Demos. the Society for Art and Science, Vol.5, No.2, pp.33-44, 2006.
It was impressive that in addition to the players many spectators
watching cheered and said the game was beautiful. We think the [5]Yoshiro Sugano, Jumpeo Ohtsuji, Toshiya Usui, Yuya
combinations of the application showing CG, the sounds, and the Mochizuki, Naohito Okude, SHOOTBALL: The Tangible Ball
light of the LED ball working with a rule based sports game Sport in Ubiquitous Computing. ACM ACE2006
increases both the players and spectators excitement. We would Demonstration Session, 2006.
like to investigate the design methodology relative how to [6]Ryota Kuwakubo “HeavenSeed” is described in his web site.
increase pleasure and excitement of spectators using our system. Ryota Kuwakubo’s site: http://www.vector-scan.com/
[7]Yago Torroja in UPM developed a ball device which is an
interface for musical expression that sends MIDI messages
6. CONCLUSION through a wireless connection to a computer.
In this research project, we developed a ball "Bouncing Star"
which has a plastic core including electronics devices (sound [8]“Bouncing Star” was presented at “Leading Edge Technology
sensor, 3 axis acceleration sensor, IR and full color LED's) Showcase” exhibit held at National Art Center in Tokyo, Feb/6-
covered in material to protect the core against shocks during Feb/17/2008.
[9]Osamu Izuta, Jun Nakamura, Toshiki Sato, Sachiko Kodama, [12]Koji Tsukada, Maho Oki, Chamelleon Ball, Proceedings of
Hideki Koike, Kentaro Fukuchi, Kaoru Shibasaki, Haruko Interaction 2009, pp.119-120, 2009.
Mamiya, "Bouncing Star" Laval Virtual 2008, 2008.4 [13]Hiroshi Ishii. and Brygg Ullmer, Tangible Bits: Towards
(exhibition) Seamless Interfaces between People, Bits and Atoms, in
[10]Osamu Izuta, Jun Nakamura, Toshiki Sato, Sachiko Kodama, Proceedings of Conference on Human Factors in Computing
Hideki Koike, Kentaro Fukuchi, Kaoru Shibasaki, Haruko Systems (CHI '97), (Atlanta, March 1997), ACM Press, pp.
Mamiya, Digital Sports Using the "Bouncing Star" Rubber Ball 234-241.
Comprising IR and Full-color LEDs and an Acceleration [14]Volker Wulf, Eckehard F.Moritz, Christian Henneke, Kanan
Sensor, ACM SIGGRAPH 2008 New Tech Demos (Los Al-Zubaidi, and Gunnar Stevens, Computer Supported
Angeles), Article No. 13, 2008. Collaborative Sports: Creating Social Spaces Filled with Sports
[11]Osamu Izuta, Toshiki Sato, Haruko Mamiya, Kaoru Shibasaki, Activities, Proceedings of the Third International Conference
Jun Nakamura, Sachiko Kodama, Hideki Koike, BouncingStar: on Educational Computing (ICEC 2004), pp.80-89. 2004.
Development of a Rubber Ball Containing Electronic Devices
for Digital Sports, Proceedings of WISS 2008, pp.41-44, 2008.
On-line Document Registering and Retrieving System
for AR Annotation Overlay

Hideaki Uchiyama, Julien Pilet and Hideo Saito


Keio University
3-14-1 Hiyoshi, Kohoku-ku
Yokohama, Japan
{uchiyama,julien,saito}@hvrl.ics.keio.ac.jp

ABSTRACT AR applications are widely developed for game, education,


We propose a system that registers and retrieves text doc- industry, communication and so on. They usually need to
uments to annotate them on-line. The user registers a text estimate geometrical relationship between the camera and
document captured from a nearly top view and adds virtual the real world to overlay virtual objects with geometrical
annotations. When the user thereafter captures the doc- consistency. One of the traditional approaches to estimate
ument again, the system retrieves and displays the appro- the geometry is to use ducial markers [6]. In recent years,
priate annotations, in real-time and at the correct location. the research direction of AR is going towards using natu-
Registering and deleting documents is done by user interac- ral features in order to reduce the limitations of a practical
tion. Our approach relies on LLAH, a hashing based method use [16].
for document image retrieval. At the on-line registering Nowadays, augmenting documents is gaining in popular-
stage, our system extracts keypoints from the input image ity and is called Paper-Based Augmented Reality [5]. The
and stores their descriptors computed from their neighbors. purpose of this research is to enlarge the usage of a phys-
After registration, our system can quickly nd the stored ical document. For example, a user can click words on a
document corresponding to an input view by matching key- physical document through a mobile phone. This enables
points. From the matches, our system estimates the geomet- a document paper to be a new tangible interface for con-
rical relationship between the camera and the document for necting physical to digital worlds. Hull et al have proposed
accurately overlaying the annotations. In the experimental a clickable document, which has some colored words as a
results, we show that our system can achieve on-line and printed hyperlink [5]. When the user reads the document,
real-time performances. the user can click the colored word to connect the Internet.
Also, the user can watch the movie instead of the printed
picture on the document. Their application was designed for
Categories and Subject Descriptors extending the usage of existing newspapers or magazines.
K.5.1 [INFORMATION INTERFACES AND PRESENTA- As a novel application for document based AR, we propose
TION]: Multimedia Information SystemsArticial, augmented, a system that registers a document image with some user an-
and virtual realities; I.4.8 [IMAGE PROCESSING AND notations for later retrieval and augmentation on new views.
COMPUTER VISION]: Scene AnalysisTracking Our system is composed of a single camera mounted on a
handheld display, such as a mobile phone, and of the physical
General Terms documents the user selects. No other special equipment such
as a ducial marker is necessary. The user captures the doc-
Algorithms
ument, writes some annotations on the document through
the display, and registers them in our system. When the user
Keywords captures the registered document again, the annotations of
Augmented reality, Document retrieval, LLAH, Feature match- the document are retrieved from the database and overlaid
ing, Poes estimation at the position selected by the user beforehand. Our system
can be useful in case that the user does not want to write
1. INTRODUCTION annotations directly on valuable documents such as ancient
books.
Augmented reality is one of the popular research cate-
The rest of the paper is organised as follows: we review
gories in computer vision and human computer interaction.
keypoint matching based registration for augmented reality
in the next section. In Section 3, we introduce the usage
of our system. Then, we explain the detailed algorithm of
Permission to make digital or hard copies of all or part of this work for our system in Section 4. We evaluate the way of capturing
personal or classroom use is granted without fee provided that copies are documents and processing time in Section 5, and conclude
not made or distributed for profit or commercial advantage and that copies in Section 6.
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee. 2. RELATED WORKS
Augmented Human Conference April 2-3, 2010, Megève, France.
Copyright 2010 ACM 978-1-60558-825-4/10/04 ...$10.00. The process of registration by keypoint matching between
two images can be divided into three parts; extraction, de- transparent, this mode can be considered as a virtual color
scription and matching. highlighter pen.
As a rst step, we extract keypoints which have distinctive After the registration, the retrieval stage starts. When the
and dierent appearance from other pixels in each image. same document is captured again, the annotations are over-
By using these distinctive keypoints, it is easier to establish laid at the specied position. While rotating and translating
correspondences between two images. Harris corner [4] and a camera, the users can watch the overlaid annotations as
FAST corner [13] are widely used and keep the repeatability written on the document. Since many documents can be
of the extraction under dierent viewpoints. registered in our system, our system can identify which doc-
Next, these keypoints are described as a high dimensional ument is captured now and overlay each annotation.
vector for robust and stable matching. The vector is usu- The operations for registering and deleting documents are
ally computed with local neighbor region of the keypoint. by user's click. First, our system starts the capturing pro-
The well-known descriptors such as SIFT [8] and SURF [2] cess. If the users register a document, the users click a
with 128 dimensional vector are well designed to be invariant button. Then, our system switches to the registration stage
to illumination, rotation and translation change. Since the and waits for user's annotation input. After the input, the
computational cost of SIFT is not sucient for real-time users click the button again to switch to the retrieval stage.
processing, several attempts to reduce the cost have been During the retrieval stage, the users can watch the anno-
done [14, 16]. tations on the document captured. When the users delete
Matching of descriptors can be addressed as a nearest a document from the database, the users click the button
neighbor searching problem. KD-tree based approach [1] while watching the annotations of the document. This op-
and a hash scheme [3] are typical as an approximated near- eration is designed in order to avoid registering the same
est neighbor searching. Though the nearest neighbor can- document. By using these user interactions, the users can
not be always searched, the computational cost is drastically register and delete documents.
reduced compared to full searching. Nister and Stewenius Our system is designed for the people who do not want to
have proposed a recursive k-means tree as a tree structure write annotations on the documents directly and can be also
for quick retrieval [12]. Lepetit et al. have proposed another considered as an electronical bookmarking. In the previous
approach by treating the keypoint matching as a classica- related works, the document database was prepared from the
tion [7]. digital documents [5, 11, 15]. Since it is dicult to prepare
The descriptors such as SIFT and SURF are well suited the digital version of books and newspapers, our system can
to match keypoints with rich texture patterns. However, be easier and more practical in terms of the usage because
documents have generally repetitive patterns composed of our system uses physical documents the user has on hand.
text. Since the local region of documents may be similar
and not be distinctive, these descriptors do not work well. 4. DETAILS
Instead of them, geometrical relationship of keypoints have
been proposed [5, 11]. 4.1 LLAH
As a descriptor for a document, Hull et al. have pro-
posed horizontal connectivity of word lengths [5]. Nakai LLAH is a document image retrieval method [11]. Since
et al. have proposed LLAH (Locally Likely Arrangement our system relies on it, we briey describe the method here
Hashing), which uses local arrangement of word centers [11]. for completeness.
Uchiyama and Saito extended the framework of LLAH for First, the center of each word is extracted from the cap-
more wide range tracking [15]. LLAH is applied to anno- tured document image as a keypoint. The image is blurred
tation extraction written in a physical documents [9] and by using a Gaussian lter and binarized by using adaptive
extended to a framework for augmented reality [15]. thresholding as shown in Figure 2. Since the lter size of
Since LLAH can achieve real-time processing thanks to both processing aects our result, we will discuss their ef-
hashing scheme [11, 15], we develop the system based on fects in Section 5.2.
LLAH as described in Section 4. Next, descriptors are computed for each keypoint. In Fig-
ure 3, x is a target keypoint. First, n nearest points of the
target are selected as abcdefg (n = 7). Next, m points out of
3. PROPOSED SYSTEM the n points are selected as abcde (m = 5). From m points,
The conguration of the system is only a camera mounted a descriptor is computed as explained in the next paragraph.
handheld display such as a tablet PC or a mobile phone. Since the number of the selections is n Cm = m!(n−m)!
n!
, one
The user prepares text documents in which the user wants keypoint has n Cm descriptors.
to write some annotations electronically. No other special From m points, 4 points are selected as abcd. From 4
equipment is used. points, we compute the ratio of two triangles. Since the
At the registration of the document into our system, the number of the selections is m C4 , the dimension of the de-
users capture the documents from nearly top view as shown scriptor is m C4 .
in Figure 1. While our system shows the captured doc- For quick retrieval in keypoint matching, the descriptor
ument on the display, the user can write annotations on is transformed into an index by using the following hash
the document through the display. We prepare two modes; function:
text and highlighter. In the text mode, the users can write !
C4 −1
mX
down several sentences at specied positions on the docu- i
ment as shown in Figure 1(a). This mode works as memos Index = r(i) k mod Hsize (1)
i=0
and can be replaced with handwriting. In the highlighter
mode, the users can highlight the text on the document as where r(i) (i = 0, 1, ...,m C4 − 1) is a quantized ratio of two
shown in Figure 1(b). Since the highlighted areas are semi- triangles, k is quantization level and Hsize is the hash size.
(a) (b)

Figure 2: Keypoint extraction. (a) The document is cap-


tured from nearly top viewpoint. (b) The white regions
represent the extracted word regions. The keypoint is the
center of each region.

(a)

Figure 3: Descriptor. (1) Selection of n points. (2) Selection


of m points out of n. (3) Selectiong of 4 points out of m (4)
Computation of the two triangles area ratio.

Descriptor (Document ID + Keypoint ID), ...

As described in Section 4.1, the descriptor is an index.


At each index, the set of document ID and keypoint ID is
stored. In our system, we use 16 bits for document ID and 16
bits for keypoint ID, and store them as 32 bits integer. Since
(b) the same descriptor can be computed, we use a list structure
for storing several sets of document ID and keypoint ID at
Figure 1: Annotation Overlay. (a) Red text is written as each index.
a memo. (b) Semi-transparent rectangle is highlighted as The descriptor database was generated as a hash table in
written by a color marker pen. previous works [11, 15]. If the database can be generated
beforehand as in [11, 15], the hash size can be optimized and
optimally designed by using all document images. Since our
These descriptors allow matching keypoints of an input system starts from an empty database, it is dicult to de-
image with those of a reference image in the database. termine the appropriate hash size. For avoiding large empty
spaces in the hash table, we use a sparse tree structure for
4.2 Document registration the descriptor database. Even though the computational
When the user captures a document, our system extracts cost of a binary tree for searching will be O(log2 N ) com-
its keypoints and computes their descriptors. For each doc- pared with O(1) of a hash table, it is enough for our purpose,
ument, our system stores keypoints in a table as follows: as discussed in Section 5.3.

Document ID Keypoint ID (x, y) Descriptors 4.3 Document retrieval


In this process, keypoints are extracted, and their descrip-
The document ID is numbered by captured order. The tors and indices are computed as described in Section 4.1.
keypoint ID is also numbered by extracted order from the For each keypoint, the several sets of document ID and key-
image. (x, y) is the coordinate in the image. This allows our point ID are retrieved from the descriptor database. If the
system to estimate the geometrical relationship between the retrieval is relatively succeeded, the same set of document ID
coordinate system of the stored image and the one of the and keypoint ID often appears for a keypoint. By selecting
input image, making possible accurate annotation overlay. maximum number of the counted sets, one set (document
Previous method do not store descriptors [11, 15]. In con- ID and keypoint ID) is assigned to a keypoint.
trast, we need to keep them for the deletion process de- After assigning one set to each keypoint, we count as-
scribed in Section 4.4. signed document ID of each keypoint in order to determine
For document retrieval, our system has a descriptor database the document image captured currently. The document cap-
as follows: tured is also identied by selecting maximum number of the
counts. shown in Figure 4(b). The word regions are desirably ex-
For verifying that the selected document is correct, we tracted in case of Figure 4(c).
compute geometrical constraints such as fundamental ma- The result of keypoint extraction can be inuenced by
trix and homography matrix. Since the paper is put on the image size, character size in a physical document, distance
table, we can use RANSAC based homography computation between a camera and a document, two lters' size and a
for the verication [15]. lens. These parameters should be optimized by considering
From the computed homography, we can overlay some AR the use of the application. In our application, examples of a
annotations at specied positions on the document. The captured image are as shown in Figure 1. The user captures
document retrieval and annotation overlay can be simulta- a A4 paper with 10 pt size's character from around 10 cm
neously done in the same process. high.

4.4 Document deletion 5.3 Processing time


As described in Section 3, the users can delete the docu- We have measured the processing time with 200 small
ment while watching the annotation of the document. This parts of documents. The size of each small part is as shown
means that the users delete the current retrieved document. in Figure 1. In this region, the average number of keypoints
When the document is deleted, its document data such as was around 180.
the sets of document ID and keypoint ID and their descrip- The average processing time of each process is shown in
tors should be deleted. First, we delete the sets of document Table 1. The document registration without user's annota-
ID and keypoint ID from the descriptor database. Since we tion took 1 msec. The document deletion also took 1 msec.
keep descriptors (indices) for each keypoint in the registra- From these result, user interactions can be done with no
tion, we can delete the sets by accessing each index. After stress.
deleting the sets, we delete other document data. Regarding the document retrieval including the annota-
tion overlay, the average time was 30 msec. Compared with
the previous related work [15], the computational cost was
5. EXPERIMENTS reduced because the number of keypoints in a smaller image
was fewer. Even though we use a tree structure for search-
5.1 Setting ing, we can still achieve about 30 fps and enough processing
The parameters in LLAH aect the performance and ac- time for AR.
curacy of document image retrieval. Since the inuence of
the parameters has already been discussed in [11], we do not
discuss it here and x the parameters through our experi- Table 1: Processing time
ments. Instead, we will discuss about the way of capturing Process msec
a document and the processing time for our purpose. Registration 1
In LLAH, the parameters are described in Section 4.1 as Retrieval 30
follows: n, m, k and Hsize . Since we set n = 6 and m = 5, Deletion 1
the number of descriptors for one keypoint is 6 C5 = 6. The
quantization level is k = 32 and the hash size is Hsize =
223 − 1. As described in Section 4.2, the hash size is used
only for computing descriptors. Each descriptor is stored 6. CONCLUSIONS AND FUTURE WORKS
in a binary tree structure. The quantization method of de- In this paper, we presented an on-line AR annotation sys-
scriptors is the same as [11]. tem on text documents. The user can register text docu-
In our current implementation, we use a laptop with a ments with annotations virtually written on the document.
re wire camera for a device. The laptop has Intel Core 2 Then, the user can watch the annotations by AR while cap-
Duo 2.2 GHz and 3GB RAM. The size of the input image turing the same document again. Our system provides the
is 640 × 480 pixels, and the size for the keypoint extraction user interaction for registering and deleting documents. The
is 320 × 240 pixels for fast computation. The focal length of algorithm of our system is based on LLAH. Our system
the lens is xed as 6mm. stores keypoints with their descriptors in the captured im-
age. By using LLAH, our system can quickly identify which
5.2 Image capture document is captured and overlay its annotations. In the
In LLAH, the keypoint extraction is composed of smooth- experiments, we showed that our system could work real-
ing by a Gaussian lter and binarization by adaptive thresh- time.
olding. The lter size of both methods needs to be deter- In our current system, target documents are European
mined beforehand. documents such as English and French. As a future work, we
Since the lter size aects the result of keypoint extrac- will apply to any language by changing the keypoint extrac-
tion, we have tested the keypoint extraction to images cap- tion method depending on the language [10]. Also, multiple
tured from dierent position as shown in Figure 4. The documents may be detected for showing many annotations
Gaussian lter is 3 × 3 and the lter for adaptive threshold- simultaneously. For handling a large scale change, keypoint
ing is 11 × 11. The character size is 10 pt written in a A4 extraction on a pyramid image may be another direction.
paper with two column format.
If the camera is close to the document as 3cm, the each
character is individually extracted as shown in Figure 4(a). 7. ACKNOWLEDGMENT
On the other hand, the word region cannot be extracted This work is supported in part by a Grant-in-Aid for the
from the image captured far from the document (20cm) as GCOE for high-Level Global Cooperation for Leading-Edge
Platform on Access Spaces from the Ministry of Educa-
tion, Culture, Sport, Science, and Technology in Japan and
Grant-in-Aid for JSPS Fellows.

8. REFERENCES
[1] S. Arya, D. M. Mount, N. S. Netanyahu,
R. Silverman, and A. Wu. An optimal algorithm for
approximate nearest neighbor searching xed
dimensions. J. of the ACM, 45:891923, 1998.
[2] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool. SURF:
Speeded up robust features. CVIU, 110:346359, 2008.
[3] M. Datar, P. Indyk, N. Immorlica, and V. S. Mirrokni.
Locality-sensitive hashing scheme based on p-stable
distributions. In Proc. SCG, pages 253262, 2004.
[4] C. Harris and M. Stephens. A combined corner and
edge detector. In Proc. AVC, pages 147151, 1988.
(a) [5] J. Hull, B. Erol, J. Graham, Q. Ke, H. Kishi,
J. Moraleda, and D. Van Olst. Paper-based augmented
reality. In Proc. ICAT, pages 205209, 2007.
[6] H. Kato and M. Billinghurst. Marker tracking and
hmd calibration for a video-based augmented reality
conferencing system. In Proc. IWAR, 1999.
[7] V. Lepetit, J. Pilet, and P. Fua. Point matching as a
classication problem for fast and robust object pose
estimation. In Proc. CVPR, pages 244250, 2004.
[8] D. G. Lowe. Distinctive image features from
scale-invariant keypoints. IJCV, 60:91110, 2004.
[9] T. Nakai, K. Iwata, and K. Kise. Accuracy
improvement and objective evaluation of annotation
extraction from printed documents. In Proc. DAS,
pages 329336, 2008.
[10] T. Nakai, K. Iwata, and K. Kise. Real-time retrieval
for images of documents in various languages using a
web camera. In Proc. ICDAR, pages 146150, 2009.
(b) [11] T. Nakai, K. Kise, and K. Iwata. Camera based
document image retrieval with more time and memory
ecient LLAH. In Proc. CBDAR, pages 2128, 2007.
[12] D. Nister and H. Stewenius. Scalable recognition with
a vocabulary tree. In Proc. CVPR, pages 21612168,
2006.
[13] E. Rosten and T. Drummond. Machine learning for
high speed corner detection. In Proc. ECCV, pages
430443, 2006.
[14] S. Sinha, J. Frahm, M. Pollefeys, and Y. Genc.
GPU-based video feature tracking and matching. In
Proc. EDGE, 2006.
[15] H. Uchiyama and H. Saito. Augmenting text
document by on-line learning of local arrangement of
keypoints. In Proc. ISMAR, pages 9598, 2009.
[16] D. Wagner, G. Reitmayr, A. Mulloni, T. Drummond,
and D. Schmalstieg. Pose tracking from natural
features on mobile phones. In Proc. ISMAR, pages
(c) 125134, 2008.

Figure 4: Keypoint extraction at a distance. (a) The camera


is set near the document(3cm). (b) The camera is set far
from the document (20cm). (c) The distance is between (a)
and (b) (10cm).
Augmenting Human Memory using Personal Lifelogs

Yi Chen Gareth J. F. Jones


Centre for Digital Video Processing Centre for Digital Video Processing
Dublin City University Dublin City University
Dublin 9, Ireland Dublin 9, Ireland
ychen@computing.dcu.ie gjones@computing.dcu.ie

ABSTRACT General Terms


Memory is a key human facility to support life activities,
Algorithms, Design, Human Factors.
including social interactions, life management and problem
solving. Unfortunately, our memory is not perfect. Normal
individuals will have occasional memory problems which can be Keywords
frustrating, while those with memory impairments can often Augmented Human Memory, Context-Aware Retrieval, Lifelogs,
experience a greatly reduced quality of life. Augmenting memory Personal Information Archives
has the potential to make normal individuals more effective, and
those with significant memory problems to have a higher general
quality of life. Current technologies are now making it possible to
1. INTRODUCTION
Memory is a key human facility inextricably integrated in our
automatically capture and store daily life experiences over an
ability to function as humans. Our functioning as humans is
extended period, potentially even over a lifetime. This type of
dependent to a very significant extent on our ability to recall
data collection, often referred to as a personal life log (PLL), can
information relevant to our current context, be it a casual chat
include data such as continuously captured pictures or videos
with a friend, remembering where you put something, the time of
from a first person perspective, scanned copies of archival
the next train or some complex theory you need to solve a
material such as books, electronic documents read or created, and
problem in the laboratory. Our effectiveness at performing many
emails and SMS messages sent and received, along with context
tasks relies on our efficiency and accuracy in reliably recalling
data of time of capture and access and location via GPS sensors.
the relevant information. Unfortunately humans are frequently
PLLs offer the potential for memory augmentation. Existing work
unable to reliably recall the correct information when needed.
on PLLs has focused on the technologies of data capture and
People with significant memory problems (e.g. amnesic patients)
retrieval, but little work has been done to explore how these
usually face considerable difficulty in functioning as happy
captured data and retrieval techniques can be applied to actual use
integrated members of society. Other people, although having
by normal people in supporting their memory. In this paper, we
much less noticeable memory problems compared with the
explore the needs for augmenting human memory from normal
amnesic patients, may also experience some degree of difficulties
people based on the psychology literature on mechanisms about
in learning and retrieving information from their memory for
memory problems, and discuss the possible functions that PLLs
various reasons. In this paper, we use the phrase normal people to
can provide to support these memory augmentation needs. Based
refer to individuals with normal memory and normal lifestyles, as
on this, we also suggest guidelines for data for capture, retrieval
opposed to amnesic or mentally impaired patients. The
needs and computer-based interface design. Finally we introduce
desirability of a reliable and effective memory means that
our work-in-process prototype PLL search system in the iCLIPS
augmenting memory is a potentially valuable technology for
project to give an example of augmenting human memory with
many classes of people.
PLLs and computer based interfaces.
Normal individuals might use a memory augmentation tool to
Categories and Subject Descriptors look up partially remembered details from events from their life
H.3.3 [Information Search and Retrieval]: Search and Retrieval in many private, social or work situations. The augmented
memory application itself might proactively monitor their context
- Search process, Query formulation, H.5.2 [User Interfaces
and bring to their attention information from their previous life
(D.2.2, H.1.2, I.3.6)]: Graphical user interfaces (GUI),
experiences which may be of assistance or interest to their current
Prototyping, User-centered design
Permission to make digital or hard copies of all or part of this work for
situation. Details from these experiences could be integrated into
personal or classroom use is granted without fee provided that copies are personal narratives for use either in self reflection or to enable
not made or distributed for profit or commercial advantage and that experiences to be shared with friends [1]. Sufficiently powerful
copies bear this notice and the full citation on the first page. To copy augmented memories could not just support their users, but
otherwise, or republish, to post on servers or to redistribute to lists, actually extend the user’s capabilities to enable them to perform
requires prior specific permission and/or a fee. new tasks or existing tasks more efficiently or faster.
Augmented Human Conference, April 2–3, 2010, Megève, France.
Copyright 2010 ACM 978-1-60558-825-4/10/04…$10.00. In order to provide augmented memory applications however we
need some means to capture, store and then access personal
information from a person’s life experiences to form an Microsoft SenseCam2, or camcorders, some projects also use
augmented memory. The emerging area of digital life logging is audio recording to record conversations, the most important
beginning to provide the technologies needed to support these source of our conventional daily communication. Due to privacy
applications. Personal lifelogs (PLLs) aim to digitally record and concerns related to continuous capture of audio data, we do not
store many features of an individual’s life experiences. These can capture audio data in our work. However, other communications
include details of visual and audio experiences, documents sources such as SMS messages and Twitter feeds can be
created or read, the user’s location, etc. While lifelog technologies monitored and included in a PLL. In addition there is a wide
fall short of genuinely mimicking the complexities and processes range of context information that can also be recorded. For
of human memory, they are already offering the promise of life example, location can be monitored using GPS sensors which
enhancing human augmentation, especially for episodic memory then look up named locations in gazetteers, individuals present
impaired patients, that is people who have problems remembering can often be inferred by monitoring nearby Bluetooth enabled
their daily experiences [2]. devices, date and time are often easily be captured, and are very
Existing studies in lifelogs have concentrated primarily on the powerful context data for search of information. Another
physical capture and storage of data. One of the major activities in interesting source of data is biometrics. Research has shown a
this area which has explored this topic in detail relates to Gordon correlation between measurable biometric responses such as heart
Bell’s experiences of digitizing his life described in [3]. Bell rate and skin conductance and temperature, personal arousal and
explores the topic of “e-memories” and “Total Recall” memorability of events [4]. Thus capturing these biometric
technologies related through this own experiences of digitizing features can potentially be used to help locate events in a PLL of
his life. While his work provides significant insights into the potential personal significance to its owner.
potential and issues or digital memories, it focuses very much on
the technologies of capture and storage, and potential
2.2 Related Works
There have been a number of studies on developing memory-
applications. Our work in the iCLIPS project in the Centre for
supporting applications. But most of current research claiming to
Digital Video Processing (CDVP) at Dublin City University1 is
use PLL data in supporting human memory is limited to
exploring capture and search of personal information archives
presenting streams of captured episodes (e.g. video or audio
looking not just at data capture, but also the development of
records of certain episode) to the user, to have them “re-
effective content indexing and search, and importantly in relation
experience” the past, to look up required information or “re-
to this paper, that we are looking at the processes of human
encode” information encountered during that period and
memory, the form and impact of memory failures and how these
consolidate memory of it (e.g.[2, 5, 6]). While these applications
might be overcome using search of PLLs. In our work, we are
appear to have promising results in clinical psychology studies
concentrating on memory failures typically encountered by
with severe amnesic patients, that is, people who suffer from
normal people, and using these to guide the development of a
serious memory disorders (e.g. [2]), such applications may not be
prototype interface to access a PLL as an augmented memory.
equally useful for people who have normal memory abilities. For
The remainder of this paper is organised as follows. In section 2 example, in the study of [2], the subject (patient) can hardly recall
we examine PLLs and related works on digital memory aid tools anything that happened to her even after one day’s delay.
in a little more detail, Section 3 then looks at theoretical models Therefore, a simple review and consolidation of past events can
of memory from the psychology literature, reviews some existing be very helpful for them to maintain necessary episodic memory.
empirical studies regarding normal people’s memory problems A patient’s lifestyle can be very different from that of normal
and memory supports needs in their daily life, and discuss the working people, in that they have enough time to review their
possible function that PLLs may be able to provide for experiences day by day. Therefore the “rehearsal” type memory
augmenting human memory In section 4 we postulate the aid (e.g. [5, 6]) is less likely to be favoured by normal people,
guidelines for developing PLLs systems to augment human unless it contains some important information which is difficult to
memory, giving suggestions on computer based interface remember or if there is some specific information they can’t
designing, types of data to be capture, and retrieval techniques recall. For example, it is not unusual that we need to find an
required. And finally section 5 we introduce the iCLIPS project object or a document but don’t remember where it is, or we meet
and our prototype application. someone we saw before but can’t recall the person’s name.
Ubiquitous Memories [7] is a tool designed to automatically
2. BACKGROUND retrieve video clips which were captured when an object was
previously presented. The developers also argue it to be a tool to
2.1 Personal Lifelogs (PLLs) help people find physical objects. VAM [8] was designed to
PLLs are typically captured using a range of software and automatically retrieve personal information such as the name of a
hardware devices. A separate lifelog is captured for each currently encountered person by automatically detecting the face
individual. In our work we use software applications to log all of the person. Audio life logs such as iRemember [9] are usually
activity on the individual’s desktop and laptop computers. This used to recover information one learned from audio
involves recording files created, edited and opened, logging web conversations. Forget-Me-Not [10] helps people find documents
pages accessed, and archiving emails sent and received. by searching for actions in which the document is involved. The
Peripheral devices are used to continuously record other cues it presents to trigger memories of the target document related
potentially significant data streams. These include visual action also include other actions in the day which are presented
information recorded using a wearable camera, in our case the

1 2
http://www.cdvp.dcu.ie/iCLIPS http://research.microsoft.com/en-us/um/cambridge/projects/sensecam
by iconized attributes, including people, location, actions on one’s memory. In the Seven Sins of Memory [12], Schacter
documents and time stamps, then allow filtering/searching for an characterizes seven daily memory problems including:
action on the document. transience, absent-mindedness, blocking, misattribution,
Most work in PLL capture to date has focused on short term suggestibility, bias, and persistence. These sins can generally fall
studies of a few days or a week or two of data. To support into three categories of memory problems namely: forgetting
research exploring technologies to truly augment human memory (transience, absent-mindedness, blocking), false memory
it is our belief that much longer term PLLs archives are needed (misattribution, suggestibility, bias), and the inability of forgetting
for research. As should be obvious, capturing PLLs using current (persistence). In the remainder of this section, we explain the
technologies requires a considerable investment by the subject mechanisms for these memory sins (problems), and discuss the
capturing their own PLL. Software must be monitored and data possible solutions that PLLs can offer.
carefully archived, more demandingly though the batteries on Table 1. Seven Sins of Memory
peripheral devices must be charged regularly and important data
Sins Meaning
uploaded to secure reliable storage locations. The iCLIPS project
transience the gradual loss of memory overtime
has so far gathered PLLs of 20 months duration from 3 subjects.
absent- incapability to retrieve memory due to the
Our experiences in capturing and archiving this data are described
mindedness lack of attention while encoding the
in detail in [11].
information.
3. MEMORY SUPPORT NEEDS blocking the failure of retrieving encoded
Since people will only turn to use memory aid tools when they information form memory due to the
feel unconfident or incapable of retrieving a piece of information interference of similar information
from their memory, we believe that a sound understanding of the retrieved or encoded before (proactive) or
memory problems people usually encounter in their daily life will after this (retroactive)
provide a guide of the functionality of memory aid tools. misattribution remembering information without correctly
recollecting where this information is from
In this section we first explain memory problems and the suggestibility reconstructing a set of information with
mechanisms which cause problems based on psychology research, false elements, which are from the
we then review existing studies in exploring normal people’s suggested cues at the time of retrieval
memory failures and needs for memory aid tools in daily life, and bias people’s current retrieved or reconstructed
finally we discuss the possible functions that PLLs may be able to memory is influenced by current emotions
provide for augmenting human memory. or knowledge
persistence inability to forget things which one wants
3.1 Theoretical Review to forget
Memory is a cognitive ability to encode, store and retrieve
information. Encoding is the process of converting sensors
received external stimuli into signals which the neuron system in Encoding newly encountered information or thoughts needs to
the brain can interpret, and then absorbing the newly received process them in a short term memory (STM) system, which is
information into long term storage, termed long term memory called working memory (WM). The WM system is comprised of
(LTM). Retrieval is the process of bringing back information from subsystems including separate short term storage channels for
the LTM storage. Different types of retrieval approaches are used visual spatial and acoustic (sound) information, and an episodic
for different types of memory. The two basic categories of buffer which links newly incoming information with what is
memory systems are procedural memory and declarative memory. already in long term storage. WM also has a central executive
Procedural memory is also called implicit memory, meaning that module which assigns cognitive resource (especially attention) to
it is usually retrieved without explicit awareness or mental effort. the channels [13, 14]. Thus the absence of attention can reduce
Examples include memory of motor skills, oral language, and the encoding efficiency or even cause encoding failure of some
memory of some types of routines. Procedural memory usually information input at that time (this is the so-called “absent-
requires minimum cognitive resource and is very durable. It has mindedness” in the seven sins of memory). And information
been found that even people with serious global memory which was paid more attention to is more likely to be better
impairments have preserved procedural memory. For this reason, encoded and therefore more likely to be better remembered. It has
memory aids for procedural memory are not explored in this been suggested that emotion can often influence attention at
paper. Declarative memory as opposed to procedural memory, encoding, and therefore influence the memory of items.
usually involves explicit awareness during encoding and retrieval. Regarding LTM, it has been argued that information in human
There are two major types of declarative memory: semantic memory exists in an associative network, the activation of one
memory, meaning memory of facts, and episodic memory, piece of information (by external output, e.g. presenting that
referring to the memory of experiences, which is usually related information again) can trigger the memory of its linked nodes
to temporal context. Most of our memory problems are [15]. The stronger the link, the more likely the trigger is going to
declarative memory problems. happen. This is why recall is easier when a cue is presented (cued
Although most memory problems can only be observed during recall) than when there is not (free recall). It has been suggested
retrieval, since current techniques are not advanced enough to by many psychology studies that it is the lack of proper links to
know what’s happening in the human mind, failures at any stage information, rather than the loss of the memory of information
can cause problems in memory. For example, failure to encode itself that cause “forgetting”. Since one node of memory may be
encountered information makes the information unavailable in linked to several other nodes, it is important that only the required
information be triggered. Thus, inhibition is an important function word or phrase. Prospective memory problems were also found to
of human memory. However, it may also induce ‘blocking’. A be frequent and usually severe.
classic example is the ‘tip of the tongue' (TOT) phenomenon, The diary study by Hayes et al [19] took a more direct approach
where one is unable to recall the name of some well remembered and explored the situations in which people wanted to use their
information, feeling that the memory is being temporarily memory aid tool, a mobile audio recoding device called Audio
blocked. Loop, to recollect the recorded past. The questions in their diary
False memory, meaning memory errors or inaccurate recollection, study not only included memory failures, but also how much time
also arises due to the structure of the associative memory the participants would be willing to spend on recalling such
network. According to Loftus [16], every time a piece of memory content. Their results showed that for neural events, people would
is retrieved, it is actually reconstructed with associated small spend an average of 336 seconds (σ = 172) to find the required
nodes of information. False memories can bring various problems information from voice records. 62% of the reported reasons for
in daily life. For example, “Misattribution” of witnesses can cause returning to an audio recoding were because of “cannot
serious legal problems if a witness does not know whether the remember”, 33% out of 62% was transience type retrieval failure,
source is from reality or was in a dream or on TV or even while 29% out of 62% were due to failure of encoding (e.g.
imagined. absent-mindedness). Another 26% of their reasons for searching
As for the sin of persistence, this is actually a problem of mental recorded audio were to present the records themselves to other
well-being and cognitive problems with memory. The reason for people. And finally 12% of recordings were marked as important
persistence is that unwanted and sometimes even traumatic before recording. While the reasons for rehearsing these predicted
memories are so well encoded, rehearsed and consolidated, that important records were not described, these results indicate that
they may not be buried or erased. According to theories of important events are likely to be reviewed, and that people may
forgetting, these memories can be “blocked” if the external cues want to “rehearse” recoding of important parts to consolidate their
can form strong link with memories of other experiences, ideally memory of information encountered during the period. Due to
happy experiences. Therefore, having people rehearsing more limitations of the information they record (selective audio
happy memories may find these helpful to replace their memories recording), and the specific tool they use, the scenarios in which
of traumatic experiences. The question of which pieces of happy people may need memory aids might be limited. For example,
memory to present is beyond the scope of our work, and is left to when the experience is largely made of visual memory, audio
clinical psychologists. records may not be helpful and not be desired.

In summary, there are two main reasons for difficulty in retrieving 3.3 Summary
a memory, namely: absence of the memory due to failure at While all of the above studies successfully discovered some daily
encoding, or the lack of proper and strong cues to link to and memory problems, the non-monitored self-reporting approach is
access the correct pieces of memory. For memory problems limited in that the people can only report their needs for memory
arising from both causes, PLLs may have the potential to provide support when they are aware of a difficulty in retrieving a
supplements. Data in PLLs can provide some details which one memory. While it is true that people may only seek help for
failed to encode due to “Absent-mindedness”, or which have specific parts of their memory when they realize that they have
faded in one’s memory over time. It can also provide cues for problem in recollecting these pieces of information from their
memories which have been “blocked”. memory, they are not always very clear as to what they actually
want to retrieve until they bring back the piece of memory. For
3.2 Empirical Studies example, sometimes people just want to review (mentally) some
In this section, we further explore the needs for memory aids past episodes for fun or because of nostalgia. They usually look at
though some documented empirical studies, and use the results of some photos or objects which are related to past events, and
this work to focus our investigation. which bring them more vivid memories of past experiences. Due
In [17], Elsweiler et al explored people’s daily memory problems to the richness of data, lifelogs can provide more details about the
with a diary study in working settings with 25 participants from past than any physical mementos can do.
various backgrounds. They concluded that the participants’ diary In short, lifelogs are a good resource for supporting retrospective
inputs can be split into 3 categories of memory problem: memory problems, including those we have gradually forgotten,
Retrospective Memory problems (47% in their data entry), distorted, or we missed while encoding. Consolidating memory of
Prospective Memory (29%), and Action Slips (24%), which are useful information cam also can also be used to provide digital
also a type of prospective memory failure caused by firmly copies of episodes (e.g. when we need to give a video record of a
routine actions rooted in procedural memory. Since prospective meeting to some one who failed to attend), or provide memory
memory failure and action slips usually happen before the person cues to trigger a person’s organic memory about the original
is made aware of them by experiencing the consequent error information, experiences, emotion, or even thoughts. Lifelogs
caused by the problem, it is unlikely that people will actively seek might also be able to improve a subject’s memory capability by
help from memory aids in these cases, unless the memory aid is training them to elaborate or associate pieces of information.
proactive and intelligent enough to understand what is going on.
Indeed, supporting people’s memory is not only a matter of
Lamming et al [18] also did a diary study to explore possible finding the missing or mistaken parts of memory for them but also
memory problems during work, and found that the most improving their long term memory capabilities. It has been argued
frequently occurring memory problems include: forgetting one’s that the better memory is often related to the ability to associate
name, forgetting a paper document’s location, and forgetting a things, and make decisions of which information to retrieve. For
example, older people usually have less elaborated memory [20].
In the study by [21], psychologists found a tendency for people which can be used as a summary of events, can also be good at
with highly elaborated daily schemas to recall more activities reducing information overload compared to viewing videos
from last week better than people with poorly elaborated schemas. (e.g.[10]). This requires that the system either to detect important
Therefore, memory-supporting tools may be able to assist people parts, or digitize and textualize describable features of physical
to associate things in order to elaborate and consolidate their world entities or events should be digitized to facilitate retrieval.
memories, and which can facilitate retrieval by strengthening the The term digitalize in this paper means represent the existence of
links between memories and the cues that life logs systems can physical world entity as digital items, e.g. an image or a line of
provide, and potentially enhancing their efficiency at performing data in the database. These can be searched directly using certain
various tasks. features (cues), rather than with the features of episodes in which
such information is encountered, e.g. features of a person and a
corresponding profile. Overall appropriate cues really depend on
4. GUIDELINE FOR DEVELOPING LIFE what people tend to remember. Therefore it is important to
explore the question of what people usually remember about the
LOG APPLICATIONS target.
Based on the previous sections, lifelogs should be able to provide
the following:
4.2 Data Capture
• Memory cues, rather than external copies of episodic In principle, the more information that is captured and stored in
memory. lifelogs, the greater will be the chance that the required
• Information or items themselves: semantic memory information can be found in the data collection. However, the
support, when one needs to exact details about previous more data that is collected the more the noise level may also
encountered information, or when one needs the increase and impose a greater burden on the life logger. In order
original digital item, e.g. a document. for a PLL to support the above memory augmenting functions, the
following data channels are needed:
Whether it is the information itself which is needed, or the target
triggered memory, it is important that these items or this 1. Visual
information can be retrieved when needed, and that relevant For the majority of individuals, most of our information is
retrieved results can be recognized by the user. Indeed, what to inputted via our eyes, therefore it is important that encountered
retrieve and even what to capture and store in life logs depends on visual information be captured. While video can capture almost
what needs to be presented to the user to serve the desired every moment when it is recording, watching video streams is a
memory aid functions. heavy information load. However, browsing static images or
photos can be much easier job. Some automatic capturing cameras
4.1 Presenting have been proved to provide rich memory cues [23]. The
There are basically two rules for presenting information:
Microsoft SenseCam is one such wearable camera which
1. Provide useful information as memory cues automatically captures images throughout the wearer’s day. It
takes VGA quality images at up to 10 images per minute. The
When items are presented to the user, it is desirable that the images taken can either be triggered by a sensed change in
information shown can be recognized by the user as what they environment or by fixed timeout. Other examples include the Eye
want, and that if the retrieval targets are cues that are expected to Tap [24] and the other Brother [25].
be useful to triggers to the user’s own memory about something,
e.g. experiences which cannot be copied digitally, it is also 2. Speech
essential that the retrieved targets are good memory cues for the
Another important source of information in daily life comes from
memory that the user wants to recall, e.g. the memory of an
audio. For example, much useful information comes from
experience.
conversations. However, as mentioned previously continuous
Lamming et al. [18] suggested that memory supporting tools audio recording has been argued to be intrusive and unacceptable
should not only provide episodes or information one forgets, but to surrounding people. For this reason, it is difficult to carry out
also episodic cues including other episodes with the temporal continuous audio recording. Some existing studies, such as [9]
relationships among them, together with information about the discussed early, record audio for limited significant periods,
characteristics of these episodes. It is suggested in [8] that the however we chose not to do this since this requires active
features usually visible in episodic memory cues are: who (a face, decisions of when to begin and end capture and careful choice of
any people in the background), where (a room, objects and when to do this to avoid privacy problems. We preferred to
landmarks in the environment), when (time stamped, light continuous and passive capture modes which are non-intrusive.
conditions, season, clothing and hair styles), and what (any visible An alternative source of much of the information traditionally
actions, the whether, etc.) conveyed in spoken conversation is now appearing in digital text
communications as described in the next section.
2. Avoid information overload
3. Textual (especially from digital born items):
It is also necessary to avoid information overload when presenting
material as a memory aid. In [22], it was found that when Nowadays, we communicate more and more with digital
unnecessary information is reduced and important parts of messages (email, instant message, and text message). These
information are played more slowly, their memory aid application content sources contain an increasing portion of the information
achieved its best results. We suggest that text or static images used in daily life which used to come from spoken conversations.
These digital resources, usually in the form of text, have less glaring in the window and she was talking on the phone to her
noise from surrounding environment and irrelevant people, and friend Jack. Conventional search techniques would not be capable
therefore have less likelihood of intruding on a third person’s of retrieving the correct photo based on these context criteria that
privacy. Text extracted from communication records (e.g. emails, are unrelated to its contents. Use of the remembered context
text messages) can be even used to assist narrative events and would enable her to search for pictures viewed while speaking
represent computer activities to trigger related episodic memory with Jack while the weather was sunny. The notion of using
(e.g. [10]). context to aid retrieval in this and other domains is not new.
Context is a crucial component of memory for recollection of
4. Context items we wish to retrieve from a PLL. In previous work we
As mentioned earlier, context information such as location and examined the use of forms of context data, or combinations of
people presented can provide important memory cues for events them, for retrieval from a PLL [28]. This work illustrated that in
[26]. Therefore they are both important for presenting events and some situations a user can remember context features such as time
can be useful for retrieving items related to events. and location, much better than the exact content of a search item,
and that incorporating this information in the search process can
4.3 Retrieval improve retrieval accuracy when looking for partially
The final and possibly most challenging component of an remembered items.
augmented memory application built on a PLL is retrieval. It is Ideally, as argued by Rhodes [29], a memory augmentation
essential that useful information be retrieved efficiently and system should provide information proactively according to the
accurately from a PLL archive in response to the user’s current user’s needs in their current situation. Many studies on ubiquitous
information needs. In order to be used most efficiently by the computing have been devoted to research into detecting events.
user, retrieval must have a high level of precision so as not to For example, retrieving an object related to a recording when
overload the user’s working memory. It is recognized that a key someone touches an object for which the sensor information is
feature of good problem solving is the ability of an individual to passed to the retrieval system as a query. Another system called
retrieve highly relevant information so that they do not have to Ubiquitous memories [7] automatically retrieves target objects
expend effort on selecting pertinent information from among related to a video recoding which automatically tagged when
related information which is not of direct use in the current touching the object. Face detection techniques are used in [8] to
situation. Being able to filter non-relevant information is an tag a person related to a memory, and enable automatic retrieval
important feature of good problem solving. of personal information triggered by detecting of the face.
Finding relevant information in such enormous data collections to Satisfying the need for high precision retrieval from PLLs
serve a user’s needs is very challenging. The characteristics of discussed earlier requires search queries to be as rich as possible
PLLs mean that they provide a number of challenges for retrieval by including as much information as possible about the user‘s
which are different to those in more familiar search scenarios such information need, and then to exploit this information to achieve
as search of the World Wide Web. Among these features are that: the highest possible effectiveness in the search process. Our
items will often not have formal textual descriptions; many items underlying search system is based on the BM25F extension to the
will be very similar, repeatedly covering common features of the standard Okapi probabilistic information retrieval model [30].
user's life; related items will often not be joined by links; and the BM25F is designed to most effectively combine multiple fields
archive will contain much non-useful data that the user will never from documents (content and context) in a theoretically well
wish to retrieve. The complex and heterogeneous natures of these motivated way for improved retrieval accuracy; BM25F was
archives means that we can consider them to be a labyrinth of originally developed for search of web type documents which, as
partially connected related information [27]. The challenge for outlined above, are very different to the characteristics of a life
PLL search is to guide the owner to elements which are pertinent log. Thus we are also interested in work such as [31] which
to their current context, in the same way as their own biological explores ways of combining multiple fields for retrieval in the
memory does in a more in complex and integrated fashion. domain of desktop search. Our current research is extending our
Traditional retrieval methods require users to generate a search earlier work, e.g. [28], to investigate retrieval behaviour using our
query to seek the desired information. Thus they rely on the user’s experimental PLL collections to explore new retrieval models
memory to recall information related to the target in order to form specifically developed for this data. In addition, PLL search can
a suitable search query. Often however the user may have a very also include features such as biometric measures to help in
poor recollection of the item from their past that they wish to location of highly relevant information [4].
locate. In this case, the system should provide search options of
features that people tend to remember. For example, the location
and people attending an event may be well remembered, thus the 5. iCLIPS - A PROTOTYPE PLL SEARCH
search engine should enable search using this information. In fact, SYSTEM
the user may not even be aware of or remember that an item was The iCLIPS project at DCU is developing technologies for
captured and is available for retrieval, or even that a particular effective search of PLLs. To support this research, three
event occurred at all, so they won’t even look for this item researchers are carrying out long term lifelog data collection. As
without assistance. outlined in Section 2, these collections already include 20 months
We can illustrate some of the challenges posed by PLLs retrieval of data, including visual capture of the physical world events with
using an example. Consider a scenario where someone is looking Microsoft SenseCams [32], full indexing of accessed information
for a particular photo from her PLL archive. All she remembers on computers and mobiles phones, and context data including
about the picture is that last time she viewed it, the sun was location via GPS and people with Bluetooth. The Microsoft
SenseCam also captures sensor information such as light status being searched with its inherent redundancy of data with
and movements (accelerometer). Our system indexes every information often repeated in different forms in multiple
computer activity and SenseCam image with time stamps and documents meaning that pieces of information are accessible from
context data including location, people, and weather. It enables different sources using a wide range of queries from users with
search of these files by textual content and above context such. differing linguistic sophistication and knowledge of the domain.
Part of our work continues to focus on the development of novel Additionally in the case of the web link structures generated by
effective search algorithms to best exploit content and context for the community of web authors can be exploited to direct searchers
PLL search. The other focus of the project is the development of a to authoritative or popular pages. In the case of specialised
prototype system to explore user interaction with a PLL to satisfy collections such as medical or legal collections, users are typically
their desire for information derived from their previous life domain experts who will use a vocabulary well matched to that in
experiences. documents in the collection. As outlined in Section 5.3 the
characteristics of PLL collections are quite different to
One of the reasons for the success of popular established search
conventional search collections. An interface to search a PLL
engines such as Google is that their interface is simple to use.
collection requires that the user can enter queries using a range of
Once a few concepts have been understood users are able to use content and context features. The memory association between
these search engines to support their information search activities.
partially remembered life events means that more sophisticated
However, simple interfaces to existing collections work well to a
interfaces supported browsing of the PLLs using different facets
large extent due to the features of the data being searched and the
are likely to be needed to support the satisfaction of user
background of the users. In the case of web search engines the
information needs. Essentially users need an interface to enable
domain knowledge, search experiences and technical background
them to explore the labyrinth of their memory using different
of searchers is very varied. However, the size of the collection
recalled facets of their experiences.

Figure 1. Sample iCLIPS interface

Figure 1 shows our prototype interface for use of a PLL as a presented, thus both searching and browsing panels are
daily memory aid for normal people. In particular, it aims to included.
serve the functions of: providing specific information or digital
items to supplement the parts of memory which are not Search
available to be retrieved; providing cues of specific episodes to The interface provides a range of search options to cater for the
assist the user to rehearse experiences during that period. It also different types of information people may be able to recall about
seeks to assist users in improving memory capability though the episodes or items, such as location, people present, weather
repeatedly associating events or information. This interface conditions and date/time. We understand the burden of trying to
requires user effort to look for or choose the information to be recall and enter all of these details for a single search, so we
adopt the virtues of navigation, and put more weight on the
presentation and browsing of results. This is particularly 6. CONCLUSIONS AND FURTHER
important in cases where over general search queries may bring
too many results for easy presentation. For example, sometimes WORK
people just want to have a look at what happened during certain In conclusion, developments in digital collection and storage
periods, e.g. when they were in middle school, and enter a time- technologies are enabling the collection of very large long term
based query: year 1998, this may result in huge amount of result personal information archives in the form of PLLs storing
data being retrieved which must then be explored by the user. details of an individual’s life experiences. Combining these with
effective tools for retrieval and presentation provides the
Navigation potential for memory aid tools as part of the augmented human.
To avoid information overload when there are a large number of Effective solutions will enable user’s to confirm partially
items as results, and provide instant memory cues for each small remembered facts from their past, and be reminded of things
step, we adopt the advantages of location-based hierarchical they have forgotten about. Applications include recreational and
folder structures to let users navigate and browse search results social situations (e.g. sharing details of a life event), being
which are grouped either temporally or by attributes such as reminded of information in a work situation (e.g. previous
location or core people attended. Based on psychology meetings with an individual, being provided with materials
literature, we believe that when, where and who are well encountered in the past), and potentially for more effective
remembered features of episodes, therefore grouping items problem solving. Integrating these technologies to really support
based on these features makes it easier for users to remember and augment humans requires that we understand how memory
and know where there target is. It also enables them to jump to is used (and how it fails), and to identify opportunities for
other results which have similar attributes (e.g. in the same supporting individuals in their life activities via memory aids.
location, with same group of people). By doing so we also The iCLIPS project is seeking to address these issues by
expect the system to help people remember more context data developing technologies and protocols for collection and
for each event or item, generating more useful associations in management, and for effective search and interaction with
their memory and elaborating them. PLLs.
Presenting results Our current work is concentrated on completing our prototype
system to explore memory augmentation using long-term PLL
While presenting the results, we provide context cues to help
archives. Going forward we are seeking methods for closer
people recognize their target and related folders more easily.
integration between PLLs, the search process and human use of
Since temporally adjacent activities are argued to be good
memory, possibly involving mobile applications and
episodic memory cues, the system enables preview of folders by
presentation of using emerging display technologies such as
presenting landmark events or computer activities (if there are
head up displays and augmented reality.
any) on a timeline. A “term cloud” (a group of selected
keywords, similar to a conventional “tag cloud”) of the
computer activities is also presented in the form of text below 7. ACKNOWLEDGMENTS
the timeline, by clicking a word, its frequency of appearance is This work is supported by grant from Science Foundation
displayed. Again this is designed to provide more memory cues Ireland Research Frontiers Programme 2006. Grant No:
for recalling what the user was doing with documents which 06/RFP/CMS023.
contain such keywords. For example, one may remember that
the target needed was previously encountered during the period 8. REFERENCES
when he/she read a lot about “SenseCam”. The name of the
[1] Byrne, D. and Jones, G., "Creating Stories for Reflection
location and the people are also included in the term clouds for
from Multimodal Lifelog Content: An Initial
the same reason.
Investigation," in Designing for Reflection on Experience,
Due to the complex functions provided in the interface, it is not Workshop at CHI 2009, Boston, MA, U.S.A., 2009.
suitable for portable or wearable devices. Thus it is not aimed at [2] Berry, E., et al., "The use of a wearable camera,
solving memory problems which need solution urgently while SenseCam, as a pictorial diary to improve
the person is away from computers. Alternative interfaces autobiographical memory in a patient with limbic
potentially automatically taking account of current user context encephalitis: A preliminary report," Neuropsychological
(location, associates nearby, and time) would be needed for Rehabilitation: An International Journal, vol. 17, pp. 582
mobile interaction is planned to a part of our further study. - 601, 2007.
[3] Bell, G. and Gemmell, J., Total Recall: Dutton 2009.
We are currently undertaking user studies to evaluate the [4] Kelly, L. and Jones, G., "Examining the Utility of
prototype system. These evaluations include the reliability with Affective Response in Search of Personal Lifelogs," in 5th
which episodes in the results can be recognized from the Workshop on Emotion in HCI, British HCI Conference,
features presented to the searcher, whether they feel that it is Cambridge, U.K, 2009.
easy to recall at least one piece of information required by the [5] Devaul, R. W., "The memory glasses: wearable computing
search fields, and the effectiveness of the retrieval algorithms. If for just-in-time memory support," Massachusetts Institute
these functions are fully working, we can explore how the life of Technology, 2004.
loggers prefer to use these data in supporting their memory, and [6] Lee, H., et al., "Constructing a SenseCam visual diary as a
what functions they may want to use in different situations, with media process," Multimedia Systems, vol. 14, pp. 341-349,
our system and our data collection. 2008.
[7] Kawamura, T., et al., "Ubiquitous Memories: a memory Video Aided Rehearsal," presented at the Proceedings of
externalization system using physical objects," Personal the 2005 IEEE Conference 2005 on Virtual Reality, 2005.
Ubiquitous Comput., vol. 11, pp. 287-298, 2007. [23] Sellen, A. J., et al., "Do life-logging technologies support
[8] Farringdon, J. and Oni, V., "Visual Augmented Memory memory for the past?: an experimental study using
(VAM)," presented at the Proceedings of the 4th IEEE sensecam," presented at the Proceedings of the SIGCHI
International Symposium on Wearable Computers, 2000. conference on Human factors in computing systems, San
[9] Vemuri, S., et al., "iRemember: a personal, long-term Jose, California, USA, 2007.
memory prosthesis," presented at the Proceedings of the [24] Mann, S., "Continuous lifelong capture of personal
3rd ACM workshop on Continuous archival and retrival of experience with EyeTap," presented at the Proceedings of
personal experences, Santa Barbara, California, USA, the the 1st ACM workshop on Continuous archival and
2006. retrieval of personal experiences, New York, New York,
[10] Lamming, M. and Flynn, M., "Forget-me-not: intimate USA, 2004.
computing in support of human memory," in Proceedings [25] Helmes, J., et al., "The other brother: re-experiencing
FRIEND21 Symposium on Next Generation Human spontaneous moments from domestic life," presented at
Interfaces, Tokyo Japan, 1994. the Proceedings of the 3rd International Conference on
[11] Byrne, D., et al., "Multiple Multimodal Mobile Devices: Tangible and Embedded Interaction, Cambridge, United
Lessons Learned from Engineering Lifelog Solutions," in Kingdom, 2009.
Handbook of Research on Mobile Software Engineering: [26] Tulving, E., Elements of episodic memory: Oxford
Design, Implementation and Emergent Applications, ed: University Press New York, 1983.
IGI Publishing, 2010. [27] Kelly, L. and Jones, G. J. F., "Venturing into the
[12] Schacter, D. L., The seven sins of memory. . Boston: labyrinth: the information retrieval challenge of human
Houghton Mifflin, 2001. digital memories," presented at the Workshop on
[13] Baddeley, A., "The episodic buffer: a new component of Supporting Human Memory with Interactive Systems,
working memory?," Trends in Cognitive Sciences, vol. 4, Lancaster, UK, 2007.
pp. 417-423, 2000. [28] Kelly, L., et al., "A study of remembered context for
[14] Baddeley, A. D., et al., "Working Memory," in information access from personal digital archives,"
Psychology of Learning and Motivation. vol. Volume 8, presented at the Proceedings of the second international
ed: Academic Press, 1974, pp. 47-89. symposium on Information interaction in context, London,
[15] Anderson, J. and Bower, G., Human associative memory: United Kingdom, 2008.
A brief edition: Lawrence Erlbaum, 1980. [29] Rhodes, B. J., "The wearable remembrance agent: a
[16] Loftus, E., "Memory Distortion and False Memory system for augmented memory," presented at the
Creation," vol. 24, ed, 1996, pp. 281-295. Proceedings of the 1st IEEE International Symposium on
[17] Elsweiler, D., et al., "Towards memory supporting Wearable Computers, 1997.
personal information management tools," J. Am. Soc. Inf. [30] Robertson, S., et al., "Simple BM25 extension to multiple
Sci. Technol., vol. 58, pp. 924-946, 2007. weighted fields," presented at the Proceedings of the
[18] Lamming, M., et al., "The Design of a Human Memory thirteenth ACM international conference on Information
Prosthesis," The Computer Journal, vol. 37, pp. 153-163, and knowledge management, Washington, D.C., USA,
March 1, 1994 1994. 2004.
[19] Hayes, G. R., et al., "The Personal Audio Loop: Designing [31] Kim, J., et al., "A Probabilistic Retrieval Model for
a Ubiquitous Audio-Based Memory Aid," ed, 2004, pp. Semistructured Data," presented at the Proceedings of the
168-179. 31th European Conference on IR Research on Advances
[20] Rankin, J. L. and Collins, M., "Adult Age Differences in in Information Retrieval, Toulouse, France, 2009.
Memory Elaboration," J Gerontol, vol. 40, pp. 451-458, [32] Gemmell, J., et al., "Passive capture and ensuing issues for
July 1, 1985 1985. a personal lifetime store," presented at the Proceedings of
[21] Eldridge, M. A., et al., "Autobiographical memory and the the 1st ACM workshop on Continuous archival and
daily schemas at work," Memory, vol. 2, pp. 51-74, 1994. retrieval of personal experiences, New York, New York,
[22] Hirose, Y., "iFlashBack: A Wearable Electronic USA, 2004.
Mnemonics to Retain Episodic Memory Visually Real by
Aided Eyes: Eye Activity Sensing for Daily Life

Yoshio Ishiguro† , Adiyan Mujibiya† , Takashi Miyaki‡ , and Jun Rekimoto‡,§


†Graduate School of Interdisciplinary Information Studies,
The University of Tokyo, 7-3-1 Hongo, Bunkyo, Tokyo Japan
‡Interfaculty Initiative in Information Studies,
The University of Tokyo, 7-3-1 Hongo, Bunkyo, Tokyo Japan
§Sony Computer Science Laboratories, 3-14-13 Higashigotanda, Shinagawa, Tokyo, Japan
{ishiy, adiyan, miyaki, rekimoto}@acm.org

ABSTRACT General Terms


Our eyes collect a considerable amount of information when Information extracting for lifelog
we use them to look at objects. In particular, eye movement
allows us to gaze at an object and shows our level of interest
in the object. In this research, we propose a method that
Keywords
involves real-time measurement of eye movement for human Eye tracking, Lifelog computing, Gaze information
memory enhancement; the method employs gaze-indexed
images captured using a video camera that is attached to 1. INTRODUCTION
the user’s glasses. We present a prototype system with an
Lifelog systems have been a topic of considerable research
infrared-based corneal limbus tracking method. Although
[3]. The development of a lifelog computing system has led
the existing eye tracker systems track eye movement with
to the hope that the human memory can be augmented. For
high accuracy, they are not suitable for daily use because
extracting beneficial information from augmented human
the mobility of these systems is incompatible with a high
memory, we consider the “five W’s and one H” (Who, What,
sampling rate. Our prototype has small phototransistors,
When, Where, Why, and How). These provide very impor-
infrared LEDs, and a video camera, which make it possible
tant contextual information. Location estimation methods
to attach the entire system to the glasses. Additionally, the
can answer “Where” [12], and a wearable camera can pro-
accuracy of this method is compensated by combining image
vide the other information. However, we cannot accurately
processing methods and contextual information, such as eye
detect a person’s actions by using only image information.
direction, for information extraction. We develop an infor-
According to visual lifelog researches, it is definitely neces-
mation extraction system with real-time object recognition
sary to extract important parts of life events from enormous
in the user’s visual attention area by using the prototype
amounts of data, such as people, objects, and texts that we
of an eye tracker and a head-mounted camera. We apply
pay attention to. Therefore, we consider using eye activity
this system to (1) fast object recognition by using a SURF
for obtaining contextual information.
descriptor that is limited to the gaze area and (2) descrip-
tor matching of a past-images database. Face recognition
Eye tracking has been extensively studied in medical, psy-
by using haar-like object features and text logging by using
chological, and user interface (UI) researches [5, 6] for more
OCR technology is also implemented. The combination of
than a century. The study of eye tracking has provided us
a low-resolution camera and a high-resolution, wide-angle
with a considerable amount of information such as gazed
camera is studied for high daily usability. The possibility of
object, stress, concentration ratio, and degree of interest in
gaze-guided computer vision is discussed in this paper, as is
the objects [4]. Interaction research using eye tracking has
the topic of communication by the photo transistor in the
been studied. In particular, in this field, wearable comput-
eye tracker and the development of a sensor system that has
ing research has been actively studied using eye movements
a high transparency.
(gaze information) because wearable devices allow intuitive
and free-hand control [17].
Categories and Subject Descriptors
H.5.2 [Information Interfaces and Presentation]: User Even though the current eye tracking method was devel-
Interfaces—Theory and methods oped several decades ago, it still involves the use of headgear
with horns that are embedded camera or requires electrodes
to be pasted on the user’s face and/or other large-scale sys-
tems to be used for a psychological experiment. In other
Permission to make digital or hard copies of all or part of this work for words, this system currently cannot be used for daily activi-
personal or classroom use is granted without fee provided that copies are ties. In this case, a “daily usable system” means a commonly
not made or distributed for profit or commercial advantage and that copies acceptable system that can be used in public in daily life.
bear this notice and the full citation on the first page. To copy otherwise, to Moreover, making the system accurate as well as portable
republish, to post on servers or to redistribute to lists, requires prior specific is a complicated task. A daily usable system for eye activ-
permission and/or a fee.
Augmented Human Conference April 2-3, 2010, Megève, France ity sensing could be utilized in many research areas such as
Copyright 2010 ACM 978-1-60558-825-4/10/04 ...$10.00. wearable computing.

# '"

# •
!
 (#! )!%!""
#"&"!"*
 

$  

  
 
 
•
 
 "$
" $ "!!
$
"   "
   "  " "

  # * $

Figure 1: Concept of the eye enhanced lifelog computing

Gaze direction is used for pointing in the UI research area;


In this research, for human memory enhancement, we however, it is well known that has “Midas touch problem”
examine a method to extract significant information from and that it is difficult to use gaze direction without a trigger
large-scale lifelog data by using eye activity information. such as a key input [8].
We develop a new method that involves real-time measure-
ment of eye movements for automated information extrac- 2.2 Eye Movement
tion. The method makes use of gaze-indexed images cap- Not only the gaze direction but also the eye movement has
tured by a video camera and an eye tracker with a low ac- meaning. In particular, microsaccade indicates the target
curacy but a high wearability; both the camera and the eye of one’s potential interest [4]. The microsaccade is a very
tracker are attached to the user’s glasses. quick eye movement, almost 600◦ /s, and it is a spontaneous
movement caused when the eye gazes at stable targets. The
2. EYE-ENHANCED LIFELOG COMPUTING frequency and direction of this movement change depending
on the person’s interest in the target. The measurement of
Lifelog with a surrounding video image can give us a con- this movement makes it possible to know human suscepti-
siderable amount of information. On the other hand, hu- bilities. The holding time of a gazed object is a conscious
mans obtain surrounding information from their eyes and movement, and the saccadic movement is an unconscious
gaze at interesting objects. However, it is impossible to movement. Therefore, it is possible to extract more informa-
record this type of information by using only a camera. Con- tion from a susceptible mind by the measurement of saccadic
sequently, first, the gazed information is detected from a movements.
video. Then, the gazed objects and the user’s state are ex-
tracted from the video and the eye movement. After this, 2.3 Eye Blink Frequency
related information is retrieved by the extracted informa- Previous research shows that eye movements can provide
tion. Finally, the results are added to lifelog as shown in information about a person’s condition. For example, the
Figure 1. For such reasons, we need to record three types of eye blink frequency shows the person’s degree of concentra-
eye activity —gaze direction, eye movement, and eye tion on his/her work [15].
blink frequency— for using lifelog and UI methodology.
Details of each type of eye activity are explained in this sec- The eye blink frequency decreases when a person concen-
tion. trates on his/her work. In contrast, the frequency increases
when he/she is not very focused on his/her work. Therefore,
2.1 Gaze Direction the measurement and detection of the eye blink frequency
It is difficult to extract significant information from the can estimate the person’s level of concentration. The eye
video image of large-scale personal life log data. For exam- blink has several characteristics. It is a motion that is ap-
ple, omnidirectional-camera video images contain a consid- proximately 150 ms fast. An involuntary eye blink is an
erable amount of information that is not related to human automatic eye blink that has a shorter motion time than
memories; the camera image may not relate to the gazed the voluntary eye blink, which is a conscious eye blink.
object. Therefore, it is difficult to know which objects are
being focused on only from the images. In this research, ob-
taining a video lifelog with gaze information is our objective.
3. DESIGN OF EYEGLASS-EMBEDDED EYE
Gazed objects such as faces and texts are extracted from a ACTIVITY SENSOR
video lifelog, and this information is used for understanding
whom you met, what you saw, and what you were interested 3.1 Requested Specification for Eye Sensing
in. The capability requirement is discussed in Section 2. Eye
movements are typically classified as ductions, versions, or


   
   
  
 
   



  
    

Figure 2: Prototype eye gaze recognizer with camera for lifelog

vergences. Eye movement has several moving speeds. There sufficiently detect eye blinks. Therefore, it has a high con-
are several types of high-speed eye movements. For exam- structability for daily use.
ple, the microsaccade frequency is more than 1000 Hz, and
the eye blink speed is around 150 ms. The method must dis- Therefore, we use an “infrared corneal limbus tracker” in
tinguish precisely between eye movement and blinks for an our study. This method has a lower accuracy than the
accurate detection of eye movements. Further, the human method of search coil and optical lever. However, our pur-
view angle is almost 160◦ for each eye. Therefore, a 5◦ res- pose is to extract significant information; hence, the accu-
olution is sufficient for information extraction because this racy of this method can be enhanced by combining image
system aims not only to achieve a high accuracy but also processing methods and contextual information such as eye
extract information by a daily usable system using a com- direction.
bination of eye activity information and image processing
methods. 3.3 Prototype of Eye Activity Sensor
Four phototransistors and two infrared LEDs are mounted
on the eye glassed as shown in Figure 2. A small camera is
3.2 Eye-tracking Technology Candidates mounted on the glasses for recording surrounding informa-
There are several types of eye trackers. In this study, we tion, and not for eye tracking. An infrared LED and four
consider in four different trackers: phototransistors are mounted inside of the glasses.

Camera based system: The video-based systems [9, 11] The infrared light is reflected by the eye surface and is
can capture a gaze texture image. This is the most com- received by the phototransistor. These sensor values throw
monly used tracker; however, it requires an extremely so- to instrumentation amplifier and analog/digital (AD) con-
phisticated optics system having a light source, lenses, and version, then input to the microprocessing unit (MPU). In
half mirrors. Additionally, it requires a large (table top this study, ATmega128 from Atmel is used for the MPU and
size) measurement system for quick eye movements (over AD conversion. The MPU clock frequency is 16 MHz, and
1000 Hz). Scale-wise, it is possible to develop a smaller the AD conversion time is 16μs per channel.
system; however, currently, such a system cannot measure
high-speed eye movements. Before the measurement, the head position and the display
are fixed for a calibration, and then, the display shows the
Search coil and Optical lever: These methods [13, 18] targets to be gazed in the calibration (Figure 3). The sensor
are used for laboratory experiments in a certain region of wearer gazes at the target object on the display, and the
space. However, these methods are not user friendly as the MPU records the sensor value. One target has 240 points
users are expected to wear special contact lenses that using (W 20 points x H 12 points) and each points are gazed for 1
a negative pressure on their eyes. second. After the calibration, the system estimates the gaze
direction by using the recorded data. The recorded data
Electrooculogram (EOG): Eyes have a steady electric and sensor value are compared first. Then, the center of
potential field, and this electric signal can be derived by us- gravity is calculated from the result in order to estimate the
ing two pairs of contact electrodes that are placed on the gaze direction. Simple method is enough for this research
skin around one eye. This is a very lightweight approach [2] because only gaze area in the picture is needed to know for
and can work if the eyes are closed. However, it requires an using information extraction system.
eye blink detection method and has other issues. For exam-
ple, an electrode is required and is affected by electronoise. 3.4 Life Events Extracting System
When an infrared limbus tracking method is used, the
Infrared corneal limbus tracker: An infrared corneal sensor value is changed rapidly by eye blinking. The speed
limbus tracker [14] is also a very lightweight tracker. It can is approximately 150 ms, as shown in Figure 4. Therefore,
be built by using a light source (infrared LED) and light the system can simply distinguish between blinks and other
sensors (phototransistor) and only requires very low com- eye movements. Further, the system extracts information
putational power. This approach is also affected by noise as face, texts, and preregistered objects. Pre-registered ob-
from environmental light. However, this is a very simple jects are recognized in real time by the user’s visual attention
approach; no electrodes are required. This approach can area. We use fast object recognition by using the SURF [1]
' #'

#'"
&"#"
&"#"

 %!
%! 
"$
"$


 
 






       


 




   !
 


Figure 6: An example graph of eye blink frequency
Figure 3: A calibration method for the gaze recog-
nizer system. The head position and the display are
fixed for a calibration and then the display shows descriptor for matching images that is limited to the gazed
targets. A user gazes target object on the display area with the past-images database (Figure 5).
and MPU records sensor value.
Face recognition using haar-like objects by “OpenCV Li-
brary1 ” is implemented for logging of “When I meet some-
one.” This method can extract the human face first, and
then the system records the time, location, and face image.
 
    
Additionally, text logging with the OCR technology “tesseract-
ocr2 ” is implemented. This system can extract a clipped im-
   

age, wherein it is clipped that gazed area of head-mounted


camera image. This system attempts to extract text from
these clipped images. Finally, the extracted text is recorded
along with time and location data for life logging.

4. CASE STUDY USING PROTOTYPE SYS-


   
  
TEM
4.1 Preliminary Experiment
An infrared limbus tracker is a commonly used tracker;
 
 therefore, the detail of hardware evaluation experiment is
spared. We checked the specifications of the proposed proto-
type system. More than 99% of the eye blinks were detected
Figure 4: An example of a fluctuation in the sensor in 3 min. Very slow eye blinks caused the 1% failure of de-
data by an eye blink tection. The gaze direction angle was 5◦ , and the processing
rate was set as 160 Hz in the preliminary experiments.

4.2 Concentration State Measurements


Our system can detect eye blinks with a high accuracy.
We recorded the eye-blink detection and the user’s tasks for

  approximately 1 hour, as shown in Figure 6. The results
showed that the eye blink frequency changed with a change
in the tasks. The frequency was slower when the user con-
centrated on the objects. Therefore, the system could tell
    the user’s concentration states and we consider that can use
for human interface technique such as displaying and anno-

  tation.

4.3 Life Event Extraction


The proposed method in this study extracts pre-registered
   
  
objects, human face, and character by using images and eye
gaze information. Figures 7 and 8 show the extraction of
objects such as posters. In this situation, the user observes
Figure 5: Image feature extraction by SURF [1], for each poster of the 100 pre-registered posters in the room.
real time object recognition
1
http://opencv.willowgarage.com/wiki/
2
http://code.google.com/p/tesseract-ocr/
% "

  

!"$



      
  
       
     
!
Figure 7: Photographs of experimental environment $" " !#"


 $" " 

 

    
 
 
 
    
 
       
     
!

Figure 9: Gaze direction and extraction results (ID


0 means no objects was extracted)
  

  

Figure 8: Object recognition scene by proposed sys-


tem. This figure shows that the object recognition
system can identify two different objects next each
other.

The IDs of these extracted objects are logged with time, ac-
tual images, and the eye direction when the system detects
the pre-registered objects, as shown in Figure 9. Figure 10
shows the optical character reading of the gazed informa-   




tion. An image of the gazed area is clipped, characters are
extracted from the clipped image. Additionally, the face
image is extracted along with the actual time, as shown in Figure 10: An example image of OCR extraction for
Figure 11. Usually, when multiple people stand in front of clipped image by gaze information using tesseract-
the camera, such as in a city or a meeting room, the normal ocr
recorded video image does not tell you who you are looking
at. However, this method can pick up who you are looking
at by using gaze information. Our system can handled with
multiple objects that shown up in head-mounted camera.
Finally, these three pieces of data are logged automatically.

5. HIGHER REALIZATION OF DAILY US-


ABILITY AND FUTURE POSSIBILITES
From these case studies, it is concluded that information
extraction by means of image processing requires the use of
a wide-angle, high-resolution camera for providing more ac-
curate information. However, it is difficult to mount such a
device on a person’s head. Moreover, the prototype of the
infrared limbus tracker is very small, but the phototransis-
tors obstruct the user’s view. In this section, a combination
of a wide-angle, high-resolution camera and a head-mounted     
  
camera along with the limbus tracker structure without pho-
totransistors is discussed.
Figure 11: An example image of face extraction.
5.1 Combination with High Resolution Wide Faces are extracted from clipped image of head-
mpunted camera by gaze information.
Angle Camera
Having a large-size camera such as a commercially used
$"  
     "
!! 
  
" "
   !"

  
 " 

"
 "
$$!#   $ 


 !" #  !"
 


Figure 13: Illustrations of transparent infrared


corneal limbus tracker

to design the light path, as explained in Figure 13. In this


figure, acrylic boards (refractive index = 1.49) are cham-
fered at approximately 30◦ , and an infrared reflection filter
    
 
is placed in between. The infrared light reflected by the eye
completely reflected in the acrylic material, and then, the
Figure 12: An example image of view point in wide- light is received by the phototransistor that is placed out of
angle camera by head-mounted camera. Gazed po- the user’s view.
sition in head-mounted camera is known, thus it is
possible to project gaze position to high resolution 5.3 Modulated Light for Robustness Improve-
camera image by using position relation of two im- ment and Using for Information Transmis-
ages. sion
Since the infrared cornea limbus tracker is affected by the
environmental light, this method needs to be devised such
USB camera mounted on the head interferes in daily commu- that the infrared light can be modulated for a lock-in am-
nication. Therefore, we embed a very small, poor-resolution plifier (also known as a phase-sensitive detector) [16]. In
camera for capturing the surrounding information in the eye other words, this tracker allows the measurement of envi-
tracker. Hence, this camera can be integrated into the user’s ronmental light through a reflecting eye surface. In fact, the
eye glasses and can capture the user’s actual view. On embedded phototransistor received the modulated backlight
the other hand, the small camera has a very poor perfor- of the normal display from the user’s view during the ex-
mance, and it is difficult to obtain a high frame rate and periments. This phenomenon with a lock-in amplifier can
a high resolution by using such a camera. Therefore, the isolate the reflected light from the modulated tracker light
image processing of information extraction methods is at source that measures eye movements and the modulated en-
times not possible. Therefore, we consider a strap-on cam- vironmental light. It is also possible to get information from
era (such as SenseCam [7] that can be dangled around one’s objects when the user gaze light sources as studied in [10].
neck) that has fewer problems than a head-mounted cam-
era. Strap-on cameras do not disturb any communication
and can be attached to the body more easily than a head- 6. CONCLUSIONS
mounted camera. Therefore, we can use a high-resolution In this research, we have described an infrared corneal lim-
camera with wide-angle lens. This prototype system com- bus tracker system to measure the eye activity for contextual
pares the SURF descriptor between the head-mounted cam- information obtained by information extraction from the
era and the strap-on camera and then calculates the homog- lifelog database. It is possible to use the proposed method
raphy matrix. From the results, we can identify the focus in daily life. In fact, we combined the low-accuracy, high-
of the head-mounted camera from the strap-on camera’s im- wearability eye tracker and image processing methods in our
ages. As a result, a high-resolution image can be used for system. In the case study, we could detect the eye blinks
the information extraction, as shown in Figure 12. with a high accuracy and estimate the participant’s concen-
tration state. Then, we combined this tracker and an image
5.2 Improving Transparency of Eye-tracker processing method such as face detection, OCR, and object
Developing a new system for daily use that is so comfort- recognition. Our eye tracking system and eye activity infor-
able that the user is not even aware of wearing it is our mation successfully extracted significant information from
long-term objective. The infrared limbus tracker has a very the lifelog database.
simple mechanism; therefore, it has highly possibility that
modification of camera based system. This tracker does not Finally, we discussed the possibility of developing a trans-
require lens and focal distance. The camera-based system missive sensor system with an infrared corneal limbus tracker
can use a half mirror to see the eye image; however, the sys- and two cameras having different resolutions for our long-
tem has to be in front of the eyes, as shown in Figure 13. term objective of designing a system suitable for daily use.
In addition, since the eyes follow objects even when the
Because of the above-mentioned reasons, we consider a user’s body moves, information about the eye direction can
transmissive sensor system. The infrared limbus tracker be used for image stabilization and it can be effective utilized
does not have a focal point unlike a camera, and it is easy in image extraction methods. We believe this research can
contribute to the utilization of augmented human memory. [16] P. A. Temple. An introduction to phase-sensitive
amplifiers: An inexpensive student instrument.
American Journal of Physics, 43(9):801–807, 1975.
7. ACKNOWLEDGMENTS [17] D. J. Ward and D. J. C. MacKay. Artificial
This research was partially supported by the Ministry of intelligence: Fast hands-free writing by gaze direction.
Education, Science, Sports and Culture, Grant-in-Aid for Nature, 418:838, 2002.
JSPS Fellows, 21-8596, 2009. [18] A. Yarbus. Eye movements and vision. Plenum Press,
1967.
8. REFERENCES
[1] H. Bay, T. Tuytelaars, and L. V. Gool. Surf: Speeded
up robust features. In 9th European Conf. on
Computer Vision, May 2006.
[2] A. Bulling, D. Roggen, and G. Tröster. Wearable eog
goggles: eye-based interaction in everyday
environments. In Proc. of the 27th int. conf. extended
abstracts on Human factors in computing systems,
pages 3259–3264, 2009.
[3] B. P. Clarkson. Life Patterns: structure from wearable
sensors. Ph.D thesis, 2002.
[4] S. M. Conde and S. L. Macknik. Windows onthe mind.
Scientific American, 297(2):56–63, 2007.
[5] A. Duchowski. Eye Tracking Methodology. Springer,
2007.
[6] J. M. Findlay and I. D. Gilchrist. Active Vision: The
Psychology of Looking and Seeing. Oxford University
Press, 2003.
[7] J. Gemmell, G. Bell, and R. Lueder. MyLifeBits: a
personal database for everything. Commun. ACM,
49(1):88–95, 2006.
[8] R. J. K. Jacob. Eye movement-based human-computer
interaction techniques: Toward non-command
interfaces. In Advances in Human-Computer
Interaction, pages 151–190. Ablex Publishing Co,
1993.
[9] D. Li, J. Babcock, and D. J. Parkhurst. openEyes: a
low-cost head-mounted eye-tracking solution. In Proc.
of the 2006 symp. on Eye tracking research &
applications, pages 95–100, 2006.
[10] Y. Mitsudo. A real-world pointing device based on an
optical communication system. In Proc. of the 3rd Int.
Conf. on Virtual and Mixed Reality, pages 70–79,
Berlin, Heidelberg, 2009. Springer-Verlag.
[11] T. Ohno. Freegaze : a gaze tracking system for
everyday gaze interaction. Proc. of the symp. on eye
tracking research & applications symposium, 2002.
[12] J. Rekimoto, T. Miyaki, and T. Ishizawa. Life-Tag:
WiFi-based continuous location logging for life pattern
analysis. In 3rd Int. Symp. on Location- and
Context-Awareness, pages 35–49, 2007.
[13] D. Robinson. A method of measuring eye movement
using a scleral search coil in a magnetic field. In IEEE
Trans. on Bio-Medical Electrics, number 10, pages
137–145, 1963.
[14] W. M. Smith and J. Peter J. Warter. Eye movement
and stimulus movement; new photoelectric
electromechanical system for recording and measuring
tracking motions of the eye. J. Opt. Soc. Am.,
50(3):245, 1960.
[15] J. A. Stern, L. C. Walrath, and R. Goldstein. The
endogenous eyeblink. Psychophysiology, 21(1):22–33,
1983.

You might also like