You are on page 1of 8

Designing Attentive Interfaces

Roel Vertegaal
Human Media Lab, CISC
Queen's University
Kingston, ON K7L 3N6
Canada
roel @ a c m . o r g
www. hml. queensu, ca

ABSTRACT encountering eye based interaction, most people are


In this paper, we propose a tentative framework for genuinely impressed with the almost magical
the classification of Attentive Interfaces, a new window into the mind of the user that it seems t o
category o f user interfaces. An Attentive Interface provide. There are two reasons why this belief m a y
is a user interface that dynamically prioritizes the lead to subsequent disappointment. Firstly, although
information it presents to its users, such that current eye tracking equipment is far superior to
information processing resources o f both user and that used in the seventies and early eighties, it is by
system are optimally distributed across a set o f no means perfect. For example, there is still the
tasks. The interface does this on the basis o f tradeoff between the use o f an obtrusive head-based
knowledge - consisting o f a combination o f system or a desk-based system with limited head
measures and m o d e l s - of the past, present and movement. Such technical problems continue to
future state o f the user's attention, given the limit the usefulness o f eye tracking as a generic
availability of system resources. We will show how input. Secondly, there are real methodological
the Attentive Interface provides a natural problems regarding the interpretation o f eye input
extension to the windowing paradigm found in for use in graphical user interfaces. One example,
Graphical User Interfaces. Our taxonomy o f the "Midas Touch" problem, is observed in systems
Attentive Interfaces allows us to identify classes o f that use eye movements to directly control a mouse
user interfaces that would benefit most from the cursor [20]. When does the system decide that a
ability to sense, model and optimize the user's user is interested in a visual object? Systems t h a t
attentive state. In particular, we show how systems implement dwell time for this purpose run the risk
that influence user workflow in concurrent task o f disallowing visual scanning behavior, requiring
situations, such as those involved with m a n a g e m e n t users to control their eye movements for the
of multiparty communication, may benefit f r o m purposes o f output, rather than input. However,
such facilities. difficulties in the interpretation o f visual interest
remain even when systems use another input
modality for signaling intent. Another classic
Keywords methodological problem is exemplified by the
Attentive Interfaces, Nonverbal Computing, E y e application o f eye movement recording in usability
Tracking, Attention, User Interfaces. studies. Although eye fixations provide some o f the
best measures o f visual interest, they do not provide
1. I N T R O D U C T I O N a measure o f cognitive interest. It is one thing to
In recent years, there has been a resurge o f interest determine whether a user has observed certain visual
in the use o f eye tracking systems for interactive information, but quite another to determine
purposes. However, it is easy to be fooled by the whether this information has in fact been processed
interactive power o f eye tracking. When first or understood [19].
Permission to make digital or hard copies of all or part of Ibis work for Some o f our technological problems can and will be
personal or classroom use is granted without fee provided that copies
are not made or distributed for profit or cornmercial advantage and that solved. However, we believe that our
copies bear this notice and the full citaUon on the first page. To copy methodological issues point to a more fundamental
otherwise, to republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
problem: What is the nature o f the input
ETRA'02 New Odeans Louisiana USA information conveyed by eye movements and to
Copyflght ACM 2002 1-58113-447-3/02/03...$55.00

23
.A
what interactive functions can this information Esthetics
provide added value?
For now, it seems, the most successful applications
have been those in which the information c o n v e y e d
by eye m o v e m e n t data is a direct measure o f t h e
locus o f the user's (visual) attention. We argue t h a t
in m a n y o f these applications, the function o f this ~,iiiii!ii !~!:,
information is to optimize the attention o f t h e
user, the attention o f other users, or the processing Engineering Human Factors
resources o f the computing system itself. W e
believe such applications in fact constitute a new
category o f user interface: the Attentive Interface.
Esthetics
2. B A C K G R O U N D
The rationale for designing o f artifacts in such a
w a y that they optimize the attention o f their users
is not new. It existed long before personal
computers were born and was the backbone o f t h e
European abstract functional art m o v e m e n t Engineeri tman Factors
throughout much o f the 20 th century. According to
Bauhaus director and architect Ludwig Mies van der
Rohe, tools and other functional artifacts, such as
buildings, needed to be designed using only those
Figure 1. Folding a design space such that
materials necessary to keep the construction constraints function as complements rather than
structurally sound [3]. There were three m a i n tradeof~.
arguments to his "Less is More" rationale: 1) This is certainly the case with the eye, as t h e
relinquishing unnecessary embellishments would information bandwidth o f the retina is a function o f
improve the communication o f the artifact's the distance to the center o f the e y e ' s visual axis.
functionality; 2) simplifying the design would ease However, James [10], and later Deutch & Deutch
automated manufacturing while improving [4], emphasized the (semantic) filtering that occurs
structural soundness, thus cutting costs; 3) the focus after information has entered the brain. This is
on the essence rather than on the peripheral would evidenced in the eye by the so-called A t t e n t i o n a l
increase the esthetic value o f the artifact. Figure 1 Spotlight: that part o f the retina that the brain
visualizes the design space constituted by the above selects for visual processing [14]. The c o n t r o v e r s y
trade-offs between the ease o f use, engineering was not just a question o f whether a t t e n t i v e
requirements and esthetics o f a design. What Mies filtering occurs early or late in human i n f o r m a t i o n
van der Rohe tried to argue was that i f the right processing. It was also a question o f what
design choices are made, the above parameters need determines the selection o f information: is
not be counteracting. Instead, they may selection triggered by external stimuli or by
c o m p l e m e n t and strengthen each other, as anticipation through prior knowledge? This
visualized in Figure 1 by a folding o f the design controversy was only recently concluded with
space. Treisman's Feature Integration Theory: that
attentive selection is guided by both input and
It is exactly this principle o f e c o n o m y o f resources
cognition [17]. The question o f input-driven versus
that drives human attention. Although humans
have vast mental resources, they are o f course knowledge-driven selection is very relevant to t h e
design o f an Attentive Interface. M a n y o f t h e
limited. N o r m a n [13] argued that for this reason,
methodological problems associated with eye
humans require a m a n a g e m e n t system that selects
and filters only that information that is m o s t tracking applications are due to the fact that eye
tracking input alone provides insufficient
relevant. There has been much controversy as to
how this selection process might operate. Von information for selective filtering.
Helmholtz [8], and later Broadbent [2], considered
human attention mostly as a physical filter, t h a t
selects information before entry into the brain.

24
optimizes the information representation of the
system according to her own attention, but also
optimizes that o f the computer system itself. By
focusing processing priority on windows in the
foreground, the resources of the system are
optimally employed to comply with the current
intent o f the user. However, for the most part, it is
the user's responsibility to optimize the e c o n o m y
of attention between system and user through a
pointing device.
We do not have to look far to find examples o f
systems that seek information about the intent o f
their users through means that are much more
proactive and implicit. A sophisticated application
of this principle can be found in m o d e m traffic
light control systems. These actively seek
Figure 2. Passive attentive toilet design (Amsterdam information about the user's intent by implicitly
Schiphol Airport, The Netherlands). sensing the focus of their attention through coils in
2.1 Early Attentive Interfaces the road surface. Traffic control systems do n o t
The question of input-driven versus knowledge- rely solely on the input provided by their sensors.
driven information filtering also relates Mies van They apply knowledge rules based on statistical
der Rohe's design rationale to the economy of user measures o f traffic flow to decide which
information processing in the context of a set o f information to present to what user.
tasks. We should design each tool such that the The above are all examples of interfaces
minimum information is provided to trigger the predominantly occupied with the optimization o f
knowledge required to operate the tool [28]. We the user's current attention. A diary is a prime
can thus interpret Mies van der Rohe's adage as a example o f a tool that seeks to optimize the future
principle of information preservation. As any attention o f its user. As with traffic lights, the
designer knows, this information comes in the form automation and interconnection of diaries, e.g., in
of the user's sequencing of intent and the task hand held devices, has allowed the preemption o f
knowledge structures triggered by that intent [18]. user attention through active interruption.
An excellent example of how one can effectively However, as opposed to traffic lights, diaries know
influence the user's current intent on the basis o f little about the current state o f the user's attentive
information about the user's goal intent is provided resources. The same applies to telecommunication
by the Attentive Toilets found at Amsterdam's and networked information systems such as email.
Schiphol airport (see Figure 2). Indeed, the designer It is in the notification structure associated with
o f these toilets has found a very simple y e t these systems that an attentive interface would
effective mechanism to direct the intent o f the perhaps be most pertinent. Indeed, according to
user. However, in cases where the user's intent is in Goldhaber [7], the Internet itself can be seen as an
constant flux, artifacts need to have the capabilities economy o f the combined attentive resources o f its
to dynamically prioritize their information users. He argues that in this economy the attention
presentation on the basis of changes in that intent. span o f the consumers is a valuable limited resource.
Graphical User Interfaces were designed to do just This is demonstrated by the current practice o f
that. According to Smith et al. [16], the rationale internet based advertising agencies to sell the
for the design of windowing systems is to allow attention of web site users as measured by page
users to focus on the task with the highest priority, views.
in the context o f other tasks o f lower priority. In a
typical graphical user interface, windows relevant to
the present task will occupy the display space on
which the user is currently fixating. Tasks of lower
priorities are represented by icons that occupy
peripheral vision. By selecting and positioning on-
screen windows with a mouse, the user not only

25
3. F E A T U R E S O F A T T E N T I V E 4. C L A S S I F Y I N G A T T E N T I V E
INTERFACES INTERFACES
So what defines an attentive interface? A n However, the main classification provided by
attentive interface is a user interface t h a t Table 1 is that according to utility: the task the
dynamically prioritizes the information it presents Attentive Interface seeks to optimize. Our
to its users, such that information processing categories are ranked according to the level o f
resources o f both user and system are optimally mental processing required for the task.
distributed across a set o f tasks. The interface does Perhaps the most urgent category is that o f
this on the basis o f knowledge - consisting o f a attentive interfaces involved in the m a n a g e m e n t o f
combination o f measures and models - o f the past, group communication. According to Duncan [6],
present and future state o f the user's attention, turn taking in group conversations is very m u c h a
taking into account the availability o f system function o f the optimization o f group attention:
resources. As we have seen in the introduction, listeners choose to be silent such that e v e r y o n e can
there can be m a n y types o f attentive interfaces. focus all attentive resources on a single speaker.
Given a set o f examples, we will, however, try to Vertegaal et al. [24, 25] demonstrated that p e o p l e
identify a number o f c o m m o n themes, imposing t h e indeed use information about each other a t t e n t i v e
tentative classification shown in Table 1. Firstly, state, as given by their gaze direction, in this
attentive interfaces can be differentiated on the process. The G A Z E Groupware System [21] is a
basis o f their ability to actively monitor the user's video conferencing system that exploits this
attentive state. The attentive toilet shown in principle. It conveys the gaze direction o f
Figure 1 is an example o f a system that manipulates conversational partners across a network by
the attention o f its user without any external rotating images o f the participants toward t h e
measures. Although windowing systems o f graphical person they look at. The system measures w h o m
user interfaces have no means to measure the user's participants look at using an eye tracker. Other user
attentive state implicitly, they do allow the user t o interfaces that fall in this category are avatar-based
communicate their attention explicitly through t h e communication systems and multi-agent
manual organization o f windows. We t h e r e f o r e conversational systems such as FRED [23]. T h e
classify these as explicit. Systems that e m p l o y next category deals with the m a n a g e m e n t o f offline
sensing technology such as eye tracking to measure communication. The sheer volume o f today's email
the user's current state o f attention are classified as messaging has rendered the crude form o f
implicit. The next feature determines what is notification m a n a g e m e n t present in t o d a y ' s email
measured by such sensing technology: the locus o f a clients inappropriate. To solve this problem,
user's attention, or its span. Typically, a t t e n t i v e Horvitz [9] developed Priorities, a messaging tool
interfaces will employ a combination o f these that collects attentive information - e.g., about t h e
techniques. Another important descriptor o f an frequency with which users respond to messages
attentive interface is the type o f sensing from certain people - t o prioritize the delivery o f
technology employed. In most explicit systems, messages. Messages with a high priority rating are,
this will simply be the user's pointing device. for example, forwarded to an offfline user's pager,
According to Vertegaal [26], systems that e m p l o y while messages with low priority wait until the user
more implicit measures m a y use a combination o f checks them. Although Priorities builds a model o f
user presence, body orientation, head o r i e n t a t i o n the user's attentive behavior, its ability to obtain
and ultimately eye orientation to detect a t t e n t i v e information about the current state o f the users'
state. A n o t h e r important feature is whether t h e attention is limited. The third category deals with
system stores any o f the above measures as a the opening and closing o f dyadic c o m m u n i c a t i o n
model. For example, traffic control systems m a k e channels. The E y e R glasses developed b y Selker
their decisions on the basis o f rules about t h e [15] use infrared light to sense whether a user is
normal flow and volume o f traffic. Finally, we orienting his head towards another user. This
consider the way in which an attentive system m a y information is then applied to regulate the opening
increase or decrease the attentive load o f the user, and closing o f communication channels, f o r
system, or network. A cell phone commands t h e example with a robot dog. When the user orients
user" s immediate attention through active himself towards the robot dog, it starts barking.
interruption, yet does little to optimize the user's EyeContact, developed by Vertegaal [22] is a
mental load. similar system which applies long range wearable

26
eye tracking technology to detect fixations at the 5. D I S C U S S I O N
user by other people. It can be used to modify the The proposed classification allows us to reason
interruption state of incoming cell phone calls explicitly about the design attributes of attentive
according to whether the user is currently engaged interfaces. It allows us to identify the categories o f
in a conversation, thus creating attentive cell user interfaces that are most likely to require
phone technology [22]. attentive enhancements through tracking
Our next category pertains to what is perhaps the technologies and priority models. We have listed a
most basic feature o f any attentive interface: the number of these candidates in Table 1, some o f
optimization o f concurrent task execution. In the which we will now discuss. Likely candidates for
classic computer game Pong, the objective is to attentive enhancement in the category o f group
bounce a ball against a wall using a paddle. In communication systems are systems that represent
traditional systems, the paddle is controlled by a users in a virtual space in the form of avatars. Some
joystick, leading to considerable eye-hand systems in this category already employ a crude
coordination problems. LC Technologies developed form of attention management that uses co-
an eye-based pong that instead uses the horizontal alignment o f avatars to decide whether to open or
coordinate of eye movements to control the close audio connections [1]. However, systems in
position o f the paddle, making this a game you this category are very likely to benefit from the use
cannot lose [11]. The windowing systems employed o f more implicit sensing technologies, such as the
by graphical user interfaces are another example in measurement of eye, head or body orientation. In
this category. The most striking differences the category of dyadic communication
between Eye-based Pong and classic windowing management, cell phones are the most obvious
systems is of course the absence o f implicit candidates for improvement. Interruption o f
measures in the latter. The MAGIC pointing system multiparty conversation through cell phones has
described by Zhai et al. [27] provides a hybrid become such a problem that it has prompted the
between the two approaches. In MAGIC pointing, installation of cell phone jamming technology in
the user's eye coordinates are used to preposition certain meeting areas. Cell phones should be capable
the mouse cursor such that it reduces m o v e m e n t o f adjusting their interruption priorities according
time toward a visual target. to the user's current state o f attention [22].
Our final three categories pertain to the From our categorization it also becomes clear t h a t
management o f the user's basic attentive resources. there are very few systems indeed that employ a
Attentive interfaces involved in the m a n a g e m e n t model o f the user's past attention in their
of the user's physical transportation seek to prioritization o f current information. One reason
optimize the user's attentive resources at the for this is that the required statistical reasoning
presence level. As such, they regulate the physical techniques are still in their infancy [9]. As
movement of users. Traffic control systems are Microsoft's Office Assistant exemplifies, it is better
prime examples o f this category. More basic not to incorporate any model than it is to
systems in this category are automatic doors and incorporate a poor model. However, we believe t h a t
other presence-operated devices. as the technology improves, we will increasingly see
One should note that attentive interfaces need n o t statistical learning techniques incorporated in
optimize the attentive resources o f a user or system attentive interface design.
solely by a reduction in load. They can also
empower the user by enlarging their attentive
capacity. Indeed, most gaze contingent displays
seek to augment the basic attentive resources o f the
user or system. Attention-based video compression
[5] and gaze-contingent 3D level-of-detail
rendering [12] are examples o f this category. Our
final category pertains those interfaces with the
most basic of attentive tasks: to attract the user's
attention. The attentive toilet shown in Figure 1 is
a prime example o f this, as are most security
monitoring systems.

27
Active Locus/ Attention Active
Sensor Reduce Load
Span Model Interrupt
Group Communication
Management
GAZE Video Conferencing Implicit Locus Eye User
System
Network
Attentive Agents Implicit Both Eye User
System
Avatars Explic~ Locus Hand User
Network
Oflline Communication
Management
Priorities System lmp~c~ Span Hand User
Network
Email ExpHcit Span Hand None
Diary No User
Headline No User
Dyadic Communication
Management
EyeR Implic~ Both Head User
System
network
EyeContaet Sensor lmp~c~ Both Eye User
System
Network
Cellphone Explicit span Hand None

Concurrent T a s k M a n a g e m e n t
Eye-based Pong Implicit Locus User
MAGIC pointing Implicit Both Eye User
Windows Explicit Both Hand User
System
Physical Transportation
Management
Presence Operated Devices Implicit Both Body None
Traffic Control Systems Implicit Both Body User
System
Door No None

A u g m e n t A t t e n t i v e Resources
Gaze-Contingent Displays Implicit Locus Eye User
System
Attention-based Compression Implicit Both Eye User
Syxtem
Network
Canon EOS-5 ImplicH Locus Eye User
System

Direct A t t e n t i v e R e s o u r c e s
Securi W Monitoring Systems Implicit Locus Body User

Attentive Toilet No System

T a b l e 1. C l a s s i f i c a t i o n o f A t t e n t i v e I n t e r f a c e s w i t h e x a m p l e s .

28
6. C O N C L U S I O N S 5. Duchowski, A.T. and McCormick, B.H. Gaze-
In this paper, we proposed a tentative framework Contingent Video Resolution Degradation. In
for the classification of Attentive Interfaces, a new Human Vision and Electronic Imaging III. San
category o f user interfaces. We defined an Jose, CA USA: SPIE, 1998.
Attentive Interface as a user interface t h a t 6. Duncan, S. Some Signals and Rules for T a k i n g
dynamically prioritizes the information it presents Speaking Turns in Conversations. Journal o f
to its users, such that information processing Personal and Social Psychology 23, 1972.
resources of both user and system are optimally
7. Goldhaber, M. The Attention E c o n o m y and
distributed across a set o f tasks. The interface does
the Net. First Monday
this on the basis of knowledge - consisting o f a
http ://www.firstmonday. dk, 1997.
combination o f measures and models - o f the past,
present and future state of the user's attention, 8. H. Helmholtz, von. Helmholtz's Treatise o n
given the availability o f system resources. Measures Physiological Optics. Southall, J.P.C. (Ed.).
are provided by tracking presence, body New York: Dover Publications, 1962.
orientation, head orientation or eye orientation o f 9. Horvitz, E., Jacobs, A., and Hovel, D.
users. Models may consist of rules and heuristics Attention-Sensitive Alerting. In Proceedings
based on previously observed patterns of attentive o f UAI '99, Conference on Uncertainty and
behavior. We have shown how Attentive Interfaces Artificial Intelligence, Stockholm, Sweden.
provide a natural extension to the windowing Morgan Kaufmann: San Francisco, 1999, pp.
paradigm of the Graphical User Interface. One o f 305-313.
the main benefits of our classification o f A t t e n t i v e
10. James, W. The Principles o f Psychology. New
Interfaces is that it makes explicit those categories
York: Holt, 1890.
of user interfaces that would benefit most from t h e
ability to sense and model the user's attentive state. 11. LC Technologies, Inc. The Eyegaze
In particular, we showed how systems that influence Communication System. Fairfax, VA:
user workflow in concurrent task situations, such as http://www.eyegaze.com, 1997.
those involved with management o f multiparty 12. Murphy, H. and Duehowski, A. Gaze-
communication, may benefit from such facilities. Contingent Level of Detail Rendering. In
7. A C K N O W L E D G E M E N T S EuroGraphics 2001. Manchester, UK, 2001.
This work was supported by grants from NSERC, 13. Norman, D. Towards a Theory o f M e m o r y and
OIT, CITO & CFI of Canada and by LC Attention. Psychological Review 88, 1968, pp.
Technologies, Inc. We thank all members o f t h e 1-15.
Human Media Lab at Queen's, particularly C o n n o r 14. Posner, M.I. Orienting of Attention. Quarterly
Dickie, Changuk Sohn, Ryan Brown, Jeff Shell, Journal o f Experimental Psychology 32, 1980,
Anthony Legault and Ivo Weevers. We also t h a n k pp. 3-25.
Myron Flickner and David Koons o f IBM Almaden
15. Selker, T. Eye-R, a Glasses-Mounted E y e
Research Center for their support.
Motion Detection Interface. In Extended
8. R E F E R E N C E S Abstracts o f CHI'01. Seattle, WA: ACM,
1. Benford, S., Greenhalgh, C., Bowers, J., 2001.
Snowdon, S., and Fahl~n, L. User E m b o d i m e n t 16. Smith, D., Irby, C., Kimball, R., Verplank, B.,
in Collaborative Virtual Environments. In and Harslem, E. Designing the Star User
Proceedings o f CHI'95. Denver, Colorado: Interface. Byte 7(4), 1982, pp. 242-282.
ACM, 1995. 17. Treismann, A. and Gelade, G. A Feature-
2. Broadbent, D.E. Perception and Integration Theory of Attention. Cognitive
Communication. London: Pergamon Press, Psychology 12, 1980, pp. 97-136.
1958. 18. Van der Veer, G.C., Yap, F., Broos, D., Donau,
3. Carter, P. and Mies van der Robe, L. Mies Van K., and Fokke, M.J. ETAG - Some
Der Rohe at Work. Phaidon Press, 1999. Applications o f a Formal Representation o f
4. Deutsch, J.A. and Deutsch, D. Attention: Some the User Interface. In Proceedings o f
Theoretical Considerations. Psychological I N T E R A C T ' 9 0 . Amsterdam: North-Holland,
Review 70, 1963, pp. 80-90. 1990.

29
19. Velichkovsky, B. and Hansen, J.P. New of CHI'2000. The Hague, The Netherlands:
Technological Windows to Mind: There is ACM, 2000.
More in Eyes and Brains for Human-Computer 24. Vertegaal, R., Slagter, R., Van der Veer, G.C.,
Interaction. In Proceedings of CHI'96. and Nijholt, A. Eye Gaze Patterns in
Vancouver, Canada: ACM, 1996, pp. 496-503. Conversations: There is More to
20. Veliehkovsky, B., Sprenger, A., and Unema, P. Conversational Agents Than Meets the Eyes.
Towards Gaze-mediated Interaction: Collecting In Proceedings of CHI'01. Seattle, WA: ACM,
Solutions of the "Midas Touch Problem". In 2001.
Proceedings of INTERACT '97. Sydney, 25. Vertegaal, R., Van der Veer, G.C., and Vons, H.
Australia, 1997. Effects of Gaze on Multiparty Mediated
21. Vertegaal, R. The GAZE Groupware System: Communication. In Proceedings of Graphics
Mediating Joint Attention in Multiparty Interface 2000. Montreal, Canada: Morgan
Communication and Collaboration. In Kaufmann Publishers, 2000, pp. 95-102.
Proceedings of CHI'99. Pittsburg, PA: ACM, 26. Vertegaal, R., Veliehkovsky, B., and Van der
1999. Veer, G.C. Catching the Eye: Management o f
22. Vertegaal, R., Dickie, C., Sohn, C. and Flickner, Joint Attention in Cooperative Work. SIGCHI
M. Designing Attentive Cell Phones Using Bulletin 29(4), 1997.
Wearable EyeContact Sensors. Submitted to 27. Zhai, S., Morimoto, C., and Ihde, S. Manual
Extended Abstracts of CHI'02. Minneapolis: and Gaze Input Cascaded (MAGIC) Pointing.
ACM, 2002. In Proceedings of CHI'99. Pittsburgh, PA:
23. Vertegaal, R., Slagter, R., Van der Veer, G.C., ACM, 1999, pp. 246-253.
and Nijholt, A. Why Conversational Agents 28. Zhang, J. and Norman, D. Representations in
Should Catch the Eye. In Extended Abstracts Distributed Cognitive Tasks. Cognitive Science
18, 1994, pp. 87-122.

30

You might also like