You are on page 1of 12

International Conference on Computer Systems and Technologies - CompSysTech’14

Eye Tracking as a Computer Input and Interaction Method

Virginio Cantoni, Marco Porta

Abstract: Eye tracking applications can be considered under two points of view: in the former the eye
tracker is a passive sensor that monitors the eyes to determine what the user is watching. In the latter the
eye tracker has an active role that allows the user to control a computer. As a computer input device, an eye
tracker typically substitutes the mouse point-select operation with a look-select process to: press buttons,
select icons, follow links, etc. While look-select operations are naturally suited to eye input, controlling an
interface element is not, because the eyes move covertly by saccades – quick movements of the point of
gaze from one location to another. Since the main task of the eyes is simply to see, if they are also used for
interacting with the computer it may be difficult to decide, for example, whether a button is watched to
understand its function or to trigger the associated action. In general, eye tracking systems present
significant challenges when used for computer input and much research has been carried out in this field.

Key words: Eye Tracking, Visual Attention, Pointing Devices, Human Computer Interaction.

INTRODUCTION
The eye (Figure 1) is the organ that allows images to be captured and transmitted to
the brain for further processing [10][13].
The outermost coating of the eye is formed of two curved elements: the cornea
(transparent, located in the frontal part and with a radius of almost one centimetre), and
the sclera (opaque and with a radius of about 12 millimetres). More internally, there are the
choroid, the iris (i.e. the “coloured” part of the eye), the ciliary body and, at the innermost
level, the retina. The lens, made of transparent fibres, is placed behind the cornea and has
the purpose to adjust the focal distance so as to focus the sharpest possible image on the
retina. The pupil is an adjustable hole, situated in the centre of the iris, whose size controls
the amount of light that enters the eye.
The retina is a photosensitive membrane mainly composed of two kinds of cells,
namely cones (6 to 10 million, more sensitive to colours) and rods (about 120 million, more
sensitive to light). On the retina’s surface, opposite to the pupil, there is the fovea, an area
characterized by a very high visual acuity.
Like in a camera, the lens produces an upside down image on the retina (the
“sensor”). Focus is achieved through the flattening or rounding of the lens (accommodation
process). In normal conditions, accommodation is not necessary to see far objects,
because the lens is able to focus them on the retina. To see closer objects, the lens is
gradually “rounded up” by the contraction of the ciliary body. Of course, we need also to
consider that what we perceive depends not only on physical reality, but also on its
interpretation by our brain.
Eye Movements
What we see is primarily processed by the retina. As already said, on its surface
there are two types of photoreceptors, rods and cones. Their distribution is however not
uniform, but varies according to the distance from the fovea, the area with the highest
spatial resolution: an object can therefore be seen sharply only if its image is formed on
the fovea.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee. Request permissions from Permissions@acm.org.
CompSysTech'14, June 27-28, 2014, Ruse, Bulgaria.
Copyright ©2014 ACM ISBN 978-1-4503-2753-4/14/06...$15.00.
http://dx.doi.org/10.1145/2659532.2659592 1
International Conference on Computer Systems and Technologies - CompSysTech’14

Figure 1. A model of the human eye. The eye can be seen as a “ball” of about 2.4 cm in diameter which is
not perfectly spherical, being slightly flattened in the front and back parts. There are two main axes that form
an angle between 4° and 8°. The optical axis is the most direct line through the centres of the pupil, the lens,
and the retina. The visual axis draws a line from the lens to the fovea and gives the best colour vision. The
human visual field can be functionally divided into four areas: the foveal area, which provides the sharpest
vision (when we are looking at something, we are directing our eyes so that the image is projected onto the
fovea); the parafoveal area, i.e. the covert attention region that previews foveal information; and the
nearperipheral and peripheral areas, regions up to 60° and more with high sensitivity to motion (they allow,
for instance, to quickly react to flashing objects and sudden movements). The visual acuity decreases as the
distance from the fovea increases. Light and colour are sensed by rods and cones, whose size is roughly
1/500th of a millimetre. Cones have the highest acuity, while rods operate in cluster, reducing the spatial
selectivity.

The function of eye movements is thus to shift the gaze on the target, so that it can
be perceived on the fovea, and to keep the image stable on the retina during the relative
movement between the observer and the target itself. The gaze is the sum of the position
of the eye in its orbit and of the head in space. Three pairs of muscles can compensate all
movements of the head. Eyes move all the time, even when we sleep.
It is basically possible to identify four types of eye movements [3]: saccade, smooth
pursuit, fixation and nystagmus.
Saccades are very rapid movements made to focus the image on the fovea. They
can be performed both voluntarily, to shift the gaze to a particular “target”, or as an
optokinetic or vestibular correction. The duration of saccades typically varies between 10
and 100 ms, period during which the eye can be considered essentially "blind" (saccadic
suppression). Saccades are ballistic; the end point of a saccade cannot be changed during
movement.
Optokinetic and vestibular corrections allow to keep the perceived image steady on
the retina, respectively, during brief head movements (angular or translational), on the
basis of a kinesthetic signal from the inner ear, or during prolonged head movements,
through optical translations of the whole visual field.
Fixations are eye movements that stabilize the object of interest on the retina (and, in
particular, on the fovea). During fixations the eye is relatively stable, although slight
movements may occur, mainly due to a sort of "noise" of the eye control system. All the
information from the scene is almost entirely acquired during fixation. Fixations are
interspersed with saccades.

2
International Conference on Computer Systems and Technologies - CompSysTech’14

Smooth pursuits are possible only when the eyes follow a moving target. Compared
to saccades, smooth pursuits are much slower, depending on the speed of the target.
Nystagmus movements, lastly, are characterized by a "saw-tooth" temporal evolution;
essentially, they are "tremors", in most cases involuntary (and therefore of pathological
nature), more rarely voluntary.
Eye movements can be also subdivided into vergence and version movements. The
first kind of movements are of equal amplitude but opposite direction, made in order to
maintain in equivalent positions on the two retinas the images of a target that moves away
or gets closer with respect to the observer. Version movements, instead, are performed by
the eyes in the same direction, for example to follow an object that moves to the right or to
the left.

EYE TRACKING
The term eye tracking refers to the use of proper techniques and tools for the
identification of a subject's gaze direction; in other words, eye tracking allows to detect and
record what users watch, typically, but not necessarily, on a screen. An eye tracker
measures changes in gaze direction with respect to the measuring system: if this is head-
mounted, then eye-in-head angles are measured (head position and eye-in-head direction
are added to determine gaze direction); if it is table mounted, then gaze angles are directly
measured.
Over the years, several methods have been developed for the measurement of eye
movements, from the rather uncomfortable and inaccurate initial systems to current
"remote" devices, which do not require any contact between user and machine and ensure
good accuracy.
Eye Tracking Techniques
Four categories of methods for eye tracking can be basically identified, which have
developed over the years, namely electro-oculography (EOG), scleral contact lens / search
coil, photo-oculography (POG) or video-oculography (VOG), and combined pupil-corneal
reflection ([8], pp. 52-57).
Electro-oculography is a method, used since the 60s, that is based on the
measurement of differences of the electrical potential detected on the skin. Four
electrodes placed just above, below, on the left and on the right of the eye measure
changes in voltage resulting from eyeball movements. Although the accuracy of this
technique is not very high and gaze can only be estimated with reference to the position of
the head, the solution is very cheap, and is also the only one that can be used for the
study of eye movements while sleeping.
The method based on scleral contact lenses and "search coil" is one of the most
accurate, but is also very invasive. It exploits a mechanical (or optical) reference object
mounted on a contact lens that covers both the cornea and the sclera. A stalk attached to
the lens is then connected to a mechanical or optical device, such as a coil that measures
the variation of an electromagnetic field. Like electro-oculography, also this method can
only measure the position of the eye referred to the head.
Photo/video-oculography includes several techniques based on the measurement of
distinguishable characteristics of the eyes during rotation and translation movements, such
as the pupil's shape, the edge separating sclera and iris and corneal reflections caused by
one or more (usually infrared) light sources. Measurements are carried out through
sequences of photos or video footage, and can be both automatic and manual (the latter
especially in the past). Like the previous two methods, also this technique studies eye
movements referred to the position of the head.
The combined use of corneal and pupillary reflections has the big advantage of
allowing the estimation of gaze direction in a precise way, disambiguating head movement

3
International Conference on Computer Systems and Technologies - CompSysTech’14

and eye rotations. This is obtained by using as references the corneal reflection generated
by one or more light sources of light and the centre of the pupil. Light in the infrared range
is usually exploited, so as not to be perceived by the user. As will be explained in the next
section, there exist both "table-mounted" devices (very similar to ordinary LCD monitors)
and tools to be worn on the head (helmets or, more recently, special glasses). Once the
position of the pupil has been identified (e.g., via the so-called bright pupil technique, in
which the identification is simplified by the lighting of the fundus oculi through the light
source), gaze direction is obtained by analyzing corneal reflections (known as Purkinje
reflections or images). During an initial calibration phase, correspondences between points
observed by the subject (for example on a screen) and the position of reflections on the
cornea are searched for, so as to allow successive precise gaze estimation. When both
the eyes are considered, as is the case with most current eye trackers, devices are said to
be binocular.
Kinds of Eye Trackers
Three main types of commercial eye trackers can be currently identified, namely:
x Monitor-based or remote. This kind of eye trackers are the most common, typically in
the form of 17'' LCD monitors or larger. Since they record the user's gaze direction
with reference to the screen, they are suited to the study of user interfaces and to
direct computer control (as assistive technologies).
x Mixed: "real world" + monitor. Unlike remote devices, these eye trackers do not have
their own monitor, but use only infrared emitters and a camera. Although they require
a relatively complex calibration procedure, their main advantage lies in the fact that
they can be used with any screen and, especially, in the possibility of finding the
gaze direction during the observation of any scene. For example, if the device is
placed in front of a supermarket shelf, it is possible to detect which areas attract
more the attention, thus potentially suggesting the best arrangement of products.
x Wearable. This category includes all those eye-tracking systems that require to be
worn by the user. While in the past devices to be placed on the user’s head were
very popular (and rather invasive), current wearable systems come mostly in the form
of "hats" with infrared emitters and camera fixed on the visor. This kind of eye
trackers allow the free observation of real-world scenes, and measure eye
movements with reference to the position of the head. A recent variant of wearable
systems is represented by special glasses, very similar to devices to be worn on the
head but with the advantage of a greater practicality and lightness.
Limitations of Current Eye Trackers
Even if modern eye trackers are much more sophisticated than in the past, they still
suffer from some problems. As well described in [3], the aspects that require further
progress regard mainly robustness, accuracy and price.
For example, devices using infrared illumination may sometimes have problems with
people wearing glasses, because of lens reflections. Although the use of two or more light
sources can usually solve this drawback, an eye model that takes into account the
presence of lenses would provide higher precisions in any circumstance.
A major practical problem is also represented by the price of current eye tracking
systems (generally over USD 20,000), due on the one hand to the cost of components
(such as high-resolution cameras and high quality lenses) and, on the other hand, to the
very restricted market of these devices. Regarding the first aspect, the constant evolution
of sensors could shortly solve this problem, providing optimal resolutions at more
competitive prices. As for the current limited market of the eye tracking industry, there are
clear signals that eye tracking technology could shortly become, if not common, much less
restricted than now. This is demonstrated both by the interest from large computer

4
International Conference on Computer Systems and Technologies - CompSysTech’14

companies and by the existence, since a few years, of prototype portable devices
incorporating gaze-controlled features (e.g. laptops [9]).

VISUAL ATTENTION
Attention is what enables us to process information about the world around us. Visual
attention is the selective allocation of visual processing resources on a particular object of
interest in the visual field. Eye movements allow the gaze to be centred on the object of
interest. Since also attention is often shifted to the observed element, eye-tracking can be
considered, within certain limits, a means to study the evolution of attention. In this regard,
several theories have been developed, often conflicting each other. In the following, we
present a very brief summary of them ([8], pp. 4-10), trying to highlight their essential
characteristics.
According to Hermann Von Elmholtz, the eyes “roam” continuously to distinctly
observe the different parts of the visual field. Attention can be consciously controlled, and
can also be focused on peripheral objects (via the so-called parafoveal visual attention).
Thus, eye movements reveal the explicit will to analyze details, according to an overt
attentional mechanism.
William James focuses instead on active and voluntary aspects of attention (foveal
visual attention), even though he acknowledges their sometimes passive and involuntary
nature. Although the approaches of Von Elmholtz and James are often considered
conflicting, they can be seen as complementary. For example, when we look at an image
we initially perceive some of its elements through peripheral (parafoveal) vision, and then
we focus on them to observe the details (foveal vision).
According to James Jerome Gibson, attention implies an advance preparation to a
reaction (“how” this reaction will actually occur will then depend on the observer). Donald
Eric Broadbent supposes instead the existence of a selective filter, used by attention to sift
visual information (since the visual channel is limited). Anthony and Diana Deutsch
assume that visual information is initially analyzed without filters: proper "structures" will
then attribute different levels of importance to the acquired information.
Al'fred Luk'yanovich Yarbus was one of the first researchers to analyze in detail the
eye scanpaths obtained using eye tracking techniques, through experiments in which
different “areas of interest” of an image were observed in sequence.
According to Michael Posner, attention moves as a "spotlight" on the scene
(regardless of eye movements). Anne Treisman suggests that attention is the "glue" that
allows the "elements" in a specific area to be perceived as a whole (objects). For Stephen
Michael Kosslyn, a "window of attention", whose size varies incrementally, selects
elements within the visual memory.
Al'fred Luk'yanovich Yarbus, in particular, can be considered the father of eye
tracking studies (and his book, "Eye Movements and Vision", translated into English in
1967 [32], a milestone on the subject). Among other things, in a famous experiment
Yarbus discovered that eye movements of different subjects observing a painting were
similar but not identical; analogously, he found that repeating the experiments at different
times with the same subjects, eye movements were very similar, but never the same. It
was clear, however, that the similarity in eye behaviours for a specific observer was
greater than it was between different observers. Yarbus also noted the different
performance of subjects in case of long observations, noting repeated "inspection cycles"
in which the eyes stop and examine the most important elements of the scene [30].
Of particular interest is the so-called Eye-Mind Hypothesis [20][21], according to
which there is a direct correspondence between the user's gaze and his/her point of
attention. As demonstrated by several experiments, while it is possible to shift the focus
without moving the eyes, the opposite is much more difficult [25]. During the 1980s, the
eye-mind hypothesis was often questioned in light of covert attention, i.e. the attention to

5
International Conference on Computer Systems and Technologies - CompSysTech’14

something that is not being looked (as frequently occurs). According to Hoffman [17],
“current consensus is that visual attention is always slightly (100 to 250 ms) ahead of the
eye”. However, as soon as attention moves to a new position, the eyes will want to follow.
The Eye-Mind Hypothesis, due to its simplicity, is widely exploited in the evaluation of user
interfaces, but also in other areas, such as in studies on reading.
Not to be neglected is also the fact that eye movement patterns are often a
consequence of a learning process, at different levels [15]. Actually, in the execution of
any task the observer has to learn which objects in the scene are relevant and, at a more
detailed level, their best positions in order to successfully achieve the specific objective.
Lastly, it is interesting to observe how the different hypotheses on visual attention are
often directly or indirectly connected with the so-called Gestalt Theory [7]. In essence, the
theory states that "the whole is more than the sum of its parts", and that what we perceive
is therefore the result of a visual composition process based on different principles, such
as proximity, similarity and symmetry.

THE BINDING PROBLEM


Going from the retinal to central cortical neurons, the level of data abstraction gets
higher, while precision in pattern location decreases. There is a semantic shift from 'where'
towards 'what' in the scene. Indeed, as information passes through the visual pathway, the
growing receptive field of neurons produces a translation invariancy. At late stages, the
large receptive fields of neurons lead to mis-localisation and the consequent mis-
combination of features. Moreover, the simultaneous presence of multiple objects
increases difficulties in determining which features belong to what objects. This difficulty
has been termed binding problem [4].
The binding problem may be solved by providing an attentional mechanism that only
allows the information from a selected spatial location to act at higher stages. From the
computational view point, early visual information processing is characterized by two
functionally distinct modes:
x Pre-attentive mode - the information is processed without need for attentional
resources. This execution is characterized by: i) complete independence of the number
of elements; ii) an almost instantaneous execution; iii) the lack of sophisticated scrutiny;
iv) a large visual field. This is for example the case of the pop-out theory in which
targets characterized by a single feature, not shared by the other elements (usually
called distractors) in the scene. Possible primitive features (singletons) of the early
vision stage are: color and different levels of contrast, line curvature, acute angle, line
tilt or mis-alignment, terminator and closure, size, direction of movement, stereoscopic
disparity, shading, 3D interpretation of 2D shapes, etc.
x Attentive mode - for more complex analysis the allocation of attentional resources to
specific locations or objects is needed. A sequence of regions of interest is scrutinized
by a small aperture of focal attention. The object can be consciously identified only
within this attentional aperture. While singletons covering the entire visual field can be
detected in pre-attentive mode, attentive serial modality is necessary when target
selection is based on more sophisticated analysis. Two particularly important cases are
the following: i) target and distractors share features, and discriminability must be based
on minor quantitative differences, e.g. target and distractors differ only by a little tilt; ii)
target and distractors are distinguishable by a combination of features (in the sequel
features conjunction [31]), e.g. the target is a red L letter among distractors composed of
blue L letters and red T letters (see Figure 2). Both these cases require a serial search
because attention must be narrowed to check one subset of items or only a single item
at a time, with precision.

6
International Conference on Computer Systems and Technologies - CompSysTech’14

L LL TT T LT L T L T LT L T L T
L LL TT T TL T T L T TL T T L T
L LL TT T LL L L T L LL L L T L
L LL TT T TL T L T T TL T L T T
L LL TT T TL L T T L TL L T T L
Figure 2. Three cases of regions composed of two different textures. In the first case, the region on the left is
composed of L letters, while the one on the right is composed of T letters. The second case contains L and T
letters distributed randomly, but the grey level of the left region is lower than that of the right one. Finally, the
third case contains L and T letters distributed randomly. Both letters can be one of two possible grey levels:
in the left region, Ts are darker and Ls are brighter, while in the right region it is the opposite. The partition of
the couples of textures is quite evident in the first and second cases, but is not easily detectable in the third
one: shape and grey-levels produce a “pop-out” effect, but their conjunction does not! (from [4]).

Pre-attentive processes perform primitive analysis in order to segment or isolate


areas of the visual field, and detect the potential regions of interest to which attention
should subsequently be allocated for more sophisticated, resource-demanding processing.
Attentional dispatching can be voluntary, i.e. guided by strategic goals, or involuntary, i.e.
reacting to particular stimuli. These two spatial attention modes have been named
endogenous and exogenous, respectively.

VISUAL MEMORY
In this section we briefly consider the following research questions: Do we keep track
of where we look at? How does memory act in search across fixations? Are the involved
processes random or systematic?
Horowitz and Wolfe [18] studied memory in visual search tasks. Experiments in which
search stimuli were randomly repositioned along different trials showed that the number of
displayed items had no effect on reaction time. This suggests that visual search is a
memoryless process that operates through random samples. Other studies have found
that relevant external information (e.g. elements distracting from the target) can direct both
attention and search. These results can be easily explained if eye movements are
pondered in the context of an exploratory task aimed at selecting useful information. In this
sense, eye movements have the purpose to explore the environment, combining the
perceived images with a cognitive framework.
An important first aspect to consider is that instead of methodically retrieving
information from memory, search sometimes depends on the external world, which serves
as its guide.
Random search is a symptom of an absence of memory, as opposed to highly
organized search patterns (such as left-to-right and top-to-bottom sequential scans). Both
computational theories and empirical results have demonstrated that unsystematic search
is often more efficient than a methodical exploration. It is also interesting to consider the
internal strategies that guide search, even if they are usually highly sensitive to external
sources. Internally driven mechanisms typically produce an erratic search (in which
fixations are independent of one another and all "places" have the same probability of
being explored), due to the strict relation among eye movements, attention and efficient
search.
Although much cognitive theory implies search strategies based on tagging and on
mechanisms inhibiting the return to already watched elements, random behaviours can be

7
International Conference on Computer Systems and Technologies - CompSysTech’14

generally noted across fixations. Since the visual-cognitive system is very flexible and
optimized to adjust to varying situations, it is reasonable to think that the contingency
characterizing visual search is central in selecting relevant information from a continuously
changing environment [1, 2]. These results also indicate the existence of a long-term
memory that is preserved across searches and which may entail the use of a plain
collection of rules characterized by self-organizing properties.

EYE TRACKING AS AN INPUT DEVICE


Using keyboard and mouse, we are all accustomed to interacting with the computer
through the WIMP paradigm, based on Windows, Icons, Menus and Pointing devices.
The so-called Fitts’ Law (see Figure 3) introduced an empirical model to estimate the
time necessary to point at an object based on step-wise movements towards the target
[11]. Every step consists of moving the pointer towards the target, and estimating how
close to the target the pointer arrives. Two basic assumptions are made: i) the distance to
the target after each step is proportional to the distance at the beginning of the step
(scalability of nature); ii) every step takes the same amount of time (constant information
processing power) [16].
The question is whether using an eye tracker as an input device the amplitude-time
relation of saccades obeys Fitts’ law. Experiments have shown that an accuracy-speed
trade-off in the sense of Fitts’ law does not exist in the eye tracking context, as there is a
limit for the accuracy: the problem is due to the fact that the fovea's visual field is about
two degrees, and single-pixel fixations on a computer display are simply not possible, due
to the “eye jittering” problem. The eyes’ point of gaze is therefore never perfectly still. The
smallest size of objects for mouse interaction is only 3 pixels (for example, it is possible to
precisely position a text cursor in between two small letters). Expressed as a visual angle,
such size is 0.1°. Gaze interaction, however, needs a visual angle that is one order of
magnitude larger.
The existence of an accuracy limit for eye tracking interaction creates a situation
which differs from the experiments conducted by Fitts. Furthermore, using targets whose
sizes are much bigger than the accuracy of eye gazing do not provide any speed benefit.
Moreover, when the target's size is bigger than the eye’s accuracy, the gaze is moved to
the target with a single saccade. Therefore, due to the high speed of saccadic movements
the distance to objects do not contribute very much to the time necessary to hit a target in
gaze-driven interfaces.
Target selection is another challenge for eye input to computers, as there is no
equivalent to the "button click" of the ordinary mouse [25, 23]. Instead of the point-select
operations performed with the mouse, the eye can look-select to activate buttons, icons,
links, or other interface elements. In order to use an eye tracking system as a computer
input device, it is thus necessary to implement the selection operation that typically follows
pointing in mouse-based interaction. The main difficulty is due the fact that many eye
movements are involuntary, therefore creating the so-called “Midas Touch” problem [19]: it
is not possible to look an element of the interface without activating it. One possible
solution is to use eye movements in combination with other input devices, to make
intentions clear. For example, speech commands can add extra context when eye
movements may be ambiguous, and vice versa [22]. Other possibilities for the selection of
a graphical element include exploiting key presses or staring at the element for a certain
dwell time. This last solution is however time consuming, as typical dwell times are
between 500 and 1000 ms. Several other solutions have been proposed and tested to face
the accuracy and jitter problems when using an eye tracker for computer input. Two ideas
are for example dynamic resizing of the cursor’s activation area [14] and expanding target
and grab-and-hold selection [24].

8
International Conference on Computer Systems and Technologies - CompSysTech’14

Figure 3. a) Fitts’ Law states that the time to point at an object using a device increases the with
distance d from the target object and decreases with the object’s size s: precisely, it depends only
on relative precision (d/s). b) two step-wise movements towards the target with constant fraction
O=0.2 (derived from [16]).

The gaze-contingency paradigm is a general framework for allowing the content of a


computer display to change depending on what the user is looking [8]. Due to an imperfect
coupling between overt and covert attentions, it is not possible to exactly know which
visual information the viewer is processing based on the fixation locations. However, in
gaze-contingent paradigms the stimulus display is continuously updated as a function of
the observers' current gaze position: by controlling the information feeding, it is possible to
eliminate any ambiguity about what is fixated and what is processed.

MAIN APPLICATIONS OF EYE TRACKING


Eye tracking is widely applied in different fields, mainly as a "measurement and
investigation" tool or as an active interface. In the following, we briefly discuss the
distinctive features of each category.
Usability is a branch of Human-Computer Interaction, a discipline (deriving from
Ergonomics) that studies how the interaction between users and computers occurs.
Usability also deals with the development of methodologies for the evaluation and
improvement of IT tools.
Compared to other methods for user interface evaluation, eye tracking can provide
potentially more "scientific", objective and quantifiable results. The aforementioned Eye-
Mind Hypothesis is the basis of studies of this kind, as it presupposes the existence of a
direct correspondence between the user's gaze direction and the elements to which his or
her attention is addressed. From eye movements, it is thus potentially possible to
investigate users' cognitive processes, and find out what they consider "interesting",
"important" or "unclear".
However, in order to draw correct conclusions on users' eye behaviour, and thus
properly evaluate an interface, eye parameters must be correctly interpreted. Considering
fixations, for example, in a usability study their duration may be interpreted differently
depending on the context: a long fixation might indicate that the fixated element is difficult
to understand, or, on the contrary, that it is particularly interesting. To interpret fixations in
the right way it is therefore necessary to always keep in mind the task assigned to the
tester: for example, if the goal is to activate a command from a menu, and the items of this
menu are fixated for a long time without being clicked, this probably means that the menu
is poorly designed.

9
International Conference on Computer Systems and Technologies - CompSysTech’14

Figure 4. Eye tracking results (gazeplots above and heatmaps below) for three subjects (Sub. 1, Sub. 2 and
Sub. 3) observing the three cases shown in Figure 2. At first (upper left) the subject is presented with a
monochromatic combination of Ls (on the left) and Ts (on the right), like in the first case of Figure 2. Then
(upper right), a picture with mixed Ls and Ts (but different colours for the left and right parts) is displayed, like
in the second case of Figure 2. The third image (bottom left) contains Ls and Ts distributed randomly, like in
the third case of Figure 2. Both letters can be one of two possible colours (in the left region, Ts are blue and
Ls are red, while in the right region it is the opposite). The fourth picture (bottom right), which is identical to
the third one, is displayed after two other images (not shown here) in which the left and right parts,
respectively, are displayed separately. The first subject (Sub. 1) did not identify any subdivision in both the
bottom left and bottom right images. The second subject (Sub. 2) recognized the subdivision only in the last
image (bottom right, characterized by few fixations), after the display of the two halves showing the left and
right parts. The third subject (Sub. 3), lastly, detected the subdivision since the third image (bottom left,
characterized by fixations that are mainly concentrated in the central part of the picture).

Perceptive Interfaces provide the computer with human perceptive capabilities, such
as sight and hearing. Vision-Based Perceptive Interfaces, in particular, use one or more
cameras to allow the computer to acquire implicit or explicit information about users and
their environment [26]. Eye tracking pertains to this category, and finds important
applications in the fields of assistive technologies and psychophysiology.
Controlling a computer through the eyes is especially useful in the case of diseases
that severely and progressively limit motor capabilities, such as amyotrophic lateral
sclerosis (ALS), muscular dystrophy, spinal muscular atrophy and cerebral palsy. Although
the effectiveness of eye tracking in these cases depends on the accuracy of the employed
device, it is essential that the associated software is sufficiently "intelligent" to properly
interpret eye inputs. Typical examples of eye tracking applications in the field of assistive
technologies include methods for text entry [27], cursor control [6][29] and web browsing
[28].

10
International Conference on Computer Systems and Technologies - CompSysTech’14

For what refers to the field of psychophysiology and cognitive science, eye tracking is
more and more exploited. In fact, scan patterns give information about mental processes,
revealing what kind visual information is going to be used (an example is given in Figure
4). The number of research papers on these subjects is reaching unexpected levels. An
example is represented by the new field of soft-biometrics, in which eye tracking is
exploited for both authentication and identification purposes [5, 12].
Let us conclude by considering the automotive and aerial industry: eye tracking is
increasingly being used for driver's and pilot's attention and mental workload analysis.

REFERENCES
[1] Aks, D. J., Zelinsky, G. J., Sprott, J. C.: Memory across eye-movements: 1=f
dynamic in visual search, J. Nonlinear Dynamics, Psychology, and Life Sciences, Vol. 6,
No. 1, 2002, 1-25.
[2] Aks, D. J.V.: 1/f dynamic in complex visual search: Evidence for Self-Organized
Criticality in human perception. In Tutorials in contemporary nonlinear methods for the
behavioral sciences, Riley M. A., G. C. Van Orden (Eds.), Web Book, 2005, 326-359.
[3] Böhme, M., Meyer, A., Martinetz, T., Barth, E.: Remote Eye Tracking: State of the
Art and Directions for Future Development. In Proceedings of the 2nd Conference on
Communication by Gaze Interaction (COGAIN), “Gazing into the Future”, 2006, 12-16.
[4] Cantoni, V., Caputo, G., Lombardi, L.: Attentional Engagement in Vision Systems,
in Artificial Vision, Cantoni, V., Levialdi, S., Roberto, V. Eds., London: Academic Press,
1996, 3-42.
[5] V. Cantoni, C. Galdi, M. Nappi, M. Porta, D. Riccio, GANT: Gaze ANalysis
Technique for Human Identification. Pattern Recognition, 2014 (available online 13 March
2014).
[6] De Gaudenzi, E., Porta, M.: Gaze Input for Ordinary Interfaces: Combining
Automatic and Manual Error Correction Techniques to Improve Pointing Precision. In
Intelligent Systems for Science and Information, Studies in Computational Intelligence,
542, Chen, L., Kapoor, S., Bhatia, R. (Eds.), Springer International Publishing Switzerland,
2014, 197-212.
[7] Desolneux, A., Moisan, L., Morel, J. M.: From Gestalt Theory to Image Analysis: A
Probabilistic Approach. Interdisciplinary Applied Mathematics Series, Vol. 34, Springer,
2008, 11-25.
[8] Duchowski, A.T., Eye Tracking Methodology – Theory and Practice (2nd Ed.).
London: Springer-Verlag, 2007.
[9] Eisenberg, A.: Pointing With Your Eyes, to Give the Mouse a Break, New York
Times, 2011 (http://www.nytimes.com/2011/03/27/business/27novel.html?r=0, retrieved on
06/06/2014).
[10] Encyclopedia Britannica, Human Eye (retrieved from http://www.britannica.com/
EBchecked/topic/1688997/human-eye on 15 June 2014).
[11] Fitts, P. M.: The information capacity of the human motor system in controlling
the amplitude of movement, Journal of Experimental Psychology, 47, 1954, 381-391.
[12] C. Galdi, M. Nappi, D. Riccio, V. Cantoni, M. Porta, A New Gaze Analysis Based
Soft-Biometric. Proceedings of the 5th Mexican Conference on Pattern Recognition
(MCPR 2013), Querétaro, Mexico, 2013, 136-144.
[13] Gonzalez, R. C., Woods R. E.: Digital Image Processing (3rd Edition). Prentice
Hall, 2008.
[14] Grossman, T. and Balakrishnan, R.: The bubble cursor: Enhancing target
acquisition by dynamic resizing of the cursor's activation area. Proceedings of the ACM
Conference on Human Factors in Computing Systems - CHI 2005, NY: ACM, 2005, 281-
290.
[15] Hayhoe, M. M., Droll, J.: Mennie, N., Learning Where to Look, in "Eye

11
International Conference on Computer Systems and Technologies - CompSysTech’14

Movements: A Window on Mind and Brain", ddited by Van Gompel, R. P. G., Fischer, M.
H., Murray, W. S., Hill, R. L., Elsevier Ltd., 2007, 641-659.
[16] Heiko Drewes: Eye Gaze Tracking for Human Computer Interaction, PhD Thesis,
Ludwig-Maximilians-Universität, München, 2010.
[17] Hoffman, J. E.: Visual attention and eye movements. In Attention, H. Pashler
(Ed.) . Hove, UK: Psychology Press, 1998, 119–154.
[18] Horowitz, T. S., Wolfe, J. M., Visual Search has no memory. Nature, 6,
394(6693), 1998, 575-577.
[19] Jacob, R. J. K., Karn, K. S.: Eye tracking in Human-Computer Interaction and
usability research: Ready to deliver the promises. In The mind's eye: Cognitive and applied
aspects of eye movement research. Hyönä, J., Radach, R., Deubel, H. (Eds.). Amsterdam:
Elsevier, 2003, 573-605.
[20] Just, M. A., Carpenter, P. A.: Eye Fixations and Cognitive Processes, Cognitive
Psychology, 88, 1976, 441-480.
[21] Just, M. A., Carpenter, P. A.: A theory of Reading. From Eye Fixation to
Comprehension, Psychology Review, 87, 1980, 329-354.
[22] Kaur, M., Tremaine, M., Huang, N., Wilder, J., Gacovski, Z., Flippo, F.,
Mantravadi, C. S.: Where is "it"? Event synchronization in gaze-speech input systems.
Proceedings of the Fifth International Conference on Multimodal Interfaces NY: ACM
Press, 2003, 151-158.
[23] MacKenzie, I. S.: Evaluating eye tracking systems for computer input. In Gaze
interaction and applications of eye tracking: Advances in assistive technologies,
Majaranta, P., Aoki, H., Donegan, M., Hansen, D. W., Hansen, J. P., Hyrskykari, A.,
Räihä, K.-J. (Eds.), Hershey, PA: IGI Global, 2012, 205-225.
[24] Miniotas, D., Špakov, O., and MacKenzie, I. S.: Eye gaze interaction with
expanding targets, Extended Abstracts of the ACM Conference on Human Factors in
Computing Systems - CHI 2004, NY: ACM, 2004, 1255-1258.
[25] Poole, A., Ball, L. J.: Eye Tracking in Human-Computer Interaction and Usability
Research: Current Status and Future. In Encyclopedia of Human-Computer Interaction,
Ghaoui C. (Ed.), PA: Idea Group, Inc., 2005, 211-219.
[26] Porta, M.: Vision-Based User Interfaces: Methods and Applications. International
Journal of Human-Computer Studies, 57, 2002, 27-73.
[27] Porta, M., Turina, M. Eye-S: a Full-Screen Input Modality for Pure Eye-based
Communication. In Proceedings of the 5th symposium on Eye Tracking Research &
Applications (ETRA 2008), Savannah, GA, USA, March 26-28, 2008, 27-34.
[28] Porta, M., Ravelli, A.: WeyeB, an Eye-Controlled Web Browser for Hands-Free
Navigation. In Proceedings of the 2nd IEEE-IES International Conference on Human
System Interaction (HSI 2009), Catania, Italy, 2009, pp. 210-215.
[29] Porta, M., Ravarelli, A., Spagnoli, G.: ceCursor, a Contextual Eye Cursor for
General Pointing in Windows Environments. Proceedings of the 6th Eye Tracking
Research & Applications Symposium (ETRA 2010), Austin, TX, USA, , ACM Press, 2010,
331-337.
[30] Tatler, B. W., Wade, N. J., Kwan, H., Findlay, J. M., Velichkovsky, B. M.: Yarbus,
eye movements, and vision, i-Perception, Vol. 1, 2010, pp. 7–27.
[31] Treisman A., Gelade G.: A feature-integration theory of attention, Cognitive
Psychology, Vol. 12, 1980, 97-136.
[32] Yarbus, A. L., Eye Movements and Vision. NY: Plenum Press, 1967.

ABOUT THE AUTHORS


Virginio Cantoni, Marco Porta, Dipartimento di Ingegneria Industriale e
dell’Informazione, Università di Pavia, ȿ-mail: virginio.cantoni@unipv.it,
marco.porta@unipv.it.

12

You might also like