You are on page 1of 74

# Analysing Eye-Tracking

Data
Hayward Godwin
University of Southampton

Outline
Part 1
Eye-tracking measures an overview
Data Viewer reports
The Organise-Analyse-Visualise approach in R
Part 2
Try it yourself!

Eye-Tracking Measures
An Overview
for a detailed review, see Rayner (2009)

## Global versus Local measures

Global measures are computed at the overall (or global)
level of a trial and ignore what was being fixated at any
point in time
e.g., mean fixation duration for a trial

## Local measures are computed for each object or

stimulus in a trial, paying attention to what was being
fixated at any point in time
e.g., mean fixation duration for target words in a reading
study

## Mean Fixation Duration (global)

(Mean duration of fixations)

## Search for a blue square target Mean Fixation Duration =

125

(130+125+110+90+150+190)/6

130

110
190

90
150

## (Mean duration of fixations on a specific object type)

Search for a blue square target Mean Fixation Duration for target =
125

(110+190)/2

130

110
190

90
150

## Number of Fixations (global)

(Mean number of fixations)

## Search for a blue square target

125

130

110
190

Number of fixations =

90
150

## (Mean number of fixations on a specific object type)

Search for a blue square target Number of fixations for target =
125

130

110
190

90
150

## Total Gaze Duration (global)

(Sum of fixation durations)

## Search for a blue square target Total gaze duration =

125

130+125+110+90+150+190

130

110
190

90
150

## (sum of fixation durations on a specific object type)

Search for a blue square target Total gaze duration for target =
125

110+190

130

110
190

90
150

## (sum of fixation durations on the first visit or pass of an

object)
Search for a blue square target
125

target =
110

130

110
190

90
150

## (the second fixation of 190ms

duration occurs on the second
pass so is excluded)

## (mean of fixation durations when an object is only ever

fixated once)
Search for a blue square target
125

130

110
190

## This is one of the cleanest

measures there are in eyetracking since only fixating an
object once means we can chart
the time taken to fully process
that object
Here, only two objects are ever
fixated once. These are
highlighted to the left.

90
150

## Proportion of objects fixated (global)

(Proportion of objects directly fixated)

125

130

110
190

90
150

## Proportion of objects fixated (local)

(Proportion of objects directly fixated, broken down by
object type)
Search for a blue square target
125

Proportion of distractors
fixated=2/4=0.5
Probability of fixating target =
1/1 = 1

130

110
190

90
150

## (Time from display onset to start of first saccade)

Search for a blue square target
125

130

110
190

90
150

this is 130ms

## (Mean number of times each object is visited)

Search for a blue square target
125

130

110
190

## Count up number of times each

object is visited and then divide
by the number of objects that
were visited
Do NOT include zero values for
unvisited objects
1 + 2 + 1 = 4 / 3 = 1.3

90
150

Search for a blue square target
125

1.2

1.4

130

110
2.2

190

90

0.2

3.4

## Mean length of all saccades =

150

Verification Time

## (Time between first fixating and button press)

Search for a blue square target
125

130

110
190

90
150

## Find when button press occurred.

If we find that it occurred 150ms
into the second fixation (of
190ms) on the target, then
verification time =
110 + 90 + 150 + 190 150
A better way to do this is to find
the time the first fixation starts
on the target and take this value
away from the RT

Scanpath Ratio

## (sum of saccade lengths to target divided by shortest

distance to target)
Search for a blue square target
125

## (1.2 + 1.4 + 2.2 + 0.2 + 3.4) /

5.2

1.2

1.4

130

5.2

110

2.2

190

90

0.2

3.4

Scanpath ratio =

150

Notes on Measures
Many, many measures that can be run
Just because you can run these, it doesnt mean that
you should
Focus on running only the measures that address your
research questions and avoid doing or reporting
additional ones for the sake of it (i.e., avoid fishing!)

## Data Viewer Reports

Fixation Report
One row of data for every fixation in your study (per trial, per
participant)
You will typically need to use the fixation report if you are running
visual search/scene perception studies
Use fixation reports to filter out fixations that coincide with other
events, such as display changes, button-press responses, etc
This can be done by filtering using the Interest Period (as youll
see in the tutorials) but often youll end up removing some
fixations you still want
Fixation reports can also be used to re-compute the size of interest
areas and capture fixations that fell just outside of interest areas

## Fixation Report Important Columns

RECORDING_SESSION_LABEL: The recording session ID
TRIAL_INDEX: Trial number
CURRENT_FIX_INDEX: The fixation ID for the current
CURRENT_FIX_DURATION: The duration of the current fixation
CURRENT_FIX_BUTTON_PRESS_X: The time during the current fixation that a
button was pressed
CURRENT_FIX_INTEREST_AREA_LABEL: The interest area label of the current
fixation (. if the eyes are not on an IA)
CURRENT_FIX_NEAREST_INTEREST_AREA_LABEL: The nearest IA to the eyes
CURRENT_FIX_NEAREST_INTEREST_AREA_DISTANCE: The distance to the
CENTRE of the nearest IA
Can also get NEXT_ and PREVIOUS_ versions of all measures

## Interest Area Report

One row of data for every interest area in your study
(per trial, per participant)
Reading researchers typically use this type of report
They typically change the interest period to be set to
the time period of the trial itself, enabling the filtering
out of any unnecessary fixations

## Interest Area Report Important

Columns
RECORDING_SESSION_LABEL: The recording session ID
TRIAL_INDEX: Trial number
IA_DWELL_TIME - Total time spent on the IA (sum of all fixations on IA)
IA_FIRST_FIXATION_DURATION - Often referred to as First Fix Duration in reading research. The duration of
the first fixation of the interest area (only on first pass, if the target region is skipped this will have no
value)
IA_FIRST_RUN_DWELL_TIME - Often referred to as Gaze Duration in reading research. A sum of all fixation
on the IA for the first pass. You also use this column for calculating Single Fixation Duration, but remove all
occurrences where the IA region was fixated more than once.
IA_ID/IA_LABEL - The ID number and label for the interest area
IA_REGRESSION_IN - Returns 0 or 1
IA_REGRESSION_IN_COUNT - Returns the number of regressions in
IA_REGRESSION_OUT - Returns 0 or 1
IA_REGRESSION_OUT_COUNT - Returns the number of regressions out
IA_REGRESSION_PATH_DURATION - Often referred to as Go Past Time in reading research. Sum of all
fixations that occur before passing to the right of the target interest area (to a greater numbered IA_ID).
IA_SKIP - Returns a 0 or 1

Message Report
One row of data for every message that occurred during
the study (per trial, per participant)
If you want an accurate view of when things happened
during your study, the message report is the one to use
This is particularly important for gaze-contingent
studies where display changes occur
You can technically get most of the messages that occur
from the fixation report. However, some messages do
get missed from the fixation report

## Message Report Important

Columns
RECORDING_SESSION_LABEL: The recording session ID
TRIAL_INDEX: Trial number
CURRENT_MSG_LABEL : message text details
CURRENT_MSG_TEXT : message text details
CURRENT_MSG_TIME : the time the message occured

Sample Report
One row of data for every sample recorded by the eyetracker during the study (per trial, per participant)
you 1,000 rows of data per second of recording
Sample reports typically are tens of millions of rows in
size
Youll only need to use a sample report if you have
certain highly customised setups (e.g., moving displays)
or want to get an idea of millisecond-by-millisecond
pupil size (as is the case in pupillometry)

## The Organise-AnalyseVisualise Approach in R

Data
In the past, data could easily be organised in Excel,
Analysed in SPSS and Visualised in
SPSS/Excel/Sigmaplot
With the size and complexity of eye-tracking studies,
this is no longer really possible
We can now do all three steps in R, making the
transition between them easier:
Organise: data.table
Analyse: ezANOVA
Visualise: ggplot

Reproducible Results
However you do things, its best to have a consistent
approach to organising your R scripts
I have two types of script:
ORGANISE__XYZ.R scripts that organise the data
ANALYSE__XYZ.R scripts that analyse and visualise the
data

## However you set up your own R scripts, find an

approach and stick to it
This then makes it easier to copy and paste existing
scripts, and being consistent means you can go back to
old stuff and understand it more easily

## Organise: the data.table package

Why use data.table?
It does things very quickly
It extends (builds upon) data.frame objects, meaning that
everything you can do to a data.frame object, you can do to
a data.table

## Now going to go through some examples of what it can

do and how to use it
Ill be giving out the example code later, so no need to
type or run through it now

Create a data.frame

## Create a normal data.frame

It will look something like this on
the right
It lists different trials for a bunch
of participants and gives you their
RT (Reaction Time) in ms

## Convert data.frame to data.table

For large data sets you will want to set keys
When data are keyed, they can be processed faster
A key is set to various columns in your data.table
When a column is associated with a key, it will be able
to group the data by that column more rapidly
In our example, let's set participant id (ppt) and
trialType as keys so we can group the data by these
values more rapidly using the setkey command

Basic Syntax
{WHERE} allows you to select only certain columns. In
other words you can get the command you run to focus
only on the data cells WHERE certain conditions are met
{SELECT} is where you tell data.table what columns or
values you want back. In other words you SELECT
certain values
{GROUPBY} allows you to group the output data in
different ways. This is a bit like pivot tables in Excel.

Getting means
How about the mean RT overall?

Gives us:

## In other words we are SELECTing the mean of the RT

column

Getting means
Overall RT isnt the interesting. Lets GROUP BY trialtype:

Gives us:

## In other words we are SELECTing the mean of the RT column but

GROUPING BY the trialType column

Getting means
Now let's group by participant and trialType:

Gives us:

## In other words we are SELECTing the mean of the RT column but

GROUPING BY the trialType and ppt columns

Getting means
But what if we want to only obtain the means for trials 3 and 4? How do we do that?
We use WHERE !

## (Reminder == means is equal

to)

Gives us:

In other words we are SELECTing the mean of the RT column but GROUPING BY the
trialType and ppt columns but only including values WHERE trial is 3 or 4

Data.table also offers more convenient syntax for
If you run:
You add a newColumn column with a value of 1. You can
combine this with WHERE and GROUP BY commands. If
you run:

You get:

## Joins and Merges

Suppose we forgot to include information relating to which
condition each participant was in. How do we get that in
there?
We can use a join!
A join in data science is a special type of operation that
combines two datasets
To do this, create a new data.table, listing the participant
id and the condition and follow the steps in the next slide
Joins (or merges) hunt down identical column names and
then join the data from one table with that from another

## Performing the Join

Create new data.table containing condition information
and set the keys

We then
have
our
joined-up
data
DT
joinedDT
cDT

We then
have
our
joined-up
data
DT
joinedDT
cDT

## Other Types of Join

Weve just done our first join!
Note that weve just joined one column with one other column, but
there is no theoretical limit to how many columns you can join by at
once
There are many types of join, which you may want to use (e.g., left,
right, natural, outer, full, Cartesian product, etc.)
The main point is making sure that the column names match in the
tables you are trying to join, or else things will go horribly wrong

Analysing Data
Worked Example

## Worked Example: Mean Fixation

Durations (global)
Lets begin by taking data from a fixation report
Well analyse it, compute mean fixation durations
(global), run an ANOVA, and then plot a graph
The data and scripts required are on the website but
lets walk through it together first

## Computing Mean Fixation Durations

(global)
Example from a fixation report
First we compute the by-trial, by-participant means:

## This gives us the mean fixation duration for each

participant and each trial
Then we take the mean of these to get means by
participant:

## Computing Mean Fixation Durations

(global)
Example from a fixation report
This is what we now have:

## Each participant (RECORDING_SESSION_LABEL) grouped by TRIAL_TYPE

with a DV (mean fixation duration)
What next?

## Computing Mean Fixation Durations

(global)
Example from a fixation report

## Now we analyse the data using ezAONVA!

This is from the ez package
Note: make sure that all columns that are factors in your
anova are factors in R before proceeding

## Computing Mean Fixation Durations

(global)
Example
from
a fixation report
ezANOVA
syntax:

## The dependent variable column

A list of within-subjects factors
A list of between-subjects factors

## Computing Mean Fixation Durations

(global)
Example from a fixation report

## Here, we want to see if the within-subjects variable TRIAL_TYPE

influences fixation durations. So we do this:
And get this:

## Most of this should be self-explanatory (its significant!)

Note that ges is generalised eta-squared, a measure of effect
size (remember: APA format wants effect sizes now). Cite this paper
when you use it: http://www.uv.es/friasnav/Bakeman2005

(global)

## Example from a fixation report

Lets plot it!
To produce a plot, we can use ezStats to first get
descriptive means
The nice thing here is that ezStats has the same syntax
as ezANOVA (i.e., you can copy/paste)

## Computing Mean Fixation Durations

(global)
Example from a fixation report
Now, lets plot it! We use ggplot to do the plotting.

The data.table containing the means for plottingControlling axes and making
it APA format
Draw points (as opposed to bars/lines)
Set up the aesthetics
of the plot, with x
being the values
plotted along the xaxis and y being the
value plotted on the yaxis

## Graphing with ggplot

Theres a very large number of options when plotting
with ggplot
We will only cover very basic ones here
http://www.cookbook-r.com/Graphs/
http://ggplot2.org/
And elsewhere online

## Computing Mean Fixation Durations

(local)
Example from a fixation report

## Next, we want to see if the within-subjects variable

TRIAL_TYPE influences fixation durations AND if fixation
durations are different for each interest area type
We have two types of interest area: TARGET and
DISTRACTOR
We therefore run local mean fixation durations, comparing
target and distractor fixation durations
We also now need to remove fixations that did not fall on an
interest area
The column to use is CURRENT_FIX_INTEREST_AREA_LABEL

## Computing Mean Fixation Durations

(local)
Example from a fixation report

## Same process as before: compute by-trial means and then by-ppt

means

The only difference now is that were removing fixations that didnt
land on an interest area (i.e., WHERE
CURRENT_FIX_INTEREST_AREA_LABEL is .)
Were also now GROUPING BY the
CURRENT_FIX_INTEREST_AREA_LABEL column

## Computing Mean Fixation Durations

(local)
Example from a fixation report
Now its time to run the ANOVA
This is done the same as before, just now we have one
more within-subjects factor

## Computing Mean Fixation Durations

(local)
Example from a fixation report
Next, we get the means as before:

## Again, we are now adding

CURRENT_FIX_INTEREST_AREA_LABEL to our list of
grouping within-subjects factor columns

## Sneak Peak at the Graph

Note that this graph
has two panels or in
ggplots language two
facets, one for
DISTRACTOR_A
objects and one for
TARGET objects
How do we get it to do

## The facet_wrap command will create

facets for every level of
CURRENT_FIX_INTEREST_AREA_LABEL

Youre not limited to creating facets for only one column. Try out
facet_wrap(TRIAL_TYPE~CURRENT_FIX_INTEREST_AREA_LABEL) and see what happens

Writing it up
When writing up eye-tracking data, dont just assume
the reader knows why you examined each measure
Given the complexity and number of possible measures
its vital that you are extremely clear both in your own
head and when you write things up why each measure
was examined and what that measure is telling you
If people start complaining that youve explained it too
much and that its bordering on being patronising, then
youre doing it right

Writing it up

## From Godwin, Hyde, Taunton, Calver, Blake & Liversedge (2013)

Simple approach:
Begin by stating what the
measure has been shown to
demonstrate in the past
Make a prediction for that
Then describe how you
examined it
Finally describe what it
showed

Writing it up

Writing it up

## From Sheridan & Reingold (2013)

Writing it up
From Fitzsimmons & Drieghe (2013)

Writing it up
From Fitzsimmons & Drieghe (2013)

## The bigger picture

This approach forms part of a larger picture when
Lets just note a few pointers before finishing

## The bigger picture

Introduction
First paragraph: general context of the work, prelude main
points
Middle paragraphs: existing research on the topic, highlighting
what has been missed or not done (either at all or perfectly)
before
Ending paragraphs: say how your work will overcome the
limitations in previous work, clearly noting how what you have
done fills a gap in the existing literature and human
knowledge. Tell them why your work is awesome. State your
research question(s). Applied relevance also gets noted if
relevant

## The bigger picture

Results
First paragraph: describe what you are going to do in your
results and why
Second paragraph: describe how you cleaned your eyetracking data
Middle paragraphs: go through each of your measures in the
same order as you predicted them in your introduction. For
each one, state WHY you are analysing that one and WHAT it
shows you, and whether it confirms or rejects your predictions

## The bigger picture

Discussion
First paragraph: re-state what you did in the study and remind the reader of
Middle paragraphs: go through each of your measures in the same order as
you predicted them in your introduction. For each one, state WHY you
analysed that one, what the outcome was, and WHAT THAT MEANS in
Later paragraphs: draw the results together for an overall picture. State
applied implications if necessary. Suggest future studies that would be cool.
Never end by saying something along the lines of more research is
needed.

## The rest of today

Next up:
Head to the website (http://wiki.psychwire.co.uk/) and go
through the Part 4: Data Viewer section
Then go through the Part 5: Data Analysis section, which
will outline the bits weve gone through above and some extra
pieces here and there
Thats it.