You are on page 1of 72

Final Year Project: Keystroke Analytics

Candidate: 77415 - James Green


2013
Contents
1 Introduction 3
1.1 Biometric Keystroke Authentication . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Overview of Existing Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Research Goals & Hypotheses 6
2.1 How People Intuitively Fragment a Passphrase . . . . . . . . . . . . . . . . 6
2.2 When learning to type a passphrase, how long does it take an individual to
produce a stable signature? . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Keystroke-biometric Authentication . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Professional Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.1 Public Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.2 Professional Competence and Integrity . . . . . . . . . . . . . . . . 13
2.5.3 Duty to Relevant Authority . . . . . . . . . . . . . . . . . . . . . . 13
3 Method 14
3.1 Design of Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.1 High Level Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.2 Construction of Passphrase . . . . . . . . . . . . . . . . . . . . . . 15
3.1.3 Length, Frequency & Format of Experiment . . . . . . . . . . . . . 15
3.1.4 Display and entry of pass-phrase . . . . . . . . . . . . . . . . . . . 16
3.1.5 Entry devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.6 Data Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.1 Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Running the Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4 Analysis Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4.1 Using Students T-test for Authentication . . . . . . . . . . . . . . 27
3.4.2 Using Self-organising Maps for Identication . . . . . . . . . . . . . 30
4 Results 34
4.1 Passphrase Fragmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Learning & Stabilisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2.1 Entry Duration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2.2 Passphrase Observation . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2.3 Travel-time Stability . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3 Identication & Authentication . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3.1 T-test Authentication . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3.2 Self-organising Maps . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3.3 Dwell Timings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
1
5 Discussion 54
5.1 Evaluation of Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.1.1 How people intuitively fragment a passphrase . . . . . . . . . . . . 54
5.1.2 When learning to type a passphrase, how long does it take an indi-
vidual to produce a stable signature? . . . . . . . . . . . . . . . . . 55
5.1.3 Keystroke-biometric authentication . . . . . . . . . . . . . . . . . . 56
5.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6 Appendices 61
6.1 Digraph Timing Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2
1
Introduction
1.1 Biometric Keystroke Authentication
The fact that computers regularly store private, sensitive and classied information makes
it very important that we can condently identify their users. Traditionally this has been
achieved through password authentication systems. However, these systems are far from
perfect. For instance, if a password becomes compromised it is no longer adequate for
authenticating its rightful owner.
In the hope of improving on this, there exists ongoing research into utilising the idiosyn-
crasies of a users interaction with a computer as a form of authentication. So far in this
eld the most promising techniques focus on patterns in the timing of a users typing. We
shall refer to this as biometric keystroke authentication. Compared to more conventional
biometrics such as ngerprint or iris authentication, it oers the following advantages:
It does not require special tools or hardware, only a conventional keyboard.
It is non-invasive for the user.
It can easily be deployed in conjunction with existing authentication systems.
Can be collected without the users knowledge.
In terms of combining with existing systems, much research has gone into investigating
the viability using the typing behaviour present upon the entry of password credentials as
an additional layer of authentication. This technique could potentially overcome the short-
comings of passwords, as not only must the password be known, but it must be entered in
the manner of the legitimate user. While this technique is promising, there has been little
work that takes into account how a user learns to type credentials. Yet acknowledging
this learning process could be a very important feature of a viable biometric keystroke
authentication system.
To investigate this issue this project will examine the timing patterns a user generates
when they type a password-like phrase on a conventional keyboard. For each entry of a
passphrase we will capture a temporal signature that is comprised of the precise timings
of when each key is pressed, the duration it is held down for (known as dwell) and the
point at which it is released.
To gather this data we shall get a group of volunteers to learn a passphrase by repeatedly
typing it into a computer. Our experiment will be set up like a traditional password entry
system, where the passphrase must be entered correctly for the volunteer to proceed. The
computer will be running a program that records the precise timings of each keystroke,
3
such that afterwards we can easily run the data through a series of analysis and visualisa-
tion techniques.
This data will be used to explore several questions relating to the way in which people
learn to type passwords. In general we are hoping to answer the question: Can we identify
people by the way they type a password? But in terms of specic research goals, we are
hoping to explore:
How do users intuitively fragment a passphrase?
When learning to type a passphrase, how long does it take people to produce a stable
signature?
For a given passphrase, can we distinguish between people who spent time learning
it, and those who didnt?
For a given passphrase, can we distinguish between two individuals who have both
learned the passphrase?
It is hoped that answering these questions will provide valuable information for cognitive
science, and practical insights for the development of biometric keystroke authentication
systems.
1.2 Overview of Existing Work
Over the last few decades keystroke biometrics has been the subject of a variety of research.
Rather than attempt to give a complete overview of the eld, we shall focus on work that
is pertinent to this report.
Gains et al [1] provides one of the rst documented attempts at using typing to identify
a user. This experiment saw seven professional typists enter three large bodies of text
into a PDP11 computer, which was running a program that recorded the timings between
each keystroke. Using this dataset, they proceeded to investigate the time delays between
certain pairs of consecutively typed letters, known as digraphs (eg th or ng). By per-
forming statistical analysis on this data they showed that it was possible to use digraphs
to distinguish between the dierent typists. While this is encouraging, its relevance to
authentication is limited by the fact the experiment was based upon a sizeable corpus of
natural language, yet passwords are small and are not constrained to only natural language.
Using digraphs is an approach that was expanded upon by Leggett et al [2], who in-
tegrated this technique into a dynamic system that performed continuous authentication
as the user performed their everyday tasks on the computer. While this situation is very
dierent from the more static verication we are exploring, its accompanying report does
4
demonstrate the importance of proling users with as much data as possible when at-
tempting to build a successful authentication system. In addition it also gives a good
introduction to the ideas of false rejection rate (proportion of legitimate users locked out)
and the false acceptance rate (proportion of impostors let in). As might be expected, they
showed that false rejection and false acceptance were linked such that reducing one would
increase the other, and that this interaction can be tuned depending on the practical needs
of a system.
While Gains and Leggett made use of statistical analysis and modelling to solve the
authentication problem, the work of Brown & Rogers [3] takes an approach closer to the
domain of articial intelligence. They showed that neural networks can be used to identify
specic users based on the typing dynamics of just their name. Unlike previous studies
they proved that short phrases can be used for authentication. Interestingly, their data set
was more complicated that simple digraphs, instead consisting of the timings of when each
key was pressed and released. This is an approach that we may adopt in this experiment.
Another advantage in the work of Brown & Rogers, is that unlike simple statistical mod-
els, neural networks are capable of capturing non-linear dynamics. This is true of a range
of machine learning techniques that include support vector machines, self-organising maps
and genetic algorithms. An up-to-date and very comprehensive overview of these and other
techniques is given by Banerjee & Woodard [4]. One of the most important distinctions
they make is that some research is concerned with the task of authentication (verifying
that a user matches a particular prole) yet other research concentrates on the task of
identication (identifying which of several proles a user matches). This is something we
must carefully consider in this report.
As previously mentioned, much of this existing research fails to address the learning
process for passwords. A notable exception is Tapiador & Sig uenza [5] who studied this
learning process in terms of temporal signatures. They found that over a period of ap-
proximately 20 trials these signatures do begin to stabilise. Not only do we to hope to
reproduce this in our experiment, but we shall be looking into the cognitive science that
accounts for this trend.
5
2
Research Goals & Hypotheses
There are many issues, questions and investigations that could be pursued in this eld.
Due to the time constraints of this project and report we have chosen to look into three
specic issues. In doing so, we wish to corroborate some of the existing research in this
eld, and perhaps investigate some of these questions in a novel way.
2.1 How People Intuitively Fragment a Passphrase
When people initially type a passphrase they have never seen before, the passphrase often
gets fragmented into smaller pieces. This is in the sense that the passphrase is entered
in a succession of quick bursts of keystrokes, separated by longer pauses. This is caused
by the way in which human cognition breaks information into chunks. Put more simply,
consider the situation where a computer user is presented with printed material displaying
a random alphanumeric passphrase that they must enter into a computer. Some people
will fail to hold the entire passphrase in working memory, so instead of entering it in one
complete motion, they look back and forth between the printed phrase and the computer,
entering the passphrase in stages.
When carrying out memorisation tasks like this, our performance, and hence the way in
which we fragment a passphrase, is underpinned by our ability to utilise working memory.
This link has been examined by many studies, but of particular relevance to passphrase-
like sequences is the work of Mclean & Gregg [6], who showed how it is possible to use
the pauses in the verbal recital of a memorised sequence of random letters, to reveal the
structure of how the sequence is chunked in working memory.
Chunks are a fundamental principle of memory. They serve as a cognitive compression
mechanism that groups many pieces of information into a fewer number of larger items.
One might argue that the fragmentation of a passphrase seems exactly the opposite (the
decomposition of a large item into several smaller ones). However, if you consider each
letter in the phrase as an individual item, the task can instead be thought of as how we
merge these letters together into reasonably sized groups.
While we can certainly expect to nd chunking patterns in the timing of a passphrase
entry, it is still a reasonably open question as to how this chunking behaviour can be used
to identify individuals. Existing work in keystroke biometrics mostly only builds models
around learned phrases that produce stable patterns and tends not to consider using the
patterns found in the initial rst few entries of a new passphrase as material for identi-
cation. Consequently, in this investigation we are going to study these initial patterns
in an attempt to determine the extent to which the fragmentation process varies between
individuals, and hence carries identifying information. Figure 2.1 shows how two people
6
might type in the same passphrase dierently.
Figure 2.1: Diagram showing two users performing fragmentation of text dierently
Examining fragmentation behaviour could yield valuable insights for cognitive science.
Certainly, results from this experiment should be examined in the context of existing re-
search in this eld. In terms of practical use, it is possible that authentication systems
could benet from an understanding of how users fragment passphrases they are unfamiliar
with. In essence, it could comprise part of a prole of their typing behaviour.
There has already been some work that backs up this idea. For example, Banikowski et
al [7] give a comprehensive overview of how the ability to organise and digest information
naturally diers from one individual to the next. They note that this performance is not
xed, and improves when the user is familiar with the involved domain. Certainly, in the
context of our experiment, an individual who frequently has to memorise passphrase-like
sequences is more likely to be ecient and swift in their adaption to novel phrases. Indeed,
Banikowski et al demonstrate how this behaviour can be learnt and outline a variety of
deliberate strategies for increasing the eciency of working memory.
This leads to our rst hypothesis: Due to variations in human chunking abilities, the
way in which we perform fragmentation of a passphrase varies between individuals. If this
is true, then we can consider how we might model this behaviour such that it forms part
of a users biometric prole.
To get insight into how people perform this fragmentation process, we need to examine
a data set of the temporal signatures produced from volunteers typing in passwords they
have never seen before. Ideally we should include a diverse range of users. To perform
7
analysis, there should be a robust way of detecting fragmentation patterns.
2.2 When learning to type a passphrase, how long
does it take an individual to produce a stable sig-
nature?
It has long been known that when acquiring a new skill, our progress tends to follow a
learning curve, where our performance of a task increases logarithmically with respect to
the number of times it is performed [8]. Typically after a certain number of repetitions,
performance will hit a plateau at which its rate of increase will become very small. It is at
this point where the task is often considered to be learned as well as it reasonably can be.
In the context of learning a passphrase, performance can be measured as the time taken to
perform a correct entry. By extension, this is a measurement of the speed and condence
at which an individual recalls a passphrase from memory.
To recall a passphrase it must rst be encoded into long term memory. In the case of
passwords, the standard technique is to use repetition. Not only does this keep the pass-
word active in working memory but it helps preserve the order of the characters in the
password. As typing a password is a physical task, its transition to long term memory
will be aided by what is known as procedural learning. This is a mechanism that encodes
physical tasks in long term memory as a series of implicit memories. These are implicit
in the sense that the repetitions will reinforce each other without us consciously recalling
previous repetitions when performing the task.
Once a password is learned in this manner, the task of typing it will involve minimal
engagement of working memory, with less conscious awareness of the characters that con-
stitute it. This eect should be particularly pronounced in skilled typists, whose speed of
typing will eventually transform the high level task of memorisation and recall, into a lower
level subconscious motor task. This eect is observed by De Luca et al [9] who demon-
strated though trials that people entering a personal identication number (PIN) into a
cash machine have a tendency not to remember the numbers, but instead to combine the
spatial movements between individual numbers into a shape or pattern that is mentally
superimposed onto the machines number entry pad.
Given this form of learning, we can expect individuals to progress to a point where they no
longer have to refer to written material to enter the password and can recall it completely
from memory. As this progression happens, we can expect the entry time to decrease.
Furthermore, existing work in biometric keystroke authentication suggests that during this
process, the keystroke timings will converge onto reasonably consistent patterns. This is
corroborated by the theory of implicit memory, which indicates that the temporal patterns
8
should cumulatively reinforce each other by way of repeated motor actions.
This helps form our second hypothesis: Over several trials, the timing signatures pro-
duced by an individual learning to type a passphrase will stabilise by converging to a
certain length and pattern. If this is indeed the case, we shall attempt to determine ap-
proximately how long it takes to generate a stable signature, and if this time period is
roughly uniform between all volunteers.
Being able to model the learning process in terms of keystroke signatures could be valu-
able in augmenting password authentication. In particular, a machine learning technique
which performs authentication based on signatures might only want to form a model on the
users password entry once the user had reached a point of regularly entering the password
in a stable fashion.
To investigate this, we need data of volunteers entering a password in a repeated fashion
until they produce a stable signature. To help ensure this, we should use existing literature
to establish a lower boundry for the number of trials to schedule. To make sure the results
are not tied to one particular passphrase, it would be desirable to study dierent groups
learning dierent passwords.
2.3 Keystroke-biometric Authentication
The essence of this investigation is to determine the extent to which it is possible to iden-
tify users based on the temporal signature generated when they type a passphrase on a
keyboard, and if so, can this be used as a biometric to augment computer authentication
systems. This idea is based on the assumption that dierent users demonstrate noticeably
dierent temporal patterns when typing the same text. In terms of authentication, it is
hoped that a malicious user with stolen credentials will enter them in a way that is distin-
guishable from the genuine user.
The use of human interaction with technology as a basis for identication actually pre-
dates the modern computer. For instance, in World War II, military intelligence could
identify the sender of a morse code telegram based on the rhythm of the dots and dashes.
[4]. Each operator had a distinctive style, known as his st, which according to investi-
gations by Bryan & Harter[10], was typically enough to identify a sender. They reported
that one experienced operator who worked with over 40 operators on a regular basis, had
little trouble distinguishing between their various rhythms, speeds and syncopations.
Operating a telegraph and typing on a keyboard share a lot in common. They are both
learned motor skills that involve us translating human language into mechanical motions.
The level of expressiveness these motions aord is surprisingly high. For instance, Epp et
9
al [11] showed that it is possible to determine the emotional state of a computer user to a
reasonable degree of accuracy, based solely on their typing style.
As explored in the previous sections, existing research has demonstrated that this ex-
pressiveness found in typing behaviour has led to positive results for both authentication
and identication using keystroke dynamics. Based upon this evidence we can expect to
nd strong patterns in the data we capture and hope to conrm that keystroke biometrics
is a viable technique.
Out of all the techniques which have been explored, the use of self-organising maps
(SOMs) is one technique that has been relatively neglected. This is surprising as SOMs
are naturally good at identifying patterns in high dimensional data and have the ability
to cluster information in an unsupervised manner. Joshi et al [12] successfully used self-
organising maps for authentication by creating a small one-dimensional map for each user.
However, there are still uses of SOMs that remain unexplored, such as planting several
users on one SOM to reveal the dierences between them. In this experiment we shall
investigate the feasibility of this as a technique for user identication.
However, like other non-linear techniques such as neural networks, SOMs are not intuitive
in the sense that while they can produce impressive results, it is not always clear how or
why these results are achieved. Sometimes this is not so much an issue, but in this case we
feel it is important to get a more intuitive grasp of the shape and form of the data. To do
so we shall supplement the use of SOMs with statistical analysis and bespoke visualisation.
As previously mentioned, we aim to use these techniques to uncover idiosyncratic fea-
tures of an individuals typing style. Such features are likely to be distinguished by two
general properties. Firstly, they must be uniquely identifying of an individual, which may
not be a problem in data sets with only a few people, but given a large number of people
to identify, this is a serious consideration. Secondly, these features must be stable and not
uctuate wildly. In other words there has to be a good signal-to-noise ratio. Good robust
features should be both unique and stable.
The most likely place to start looking for these properties is not in the raw timestamps
of each key-press, but in the delay times between successive keystrokes. Like much of the
existing literature, we shall refer to these timings as digraphs. The relationship between
raw timings, dwells and digraphs can be seen in gure 2.2. To be able to properly address
the question of authentication and identication we need a selection of passphrases, each
of which is learned by several volunteers, who enter their chosen passphrase multiple times.
In addition, it would useful if for each of the passphrases we had a variety of people perform
one-o entries. This would enable us to not just compare between people who have learned
a passphrase, but between those who did and those who did not.
10
Figure 2.2: How digraphs are extracted from a temporal signature
To conclude this section, our third hypothesis is that it is possible to distinguish between
individuals using the temporal signatures produced from keystroke timings. (Biometric
Keystroke Authentication)
2.4 Approach
To answer these hypotheses we shall experimentally gather the required data. We will
then analyse this data using a combination of visualisation, graphing, statistical tests and
self-organising maps. This data shall come from volunteers interacting with bespoke soft-
ware that we shall design and implement. To ensure that the data is close to what might
be used in a functioning keystroke authentication system, we shall attempt to provide a
realistic experience for the volunteers.
In terms of what techniques we are to use for each hypothesis, our justication is as
follows:
11
Hypothesis Technique Reasoning
1 Visualisation
of fragment
patterns.
Fragmentation can be visualised very naturally. Doing so
will allow us to get a quick and comprehensive understand-
ing of how volunteers go about this process.
1 Collation of
common tim-
ing patterns.
This will allow us to get a quantitive measure of the diver-
sity of fragmentation patterns.
2 Investigation
of average
entry times.
This will provide a coarse but eective idea of volunteers
rate of progress. It will also allow us to easily examine our
work in the context of established theories of the learning
curve.
2 Visualisation
of volunteers
passphrase
observation
behaviour.
By looking at how and when the volunteers elect to observe
the passphrase, we can get a good idea as to how their
condence develops when entering the passphrase.
2 Charts of di-
graph timings.
This will allow us to investigate a volunteers progression
towards a stable passphrase entry. We will also be able to
cross-reference this with our other visualisations to see how
fragmentation patterns persist beyond the rst few entries.
3 T-test based
authentica-
tion.
T-test based authentication is a well established way of per-
forming authentication with temporal signatures. But as
this investigation will consider the eect learning a phrase
has on the resulting temporal signature, we can re-examine
t-test authentication in the context of learned and un-
learned impostors.
3 Self-
organising
maps.
The dimensionality reducing abilities of self-organising
maps make them a natural candidate for keystroke bio-
metrics, yet there have been few studies that explore them
in this context.
12
Like any experiment involving volunteers, there are certain legal and ethical frameworks
that we must operate within. We shall explore these issues in the next section.
2.5 Professional Considerations
This project must comply with the Code of Conduct and Code of Practice published by
the British Computer Society. The following sections address this projects applicability
to each section of the code of conduct.
2.5.1 Public Interest
This project has little direct application to this section of the code of conduct. However, if
elements of this research are used in authentication software, then arguably this increased
security is in the public interest. Certainly, no one is harmed by developing keystroke
dynamics as a biometric. On the contrary, keystroke dynamics is much less intrusive than
traditional biometrics such as retinal or ngerprint scanning.
The only other relevant issue is that we must ensure there is no discrimination with
regards to who gets chosen for participation in the experiment.
2.5.2 Professional Competence and Integrity
This section applies to how we conduct this research in the context of academia. In partic-
ular we must respect and value the viewpoints of other academics and accept criticisms of
our work. It is worth noting that we may oer volunteers entry into a prize draw for a small
amount of money as an incentive to my project. But this is in line with the Universitys
ethics approval, and is purely optional for the volunteers.
2.5.3 Duty to Relevant Authority
The most applicable rule of authority in this experiment, other than the ethical guide-
lines produced by the University Of Sussex, is the Data Protection Act in UK law. We
must be careful how volunteer data is stored, and how we share the results of our experi-
ment. We must also ensure volunteers are fully aware of how their data is going to be used.
While the experiment is running, the volunteers email will be used to authenticate and
identify them. But, when the experiment is over the data collected from the volunteers
will be anonymised such that it will not be possible to link identities or information back
to individual participants.
13
3
Method
3.1 Design of Experiment
3.1.1 High Level Design
To answer our particular set of hypotheses and research goals, we need an experimental
setup that includes these features:
Volunteers should type in a passphrase they have not seen before.
Volunteers must be given the chance to repeat entry of the phrase until they have
learned it.
There should be several passphrases in use.
Each passphrase should be learned by many volunteers.
Each passphrase should be attempted (one o entry) by even more volunteers.
In this case, the term entry refers to the typing of a passphrase into a standard computer
keyboard. We shall elaborate on this, but rst we shall consider the high level design of
the experiment. To fully meet our requirements, the experiment will be constructed as
follows:
The volunteers will be divided into several groups.
Each group will be assigned a primary phrase and a secondary phrase.
The primary phrase will be randomly generated.
The secondary phrase will be the primary phrase of another group.
Each volunteer will be required to enter the primary phrase of their group a xed
number of times. At the end of the experiment they will act as an impostor for
another group by entering the secondary phrase.
In this setup the independent and dependent variables are:
Independent variables: Individuals, passphrase.
Dependent variable: Temporal signature.
By varying the individuals across the same passphrase we increase the certainty that iden-
tifying patterns in the temporal signatures are due to the individual, not the passphrase.
By having several groups, each learning a dierent phrase, we can verify that identifying
phenomena are consistent across dierent passphrases. Having each group act as one time
impostors for another group allows comparison between those who have learned a phrase
14
and those who have not.
This high level design provides a good framework for capturing the kind of data we are
interested in. However there are still lots of details to be worked out. The following sections
will discuss these issues and decide on reasonable solutions.
3.1.2 Construction of Passphrase
The volunteers are to be presented with a generated passphrase. There are several options
as to how the passphrase is generated.
For instance, pass-phrases could include dictionary words. However, this is likely to be
a bad idea, as certain volunteers may be familiar typing particular words. This would add
an extra layer of complexity and make analysis harder.
The best way to ensure that the pass-phrases are completely new to the volunteers is
to create randomly generated sequences of characters. To keep the experiment as simple
as possible, it is probably preferable to keep the pass-phrases as lower-case and alphabetic
only. This will also reduce the likelihood of volunteers getting thrown o by varying
keyboard layouts.
3.1.3 Length, Frequency & Format of Experiment
It was initially considered to have the volunteer participate in one or two sessions where
they would sit down and repeatedly enter a pass-phrase until they had learnt it. However,
while such a set-up allows complete experimental control, it is a very un-natural learning
situation and is likely to feel excessively articial or tedious. One solution would be to
have a user periodically enter a password while they perform some other non-essential
task (possibly of their choosing). However, this is still a fairly articial way to learn and
volunteers might not even manage to learn a password in the amount of time allotted to
such an experiment.
A better solution would be to have the volunteers enter the passphrase only once or twice
a day, but over a period of one or two weeks. This would allow volunteers to perform the
learning in a much more natural way. One way to implement this would be to have regu-
lar meetings in which the volunteers could participate, but we feel that this is not practical.
Instead we decided that the experiment shall be distributed and remote. Users will take
part, in their own time, with their own computers. This shall be achieved by placing the
experiment on a website. Not only is this convenient, but it provides us with a more real-
istic dataset that is closer to what a real biometric keystroke authentication system would
have to deal with.
15
In terms of the length of the experiment Tapiador and Sig uenza [5] demonstrated that a
user can learn a password after approximately 20 entries. However, we also wish to capture
a substantial number of signatures once the phrase has been learned. Given that this is
the case, it was decided that 40 entries should give us enough data to work with. The last
4 of these entries will see the volunteer use their groups secondary phrase to act as an
impostor for another group.
At each session, a volunteer will use their computer to connect to the experiment web
site. To reinforce the learning, each of these sessions will consist of two entries. This means
that 40 entries can be gathered over 20 sessions. In an eort to make the experiment easier
to participate in, volunteers can decide when they complete these sessions, with only the
following restrictions:
There must be at least two hours between each session.
Up to two sessions can be performed per day.
Volunteers will be encouraged to try and complete sessions twice daily.
Should a volunteer perform one or none on a given day, then the experiment will
simply take more days to complete.
With this setup, the experiment will take a minimum of 10 days for each volunteer to
complete.
3.1.4 Display and entry of pass-phrase
As the experiment begins, the volunteers are going to need to be prompted with the pass-
phrase. By prompting, we mean displaying the passphrase prominently on the screen. If
we were to simply to tell the volunteers their passphrase, and let them choose how to store
it, many of them would write it down on paper or use a password manager. But as this
behaviour is inconsistent, it would devalue our data. For this reason, all prompting of the
passphrase shall take place on the screen where we have full control of it.
However, there are two issues to consider here. Firstly, we must decide how long into the
experiment we keep prompting the volunteer. Secondly, we must decide how the prompt
displays the pass-phrase. In particular, we need to look at how we can stop the volunteer
from typing in the pass-phrase as they read it o the screen. If they do this, then it is
harder to see how the phrase is being chunked up and the volunteers are less likely to
commit the phrase to memory.
Longevity of prompting
One option would be to slowly prompt the volunteers less as they learn the pass-phrase.
But this might prove dicult to implement correctly. A better solution would be to stop
16
immediately prompting the volunteers after a xed amount of time into the experiment,
and after this point have the volunteers elect to reveal the pass-phrase.
Method of prompting
If the volunteers are allowed to look at the prompt and type at the same time, there is
a danger that those who are uent touch typists will simply type as they read o the
screen, without actually committing the password to working memory. A good solution is
to design the experiment such that volunteers can toggle between viewing the prompt and
typing, but cant type and look at the same time.
Repeated entry
As explored when discussing the frequency of the experiment, to speed up learning, it is
desirable to have volunteers perform entry of the pass-phrase twice in one sitting. This
shouldnt take much longer for volunteers to complete and it adds value to the experiment.
Display of entered text
Intuitively it might seem a reasonable suggestion to show the user what they have typed
so far in clear text. But it is worth considering if this is the right way to implement visual
feedback in this experiment.
When a user of a website enters a password it is usually displayed as a series of periods
or asterisks (to stop prying eyes from picking up their password). As this is an accepted
standard, implementing visual feedback in this way would make the experimental results
more applicable to password authentication. Furthermore, it forces the user to enter the
password in one go, with no going back to edit what has already been entered. If the
user wishes to do so, they must start again. Enforcing this will result in cleaner temporal
signatures that are easier to analyse.
Incorrect entry
Rather than reject individual mis-typed letters, it is easier to reject completed phrases and
allow the user to enter the entire phrase again. This is to ensure all temporal signatures
are the same and match the passphrase correctly.
However, it may still be worth while recording the keystrokes of the incorrect phrases,
as this could provide insights into the learning process.
3.1.5 Entry devices
While we wish for all volunteers to use standard keyboards, the experiment is run remotely,
so volunteers may attempt to to enter the pass-phrase on a non-standard keyboard device
17
(eg mobile phone keyboard). The implementation of our experiment will try to prevent
this.
3.1.6 Data Security
It would be unacceptable to lose experimental data, so due to the lengthy nature of this
experiment there should be regular backups of the entire system. In the event of the
primary instance of the experiment failing it should be possible to resume it from backup.
3.1.7 Summary
Amalgamating all these requirements together gives up an experiment that will get the data
needed to investigate our hypothesis in a robust and secure fashion. Volunteers progress
through this experiment can be divided into three stages:
Stage Phrase Repetitions Prompt
1 Primary 20 Immediately
2 Primary 16 Upon choosing
3 Secondary 4 Immediately
In the above table the primary phrase refers to the main passphrase that each group
of volunteers will be learning. The secondary phrase refers to the phrase that each group
will enter at the end to act as an impostor for another group. In stages 1 and 3, when
a volunteer takes part in an entry session, they will be shown the passphrase they are
learning before entry. However, in stage 2 they will be asked to enter it from memory, and
will only be shown the passphrase if they elect to do so.
We have been careful to make sure that all the features needed to investigate our hy-
pothesis are included in this experiment. The way in which the stages of the experiment
correspond to each hypothesis is shown in gure 3.1. In the next section we will discuss
how we implemented this design and explain what technologies we used.
18
Figure 3.1: Diagram showing the design of the experiment
.
19
3.2 Implementation
Using the design described in the above section the experiment software was successfully
built, tested and used to gather data. Rather than give an in-depth analysis of the code,
we shall provide an more informative overview of the technologies we used, and describe
the interaction with the software that volunteers experienced.
3.2.1 Technologies
We built the experiment software as a web application using the following technologies:
HTML5 + Javascript - For creating the information pages and javascript client
Twitter Bootstrap - CSS & Javascript library for styling the HTML pages
Node.js + Express middleware - Web server to serve content and power the restful
API
Jade Templating Engine - To allow dynamic rendering of HTML
MongoDB - Database to provide persistence for data gathered.
The volunteer was authenticated with the site using their email and a password. (Not to
be confused with passphrase they were learning). Once authenticated they were directed
to a page containing a javascript client which allowed them to perform the experiment.
This client queried the servers RESTful API to retrieve the pass-phrase to prompt the vol-
unteer with, along with any other experimental parameters. Once the pass-phrase entry
was completed, the temporal signatures recorded were submitted through the same API.
These were then stored in JSON notation in the MongoDB store. The entire system was
backed up remotely and automatically to another server using rsync. The design of the
experiment server is shown in gure 3.2.
To make the experiment easier for volunteers to complete we used the middleware to
create log-in sessions. We also used long term cookies to re-authenticate volunteers who
returned with the same web browser on the same computer. This way volunteers were not
required to login very often.
The software was designed to be very modular. The core modules are shown in gure
3.3. This means that if similar experiments are required, the software to very easily be re
congured to with dierent groups, passphrase and schedules.
20
Figure 3.2: Diagram showing the design of the experiment server
Figure 3.3: Diagram showing the node.js components used in the experiment server
21
Path Method Resource Functionality
/experiment GET HTML Provides the experiment web client
/data GET JSON Returns the current experiment (if there is
one) for the currently logged in volunteer.
/data POST JSON Receives collected experimental data from
the client.
/signup GET HTML Provides the web page where volunteers can
register to take part.
/signup POST JSON Accepts registration data for new volunteers.
/login GET HTML Provides a web page where volunteers can log
in.
/login POST JSON Accepts the username and password of a vol-
unteer and logs them in.
/logout GET HTML When navigated to, logs out the currently
logged in volunteer.
/participation GET HTML Returns a web page that lists information
about participating.
/consent GET HTML Returns a webpage that records the volun-
teers consent to take part.
/admin/ GET HTML Returns a web page that provides an admin-
istrators overview.
/admin/user GET HTML Returns a web page that provides an
overview of a volunteers progress through the
experiment.
/admin/data/summary GET JSON Returns a summary of all volunteers progress
through the experiment.
/admin/data/user GET JSON Returns all experiment data for a given vol-
unteer.
/admin/data/everything GET JSON Returns a database dump of all experimental
data.
Figure 3.4: Table detailing the http interface to the experiment server
3.3 Running the Experiment
To register to take part in the experiment, volunteers had to be given information about
what they were taking part in and had to tick a box indicating they agree to our terms
and conditions. After this they chose a username and password then logged-in for the rst
time. This process can be seen in gure 3.6.
When a volunteer registered they were allocated into one of the three phrase groups. In
22
Figure 3.5: The process for typing in a passphrase
an attempt to keep group numbers easy, the software always allocated each volunteer to
the group with the fewest members.
Once signed-up and authenticated, volunteers could begin completing the experiment.
As per the design, volunteers could toggle between entering the phrase and looking at it.
This process is shown in 3.5.
To make it easy to keep track of volunteers, the experiment software included an ad-
ministration page which listed volunteers and showed their progress. It also provided a
visual indication that highlighted volunteers who had not completed an experiment in over
24 hours. This page, pictured in gure 3.7 was not accessible by any of the volunteers.
To perform preliminary analysis of the temporal signatures as they were being collected,
we created another administration page that displayed the entries of each volunteer. This
page, seen in gure 3.8 shows dwell times, digraphs and typing patterns.
23
Figure 3.6: The sign-up process for volunteers
24
Figure 3.7: An excerpt from the administration console
Figure 3.8: Screenshot of the volunteer info page
25
At the end of the experiment we had captured the data of 20 volunteers across 3 groups.
A greater number of volunteers registered to take part, but many of them did not complete
the experiment. Unfortunately, this meant in our dataset of completed experiments the
groupsizes were uneven. These groups, the number of volunteers in each, and the phrases
they learned can be seen in the following table:
Group ID Volunteers Primary Phrase Secondary Phrase
1 5 zmlntdzqdmg fhtvlqwlesf
2 8 vrfpebvvewe zmlntdzqdmg
3 7 fhtvlqwlesf vrfpebvvewe
To analyse this data, we decided to use the Scala programming language. Scala provides
a hybrid of functional and object-oriented methodologies which make it great for manipu-
lating data. To get our JSON data into Scala, we used the Lift-web JSON parsing library.
Custom visualisations were implemented using Processing.
26
3.4 Analysis Techniques
3.4.1 Using Students T-test for Authentication
Students t-test provides a measure of whether the mean of two groups are statistically
dierent from each other. Rather than just using the dierence of the mean, it takes
into account the variability of each group. This makes it particularly useful for keystroke
authentication, as dierent users may enter the passphrase with dierent degrees of vari-
ability. In essence, it provides an assessment of signal over noise.
The t-test of two distributions, a and b, is dened as following:
t =
x
a
x
b

vara
na
+
var
b
n
b
(3.1)
Where for each group: x is the mean, n is the size and var is the variation.
Usually the resulting value of a t-test is looked up in a statistical table that gives a
measure of signicance. However, in this work we shall be adapting the t-test into an
authentication mechanism that determines whether or not a candidate signature belongs
to a collection of temporal signatures that represent a user.
This authentication procedure will use the delay between pairs of key-presses, known as
digraphs, that are present in each temporal signature. To do this we take all an individual
users signatures, and extract a collection of timings for each digraph. This process is
illustrated in gure 3.9. This means that we can take in to account the variability of each
individual digraph, which is advantageous as a user may vary certain digraphs more than
others.
These collections of digraph timings form the users model that we will authenticate
upon. To perform authentication of a candidate signature, we simply use the t-test to
compare each digraph of the candidate signature with the corresponding collection of di-
graphs in the users model. The values from each t-test are summed together to give a
similarity score. If the score is below a certain threshold value, then the candidate signa-
ture is deemed as belonging to the user and it passes authentication. This process is shown
in gure 3.10.
27
Figure 3.9: A diagram showing how digraphs are collated for use in authentication
28
Figure 3.10: A diagram showing how the t-test can be adapted to perform authentication
29
To assess the suitability of this t-test authentication method, we can perform authenti-
cation attempts using the data gathered from our experiments. For each volunteer we will
simulate attempts from the volunteer in question, from impostors who have not learned the
phrase (acquired from another groups secondary phrase), and from impostors who have
learned the phrase (acquired from other volunteers in the same group). The proportion
of incorrectly authenticated and rejected signatures will measure the performance of this
technique. To get these measures for each volunteer we perform the following procedure:
Decide on a threshold value to compare against the results of the t-test.
Gather the volunteers signatures that are usable for authentication.
For each of the usable temporal signatures, exclude it such that it is not used for
modelling. We then compare the remaining signatures to the excluded signature
using t-test authentication and classify the result using the given threshold. The
number of legitimate signatures rejected gives the false rejection rate.
Gather the same number of naive impostor signatures as there are usable signatures,
and compare each naive impostor signature to the collection of usable signatures
using t-test authentication. As before classify with the threshold. The number of
signatures accepted gives us the false authentication rate for naive impostors.
Perform the same with impostors who have learned the phrase to get the false au-
thentication rate for trained impostors.
By averaging the false rejecting and false authentication rates, we can get a clear idea of
the performance of t-test based authentication for that given threshold value. Then by
repeating this process and varying the threshold value, we can experimentally determine
the optimum threshold.
3.4.2 Using Self-organising Maps for Identication
Self-organising maps (SOM), invented by Teuvo Kohonen, provide a way to represent mul-
tidimensional data in a lower number of dimensions while still preserving the topological
layout of the data. SOMs are notable for being an unsupervised learning algorithm. Instead
of using target values or categories to guide the learning process, it uses the structures and
relationships within the input data. Input data simply consists of a collection of vectors
with uniform length.
30
Figure 3.11: A diagram showing the components of a self-organising map.
As shown in gure 3.11, a SOM is comprised of a lattice of nodes, usually in two di-
mensions. Each node holds a weight vector of the same dimension as the input vectors.
Initially all the weights in the network are randomised and the following steps are iteratively
applied:
1. A random vector is chosen from the input data.
2. Every node is compared to this input vector using the euclidean distance between
the nodes weights and the input vector. The node which is most like the given input
is marked as the best matching unit (BMU)
3. The neighbourhood of the BMU calculated. This is a collection of surrounding nodes
that fall within a certain distance from the BMU. This distance - the radius of the
neighbourhood - is a value that starts large but decreases with each iteration.
4. Each of the nodes in the neighbourhood has its weights altered to make them more
like the input vector. Nodes that are closer to the BMU will have their weights
altered to a greater extent. The inuence of the BMU is shown in gure 3.12.
31
Figure 3.12: A diagram showing the inuence of a BMU within its neighbourhood.
For time step t, the radius r is calculated using the following formula:
R(t) = R
0
exp

(3.2)
is a time constant that is calculated from i, the total number of iterations to perform,
and n the radius of the lattice:
=
i
log(n))
(3.3)
When shifting a nodes weight towards an input vector, the amount adjusted is a pro-
portion of the dierence between the input and the weight. This proportion is known as
the learning rate. To make this relationship follow a gaussian curve, the inuence of the
distance, , at a given point in time, is modelled as:
(t) = exp

dist
2
2R
2
(t))

(3.4)
As we have just discussed, the learning rate decreases the further away it is from the
BMU. However, this learning rate, L, also decreases with respect to time much in the same
way the radius does:
L(t) = L
0
exp

(3.5)
32
Taking all of this into account, for a input vector, V , the new value of a nodes weight
vector is calculated as:
W(t + 1) = W(t) + (t)L(t)(V (t) W(t)) (3.6)
One the map has been trained, it can then be visualised. Each input vector can be
superimposed on the map, at the position of its best matching unit, revealing the structure
of the data.
33
4
Results
Now that we have collected some experimental data we shall analyse it. Each of the
following sections will investigate one of our three hypotheses. As our volunteers were
eectively users of our software, we shall use the term volunteer and user interchangeably.
4.1 Passphrase Fragmentation
The rst of our three hypotheses concerns the way in which a volunteer initially frag-
ments a passphrase into sections. To explore this we need a way to visualise the patterns
present in the rst few experiments of each volunteer in such a way that they can be ex-
amined and compared. The visualisation in gure 4.1 shows our attempt at creating such
a visualisation.
Figure 4.1: Visualisation showing the fragmentation patterns present when volunteers are
confronted with new phrases
34
It shows the fragmentation patterns of the rst four entries for both of the two phrases,
for all volunteers. Each rectangle represents a passphrase entry. The vertical lines indi-
cate points at which the volunteer stopped entry to look at the passphrase. The spatial
position of each line indicates at what point in the entry this occurred, where the leftmost
side of the rectangle is the beginning of the passphrase, and the rightmost the end. Thus
the collection of lines inside a rectangle represents the fragmentation pattern of that given
entry. When this visualisation was rst created it did not include fragmentation patterns
present in failed entries. However, sometimes the most informative patterns are displayed
in failed entries, so they were introduced but displayed using a dierent colour line.
An initial examination of this visualisation reveals that volunteers use a wide variety
of fragmentation patterns. Many entries have no fragmentation at all, while others are
split into 4 pieces. This visualisation also reveals there is some degree of consistency be-
tween how they enter both phrases. In particular it shows that fragmentation of the rst
passphrase seems to be a strong indicator that the volunteer will fragment the second
passphrase. However, some volunteers (6,7,8) perform little to no fragmentation upon the
rst phrase, but considerably fragment the second phrase. This could be due to the fact
that these volunteers had become used to the format of the experiment and were surprised
by the sudden change to a dierent passphrase, thus altering their behaviour.
While consistency between both phrases isnt perfect, there are plenty of examples where
a particular pattern is repeated in both passphrases. For instance, volunteer 15 has a dis-
tinct way of breaking the passphrase into two chunks. On the other hand, there are a
surprising number of volunteers who demonstrate very little fragmentation, so any per-
ceived consistency is simply the lack of any pattern. Interestingly, volunteers will often fail
entry one or more times, but still not resort to looking at the phrase. This is a surprising
behaviour but it unfortunately has resulted in less fragmentation patterns than was hoped
for.
However, some of the failed attempts did yield patterns. And when there is an entry that
has both patterns for failed and successful attempts, the two patterns tend to reinforce
each other. These can be seen as the purple lines on the visualisation. This demonstrates
how volunteers will repeat fragmentation strategies.
Another interesting pattern is that some volunteers demonstrate convergence of frag-
ments from one pattern to the next. This is in the sense that two separate fragments in
one phrase join to become a single fragment in the next. Good examples of this include
volunteers 1 and 6 on phrase one. This is revealing of how chunks of information are ma-
nipulated in memory during the learning process.
35
Figure 4.2: Graph showing the occurrence of dierent fragmentation patterns, taken from
the rst entry of each phrase
The chart in gure 4.2 shows the number of dierent fragmentation patterns present
in the initial entry of each passphrase. Given 20 volunteers who entered 2 pass phrases,
we have 40 of such patterns. Interestingly the most common pattern is no pattern at all,
accounting for 40% of all initial entries. The second most popular behaviour is to split
the passphrase into two section. Beyond this, are a variety of other patterns. Generally,
it seems that longer patterns (more chunks) are less popular, and even-size chunks are
preferred.
Some of the more unusual patterns could be due to the user getting familiar with our
experiment by exploring its functionality. If this is the case, then this is something we
should consider carefully for future experiments.
36
4.2 Learning & Stabilisation
Our second hypothesis focuses around the progression of the temporal signature from a
volunteer learning to type a passphrase. In particular, we are hoping to examine how long
it takes for them to form a stable entry, where they enter the passphrase in a consistent
way. To investigate this we shall consider entry duration, passphrase prompting behaviour
and digraph stability. As before, we do so using a mixture of visualisation and graphing
techniques.
4.2.1 Entry Duration
Because of its simplicity, time taken to enter a passphrase is a good rst measure to
investigate when considering the learning process. As we explored previously, we expect
the entry time to decrease logarithmically, following what is known as the learning curve.
Figure 4.3 shows the duration of each entry averaged across all volunteers. As is indicated
by the logarithmic trend line, the entry time does decrease following a learning curve.
Figure 4.3: The average entry time for each
One interesting feature of gure 4.3 is that it shows a increase in the average entry du-
ration at entry 20, which is the point at which volunteers are no longer shown the phrase
37
by default, and must either recall it completely from memory, or elect to prompted the
passphrase once more. Interestingly, it looks like the pattern from entry 20 onwards might
follow another smaller exponential curve. It is likely this represents volunteers adaption
to the change in the experiment.
Previously we mentioned that giving groups of volunteers dierent passphrases to learn
would allow us to see if the rate of learning is consistent between dierent random passphrases
of the same length. Figure 4.4 shows how the average duration of entry progresses for each
of the three phrase groups. On the whole, it seems to substantiate this idea as the three
series follow similar logarithmic curves. However, there is an interesting discrepancy for
the group fhtvlqwlesf on the second entry. Investigating this, it turns out there is an
anomalous signature, where one volunteer began entry, but for reasons unknown paused
for over 60 seconds before completing it.
Figure 4.4: Average entry time for each group
In terms of nding the point at which passphrase entry is performed in a stable fashion.
it seems that from approximately the 25th entry onwards, the entry duration is reasonably
consistent. While this is encouraging, one could argue that extrapolation of the trend line
indicates that the duration will keep reducing. However, due to the nature of logarithmic
curves, its questionable as to whether any reduction in duration beyond this point will be
observable over noise in the data.
38
4.2.2 Passphrase Observation
Another way in which we can explore volunteers learning process is by looking at how and
when they decide to observe the passphrase. As previously explained in detail, volunteers
can switch between entering and viewing the phrase, but not do both at the same time.
Initially the experiment defaults to displaying the passphrase, but from entry 21 and on-
wards, the experiment will default to not showing the passphrase and volunteers will have
to choose if they wish to reveal it again.
Figure 4.5: Visualisation of when and how volunteers elected to view the passphrase
The way in which volunteers view the passphrase is visualised in gure 4.5. This visu-
alisation is advantageous as it it allows us to follow each volunteers progress through the
39
dierent stages of the learning process. In this visualisation each row represents a volun-
teer and each column represents a particular entry number. As there were two entries per
session, the entries are paired together in twos. Each rectangle indicates a single entry of
the passphrase. The colour of this rectangle represents how the volunteer decided to view
the passphrase. The signicance of each colour is as follows:
Red indicates they saw the passphrase at the beginning, returned to look at it again
after they started typing.
Orange indicates they looked at the passphrase at the start and only the start.
Pink indicates they looked at the passphrase, but not at the start, only after they
had begin typing it.
Green indicates they completed entry without looking at the passphrase.
The blue line between entries 20 and 21 represents the point at which volunteers are no
longer prompted by default. It is worth noting that it is impossible for volunteers to get a
green mark before this point, as they will always be initially shown the passphrase.
There are several interesting trends displayed in this visualisation. The most general
pattern is the progression from frequently looking at the phrase, to looking at it periodi-
cally, to recalling it completely from long term memory.
Just from eyeballing the data, it seems that a little over half the volunteers managed
to really commit the phrase to memory. This can be judged by looking for consecutive
sequences of green boxes after approximately entry numbers 25 to 36. Interestingly, those
that repeatedly refer back to the phrase at around this point progress from red to orange
boxes (in the rst half of the experiment) later than other volunteers. This implies that
they are further back in their learning progress and potentially learn this task at a slower
rate.
One volunteer is particularly anomalous as they never really seem to learn the phrase.
They consistently refer back to the prompt and only begin to show signs of learning it
near the very end. A possible explanation is that this volunteer simply formed a habit of
relying on being prompted, and felt little need to learn the passphrase.
Another common pattern displayed in this visualisation stems from the fact that each
session has two entries. Typically, in the rst entry, the volunteer will look at the phrase
to remember it, and in the following second entry, they will enter it straight from working
memory. This pattern is particularly prominent in session 11, entries 21 and 22, where
there is an orange, green pattern.
Looking at this visualisation reveals a possible limitation in the design of this experi-
ment. This is the fact that volunteers may have entered the passphrase without looking
40
at it (represented by a green box), before entry 21, but because the experiment always
showed them the phrase in sessions before this point, they were not oered the chance to.
In conclusion, this visualisation shows that most volunteers learn the passphrase in some-
where between 20 and 25 entries. This is consistent with gure 4.3, which shows entry time
stabilising at around this point.
4.2.3 Travel-time Stability
While duration gives us a reasonable indicator of stability, it overlooks any of the nuances
of the temporal signatures. A better way is to examine the digraphs of the travel times
between keys. To do this we came up with charts that show these digraph times for each
volunteer. The charts for all volunteers can be found in the appendix, but four particularly
interesting charts are shown in gures 4.6 and 4.7. These charts can be read as follows:
Each series represents a certain digraph.
Each point signies that digraphs time in a certain temporal signature.
A points position on the y-axis signies the time delay (gap) between the two letters
in that digraph.
A points position on the x-axis indicates which entry number this digraph belongs
to. Therefore if you added all the times at a single x axis position, you would get
the duration of that entry. (minus dwell times)
Each chart is scaled to 3000ms. This is because the dynamics we are interested in
are sub 3000ms and also allows us to compare dierent volunteers easily.
These charts are informative, because if a series line is straight, it implies that the di-
graph it represents is stable. Conversely, an uneven series line indicates instability.
Looking at gure 4.6, we can see that the digraphs times for volunteer 5 are initially very
unstable. There are many high peaks on the chart, which are are good indicators for where
the passphrase was broken into fragments. For this volunteer, entry does not become par-
ticularly stable until about signature number 25. Interestingly, beyond this point digraph
5 is consistently larger than the others, a promising feature for identication. Volunteer 9,
also show in gure 4.6 contrasts with volunteer 5. In particular, it is comparatively much
stabler and does so from a much earlier point. In terms of identifying features, digraph 2
seems to be consistently longer than the other digraphs.
Figure 4.7 juxtaposes two volunteers with very dierent temporal signatures. Volunteer
11 forms a stable signature very quickly and has digraph timings that are remarkably
consistent. On the other hand, the digraphs in the signatures from volunteer 15 are com-
paratively erratic. Volunteer 15 never really seems to learn the phrase, and nearly always
41
enters it in two chunks.
In conclusion, these charts indicate that after a certain number of entries, the digraph
timing converge downwards towards reasonably stable values. An interesting trend is that
digraphs that are originally on fragment boundaries (where the volunteer stops entry and
refers to the passphrase) tend to remain longer than other digraphs even when the phrase is
learnt and digraphs are relatively stabilised. In other words, the way in which a volunteer
initially fragments a passphrase seems to set a precedent for how they will eventually learn
to type it. Examples of digraphs that display this pattern include digraph 5 for volunteer 5,
digraph 2 for volunteer 9 and digraph 4 for volunteer 11. Generally, it seems that digraphs
inside original fragment boundaries are consistently small, while digraphs that are between
these fragments tend to be proportionally longer. We shall explore how this relates to
chunking in a later section.
42
Figure 4.6: Digraph stabilities for volunteers 5 & 9
43
Figure 4.7: Digraph stabilities for volunteers 11 & 15
44
4.3 Identication & Authentication
The last of our three hypothesises is concerned with the extent to which we can use the
idiosyncrasies and nuances in a volunteers typing behaviour to distinguish them. We shall
investigate two very dierent techniques, students t-test and self-organising maps.
4.3.1 T-test Authentication
As was explored in the previous section, most people do develop a typing behaviour that
produces a stable temporal signature. Our rst method of determining the usefulness of
these signatures is Students t-test. To correctly model each user, we had to devise a way
of gathering usable signatures. It was found that signatures which the volunteer typed
in one go without referring to the prompt, and do not contain any digraphs greater than
1500ms, were suitable for this purpose. It is worth noting that one volunteer was dropped
from this section of the experiment for having too few usable signatures.
With the usable signatures from each volunteer, a series of authentication attempts were
simulated to nd the optimal threshold for the t-test. Figure 4.8 shows the results of these
simulations.
Figure 4.8: A graph showing how changing the authentication threshold aects authenti-
cation rates.
The three series on the graph show the average error rates for three distinct authentica-
tion scenarios. The rst series (blue) represents the false rejection rate (FRR) of legitimate
45
users attempt to authenticate. In this case the error rate is based on the number of times
users were falsely rejected. The second series (red) represents the false authentication rate
(FAR) of naive impostors who have not learned the phrase. The simulated attempts at
authentication were gathered from the nal phase of the experiment, where each groups
acts as an impostor for another group. The error rate is the number of falsely authenticated
impostors. The nal series (yellow) represent the false authentication rate of learned im-
postors. In this case the simulated attempts were gathered from other volunteers inside
the same group as the volunteer being tested.
As expected, increasing the threshold decreases the FRR but increases the FARs. The
optimal balance is where the sum of both the FRR and the FAR are as low as possible.
However, this point varies depending on the kind of impostor. When the impostor is naive
and has not practiced entering the phrase the optimal threshold is around 74, with 6.6%
legitimate users falsely rejected and 10% naive impostors falsely authenticated. However,
if the impostor has learned the passphrase, then the optimal threshold is more strict, av-
eraging around 52. In this case 17% of legitimate users are falsely rejected and 16% of
trained impostors are falsely authenticated. This performance dierence between the cal-
ibre of impostor is to be expected. A more interesting trend is that as the threshold is
increased, the error rate for naive impostors seems to increase linearly, but the error rate
for trained impostors seems to follow a logarithmic curve. As this was surprising, we ran
the simulation again with higher threshold values. The resulting graph displayed in gure
4.9 shows that the two false authentication rates do in fact both follow a curve, and that
they both eventually converge.
Figure 4.9: A graph showing false authentication rates for naive and trained impostors
converging
46
4.3.2 Self-organising Maps
Self-organising maps are our second technique for exploring the usability of signatures.
Given that SOMs are non-linear and allow unsupervised modelling, they contrast nicely
with t-test authentication and give us a qualitatively alternative approach. As SOMs pre-
serve the topological properties of their input, assessment of our data shall performed by
visualising the output of each SOM. Like the t-test authentication, we shall only include
usable temporal signatures (using the same criteria).
To get the best results with a SOM, there are several parameters that must be tuned.
The two most important of these are the learning rate, and the initial neighbourhood
size. So the rst goal was to nd good values for these parameters. It was found that
increasing the learning rate increased the density of clusters in the map and increasing
the initial neighbourhood size increased how spread out the map was within the available
space. It was determined that depending on the input data, there were two combinations
of parameters that seemed to produce particularly good results: learning rate: 0.8, initial
neighbourhood size: SOMSize/4 and learning rate: 0.5, initial neighbourhood size: SOM-
Size/3. Because these parameters could be pre-tuned in a production environment, and
for the sake of brevity, we shall present each SOM with its most optimum parameters. All
SOMs in this section were trained over 1000 iterations.
In our visualisation each node in the maps network is represented as a rectangle. Each
lled in rectangle represents a particular passphrase entry, its colour representing which
volunteer the entry belongs to. Each of these passphrase entries is placed over the node
that its digraph timings are closest to (using euclidian distance). It is worth reiterating
that as the SOM is being trained, it does not know which volunteer each entry belongs
to; it simply arranges them according to the topography of the input. Thus the horizontal
and vertical dimensions of the map do not correspond to any particular values. They are
actually an approximate 2 dimensional representation of the spacial arrangement between
the digraphs in higher order space.
Our rst SOM, shown in gure 4.10, displays the 10 dimensional travel time digraphs
of all volunteers mapped out into a 2 dimentional space. An initial look at this SOM
is fairly promising. While not all entries are located in meaningful positions and some
volunteers are scattered across the map, many of the entries do form distinguishable and
interesting patterns. For instance, entries of certain volunteers form very clear clusters,
while others occupy less dense but equally well dened spaces. This implies that to be
useful, a collection of temporal signatures doesnt have to all be clustered around a central
point. They can in fact vary to a greater extent than one may imagine and still remain
suitable for authentication as long as they form contiguous regions in hyperspace. In other
words, the user can vary their input as long as the variations collectively form a gradual
continuum and are not extremely dierent. This is more apparent in the SOM of the
digraphs for passphrase group zmlntdzqdmg, pictured in gure 4.11, where one can see
47
several dierent volunteers forming well dened, but not circular regions.
Figure 4.10: Self-organising Map of the digraph times from all volunteers
48
Figure 4.11: Self-organising Map of the digraph times from volunteers in group zmlnt-
dzqdmg
49
Figure 4.12: Self-organising Map of the digraph times from volunteers in group fhtvlqwlesf
50
Figure 4.13: Self-organising Map of the digraph times from volunteers in group vrfpebvvewe
The SOM of group fhtvlqwlesf shown in gure 4.12 demonstrates reasonable separation
between the dierent volunteers. On the other hand the SOM of group vrfpebvvewe, shown
in gure 4.13, demonstrates the weakest patterns out of all three phrase groups. In partic-
ular, volunteer 9 has signatures that fall right across the entire map. This is surprising as
cross referencing back to gures 4.5 and 4.6 indicates that this volunteer learns to type the
phrase quickly and enters it in a very stable fashion. One explanation is that their entry
of the passphrase is in fact too perfect and lacks any idiosyncratic features. This would
indicate that some users are inherently more easy to identify than others.
In conclusion, SOMs demonstrate very promising results. Not only do they provide a
robust way for visualising higher order data, but the patterns shown are clear enough to
consider maps worthy of solving the identication problem. Furthermore, the output of
SOMs could be used as the input for more conventional machine learning techniques such
as k-means clustering and neural networks. We shall explore these possibilities in a later
section.
51
4.3.3 Dwell Timings
Thus far the focus of this investigation has mainly been on the travel timings between
keystrokes, known as digraphs. One feature of the temporal signatures that we have ne-
glected up to this point is the duration each key is held down for, known as dwell. Because
some existing work has asserted that the dwell times carry important biometric informa-
tion [13], we shall now take the time to explore this feature. To do this we shall repeat our
investigation of t-test authentication and self-organising maps, but this time, rather than
extract inter-key digraph times from the temporal signatures, we shall instead extract the
key dwell times. Other than this, we require no further adaptations to our methods, as
both digraphs timings and dwell timings are simply number vectors.
Figure 4.14 shows the performance of t-test based authentication when using dwell times
as feature vectors. As before, we consider the FRR of legitimate volunteers and the two
FARs of naive and trained impostors. From this graph we can see the dwell times do
indeed seem to carry enough idiosyncratic information to give moderate performance for
authentication. A threshold of 52 produces a FRR of 30% and yields 10% for both FARs.
However, unlike digraphs there is only a marginal dierence between the FARs of the
two dierent calibre of impostors, with naive impostors being only marginally easier to
correctly reject. In other words, learning the phrase seems to have little eect on the dwell
times. Furthermore, it seems much easier to exclude naive impostors with digraphs than
it is with dwells.
Figure 4.14: A graph showing how changing the threshold aects authentication rates when
using dwell times as features
52
Figure 4.15 shows a SOM that has been trained on the dwell times. Our technique is
the same as before, the only dierence being that instead of using 10 dimensional digraph
vectors, we are instead using 11 dimensional dwell timing vectors. As might be expected
from the results of the t-test authentication, dwell timings do indeed seem to carry enough
distinguishing information for a SOM to arrange the entries of many of the volunteers into
discrete regions. The cohesiveness of these regions is not perfect, but the patterns shown
in the map are far from random.
Figure 4.15: Self-organising Map of the dwell times from all volunteers
While this is a good indicator that dwell time is a useful feature, we feel that due to
the nature of this experiment, perceived idiosyncratic features and distinguishable patterns
may in fact be attributable to the mechanical variations between the dierent keyboards
used by our volunteers. Variations in key thicknesses and key springing mechanisms could
well aect the dwell time captured by our experiment. However, as was seen in gure 4.14,
there is a very slight variation between the error rates of learned and naive impostors when
using dwell timings, so there is a still a possibility dwell times are useful.
53
5
Discussion
5.1 Evaluation of Hypotheses
5.1.1 How people intuitively fragment a passphrase
We hypothesised that due to variations in human chunking abilities, the way in which
volunteers fragment passphrases would vary between individuals. The evidence we col-
lected goes some way in supporting this. What was surprising however, is the number of
volunteers who did not perform any fragmentation at all. However, although our dataset
is small, there were a variety of dierent fragmentation patterns among the volunteers who
decided to fragment it. It was even apparent that some volunteers had particular strate-
gies, such as always dividing the phrase into two roughly equal size sections. Because we
built the experiment specically to capture the way in which the user switches between
looking the phrase and entering the phrase we were able to capture behaviour like this
particularly easily.
We also mentioned the possibility of using these behaviours as part of a users biometric
prole. While our data does show some volunteers with consistent behaviour many of
the volunteers had very dierent behaviour on each passphrase they entered. To be more
conclusive, we would need a dataset that captured volunteers being introduced to many
more phrases.
In terms of the cognitive science of chunking, there are two interesting trends in the data.
Firstly, it appears that volunteers prefer to digest the passphrase in as few a fragments
as possible. Given that fragments are great indication of the data is divided into chunks
in working memory, this implies that people have an aversion to needlessly creating extra
chunks in working memory. The second observation is that volunteers prefer chunking
patterns that create even sized chunks. This suggests that there is some cognitive mech-
anism for eciently dividing up data. This is precisely the kind of phenomena explored
by Simon [14] who quotes that we can increase the amount of information that working
memory contains simply by building larger and larger chunks, each chunk containing more
information than before.
To conclude, the hypothesis seems to be partially supported. However, while providing
insights that help us understand the passphrase learning process, our results indicate that
fragmentation patterns carry limited practical use for keystroke authentication.
54
5.1.2 When learning to type a passphrase, how long does it take
an individual to produce a stable signature?
We hypothesised that over the duration of the experiment the temporal signatures pro-
duced by volunteers would converge to stable values. We feel that our data supports this
hypothesis. In particular, our use of charts to display the progression of digraph times
across the span of the experiment showed series lines that tend towards stable values. We
also veried that, for a random alphabetic phrase, it takes volunteers somewhere between
20 and 30 entries to reach a point at which entry is performed in a stable fashion. We
conrmed that during this process the average entry time follows a logarithmic learning
curve.
When looking at the visualisation which depicted the volunteers observation behaviour,
one could distinguish three distinct phrases. Initially, volunteers look back and forth be-
tween prompt and entry very often. We assume that at this stage the passphrase mostly
exists in working memory. Next, volunteers will progress to the point where they have
mostly learned the passphrase, but still need to be prompted to assist in recalling it. This
indicates that at the very least, at this stage, the passphrase is encoded in short term
memory. Finally, towards the end of the experiment volunteers reach a point at which the
passphrase enters long term memory and typing it becomes and implicit motor task. One
limitation with the design of the experiment is that because it will always prompt volun-
teers up until a certain point, it is hard to detect the point at which volunteers progressed
to this nal stage.
Although more subtle than other patterns in the data, there was some evidence to sug-
gest that the way in which a user initially fragments a phrase sets a precedent for how they
will eventually type the passphrase. This is something that merits further investigation.
Out of all the people who took part, volunteer 14 was particularly fast and consistent
in how they learned to type the passphrase. In particular, they managed to complete
entry of the primary passphrase without ever having to fragment it. After the experiment
we talked to the volunteer and they claimed to have lots of experience working in retail,
where they have to frequently memorise the random sequences of numbers and letters on
barcodes. While anecdotal, this supports the idea that certain skills involving working
memory can be improved by practice. In this instance, its quite possible that volunteer 14
has developed a learning schema that facilitates the memorisation of short alpha-numeric
codes.
In conclusion, there is fairly good evidence for progressive formation of stable typing
patterns. A glance at the digraph charts shown in the appendix reveals that many (but
not all) volunteers show this progression. Although the data is noisy there are still distinct
patterns that show digraphs converging on certain values. Thus we consider this hypothesis
to be supported reasonably well.
55
5.1.3 Keystroke-biometric authentication
We initially hypothesised that it would be possible to use temporal signatures to distin-
guish between individuals. As we researched and developed our techniques we realised that
this task can be split into two specic sub-problems - authentication and identication.
The rst problem is that of authenticating a user. This involves making a binary decision
as to whether a candidate signature belongs to the prole of the user it claims to be. We
demonstrated that an adaption of students T-test showed to be reasonably eective at this
task. In doing so we noted that there is a trade o between the rates of falsely authen-
ticated impostors and falsely rejected users. This is a parameter that could be tuned as
desired in a real life system.
We also showed that the T-test is a lot better at detecting naive impostors than it is
trained ones. Much of the T-tests ability to catch out naive impostors comes from the fact
that they enter the passphrase in a slow and unsure way. When it comes to distinguishing
between users who have learned the same phrase, it is much harder as the only features to
work with are subtle idiosyncrasies and small nuances. While in many ways this a truer
test of distinguishing between people, in a practical authentication scenario it is unlikely
that the impostor will have really learned the phrase. Then again, if keystroke biometric
authentication systems gained popularity then this would not necessarily be the case.
The second problem is the task of identifying a user by matching a candidate temporal
signature onto the correct owner in a collection of known user proles. We demonstrated
that self-organising maps are reasonably capable of using digraphs to divide up volunteers
into zones and clusters in the output space.
While our investigation into this stopped here, we think that identication should be
possible by nding the best matching unit for a candidate passphrase, and assign it an
identity based on the owner of the signatures that surround the BMU. In the regions of
the SOM that are populated by more than one user this technique should be able to reect
this uncertainty by reporting that a signature matched more than one user.
With renement SOMs could be used in conjunction with other machine learning meth-
ods. For example, the output of a SOM could be used as the input for a clustering technique
such as k-means clustering. Ideally, centroids could converge on clusters and be able to
classify regions of the search space. However, some of the patterns are not circular, in
which case the use of k-means presents a bias mismatch. One method that may have
a more appropriate bias is back propagating neural networks, which are great at nding
striped or blob shaped regions.
The most uncertain aspect of this section of the experiment was our investigation into
the use of dwell timings for authentication. Our initial response was that they provide
56
a useful feature which doesnt resort to exploiting whether or not someone has learned a
phrase. However, after some consideration we came to the conclusion that this dierence
could well be caused by the dierences in the manufacture between volunteers computer
keyboard. This is a variable which was left uncontrolled because we wanted to test our
authentication mechanisms with realistic data. The downside of the tradeo is that we
cant conclude anything specic about this part of the experiment. If the eld of keystroke
biometric authentication is to be successful, then the eect of computer keyboards on tem-
poral signatures is an investigation that must be carried out.
In conclusion, we feel that our hypothesis is supported from both the angles of authen-
tication and identication. Our novel use of SOMs has shed light on several ideas which
are worthy of further investigation.
5.2 Limitations
Although our experiment was largely successful, there were several miscellaneous elements
of our work that were limiting or could have been improved upon:
Firstly, while random alphanumeric phrases are good for ensuring that every vol-
unteer gets presented with a phrase that they have not seen before, in a realistic
situation we could expect to see a much wider variety of passwords. However, now
that we have random alphabetic phrases as a baseline for comparison we could im-
prove the applicability of our conclusions by repeating the experiment with a variety
of more realistic password phrases.
While our data was sucient for analysis, ideally the experiment would have lasted
for longer and involved more volunteers. This would have allowed us to be more
condent about our hypotheses, especially to our deductions relating to chunking.
However, due to the time constraints on this project this was not possible.
As discussed during the analysis of our hypothesis, our work has highlighted several
opportunities for further development and research. In particular, it was regrettable
that we did not manage to further investigate self-organising maps as they show a
lot of potential for keystroke identication.
In terms of the experiment software itself there were some problems with volunteers
being confused. These were all small issues, such as ambiguous fonts, or slightly
unclear instructions. We managed to x this out by keeping in touch with the
volunteers, but are things to correct if the software was ever to be used again.
We have already explained the limitations surrounding the use of dwell timings. Had
this not been as problematic, we would have explored the combining of digraph and
key-dwell timings.
57
5.3 Conclusion
In terms of the scope of our results there are several elements of our work that apply to the
wider eld. For instance, we demonstrated how the eectiveness of using digraph times for
authentication changes depending on the calibre of impostor. This is a practical insight
that is applicable to most work in keystroke authentication.
Another element of our work that supports the wider context of this eld is our demon-
stration of SOMs. We showed that SOMs are a promising and novel way of identifying
users. This is notable because unlike many existing techniques that have been applied to
keystroke identication, SOMs do not require supervised training. Yet the fact that SOMs
managed to cluster typing data veries that the idiosyncrasies in users typing are enough
to distinguish between individuals.
Although much of the focus of our report was to investigate keystroke authentication
our results support several established theories in cognitive science. In particular, our
work that explored how users fragment passphrases ts in well with the ideas surrounding
chunking. This means that our deductions in this section should be applicable to many
other similar memorisation tasks.
To conclude, we have created robust web software that allows investigations into typing
dynamics to be carried out remotely across the internet. In terms of analysis we feel the
we have demonstrated a fresh approach to investigating keystroke analysis that blends
together statistics, data visualisation and machine learning.
58
Bibliography
[1] R Stockton Gaines, William Lisowski, S James Press, and Norman Shapiro. Au-
thentication by keystroke timing: Some preliminary results. Technical report, DTIC
Document, 1980.
[2] John Leggett, Glen Williams, Mark Usnick, and Mike Longnecker. Dynamic iden-
tity verication via keystroke characteristics. International Journal of Man-Machine
Studies, 35(6):859870, 1991.
[3] Marcus Brown and Samuel Joe Rogers. User identication via keystroke characteristics
of typed names using neural networks. International Journal of Man-Machine Studies,
39(6):9991014, 1993.
[4] Salil P Banerjee and Damon L Woodard. Biometric authentication and identication
using keystroke dynamics: A survey. Journal of Pattern Recognition Research, 7:
116139, 2012.
[5] Marino Tapiador and Juan A Sig uenza. Fuzzy keystroke biometrics on web security. In
IEEE Workshop on Automatic Identication Advanced Technologies (AutoID 1999),
pages 2829. Citeseer, 1999.
[6] R. S. MCLEAN and L. W. GREGG. Eects of induced chunking on temporal aspects
of serial recitation. Journal of experimental psychology, 74(4):455459, 1967.
[7] Alison K Banikowski and Teresa A Mehring. Strategies to enhance memory based on
brain-research. Focus on Exceptional Children, 32(2):116, 1999.
[8] Louis E Yelle. The learning curve: Historical review and comprehensive survey. De-
cision Sciences, 10(2):302328, 1979.
[9] Alexander De Luca, Roman Weiss, and Heinrich Hussmann. Passshape: stroke based
shape passwords. In Proceedings of the 19th Australasian Conference on Computer-
Human interaction: Entertaining User interfaces, pages 239240. ACM, 2007.
[10] William Lowe Bryan and Noble Harter. Studies in the physiology and psychology of
the telegraphic language. Psychological Review, 4(1):27, 1897.
[11] Clayton Epp, Michael Lippold, and Regan L Mandryk. Identifying emotional states
using keystroke dynamics. In Proceedings of the 2011 annual conference on Human
factors in computing systems, pages 715724. ACM, 2011.
[12] Shrijit S Joshi and Vir V Phoha. Competition between som clusters to model user
authentication system in computer networks. In Communication Systems Software
and Middleware, 2007. COMSWARE 2007. 2nd International Conference on, pages
18. IEEE, 2007.
59
[13] John A Robinson, VW Liang, JA Michael Chambers, and Christine L. MacKenzie.
Computer user verication using login string keystroke dynamics. Systems, Man and
Cybernetics, Part A: Systems and Humans, IEEE Transactions on, 28(2):236241,
1998.
[14] Herbert A Simon. How big is a chunk. Science, 183(4124):482488, 1974.
60
6
Appendices
6.1 Digraph Timing Charts
61

You might also like