Professional Documents
Culture Documents
vara
na
+
var
b
n
b
(3.1)
Where for each group: x is the mean, n is the size and var is the variation.
Usually the resulting value of a t-test is looked up in a statistical table that gives a
measure of signicance. However, in this work we shall be adapting the t-test into an
authentication mechanism that determines whether or not a candidate signature belongs
to a collection of temporal signatures that represent a user.
This authentication procedure will use the delay between pairs of key-presses, known as
digraphs, that are present in each temporal signature. To do this we take all an individual
users signatures, and extract a collection of timings for each digraph. This process is
illustrated in gure 3.9. This means that we can take in to account the variability of each
individual digraph, which is advantageous as a user may vary certain digraphs more than
others.
These collections of digraph timings form the users model that we will authenticate
upon. To perform authentication of a candidate signature, we simply use the t-test to
compare each digraph of the candidate signature with the corresponding collection of di-
graphs in the users model. The values from each t-test are summed together to give a
similarity score. If the score is below a certain threshold value, then the candidate signa-
ture is deemed as belonging to the user and it passes authentication. This process is shown
in gure 3.10.
27
Figure 3.9: A diagram showing how digraphs are collated for use in authentication
28
Figure 3.10: A diagram showing how the t-test can be adapted to perform authentication
29
To assess the suitability of this t-test authentication method, we can perform authenti-
cation attempts using the data gathered from our experiments. For each volunteer we will
simulate attempts from the volunteer in question, from impostors who have not learned the
phrase (acquired from another groups secondary phrase), and from impostors who have
learned the phrase (acquired from other volunteers in the same group). The proportion
of incorrectly authenticated and rejected signatures will measure the performance of this
technique. To get these measures for each volunteer we perform the following procedure:
Decide on a threshold value to compare against the results of the t-test.
Gather the volunteers signatures that are usable for authentication.
For each of the usable temporal signatures, exclude it such that it is not used for
modelling. We then compare the remaining signatures to the excluded signature
using t-test authentication and classify the result using the given threshold. The
number of legitimate signatures rejected gives the false rejection rate.
Gather the same number of naive impostor signatures as there are usable signatures,
and compare each naive impostor signature to the collection of usable signatures
using t-test authentication. As before classify with the threshold. The number of
signatures accepted gives us the false authentication rate for naive impostors.
Perform the same with impostors who have learned the phrase to get the false au-
thentication rate for trained impostors.
By averaging the false rejecting and false authentication rates, we can get a clear idea of
the performance of t-test based authentication for that given threshold value. Then by
repeating this process and varying the threshold value, we can experimentally determine
the optimum threshold.
3.4.2 Using Self-organising Maps for Identication
Self-organising maps (SOM), invented by Teuvo Kohonen, provide a way to represent mul-
tidimensional data in a lower number of dimensions while still preserving the topological
layout of the data. SOMs are notable for being an unsupervised learning algorithm. Instead
of using target values or categories to guide the learning process, it uses the structures and
relationships within the input data. Input data simply consists of a collection of vectors
with uniform length.
30
Figure 3.11: A diagram showing the components of a self-organising map.
As shown in gure 3.11, a SOM is comprised of a lattice of nodes, usually in two di-
mensions. Each node holds a weight vector of the same dimension as the input vectors.
Initially all the weights in the network are randomised and the following steps are iteratively
applied:
1. A random vector is chosen from the input data.
2. Every node is compared to this input vector using the euclidean distance between
the nodes weights and the input vector. The node which is most like the given input
is marked as the best matching unit (BMU)
3. The neighbourhood of the BMU calculated. This is a collection of surrounding nodes
that fall within a certain distance from the BMU. This distance - the radius of the
neighbourhood - is a value that starts large but decreases with each iteration.
4. Each of the nodes in the neighbourhood has its weights altered to make them more
like the input vector. Nodes that are closer to the BMU will have their weights
altered to a greater extent. The inuence of the BMU is shown in gure 3.12.
31
Figure 3.12: A diagram showing the inuence of a BMU within its neighbourhood.
For time step t, the radius r is calculated using the following formula:
R(t) = R
0
exp
(3.2)
is a time constant that is calculated from i, the total number of iterations to perform,
and n the radius of the lattice:
=
i
log(n))
(3.3)
When shifting a nodes weight towards an input vector, the amount adjusted is a pro-
portion of the dierence between the input and the weight. This proportion is known as
the learning rate. To make this relationship follow a gaussian curve, the inuence of the
distance, , at a given point in time, is modelled as:
(t) = exp
dist
2
2R
2
(t))
(3.4)
As we have just discussed, the learning rate decreases the further away it is from the
BMU. However, this learning rate, L, also decreases with respect to time much in the same
way the radius does:
L(t) = L
0
exp
(3.5)
32
Taking all of this into account, for a input vector, V , the new value of a nodes weight
vector is calculated as:
W(t + 1) = W(t) + (t)L(t)(V (t) W(t)) (3.6)
One the map has been trained, it can then be visualised. Each input vector can be
superimposed on the map, at the position of its best matching unit, revealing the structure
of the data.
33
4
Results
Now that we have collected some experimental data we shall analyse it. Each of the
following sections will investigate one of our three hypotheses. As our volunteers were
eectively users of our software, we shall use the term volunteer and user interchangeably.
4.1 Passphrase Fragmentation
The rst of our three hypotheses concerns the way in which a volunteer initially frag-
ments a passphrase into sections. To explore this we need a way to visualise the patterns
present in the rst few experiments of each volunteer in such a way that they can be ex-
amined and compared. The visualisation in gure 4.1 shows our attempt at creating such
a visualisation.
Figure 4.1: Visualisation showing the fragmentation patterns present when volunteers are
confronted with new phrases
34
It shows the fragmentation patterns of the rst four entries for both of the two phrases,
for all volunteers. Each rectangle represents a passphrase entry. The vertical lines indi-
cate points at which the volunteer stopped entry to look at the passphrase. The spatial
position of each line indicates at what point in the entry this occurred, where the leftmost
side of the rectangle is the beginning of the passphrase, and the rightmost the end. Thus
the collection of lines inside a rectangle represents the fragmentation pattern of that given
entry. When this visualisation was rst created it did not include fragmentation patterns
present in failed entries. However, sometimes the most informative patterns are displayed
in failed entries, so they were introduced but displayed using a dierent colour line.
An initial examination of this visualisation reveals that volunteers use a wide variety
of fragmentation patterns. Many entries have no fragmentation at all, while others are
split into 4 pieces. This visualisation also reveals there is some degree of consistency be-
tween how they enter both phrases. In particular it shows that fragmentation of the rst
passphrase seems to be a strong indicator that the volunteer will fragment the second
passphrase. However, some volunteers (6,7,8) perform little to no fragmentation upon the
rst phrase, but considerably fragment the second phrase. This could be due to the fact
that these volunteers had become used to the format of the experiment and were surprised
by the sudden change to a dierent passphrase, thus altering their behaviour.
While consistency between both phrases isnt perfect, there are plenty of examples where
a particular pattern is repeated in both passphrases. For instance, volunteer 15 has a dis-
tinct way of breaking the passphrase into two chunks. On the other hand, there are a
surprising number of volunteers who demonstrate very little fragmentation, so any per-
ceived consistency is simply the lack of any pattern. Interestingly, volunteers will often fail
entry one or more times, but still not resort to looking at the phrase. This is a surprising
behaviour but it unfortunately has resulted in less fragmentation patterns than was hoped
for.
However, some of the failed attempts did yield patterns. And when there is an entry that
has both patterns for failed and successful attempts, the two patterns tend to reinforce
each other. These can be seen as the purple lines on the visualisation. This demonstrates
how volunteers will repeat fragmentation strategies.
Another interesting pattern is that some volunteers demonstrate convergence of frag-
ments from one pattern to the next. This is in the sense that two separate fragments in
one phrase join to become a single fragment in the next. Good examples of this include
volunteers 1 and 6 on phrase one. This is revealing of how chunks of information are ma-
nipulated in memory during the learning process.
35
Figure 4.2: Graph showing the occurrence of dierent fragmentation patterns, taken from
the rst entry of each phrase
The chart in gure 4.2 shows the number of dierent fragmentation patterns present
in the initial entry of each passphrase. Given 20 volunteers who entered 2 pass phrases,
we have 40 of such patterns. Interestingly the most common pattern is no pattern at all,
accounting for 40% of all initial entries. The second most popular behaviour is to split
the passphrase into two section. Beyond this, are a variety of other patterns. Generally,
it seems that longer patterns (more chunks) are less popular, and even-size chunks are
preferred.
Some of the more unusual patterns could be due to the user getting familiar with our
experiment by exploring its functionality. If this is the case, then this is something we
should consider carefully for future experiments.
36
4.2 Learning & Stabilisation
Our second hypothesis focuses around the progression of the temporal signature from a
volunteer learning to type a passphrase. In particular, we are hoping to examine how long
it takes for them to form a stable entry, where they enter the passphrase in a consistent
way. To investigate this we shall consider entry duration, passphrase prompting behaviour
and digraph stability. As before, we do so using a mixture of visualisation and graphing
techniques.
4.2.1 Entry Duration
Because of its simplicity, time taken to enter a passphrase is a good rst measure to
investigate when considering the learning process. As we explored previously, we expect
the entry time to decrease logarithmically, following what is known as the learning curve.
Figure 4.3 shows the duration of each entry averaged across all volunteers. As is indicated
by the logarithmic trend line, the entry time does decrease following a learning curve.
Figure 4.3: The average entry time for each
One interesting feature of gure 4.3 is that it shows a increase in the average entry du-
ration at entry 20, which is the point at which volunteers are no longer shown the phrase
37
by default, and must either recall it completely from memory, or elect to prompted the
passphrase once more. Interestingly, it looks like the pattern from entry 20 onwards might
follow another smaller exponential curve. It is likely this represents volunteers adaption
to the change in the experiment.
Previously we mentioned that giving groups of volunteers dierent passphrases to learn
would allow us to see if the rate of learning is consistent between dierent random passphrases
of the same length. Figure 4.4 shows how the average duration of entry progresses for each
of the three phrase groups. On the whole, it seems to substantiate this idea as the three
series follow similar logarithmic curves. However, there is an interesting discrepancy for
the group fhtvlqwlesf on the second entry. Investigating this, it turns out there is an
anomalous signature, where one volunteer began entry, but for reasons unknown paused
for over 60 seconds before completing it.
Figure 4.4: Average entry time for each group
In terms of nding the point at which passphrase entry is performed in a stable fashion.
it seems that from approximately the 25th entry onwards, the entry duration is reasonably
consistent. While this is encouraging, one could argue that extrapolation of the trend line
indicates that the duration will keep reducing. However, due to the nature of logarithmic
curves, its questionable as to whether any reduction in duration beyond this point will be
observable over noise in the data.
38
4.2.2 Passphrase Observation
Another way in which we can explore volunteers learning process is by looking at how and
when they decide to observe the passphrase. As previously explained in detail, volunteers
can switch between entering and viewing the phrase, but not do both at the same time.
Initially the experiment defaults to displaying the passphrase, but from entry 21 and on-
wards, the experiment will default to not showing the passphrase and volunteers will have
to choose if they wish to reveal it again.
Figure 4.5: Visualisation of when and how volunteers elected to view the passphrase
The way in which volunteers view the passphrase is visualised in gure 4.5. This visu-
alisation is advantageous as it it allows us to follow each volunteers progress through the
39
dierent stages of the learning process. In this visualisation each row represents a volun-
teer and each column represents a particular entry number. As there were two entries per
session, the entries are paired together in twos. Each rectangle indicates a single entry of
the passphrase. The colour of this rectangle represents how the volunteer decided to view
the passphrase. The signicance of each colour is as follows:
Red indicates they saw the passphrase at the beginning, returned to look at it again
after they started typing.
Orange indicates they looked at the passphrase at the start and only the start.
Pink indicates they looked at the passphrase, but not at the start, only after they
had begin typing it.
Green indicates they completed entry without looking at the passphrase.
The blue line between entries 20 and 21 represents the point at which volunteers are no
longer prompted by default. It is worth noting that it is impossible for volunteers to get a
green mark before this point, as they will always be initially shown the passphrase.
There are several interesting trends displayed in this visualisation. The most general
pattern is the progression from frequently looking at the phrase, to looking at it periodi-
cally, to recalling it completely from long term memory.
Just from eyeballing the data, it seems that a little over half the volunteers managed
to really commit the phrase to memory. This can be judged by looking for consecutive
sequences of green boxes after approximately entry numbers 25 to 36. Interestingly, those
that repeatedly refer back to the phrase at around this point progress from red to orange
boxes (in the rst half of the experiment) later than other volunteers. This implies that
they are further back in their learning progress and potentially learn this task at a slower
rate.
One volunteer is particularly anomalous as they never really seem to learn the phrase.
They consistently refer back to the prompt and only begin to show signs of learning it
near the very end. A possible explanation is that this volunteer simply formed a habit of
relying on being prompted, and felt little need to learn the passphrase.
Another common pattern displayed in this visualisation stems from the fact that each
session has two entries. Typically, in the rst entry, the volunteer will look at the phrase
to remember it, and in the following second entry, they will enter it straight from working
memory. This pattern is particularly prominent in session 11, entries 21 and 22, where
there is an orange, green pattern.
Looking at this visualisation reveals a possible limitation in the design of this experi-
ment. This is the fact that volunteers may have entered the passphrase without looking
40
at it (represented by a green box), before entry 21, but because the experiment always
showed them the phrase in sessions before this point, they were not oered the chance to.
In conclusion, this visualisation shows that most volunteers learn the passphrase in some-
where between 20 and 25 entries. This is consistent with gure 4.3, which shows entry time
stabilising at around this point.
4.2.3 Travel-time Stability
While duration gives us a reasonable indicator of stability, it overlooks any of the nuances
of the temporal signatures. A better way is to examine the digraphs of the travel times
between keys. To do this we came up with charts that show these digraph times for each
volunteer. The charts for all volunteers can be found in the appendix, but four particularly
interesting charts are shown in gures 4.6 and 4.7. These charts can be read as follows:
Each series represents a certain digraph.
Each point signies that digraphs time in a certain temporal signature.
A points position on the y-axis signies the time delay (gap) between the two letters
in that digraph.
A points position on the x-axis indicates which entry number this digraph belongs
to. Therefore if you added all the times at a single x axis position, you would get
the duration of that entry. (minus dwell times)
Each chart is scaled to 3000ms. This is because the dynamics we are interested in
are sub 3000ms and also allows us to compare dierent volunteers easily.
These charts are informative, because if a series line is straight, it implies that the di-
graph it represents is stable. Conversely, an uneven series line indicates instability.
Looking at gure 4.6, we can see that the digraphs times for volunteer 5 are initially very
unstable. There are many high peaks on the chart, which are are good indicators for where
the passphrase was broken into fragments. For this volunteer, entry does not become par-
ticularly stable until about signature number 25. Interestingly, beyond this point digraph
5 is consistently larger than the others, a promising feature for identication. Volunteer 9,
also show in gure 4.6 contrasts with volunteer 5. In particular, it is comparatively much
stabler and does so from a much earlier point. In terms of identifying features, digraph 2
seems to be consistently longer than the other digraphs.
Figure 4.7 juxtaposes two volunteers with very dierent temporal signatures. Volunteer
11 forms a stable signature very quickly and has digraph timings that are remarkably
consistent. On the other hand, the digraphs in the signatures from volunteer 15 are com-
paratively erratic. Volunteer 15 never really seems to learn the phrase, and nearly always
41
enters it in two chunks.
In conclusion, these charts indicate that after a certain number of entries, the digraph
timing converge downwards towards reasonably stable values. An interesting trend is that
digraphs that are originally on fragment boundaries (where the volunteer stops entry and
refers to the passphrase) tend to remain longer than other digraphs even when the phrase is
learnt and digraphs are relatively stabilised. In other words, the way in which a volunteer
initially fragments a passphrase seems to set a precedent for how they will eventually learn
to type it. Examples of digraphs that display this pattern include digraph 5 for volunteer 5,
digraph 2 for volunteer 9 and digraph 4 for volunteer 11. Generally, it seems that digraphs
inside original fragment boundaries are consistently small, while digraphs that are between
these fragments tend to be proportionally longer. We shall explore how this relates to
chunking in a later section.
42
Figure 4.6: Digraph stabilities for volunteers 5 & 9
43
Figure 4.7: Digraph stabilities for volunteers 11 & 15
44
4.3 Identication & Authentication
The last of our three hypothesises is concerned with the extent to which we can use the
idiosyncrasies and nuances in a volunteers typing behaviour to distinguish them. We shall
investigate two very dierent techniques, students t-test and self-organising maps.
4.3.1 T-test Authentication
As was explored in the previous section, most people do develop a typing behaviour that
produces a stable temporal signature. Our rst method of determining the usefulness of
these signatures is Students t-test. To correctly model each user, we had to devise a way
of gathering usable signatures. It was found that signatures which the volunteer typed
in one go without referring to the prompt, and do not contain any digraphs greater than
1500ms, were suitable for this purpose. It is worth noting that one volunteer was dropped
from this section of the experiment for having too few usable signatures.
With the usable signatures from each volunteer, a series of authentication attempts were
simulated to nd the optimal threshold for the t-test. Figure 4.8 shows the results of these
simulations.
Figure 4.8: A graph showing how changing the authentication threshold aects authenti-
cation rates.
The three series on the graph show the average error rates for three distinct authentica-
tion scenarios. The rst series (blue) represents the false rejection rate (FRR) of legitimate
45
users attempt to authenticate. In this case the error rate is based on the number of times
users were falsely rejected. The second series (red) represents the false authentication rate
(FAR) of naive impostors who have not learned the phrase. The simulated attempts at
authentication were gathered from the nal phase of the experiment, where each groups
acts as an impostor for another group. The error rate is the number of falsely authenticated
impostors. The nal series (yellow) represent the false authentication rate of learned im-
postors. In this case the simulated attempts were gathered from other volunteers inside
the same group as the volunteer being tested.
As expected, increasing the threshold decreases the FRR but increases the FARs. The
optimal balance is where the sum of both the FRR and the FAR are as low as possible.
However, this point varies depending on the kind of impostor. When the impostor is naive
and has not practiced entering the phrase the optimal threshold is around 74, with 6.6%
legitimate users falsely rejected and 10% naive impostors falsely authenticated. However,
if the impostor has learned the passphrase, then the optimal threshold is more strict, av-
eraging around 52. In this case 17% of legitimate users are falsely rejected and 16% of
trained impostors are falsely authenticated. This performance dierence between the cal-
ibre of impostor is to be expected. A more interesting trend is that as the threshold is
increased, the error rate for naive impostors seems to increase linearly, but the error rate
for trained impostors seems to follow a logarithmic curve. As this was surprising, we ran
the simulation again with higher threshold values. The resulting graph displayed in gure
4.9 shows that the two false authentication rates do in fact both follow a curve, and that
they both eventually converge.
Figure 4.9: A graph showing false authentication rates for naive and trained impostors
converging
46
4.3.2 Self-organising Maps
Self-organising maps are our second technique for exploring the usability of signatures.
Given that SOMs are non-linear and allow unsupervised modelling, they contrast nicely
with t-test authentication and give us a qualitatively alternative approach. As SOMs pre-
serve the topological properties of their input, assessment of our data shall performed by
visualising the output of each SOM. Like the t-test authentication, we shall only include
usable temporal signatures (using the same criteria).
To get the best results with a SOM, there are several parameters that must be tuned.
The two most important of these are the learning rate, and the initial neighbourhood
size. So the rst goal was to nd good values for these parameters. It was found that
increasing the learning rate increased the density of clusters in the map and increasing
the initial neighbourhood size increased how spread out the map was within the available
space. It was determined that depending on the input data, there were two combinations
of parameters that seemed to produce particularly good results: learning rate: 0.8, initial
neighbourhood size: SOMSize/4 and learning rate: 0.5, initial neighbourhood size: SOM-
Size/3. Because these parameters could be pre-tuned in a production environment, and
for the sake of brevity, we shall present each SOM with its most optimum parameters. All
SOMs in this section were trained over 1000 iterations.
In our visualisation each node in the maps network is represented as a rectangle. Each
lled in rectangle represents a particular passphrase entry, its colour representing which
volunteer the entry belongs to. Each of these passphrase entries is placed over the node
that its digraph timings are closest to (using euclidian distance). It is worth reiterating
that as the SOM is being trained, it does not know which volunteer each entry belongs
to; it simply arranges them according to the topography of the input. Thus the horizontal
and vertical dimensions of the map do not correspond to any particular values. They are
actually an approximate 2 dimensional representation of the spacial arrangement between
the digraphs in higher order space.
Our rst SOM, shown in gure 4.10, displays the 10 dimensional travel time digraphs
of all volunteers mapped out into a 2 dimentional space. An initial look at this SOM
is fairly promising. While not all entries are located in meaningful positions and some
volunteers are scattered across the map, many of the entries do form distinguishable and
interesting patterns. For instance, entries of certain volunteers form very clear clusters,
while others occupy less dense but equally well dened spaces. This implies that to be
useful, a collection of temporal signatures doesnt have to all be clustered around a central
point. They can in fact vary to a greater extent than one may imagine and still remain
suitable for authentication as long as they form contiguous regions in hyperspace. In other
words, the user can vary their input as long as the variations collectively form a gradual
continuum and are not extremely dierent. This is more apparent in the SOM of the
digraphs for passphrase group zmlntdzqdmg, pictured in gure 4.11, where one can see
47
several dierent volunteers forming well dened, but not circular regions.
Figure 4.10: Self-organising Map of the digraph times from all volunteers
48
Figure 4.11: Self-organising Map of the digraph times from volunteers in group zmlnt-
dzqdmg
49
Figure 4.12: Self-organising Map of the digraph times from volunteers in group fhtvlqwlesf
50
Figure 4.13: Self-organising Map of the digraph times from volunteers in group vrfpebvvewe
The SOM of group fhtvlqwlesf shown in gure 4.12 demonstrates reasonable separation
between the dierent volunteers. On the other hand the SOM of group vrfpebvvewe, shown
in gure 4.13, demonstrates the weakest patterns out of all three phrase groups. In partic-
ular, volunteer 9 has signatures that fall right across the entire map. This is surprising as
cross referencing back to gures 4.5 and 4.6 indicates that this volunteer learns to type the
phrase quickly and enters it in a very stable fashion. One explanation is that their entry
of the passphrase is in fact too perfect and lacks any idiosyncratic features. This would
indicate that some users are inherently more easy to identify than others.
In conclusion, SOMs demonstrate very promising results. Not only do they provide a
robust way for visualising higher order data, but the patterns shown are clear enough to
consider maps worthy of solving the identication problem. Furthermore, the output of
SOMs could be used as the input for more conventional machine learning techniques such
as k-means clustering and neural networks. We shall explore these possibilities in a later
section.
51
4.3.3 Dwell Timings
Thus far the focus of this investigation has mainly been on the travel timings between
keystrokes, known as digraphs. One feature of the temporal signatures that we have ne-
glected up to this point is the duration each key is held down for, known as dwell. Because
some existing work has asserted that the dwell times carry important biometric informa-
tion [13], we shall now take the time to explore this feature. To do this we shall repeat our
investigation of t-test authentication and self-organising maps, but this time, rather than
extract inter-key digraph times from the temporal signatures, we shall instead extract the
key dwell times. Other than this, we require no further adaptations to our methods, as
both digraphs timings and dwell timings are simply number vectors.
Figure 4.14 shows the performance of t-test based authentication when using dwell times
as feature vectors. As before, we consider the FRR of legitimate volunteers and the two
FARs of naive and trained impostors. From this graph we can see the dwell times do
indeed seem to carry enough idiosyncratic information to give moderate performance for
authentication. A threshold of 52 produces a FRR of 30% and yields 10% for both FARs.
However, unlike digraphs there is only a marginal dierence between the FARs of the
two dierent calibre of impostors, with naive impostors being only marginally easier to
correctly reject. In other words, learning the phrase seems to have little eect on the dwell
times. Furthermore, it seems much easier to exclude naive impostors with digraphs than
it is with dwells.
Figure 4.14: A graph showing how changing the threshold aects authentication rates when
using dwell times as features
52
Figure 4.15 shows a SOM that has been trained on the dwell times. Our technique is
the same as before, the only dierence being that instead of using 10 dimensional digraph
vectors, we are instead using 11 dimensional dwell timing vectors. As might be expected
from the results of the t-test authentication, dwell timings do indeed seem to carry enough
distinguishing information for a SOM to arrange the entries of many of the volunteers into
discrete regions. The cohesiveness of these regions is not perfect, but the patterns shown
in the map are far from random.
Figure 4.15: Self-organising Map of the dwell times from all volunteers
While this is a good indicator that dwell time is a useful feature, we feel that due to
the nature of this experiment, perceived idiosyncratic features and distinguishable patterns
may in fact be attributable to the mechanical variations between the dierent keyboards
used by our volunteers. Variations in key thicknesses and key springing mechanisms could
well aect the dwell time captured by our experiment. However, as was seen in gure 4.14,
there is a very slight variation between the error rates of learned and naive impostors when
using dwell timings, so there is a still a possibility dwell times are useful.
53
5
Discussion
5.1 Evaluation of Hypotheses
5.1.1 How people intuitively fragment a passphrase
We hypothesised that due to variations in human chunking abilities, the way in which
volunteers fragment passphrases would vary between individuals. The evidence we col-
lected goes some way in supporting this. What was surprising however, is the number of
volunteers who did not perform any fragmentation at all. However, although our dataset
is small, there were a variety of dierent fragmentation patterns among the volunteers who
decided to fragment it. It was even apparent that some volunteers had particular strate-
gies, such as always dividing the phrase into two roughly equal size sections. Because we
built the experiment specically to capture the way in which the user switches between
looking the phrase and entering the phrase we were able to capture behaviour like this
particularly easily.
We also mentioned the possibility of using these behaviours as part of a users biometric
prole. While our data does show some volunteers with consistent behaviour many of
the volunteers had very dierent behaviour on each passphrase they entered. To be more
conclusive, we would need a dataset that captured volunteers being introduced to many
more phrases.
In terms of the cognitive science of chunking, there are two interesting trends in the data.
Firstly, it appears that volunteers prefer to digest the passphrase in as few a fragments
as possible. Given that fragments are great indication of the data is divided into chunks
in working memory, this implies that people have an aversion to needlessly creating extra
chunks in working memory. The second observation is that volunteers prefer chunking
patterns that create even sized chunks. This suggests that there is some cognitive mech-
anism for eciently dividing up data. This is precisely the kind of phenomena explored
by Simon [14] who quotes that we can increase the amount of information that working
memory contains simply by building larger and larger chunks, each chunk containing more
information than before.
To conclude, the hypothesis seems to be partially supported. However, while providing
insights that help us understand the passphrase learning process, our results indicate that
fragmentation patterns carry limited practical use for keystroke authentication.
54
5.1.2 When learning to type a passphrase, how long does it take
an individual to produce a stable signature?
We hypothesised that over the duration of the experiment the temporal signatures pro-
duced by volunteers would converge to stable values. We feel that our data supports this
hypothesis. In particular, our use of charts to display the progression of digraph times
across the span of the experiment showed series lines that tend towards stable values. We
also veried that, for a random alphabetic phrase, it takes volunteers somewhere between
20 and 30 entries to reach a point at which entry is performed in a stable fashion. We
conrmed that during this process the average entry time follows a logarithmic learning
curve.
When looking at the visualisation which depicted the volunteers observation behaviour,
one could distinguish three distinct phrases. Initially, volunteers look back and forth be-
tween prompt and entry very often. We assume that at this stage the passphrase mostly
exists in working memory. Next, volunteers will progress to the point where they have
mostly learned the passphrase, but still need to be prompted to assist in recalling it. This
indicates that at the very least, at this stage, the passphrase is encoded in short term
memory. Finally, towards the end of the experiment volunteers reach a point at which the
passphrase enters long term memory and typing it becomes and implicit motor task. One
limitation with the design of the experiment is that because it will always prompt volun-
teers up until a certain point, it is hard to detect the point at which volunteers progressed
to this nal stage.
Although more subtle than other patterns in the data, there was some evidence to sug-
gest that the way in which a user initially fragments a phrase sets a precedent for how they
will eventually type the passphrase. This is something that merits further investigation.
Out of all the people who took part, volunteer 14 was particularly fast and consistent
in how they learned to type the passphrase. In particular, they managed to complete
entry of the primary passphrase without ever having to fragment it. After the experiment
we talked to the volunteer and they claimed to have lots of experience working in retail,
where they have to frequently memorise the random sequences of numbers and letters on
barcodes. While anecdotal, this supports the idea that certain skills involving working
memory can be improved by practice. In this instance, its quite possible that volunteer 14
has developed a learning schema that facilitates the memorisation of short alpha-numeric
codes.
In conclusion, there is fairly good evidence for progressive formation of stable typing
patterns. A glance at the digraph charts shown in the appendix reveals that many (but
not all) volunteers show this progression. Although the data is noisy there are still distinct
patterns that show digraphs converging on certain values. Thus we consider this hypothesis
to be supported reasonably well.
55
5.1.3 Keystroke-biometric authentication
We initially hypothesised that it would be possible to use temporal signatures to distin-
guish between individuals. As we researched and developed our techniques we realised that
this task can be split into two specic sub-problems - authentication and identication.
The rst problem is that of authenticating a user. This involves making a binary decision
as to whether a candidate signature belongs to the prole of the user it claims to be. We
demonstrated that an adaption of students T-test showed to be reasonably eective at this
task. In doing so we noted that there is a trade o between the rates of falsely authen-
ticated impostors and falsely rejected users. This is a parameter that could be tuned as
desired in a real life system.
We also showed that the T-test is a lot better at detecting naive impostors than it is
trained ones. Much of the T-tests ability to catch out naive impostors comes from the fact
that they enter the passphrase in a slow and unsure way. When it comes to distinguishing
between users who have learned the same phrase, it is much harder as the only features to
work with are subtle idiosyncrasies and small nuances. While in many ways this a truer
test of distinguishing between people, in a practical authentication scenario it is unlikely
that the impostor will have really learned the phrase. Then again, if keystroke biometric
authentication systems gained popularity then this would not necessarily be the case.
The second problem is the task of identifying a user by matching a candidate temporal
signature onto the correct owner in a collection of known user proles. We demonstrated
that self-organising maps are reasonably capable of using digraphs to divide up volunteers
into zones and clusters in the output space.
While our investigation into this stopped here, we think that identication should be
possible by nding the best matching unit for a candidate passphrase, and assign it an
identity based on the owner of the signatures that surround the BMU. In the regions of
the SOM that are populated by more than one user this technique should be able to reect
this uncertainty by reporting that a signature matched more than one user.
With renement SOMs could be used in conjunction with other machine learning meth-
ods. For example, the output of a SOM could be used as the input for a clustering technique
such as k-means clustering. Ideally, centroids could converge on clusters and be able to
classify regions of the search space. However, some of the patterns are not circular, in
which case the use of k-means presents a bias mismatch. One method that may have
a more appropriate bias is back propagating neural networks, which are great at nding
striped or blob shaped regions.
The most uncertain aspect of this section of the experiment was our investigation into
the use of dwell timings for authentication. Our initial response was that they provide
56
a useful feature which doesnt resort to exploiting whether or not someone has learned a
phrase. However, after some consideration we came to the conclusion that this dierence
could well be caused by the dierences in the manufacture between volunteers computer
keyboard. This is a variable which was left uncontrolled because we wanted to test our
authentication mechanisms with realistic data. The downside of the tradeo is that we
cant conclude anything specic about this part of the experiment. If the eld of keystroke
biometric authentication is to be successful, then the eect of computer keyboards on tem-
poral signatures is an investigation that must be carried out.
In conclusion, we feel that our hypothesis is supported from both the angles of authen-
tication and identication. Our novel use of SOMs has shed light on several ideas which
are worthy of further investigation.
5.2 Limitations
Although our experiment was largely successful, there were several miscellaneous elements
of our work that were limiting or could have been improved upon:
Firstly, while random alphanumeric phrases are good for ensuring that every vol-
unteer gets presented with a phrase that they have not seen before, in a realistic
situation we could expect to see a much wider variety of passwords. However, now
that we have random alphabetic phrases as a baseline for comparison we could im-
prove the applicability of our conclusions by repeating the experiment with a variety
of more realistic password phrases.
While our data was sucient for analysis, ideally the experiment would have lasted
for longer and involved more volunteers. This would have allowed us to be more
condent about our hypotheses, especially to our deductions relating to chunking.
However, due to the time constraints on this project this was not possible.
As discussed during the analysis of our hypothesis, our work has highlighted several
opportunities for further development and research. In particular, it was regrettable
that we did not manage to further investigate self-organising maps as they show a
lot of potential for keystroke identication.
In terms of the experiment software itself there were some problems with volunteers
being confused. These were all small issues, such as ambiguous fonts, or slightly
unclear instructions. We managed to x this out by keeping in touch with the
volunteers, but are things to correct if the software was ever to be used again.
We have already explained the limitations surrounding the use of dwell timings. Had
this not been as problematic, we would have explored the combining of digraph and
key-dwell timings.
57
5.3 Conclusion
In terms of the scope of our results there are several elements of our work that apply to the
wider eld. For instance, we demonstrated how the eectiveness of using digraph times for
authentication changes depending on the calibre of impostor. This is a practical insight
that is applicable to most work in keystroke authentication.
Another element of our work that supports the wider context of this eld is our demon-
stration of SOMs. We showed that SOMs are a promising and novel way of identifying
users. This is notable because unlike many existing techniques that have been applied to
keystroke identication, SOMs do not require supervised training. Yet the fact that SOMs
managed to cluster typing data veries that the idiosyncrasies in users typing are enough
to distinguish between individuals.
Although much of the focus of our report was to investigate keystroke authentication
our results support several established theories in cognitive science. In particular, our
work that explored how users fragment passphrases ts in well with the ideas surrounding
chunking. This means that our deductions in this section should be applicable to many
other similar memorisation tasks.
To conclude, we have created robust web software that allows investigations into typing
dynamics to be carried out remotely across the internet. In terms of analysis we feel the
we have demonstrated a fresh approach to investigating keystroke analysis that blends
together statistics, data visualisation and machine learning.
58
Bibliography
[1] R Stockton Gaines, William Lisowski, S James Press, and Norman Shapiro. Au-
thentication by keystroke timing: Some preliminary results. Technical report, DTIC
Document, 1980.
[2] John Leggett, Glen Williams, Mark Usnick, and Mike Longnecker. Dynamic iden-
tity verication via keystroke characteristics. International Journal of Man-Machine
Studies, 35(6):859870, 1991.
[3] Marcus Brown and Samuel Joe Rogers. User identication via keystroke characteristics
of typed names using neural networks. International Journal of Man-Machine Studies,
39(6):9991014, 1993.
[4] Salil P Banerjee and Damon L Woodard. Biometric authentication and identication
using keystroke dynamics: A survey. Journal of Pattern Recognition Research, 7:
116139, 2012.
[5] Marino Tapiador and Juan A Sig uenza. Fuzzy keystroke biometrics on web security. In
IEEE Workshop on Automatic Identication Advanced Technologies (AutoID 1999),
pages 2829. Citeseer, 1999.
[6] R. S. MCLEAN and L. W. GREGG. Eects of induced chunking on temporal aspects
of serial recitation. Journal of experimental psychology, 74(4):455459, 1967.
[7] Alison K Banikowski and Teresa A Mehring. Strategies to enhance memory based on
brain-research. Focus on Exceptional Children, 32(2):116, 1999.
[8] Louis E Yelle. The learning curve: Historical review and comprehensive survey. De-
cision Sciences, 10(2):302328, 1979.
[9] Alexander De Luca, Roman Weiss, and Heinrich Hussmann. Passshape: stroke based
shape passwords. In Proceedings of the 19th Australasian Conference on Computer-
Human interaction: Entertaining User interfaces, pages 239240. ACM, 2007.
[10] William Lowe Bryan and Noble Harter. Studies in the physiology and psychology of
the telegraphic language. Psychological Review, 4(1):27, 1897.
[11] Clayton Epp, Michael Lippold, and Regan L Mandryk. Identifying emotional states
using keystroke dynamics. In Proceedings of the 2011 annual conference on Human
factors in computing systems, pages 715724. ACM, 2011.
[12] Shrijit S Joshi and Vir V Phoha. Competition between som clusters to model user
authentication system in computer networks. In Communication Systems Software
and Middleware, 2007. COMSWARE 2007. 2nd International Conference on, pages
18. IEEE, 2007.
59
[13] John A Robinson, VW Liang, JA Michael Chambers, and Christine L. MacKenzie.
Computer user verication using login string keystroke dynamics. Systems, Man and
Cybernetics, Part A: Systems and Humans, IEEE Transactions on, 28(2):236241,
1998.
[14] Herbert A Simon. How big is a chunk. Science, 183(4124):482488, 1974.
60
6
Appendices
6.1 Digraph Timing Charts
61