You are on page 1of 16

International Journal of Pure and Applied Mathematics

Volume 118 No. 5 2018, 769-783


ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version)
url: http://www.ijpam.eu
Special Issue
ijpam.eu

A keystroke dynamic based biometric for person authentication


R.Abinaya1, Dr.AN.Sigappi2
1
Research Scholar, 2Associate Professor
Department of Computer Science & Engineering, Annamalai University,
Chidambaram, Taminadu, India
Abinayamalar@gmail.com, aucse_sigappi@yahoo.com

ABSTRACT
Keystroke Dynamics is avital biometric solution for person authentication. Based upon keystroke dynamics, this paper designs and
develops an online system, collects two public databases for supporting the research on keystroke authentication, exploits the Discrete cosine
transform (DCT) to characterize the keystroke dynamics, and provides BeiHang keystroke dynamics results of Three popular classification
algorithms, Random forest classifier, support vector machine and Random tree classifier.

Keywords: Keystroke Dynamics, Biometrics, DCT, SVM, Random forest classifier, Random Tree Classifier, ROC.

I. INTRODUCTION
In today’s scenario there is a major threat to appropriation the personal as well as official data. Data which is easily nearby from
the storage device by an unauthenticated user is a major disquiet and needs to be exterminated. Dependency on computer systems and
networks has undertaken a sea change. Over the years this dependency has improved tremendously, as today computers have
conquered every field e.g. education, business, hospitality etc. Essential information such as criminal and medical records, personal
letters and bills are stored on computer devices. The main issue is the security of these systems and to ensure security there is need of
user which means confirming the facts which are given by a exact user. To ensure security there is need of user authentication.
Authentication is the method to determine that the person really is, in fact what it claims to be.
Increase in number of software and devices for hacking and cracking causes gains in unauthorized access which results in
manipulation of important data. Methods like user ID and password which is mostly used as security is now not reliable and secure
due to rapidly increase in hackers and crackers. Also this method no longer provides consistent security measures because passwords
are prone to shoulder surfing and passwords can also be hacked. To gain secure and efficient access either user must change his
password frequently or the user should use strong password (combination of alphabets, numeric and special symbols). Users do not
respect these conditions as they feel them quite strict and difficult to be applied. The solution to above said problems is keystroke
dynamics. Keystroke Dynamics is a behavioral biometric approach to enhance the computer access rights. It verifies the individual by
its keystroke typing pattern. Keystroke biometric is based on the assumption that the typing pattern of each user is unique. The
objective of this review paper is to summarize the well-known approaches used in keystroke dynamics [1].
Approaches of User Authentication:
1. Object Based: In this approach to recognize the user, appearance of a smart card or some key is required. Strong user identification
techniques like mixture of password with the tokens are needed to use them. It becomes more awkward if the users forget their details
such as PIN numbers or make errors and if the card gets locked after some number of incorrect attempts.
2. Knowledge Based: To access the physical passwords are the most commonly used tools but the flaw is that generally the individual
chooses very easy passwords that can be easily guessed or hacked by computer literate professionals, so even a hacker can appear a
genuine user.
3. Biometric Based: In this kind of a system, the user has to give a individual physical characteristic like fingerprints. This is the most
correct and suitable method. The reason is because biometrics is straight linked and dependent on the user while the password or
token can be used by some other person. This method is almost perfect as these have no loopholes such as forgetting the password or
carrying the card [1]
What is Biometrics? Traditionally, authentication measures rested upon tools such as passwords and PINs. The main defect with these
methods is that the identification of a individual is not done, but the capability to access the information demanded is calculated. But
transition is taking place in corporate sector, education sector, defence sector etc. where biometric system has been introduced. It is
comparison of information that user gives during login with a database of information that was previously recorded. It identifies the
person’s recorded informational attributes so it doesn’t depend on the person’s previous knowledge. Biometric authentication relies on
physical characteristics of the user. There are two distinct categories separating the methodology of this authentication [2]. Two
categories are: Physiological biometrics and Behavioural biometrics.
Physiological Biometrics: It illustrates those features that describe who the user is depending on the physical attributes e.g.
fingerprints, Iris and retina scanning. For this additional hardware required.

769
International Journal of Pure and Applied Mathematics Special Issue

Behavioural Biometrics: It is based on typing pattern, Voice recognition and Signature style. Behavioral characteristics can be
composed without the requirement of any extra hardware [3]. This study will focus on Behavioral biometric technique i.e. Keystroke
Dynamics.

General Description of Biometrics Technologies:


1. Fingerprints: Fingerprint systems perform the analysis of the patterns found on fingertips. User identification is achieved by
Automated Fingerprint Identification Systems (AFIS) and fingerprint recognition systems.
2. Hand Geometry: It evaluates unique hand characteristics such like length of the finger, its thickness and surface area. It’s a high
cost but highly accurate method. It is used as an entry method for secure areas of airports, hospitals and government agencies.
3. Face Recognition: Face recognition analyses unique facial characteristics of the eyes, nose, lips, etc. A digital motion detection
camera is essential to capture an image of the face. It is used for public applications such as ATMs or license verification.
4. Voice recognition these systems capture unique voice features such as pitch, tone and frequency to authenticate a user. It is used in
telephony applications which allow users to log into financial and other systems.
5. Retina scanning It is a method for identification of the unique blood vessel pattern at the back portion of the eye by directing low-
intensity infrared light through the pupil. Retinal scanning can be quite accurate. It is required that user should focus on a particular
point.
6. Iris scanning: Iris scanning analyzes the pattern of flecks on the iris, which is on the surface of the eye. It is also expensive
method.
7. Hand Signature: Signature verification analyzes the way a user signs the name. Signing features such as speed, velocity, and
pressure are considered.
8 .Keystroke Dynamics This method analyzes the way a user types on a fatal, by monitoring the keyboard input. Since the input
device is the remaining Keyboard, this approach is not exclusive. Overview of Keystroke Dynamics This method focuses on the
typing pattern of a user at a fatal and then evaluating the input identifying habitual typing rhythm pattern. Keystroke features are
usually obtained using the timing particulars of the key down or key hold or events. It is known by different names such as typing
biometrics and typing rhythms. The main advantage of using keystroke dynamics is that it does not require any extra hardware [4].
Two basic features used for keystroke dynamics are Key Hold time and Inter Key time.

Hold s RP-Latency Holdp

PP-Latency

𝑡𝑝 𝑡𝑟 𝑡𝑝 𝑡𝑟
1 1 2 2

Fig 1.Keystroke dynamics events (press and release) and three of its most popular features: Hold Time, Release-Press (RP)Latency and
Press-Press (PP)Latency.
Dwell time is the duration that a key is held down.
Flight time is the duration between pressing a key and releasing the next key
Dwell Time (DT): Dwell time also known as key hold time refers how long a key was held pressing down or the amount of
time between pressing and releasing a single key
Flight Time (FT): Flight time also known as latency time, inter key time or interval time. It refers to the amount of time
between pressing and releasing two successive keys. It involves key event (press or release) from two keys, which could be
comparable or different characters When typing a text, Flight time and dwell time are unique for each user, and is
independent of overall typing speed. This is an important factor that is directly related to user acceptability to the technology.
The technology should offer user as much comfortable and transparency as possible by not overloading user with long
inputs, memorization of complex strings, or provide huge amounts of repetitive input.
ppTime (PP): the latencies of when the two buttons (keys) are pressed;

770
International Journal of Pure and Applied Mathematics Special Issue

rrTime (RR): the latencies of when the two buttons (keys) are released;
prTime (PR): the durations of when one button (key) is pressed and the other is released;
rpTime (RP): the latencies of when one button (key) is released and the other is pressed;
vector (V): the concatenation of the four previous timing values
Objectives of this Project Work:
To use keystroke dynamics as biometric trait for person authentication and analyses the performance of various classifiers in
authentication experiments on selected biometric datasets.
Advantages of Keystroke Dynamics
Compared to written signatures typing pattern cannot be reproduced. Most security systems allow limited number of
incorrect attempts. After few incorrect attempts they block the account
Compared to physiological biometric systems such as fingerprint, Iris detection Keystroke dynamics does not require any extra
hardware. Thus implementation and deployment cost is low.

2 RELATED WORKS:
From the time the concept of Keystroke Dynamics was introduced, much advancement in the field has taken place. Several
techniques came into existence since then. They are described below in details with their strengths and limitations as follows:

N. Chourasia Nandini, 2014 has introduced an additional layer of security for the authentication of the user, Keystroke
Dynamics. The security can be implemented in android phones or any other smart phones through which internet is accessible as well
as online transactions can be performed. Data set was collected to measure the performance and evaluation procedure was developed.
A mathematical model was presented before implementation. A. K. Hussain and M. M. Alnabhan, 2014 in his study presented an
advanced keystroke authentication model increase the strength. The keystroke structure involved two components, Firstly the
deviation in typing time of user Secondly a unique user secret code. This system solved the problem of large deviations in keystroke
dynamics and improved keystroke authentication level was provided [6]. K. Senathipathi, Krishnan Batri, 2014 A comparative
analysis of Particle Swarm Optimization and Genetic algorithm has been shown bythe author with respect to keystroke dynamics. The
author select the feature selection for the proposed method uses Particle Swarm Optimization (PSO) algorithm and Typing rhythms
are the rawest form of data stemming from the interaction between users and computers then sampled and analyzed, they may
become a useful tool to ascertain personal identity. this paper, made a comparative analysis of Particle Swarm Optimization and
Genetic Algorithm with respect to Keystroke Dynamics [7].The author T.Maheswari and S. Anitha, 2014 has introduced a novel
approach for authentication that was based on biometric characteristics i.e. Keystrokes of the password entry. The author has
measured three phases namely, fingerprint, login credential based on username and password and keystroke dynamics. Two stages
were also considered that are Training and testing stage. Training stage was implemented during enrolment and testing during
verification period [8]. D. Rudrapal, S. Das, and S. Debbarma, 2014 has combined different matrices and calculation was performed
to find keystroke latency as measure of disorder. The author make authentication persuasively more secure than the usual password
used in both offline and online transactions.With the help of empirical data . this works author takes the Keystroke latency and
duration is inadequate for user authentication, which motivates exploring other matrices. combination of different matrices and
calculation of degree of disorder on keystroke latency as well as duration to generate user profile. Statistical analysis on these matrices
evaluates enhanced authentication process respectively.The result of proposed method showed FRR of 8% and FAR of 2%, which
enhanced the existing authentication result using keystroke dynamics [9] . A. Ahmed and I. Traore, 2014 presented a new approach
for the free text analysis of keystrokes that combined monograph and digraph analysis. A neural network had been used to predict
missing digraphs based on the relation between the monitored keystrokes. The heterogeneous experiment involved 53 users, the
follow up experiment in a homogeneous environment considered only 17 volunteers. The results obtained from this research were
promising with reduced error rates[10] J. V. Monaco, N. Bakelman, S.-H. Cha,and C. C. Tappert, 2014 evaluated and developed a
new classification algorithm with reduced error rate. The average and standard deviation of dwell and flight time has been used for
feature extraction. The vector difference authentication model altered a class problem into a two-class problem and model was used
for the classification procedure. The performance was characterised by Receiver Operating Characteristics curves [11]. M. Rybnik, M.
Tabedzki, M. Adamski, and K. Saeed, 2013) has used non-fixed text of various sizes for efficient user authentication with keystroke
dynamics. The approach had been tested on a small sample of users, and data was collected using web based application over Internet.
Nine individuals were participated from whom keystrokes samples were collected which corresponds to the use conditions of a
computer system in a home or small business. Each individual typed a long text twice in the five sessions of more than 250 characters,
ten samples for each person was collected in this way. The author stated that keystroke dynamics is valuable biometrics approach for
authentication of users that is achieved by using only trouble-free classifier and few keystroke features [12]. S. I. Hassan, M. M.
Selim, and H. Hala,2013 implemented a keystroke biometric system that solved the problem of variations in samples and threshold
that is adaptive in nature was considered. The proposed system was evaluated using CMU dataset and for this study, a new dataset has
also been created. Four Distance based algorithms were implemented. Standard deviation of the training samples was also considered
therefore, Manhattan including standard deviation has given accurate results [13]. A. Schclar, L. Rokach, A. Abramson, and Y.
Elovici, 2012 has provided novelty in the field of authentication of users for login. It was based on two approaches. The first approach
is called Cluster representative which used a unique user as a representative from each cluster. The second approach called Inner
Cluster representative which selects that user as a representative whose biometric profiles were the most similar to that of examined

771
International Journal of Pure and Applied Mathematics Special Issue

use [14]. Z. Syed, S. Banerjee, Q. Cheng, and B. Cukic, 2011 presented as study of habitual pattern of users to new passwords. He
stated that when entering a simple password the variation between successive trials did not exhibit much difference. There was no
significant habituation improvement in total password typing time in case of simple passwords. He also stated that in case of complex
passwords there was significant variance in habitual pattern as well as total keystroke timings [15] M. Karnan, M. Akila, and N.
Krishnaraj, 2011 discussed various feature extraction methods andclassification methods related to keystroke dynamics.There was a
clear explanation of statistical methods,pattern recognition algorithms and neural networks thatcan be used as approaches for
classification. Shortsample texts were considered for the analysis purpose.[16] A.Kolakowska, 2011 compared the performance of
Relative method, absolute method and method based on feature distribution parameters for user authentication. For training data
collection purpose a web application was created. File was used to store the details of keystroke features like key hold time and key
release time. The feature vectors that represented the key timings comprising different values. With this modification quality of the
methods were improved compared to previous methods. [17] P. S. Teh, A. B. J. Teoh, C. Tee, and T. S. Ong, 2010 proposed an
approach to give strength to password authentication system. It incorporated numerous keystroke dynamic information. To combine
keystroke dynamics information a two-layer amalgamated structure was proposed. For feature matching two methods named
Gaussian probability density function and Direction similarity measure were proposed and six fusion rules were employed when
authenticated user was not able to type in his usual speed due to some hand injury. The system obtained very good results [18]. Akash
Sanghi, Y.D.S. Arya ,2017 author proposed security system which will analyze user typing behavior and discriminate between
authentic and non authentic users. This approach provides high level of security as well as it is cost effective because no extra
hardware is required. Two features such as key hold time and key interchange time were considered. Samples from both authentic as
well as non authentic user were recorded then neural network was trained on those samples. Finally user will be authenticated on the
basis of his typing pattern [19]. Jing and Zhang [20] proposed an approach to find discriminant bands (a group of coefficients) in the
transformed space. Their approach searches the discriminant coefficients in the transformed space group by group. In this case, it is
possible to lose a discriminant coefficient placing beside the non discriminant coefficients in a group [21].

Objectives of this work:


To use keystroke dynamics as biometric trait for person authentication and analyses the performance of various classifiers in
authentication experiments on selected biometric datasets. Advantages of Keystroke Dynamics Compared to written signatures typing
pattern cannot be reproduced. Most security systems allow limited number of incorrect attempts. After few incorrect attempts they
block the account. Compared to physiological biometric systems such as fingerprint, Iris detection Keystroke dynamics does not
require any extra hardware. Thus implementation and deployment cost is low.

3 PROPOSED RESEARCH WORK


In this project, the authentication of person by keystroke dynamics by BeiHang keystroke dynamics Datasets they proposed
DCT feature vector, random forest and random tree, SVM as a classifier.
The BeiHang Keystroke Dynamics Database
We have obtained two databases by using the proposed embedded system and the online system respectively. It can be used
by researchers to test their algorithms and eventually boost the development of keystroke dynamics.There are 209 subjects involved in
building the databases. It should be noted that 10 subjects of Dataset A of Database 2 are from Dataset B of Database 2. The first
database, named BeiHang Keystroke Dynamics Database 1, is captured by the online system, and the second one, named BeiHang
Keystroke Dynamics Database 2, is collected from the embedded system. The subjects gather registration data from genuine users
used as training samples, log-in data from genuine users and log-in data from intruders. All the data are stored in text format; they can
be downloaded at.
In each folder of Database 1, the only training file contains 4 or 5 registration samples and the file name is in the format of,
say, [12345] (-regliaoxiaoying).txt meaning that this is the training file for ID being 12345 and password being [liaoxiaoyin] with
being the label of the file. All the testing files have the same format: [Year-Month-Day Hour.Min.Sec] ID(-loginPSW)_
IsGenuine_IsPostive.txt, where IsGenuine = 0 or 1 represents the data from a genuine user or an intruders; IsPostive = y or n
represents the positive data from a user or the negative data from an intruder. For example, the testing file, [2009-12-30
14.07.01]12345(-loginliaoxiaoying)_1_n.txt, indicates that the login time is 2009-12-30 14.07.01, ID is 12345, PSW is liaoxiaoying,
and it is negative data from an intruder.
The file names in Database 2 have been simplified. The folders are named as PSW or the time when the data were collected.
In the folder,[.txt] stores genuine user registration data. The entire testing files are in the form of time-index_IsGenuine_ IsPostive.txt.
The BeiHang Keystroke Dynamics Database 1 includes 1902 test samples and 477 training samples from 117 subjects. The whole
Database 1 is divided into two subsets, Dataset A and Dataset B, collected from two different environments. Dataset A was collected
in Internet Cafe. It contains 49 subjects, 212 training samples, 157 testing samples from genuine users and 996 testing samples from
intruders, as shown in Table 1. The developed commercial system was embedded into the login system of an online application. In
Database 1, Dataset B was collected online in a university lab. It contains 68 subjects, 265 training samples, 214 testing samples from
genuine users and 535 testing samples from intruders. The BeiHang Keystroke Dynamics Database 2 was collected by the embedded
system, which contains 5089 test samples and 478 training samples from 92 subjects. Dataset A and B in Database 2 are released for
research purpose. Dataset A of Database 2 contains 52 subjects, 228 training samples, 717 testing samples from genuine users and
1468 testing samples from intruders. Dataset B of database 2 contains 50 subjects, 250 training samples, 1103testing samples from

772
International Journal of Pure and Applied Mathematics Special Issue

genuine users and 1801 testing samples from intruders. The details are given in Table 1. It is worth noting that there are 10 subjects
appear both in these two subsets. Which contain data of stable typing rhythms?

Table 1: Description of dataset A and dataset B


DATASET A DATASET B
Number of inducers 816 Number of inducers 365

Number of Training 417 Number of training 685


Number of Users 198 Number of users 551
Total 1428 Total 1601

All the data in these dataset are original collected , without any manual modification .Generallya password is represented by
following stream P1,R1,P2,R2,...,Pn,Rn,where p1 and r1 represented as the press and release time of the ith keystroke of a password
the meaning of different files are shown by their file names.

Database Access

To download the databases for research purpose, one can visit http://www.vmonaco.com/keystrokes The BeiHang keystroke
Dynamics -datasets or send an email to the corresponding author.

3.1 Benchmark Algorithms

The framework of our Keystroke Dynamic System is shown in Fig. Feature extraction and classification algorithm are the
main components and are discussed in detail in the following sections.

Fig: 2 Framework of the Keystroke Dynamics System

3.1.1 Feature Extractions


Suppose a password is represented by the following sequence:
P1, R1, P2 ,R2 ,…, Pn , Rn (1)
Where represent the press and the release time of the ith keystroke of a password
The elements of the feature vector extracted from the original keystroke information are classified into two categories:
dwelling time and flight time. The dwelling time is calculated by Ri- Pi, and the flight time by Pi-Ri-1.
Therefore, the extracted feature from the original sequence is represented as:
I= (R1-P1 ,P2-R1 ,R2- P2,...,Pn-Rn- 1 ,Rn-Pn). (2)
The above feature is also called the original feature. The number of the registration samples collected in the training
procedure is 4 or 5..
In signal processing and object recognition, the transformed features are extensively studied such as Fast Fourier Transform
(FFT), Discrete Cosine Transform (DCT), and Gabor wavelets. This paper investigates these three transformation methods to further

773
International Journal of Pure and Applied Mathematics Special Issue

enhance the performance of keystroke dynamics systems using the original feature directly. Different from most of previous works
focusing on classifier design, this paper design better features for performance improvement.
DCT (Discrete Cosine Transform):
DCT feature extraction approaches are considered and a new efficient approach is proposed. DCT feature extraction consists
of two stages. In the first stage, the DCT is applied to the entire data to obtain the DCT coefficients, and then some of the coefficients
are selected to construct feature vectors in the second stage. Dimension of the DCT coefficient matrix is the same as the input text. In
fact the DCT, by itself, does not decrease data dimension; so it compresses most signal information in a small percent of coefficients.
Discrete cosine transform (DCT) [21–28] are the important approaches of this class. Some special properties of DCT make it a
powerful transform in image processing applications, including face recognition. DCT is very close to the KLT and has a strong
ability for data decorrelation [26]. There are fast algorithms for DCT realization, which is not the case for KLT. Combination of
statistical and deterministic transforms constructs a third type of feature extraction approaches with both advantages. In this type,
DCT reduces the dimension of data to avoid singularity and decreases the computational cost of PCA and LDA. Chen et al. After
applying the DCT to an image, some coefficients are selected and others are discarded in data dimension reduction. The selection of
the DCT coefficients is an important part of the feature extraction process. In most of the approaches which utilize the DCT, not
enough attention was given to coefficients selection (CS). The coefficients are usually selected with conventional methods such as
zigzag or zonal masking. These conventional approaches are not necessarily efficient in all the applications and for all the databases.

In general, the DCT coefficients are divided into three bands (sets), namely low frequencies, middle frequencies and high
frequencies. itvisualizes these bands. Low frequencies are correlated with the illumination conditions and high frequencies represent
noise and small variations (details). Middle frequencies coefficients contain useful information and construct the basic structure of the
image. From the above discussion, it seems that the middle frequencies coefficients are more suitable candidates in face recognition.
Although coefficients selection, the second stage of the feature extraction, is an important part of the feature extraction process and
strongly influences the recognition accuracy, but it has received only scant attention.

DCT and coefficients selection For an M N values, where each single data corresponds to a 2D matrix, DCT coefficients are
calculated as follows:
M−N
1 (2x + 1)uπ (2x + 1)vπ
𝑓(𝑢, 𝑣) = ∝ (𝑢) ∝ (𝑣) ∑ 𝑓(𝓍, 𝓎) ∗ cos( ∗ cos(
√mn 2M 2N
𝓍=0

Where u=0, 1.....M, and v=0,1,..N (3)


1
𝜔=0
√2
∝ (𝜔) = { } (4)
1 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

F(𝓍, 𝓎) is theobject intensity function and F(𝓊, 𝓋) is a 2D matrix of DCT coefficients. Block-based implementation and the entire
image are the two implementations of the DCT. Entire Data DCT has been used in this paper. The DCT is applied to the entire Data to
obtain the frequency coefficient matrix of the same dimension.DCT has mainly used in this paper to change the dimension for getting
better results.

3.2 Feature Vector:


The password is represented as following sequence
𝑃1 , 𝑅1 , 𝑃2 , 𝑅2 … … 𝑃𝑛 , 𝑅𝑛
Where𝑃1 and𝑅1 represents the press and release time of the ith keystroke of a password.the element of the feature vector extracted
from the original keystroke information are classified in to two categorizes: Dwelling time and Flight time. Dwelling time is
calculated by 𝑅1 -𝑃1 and the Flight time is calculated by 𝑃𝑖 -𝑅𝐼−1

3.2.1 Feature Vector Extraction:


Therefore the extracted feature from the original sequence is represented as
I = (𝑅1 − 𝑃1 ,𝑃2 − 𝑅1 , 𝑅2 − 𝑃2 , … … . , 𝑃𝑛 − 𝑅𝑛−1 , 𝑅𝑛 − 𝑃𝑛 )
Example:
If 12 letter password means 12/6 = 6, therefore I = 6., if i = 1 I = 𝑅1 − 𝑃1 ,if i = 2 where i = 𝑃2 − 𝑅1 ,𝑅2 − 𝑃2 ., if i = 3 Where i = 𝑃3 −
𝑅2 , 𝑅3 − 𝑃3 ., if i = 4 where 𝑃4 − 𝑅3 , 𝑅4 − 𝑃4 ., if I = 5 where i = 𝑃5 − 𝑅4 , 𝑅5 − 𝑃5 ., if i = 6 where i = 𝑃6 − 𝑅5 , 𝑅6 − 𝑃6 .
Example:
User 1 Data values:
0,117340,282839,353985,558993,647279,1004653,1095798,1227438,1312870,1534808,1637380
Where P denotes
𝑃1 =0,
𝑃2 = 282839,

774
International Journal of Pure and Applied Mathematics Special Issue

𝑃3 =558993
𝑃4 =1004653
𝑃5 =1227438
𝑃6 =1534808
Where R denotes
𝑅1 =117340
𝑅2 =353985
𝑅3 =647279
𝑅4 =1095798
𝑅5 =1312870
𝑅6 =1637380

By this sequence
I = (𝑅1 − 𝑃1 ,𝑃2 − 𝑅1 , 𝑅2 − 𝑃2 , … … . , 𝑃𝑛 − 𝑅𝑛−1 , 𝑅𝑛 − 𝑃𝑛 ) (3)
We get Feature Vector V = 117340, 165499, 71146, 205008, 88286, 357374, 91145, 308217, 85432, 22938, 102572

3.2.2 Classifier algorithm:


Fusion of Features and Classifiers:
Mixture of different knowledgeable decision of widely studied in previous twenty years. Combination methods can be
grouped by the level at which they operate. The simplest way is in the feature level, where different kinds of features are concatenated
into an extended feature vector. This combination inherits the advantages of different features, and any classifier is easily used with
them to build the final classification model. Combination can also be done in the level of decision or output score, which is called
classifier-level combination. It is a quite popular way as the score is generally considered as a new kind of feature. This paper
investigates both methods for performance improvement. For feature-level combination, we can easily get the new extended features.
DCT features vector Similar to the feature-level combination, the classifier-level combination is based on the scores of classifiers.
Supposed we have k classifiers, whose classifiers and dct feature combination score is denoted as score level fusion vector.
3.2.3 Experiment:
In this section, we present benchmark experimental results for some classification and feature extraction algorithms on the
BeiHang Keystroke Dynamics Databases. The extensive research experiments bring the evaluation of different features, classifiers,
and their fusions.
We also exhibit those specific rhythms for different individuals can lead to high performance, which can be used in practical
applications, such as password protection.
3.2.4 Evaluation criteria:
In the research experiments, we use the False Positive Rate and the True Positive Rate for evaluation metrics. The former is
the percentage of intruders who can enter the accounts by imitating the typing manner of genuine users. The latter is the percentage of
genuine users who can successfully log into the system with the right keystroke manner. By changing the threshold in the
classification procedure, we obtain a series of True Positive Rates and False Positive Rates, and then we use these results to draw a
ROC curve. The ROC curve is used for evaluation of various algorithms including the Random forest classifier, classifier random tree
classifier with the original feature DCT. We also provide the Equal Error Rate (EER) for further evaluation of different methods. EER
is the percentage where the False Positive Rate equals the False Negative Rate.
3.2.5 Classifier, Features and Training:
This section explains the classifier that we used, the features it employed, and its training and testing. The MATLAB
Programming environment (version 2013 a) was used for analyses And WEKA 6 for Classifier analysis.
3.2.6 Classifier – support vector machine:
SVM generally applies to linear boundaries. In the case where a linear boundary is inappropriate SVM can map the input
vector in to a high dimensional feature space By choose a non linear mapping, the SVM construct a optimal separating hyper plane in
the higher dimensional space .The function K is defined as the k rnalfunction for generating the inner products to construct machines
with different types of non-linear decision surfaces in the input space.

𝐾(𝓍, 𝓍𝑖 ) = Φ(𝓍). Φ(𝓍𝑖 ) (5)

The kernel function may be of any symmetric functions that satisfy the Mercer’s conditions (courant and Hilbert , 1953) there are
several SVM kernel function .
SVM Gaussian – inner product kernels are

∥𝓍 𝕋 −𝓍𝑖 ∥
Gaussian ℯ𝓍𝓅[− ] (6)
2𝜎 2

775
International Journal of Pure and Applied Mathematics Special Issue

Where x is the input patterns , 𝑥𝑖 is variance ,1 ≤ 𝑖 ≤ 𝑁𝑠 , where 𝑁𝑠 is the number of support vectors ,𝛽𝜊 ,𝛽1 are constant
values. p is the degree of the polynomial. Although support vector machines are often considered to be the best classifiers currently
available, random forests are strong competitors, frequently outperforming SVMs .The random forest classifier is generally a good
performer because it is robust against noise, and because its tree-classification rules enable it to find informative signatures in small
subsets of the data (i.e., automatic feature selection).

3.2.7 Features used in the classifier


During typing, all key-press (key-down) and key-release (key-up) events were time stamped and recorded. From these
events, each of the three features used in the random forest classifier, SVM, Random tree can be derived: (1) hold time (time elapsed
from key-down to key-up of a single key); (2) digram latency(time elapsed from the key-down of a character being typed to the key-
down of the next character); and (3) diagram interval (key-up to key-down latencies between diagrams ).For a ten-digit passcode,
there are 11 hold times(including the return key), 10 key-down to key-down latencies, and 10 key-up to key-down intervals, which
taken together form a 31-dimensional vector that represents each passcode repetition. All three features were used, because they form
a superset of the features commonly used by other researchers. Although some of these features are linearly dependent, this is not a
concern when using a random forest, because the random forest performs feature selection as part of its training, thereby
accommodating any linear dependencies among features.
4. RESULTS AND DISCUSSIONS
DATASET A:
Results for dataset A

Table .no: 3 EER Results with different feature dimensions and classifier on Dataset 1, where* indicates results

Random Forest Random tree Support Vector


Classifier Classifier Machine
Dataset A classifier
Accurac ERR Accurac ERR Accura ERR
y Rate y Rate cy Rate
DCT –
Feature 79.7 13.2 81.0 18.9 79.6 20.4
(DIM=10)
DCT –
Feature 80.3 15.6 80.5 19.4 83.2 16.8
(DIM=12)
DCT –
Feature
(DIM = 77.5 11.4 80.1 19.8 86.3 13.7
14)

DATASET B:
Results for dataset B:

Table .no:3 EER Results with different feature dimensions and classifier on Dataset B, where* indicates results

Random Forest Random tree Support Vector


Classifier Classifier Machine
Dataset A classifier
Accurac ERR Accurac ERR Accura ERR
y Rate y Rate cy Rate
DCT –
Feature 80.5 15.4 79.4 20.5 81.2 18.3
(DIM=10)
DCT –
Feature 80.5 15.6 80.5 19.4 78.8 21.2
(DIM=12)
DCT –
Feature
(DIM = 81.6 14.1 81.6 18.3 84.3 15.7
14)

4.1 Roc Curve:


Receiver operating curve or Relative operating characteristic (ROC): The ROC plot is a visual characterization of the trade-
off between the FMR and the FNMR. In general, the matching algorithm performs a decision based on a threshold that determines
how close to a template the input needs to be for it to be considered a match. This type of graph is called a Receiver Operating
Characteristic curve (or ROC curve.) It is a plot of the true positive rate against the false positive rate for the different possible cut
points of a diagnostic test. An ROC curve demonstrates several things:

776
International Journal of Pure and Applied Mathematics Special Issue

It shows the tradeoffs between sensitivity and specificity (any increase in sensitivity will be accompanied by a decrease in
specificity). The closer the curve follows the left-hand border and then the top border of the ROC space, the more accurate the test.
The closer the curve comes to the 45-degree diagonal of the ROC space, the less accurate the test. The slope of the tangent line at a
cutpoint gives the likelihood ratio (LR) for that value of the test. You can check this out on the graph above. Recall that the LR for T4
< 5 is 52. This corresponds to the far left, steep portion of the curve. The LR for T4 > 9 is 0.2. This corresponds to the far right, nearly
horizontal portion of the curve. The area under the curve is a measure of text accuracy
4.1.1 Comparison of the ROC Results for two Datasets (particularly SVM classifier)

Fig no:3 A10_SVM_ROC_79.6


The above curve shows the ROC curve for Dataset A with dimension 10 in SVM classifiers the overall accuracy is 79.6

Fig no: 4 A12_SVM_ROC_83.0


The above curve shows the ROC curve for Dataset A with dimension 12 in SVM classifiers the overall accuracy is 83.0

Fig no 5 A14_SVM_ROC_86.3

777
International Journal of Pure and Applied Mathematics Special Issue

The above curve shows the ROC curve for Dataset A with dimension 14 in SVM classifiers the overall accuracy is 86.3

Fig.no: 6 B10_SVM_ROC_81.2

The above curve shows the ROC curve for Dataset B with dimension 10 in SVM classifiers the overall accuracy is 81.2

Fig.no.7 B12_SVM_ROC_78.8

The above curve shows the ROC curve for Dataset B with dimension 12 in SVM classifiers the overall accuracy is 78.8

Fig no :6 B14_SVM_ROC_84.3

The above curve shows the ROC curve for Dataset B with dimension 14 in SVM classifiers the overall accuracy is 84.3

4.1.2 Performance Evolution:


The performance of a biometric system is generally characterized by the receiver operating characteristic (ROC) curve. It can
be summarized by the equal error rate (EER), the point on the curve where the fall acceptance rate (FAR)and false rejection rate
(FRR) are equal. Other system evaluation criteria include efficiency, adaptability, robustness, and convenience .Performance:
measures the performance of a biometric system in terms of acquisition and recognition errors. In order to evaluate the performance of

778
International Journal of Pure and Applied Mathematics Special Issue

a biometric system, we generally need a test benchmark and performance metrics. According to the International Organization for
Standardization ISO/IEC 19795-1, the performance metrics are divided in to three sets: Acquisition performance metrics such as the
Failure-To-Enroll rate (FTE). Verification system performance metrics such as the Equal Error Rate (EER). Identification system
performance metrics such as the False-Negative and the False- Positive Identification Rates (FNIR and FPIR, respectively).

Effectiveness: Effectiveness indicates the ability of a method to correctly differentiate genuine and imposter. Performance indicators
employed by the researches are summarized as follow.

False Rejection Rate (FRR) refers to the percentage ratio between falsely denied genuine users against the total number of genuine
users accessing the system. Occasionally known as False Nonmatch Rate (FNMR) or type 1 error . A lower FRR implies less
rejection and easier access by genuine user
.
False Acceptance Rate (FAR) is defined as the percentage ratio between falsely accepted unauthorized users against the total number
of imposters accessing the system. Terms such as False Match Rate (FMR) or type 2 error refers to the same meaning. A smaller
FAR indicates less imposter accepted.
Equal Error Rate (EER) is used to determine the overall accuracy as well as a comparative measurement against other systems. It
may be sometimes referred to as
Crossover: Error Rate (CER). Result comparison portrayed in the next section will mainly be express with FAR, FRR, and EER.
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑓𝑢𝑠𝑒𝑑 𝑔𝑒𝑛𝑢𝑖𝑛𝑒𝑠
𝐹𝑅𝑅 =
𝑇𝑜𝑡𝑎𝑙 𝑛𝑜 𝑜𝑓 𝑔𝑒𝑛𝑢𝑖𝑛𝑒𝑠

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐴𝑐𝑐𝑒𝑝𝑡𝑒𝑑 𝑖𝑚𝑝𝑜𝑠𝑡𝑒𝑟


𝐹𝐴𝑅 =
𝑇𝑜𝑡𝑎𝑙 𝑛𝑜 𝑜𝑓 𝑖𝑚𝑝𝑜𝑠𝑡𝑒𝑟
4.1.3 Confusion matrix: for SVM - Gaussian:

One of the most important classification concepts is contained in the confusion matrix (for error) this matrix is a table that
represents the performance of an algorithm and form which other matrices can be derived the column of the matrices represents the
instances actually belong . the matrix makes it easy to see if the system is confusing two classes, hence the name. This matrix can
categorize in to two categories (e.g. Positive and negative) and counts the correctly classified (true)or falsely classified
(false)instances per class. Now, a success in when an instances is predicted correctly as a true positive (tp) or a true negative (tn). An
error is when an instance’s class is predicted incorrectly such that it is either a false positive (fp) or a false negative (fn)

Classification and actual class


True positives false negatives (P)
False positives (P) true negatives (N)

Where P represents positive rate and N represents Negative rate

Table .no: A10 Confusion Matrix– SVM Gaussian:


Classification

436 176
Actual class
67 749

Table .no: A12 – Confusion Matrix SVM Gaussian :


Classification

484 128
Actual class
68 748

Table .no: A14 – Confusion Matrix SVM Gaussian :


Classification

422 190
Actual class
102 714

779
International Journal of Pure and Applied Mathematics Special Issue

Table .no: B10 – Confusion Matrix SVM Gaussian

Classification

1159 77
Actual class
224 141

Table .no: B12 – Confusion Matrix SVM Gaussian

Classification

1173 63
Actual class
217 88

Table .no: B14 – Confusion Matrix SVM Gaussian

Classification

1171 65
Actual class
186 179

𝑡𝑝 Classification mans a correct classification of the instances. Originating from a deceptive messages 𝑓𝑝 are instances that are
supposedly truthful but classified as deceptive 𝑓𝑝 are instances that were derived from deceptive messages but are classified as
truthful. 𝑡𝑛 are instances that originate from truth messages and are classified as such some of actual deceptive (positive) instances is
p with p=𝑡𝑝 +𝑓𝑛 here SVM Gaussian takes B14 dimension get 𝑡𝑝 = 1171 and the sum of actual truthful (negative) instances N with
N=𝑓𝑝 +𝑡𝑛 .and P and N with p=𝑡𝑝 +𝑓𝑝 ,p=1357 N= 𝑓𝑛 +𝑡𝑛 ,N=244 represented as sum of instances are classified respectively.

Chart No: 1 Different Dimensions Level by DCT for various classifiers for Dataset B

88 86.3
86
83.2 Random Forest Classifier
84 80.3
82 81 80.5
79.7 79.6 80.1
80 77.5
78 Random Tree Classifier
76
74 Support Vector Machine
72 Classifier
DCT –FEATURE DCT – FEATURE DCT – FEATURE
(DIM=10) (DIM=12) (DIM=14)

The above chart describes the different dimension level by DCT accuracy rate with various classifiers. Hence the performance matrix
shown by the ROC curve.

780
International Journal of Pure and Applied Mathematics Special Issue

Chart No: 2 Different Dimensions Level by DCT for various classifiers for Dataset B
86
84.3
84
81.6 Random Forest Classifier
81.2 80.5 81.6
82 80.5 80.5
80 79.4 Random Tree Classifier
78.8
78
Support Vector Machine
76 Classifier
DCT –FEATURE DCT – FEATURE DCT – FEATURE
(DIM=10) (DIM=12) (DIM=14)

The above chart describes the different dimension level by DCT accuracy rate with various classifiers. Hence the performance matrix
shown by the ROC curve.

5 CONCLUSIONS AND FUTURE WORK


Two large databases have been collected and open for public research. Different features and benchmark algorithms have
been tested and summarized. We design both DATASET for protection device and an online keystroke dynamics system. The new
feature includes DCT and their combinations. The benchmark results are obtained by the one-class support vector machine, Gaussian
model as classifier, applied on the original and extended features. Our future work will focus on boosting the classifiers and
promoting the applications.
REFERENCE:

1. R.Abinaya, AN.Sigappi “Survey of Biometric Systems: Evolution and Challenges” proceedings of the nationalconference
on Multimedia signal processing and applications (NCMSPA) ISBN NO: 978-81922221-6-5, 2016
2. Pin Shen Teh, Andrew Beng Jin Teoh and Shigang Yue1, “A Survey of Keystroke Dynamics Biometrics” Hindawi
Publishing Corporation The Scientific World Journal Volume 2013, Article ID 408280, 24 pages
http://dx.doi.org/10.1155/2013/408280
3. Shimaa I. Hassan , Mazen M. Selim , and Hala H. zayed “User Authentication with Adaptive Keystroke Dynamics”
IJCSI International Journal of Computer Science Issues, Vol. 10, Issue 4, No 2, July 2013 ISSN (Print): 1694-0814 | ISSN
(Online): 1694-0784 [3], 2014
4. Roy A. Maxion and Kevin S. Killourhy “Keystroke Biometrics with Number-Pad Input” IEEE/IFIP International
Conference on Dependable Systems & Networks (DSN) .,2010.
5. Rosy Vinayak1 , Komal Arora “A Survey of User Authentication using Keystroke Dynamics ” International Journal of
Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882 Volume 4, Issue 4, April 2015
6. Hussain et al. (A. K. Hussain and M. M. Alnabhan, 2014” Advanced authentication scheme using a predefined keystroke
structure” International Journal of Computer,2014
7. Senathipathi, Krishnan Batri, “An analysis of Particle Swarm Optimization and Genetic Algorithm with respect to
keystroke dynamics” Green Computing Communication and Electrical Engineering (ICGCCEE), Electronic ISBN: 978-1-
4799-4982-3 ,DVD ISBN: 978-1-4799-4983-0 , 2014
8. Maheswari and S. Anitha, 2014) “Remote User Authentication Scheme: A Comparative Analysis and Improved
Behavioral Biometrics Based Authentication Scheme” Published in: Micro-Electronics and Telecommunication
Engineering (ICMETE), 2016
9. Rudrapal, S. Das, and S. Debbarma, “Improvisation of Biometrics Authentication and Identification through
Keystrokes Pattern Analysis” International Conference on Distributed Computing and Internet Technology
ICDCIT: Distributed Computing and Internet Technology pp 287-292, 2014
10. Ahmed and I. Traore “Biometric Recognition Based on Free-Text Keystroke Dynamics”.Referenced in: IEEE Biometrics
CompendiumIEEE RFIC Virtual JournalIEEE RFID Virtual Journal Page(s): 458 - 472 , Published in: IEEE Transactions on
Cybernetics ( Volume: 44, Issue: 4, April 2014
11. Monaco, N. Bakelman, S.-H. Cha,and C. C. Tappert, “One-handed Keystroke Biometric Identification Competition
“ Biometrics (ICB), 2015 International Conference on Electronic ISBN: 978-1-4799-7824-3 Print ISSN: 2376-4201

781
International Journal of Pure and Applied Mathematics Special Issue

12. Rybnik et al. (M. Rybnik, M. Tabedzki, M. Adamski, and K. Saeed. “User Authentication for Mobile Devices” Computer
Information Systems and Industrial Management pp 47-58, , 2013
13. Hassan et al. (S. I. Hassan, M. M. Selim, and H. Hala,” User Authentication with Adaptive Keystroke Dynamics” IJCSI
International Journal of Computer Science Issues, Vol. 10, Issue 4, No 2, July 2013 ISSN (Print): 1694-0814 | ISSN
(Online): 1694-0784,2013.
14. Schclar , L. Rokach, A. Abramson, and Y. Elovici “User Authentication Based on Representative Users” IEEE
Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) ( Volume: 42, Issue: 6, Nov. 2012 )
Page(s): 1669 – 167, , 2012
15. Syed, S. Banerjee, Q. Cheng, and B. Cukic, 2011 “Effects of User Habituation in Keystroke Dynamics on Password
Security Policy” High-Assurance Systems Engineering (HASE), 2011 IEEE 13th International Symposium on ISSN
Information: INSPEC Accession Number: 12461980
16. Karnan, M. Akila, and N. Krishnaraj, 2011 “Biometric personal authentication using keystroke dynamics: A review”
Applied Soft Computing “Volume 11, Issue 2, March 2011, Pages 1565-1573
17. Kolakowska” Computer Recognition Systems “ Advances in Intelligent and Soft Computing book series (AINSC,
volume 95), , 2011
18. Teh , A. B. J. Teoh, C. Tee, and T. S. Ong “Keystroke dynamics in password authentication enhancement” Expert
Systems with Applications Volume 37, Issue 12, December 2010, Pages 8618-8627
19. Akash Sanghi, Y.D.S. Arya “Survey, Applications and Security of Keystroke Dynamics for User Authentication
“International Journal of Computer & Mathematical Sciences IJCMS ISSN 2347 – 8527 Volume 6, Issue 9 September 2017.
20. X.Y. Jing, D. Zhang, “A face and palmprint recognition approach based on discriminant DCT feature extraction”,
IEEE Transactions on Systems, Man and Cybernetics 34 (6) (2004) 2405–2415
21. S. Dabbaghchian, A. Aghagolzadeh, M.S. Moin, “Reducing the effects of small sample size in DCT domain for face
recognition,” in: International Symposium on Telecommunication (IST), 2008
22. D. Ramasubramanian, Y.V. Venkatesh, “Encoding and recognition of faces based on the human visual model and
DCT,” Journal of Pattern Recognition 34 (2001) 2447–2458
23. C. Podilchuk, X. Zhang, “Face recognition using DCT based feature vectors” Proceeding of IEEE (1996)
24. R. Tjahyadi, W. Liu, S. Venkatesh,” Application of the DCT energy histogram for face recognition” International
Conference on Information Technology Applications (2004) 305–310.
25. D. Zhong, I. Defee, “Pattern recognition in compressed DCT domain” Proceeding of IEEE International Conference on
Pattern Recognition (ICPR) (2004) 2031–2034
26. Z.M. Hafed, M.D. Levine, “Face recognition using the discrete cosine transform” International Journal of Computer
Vision 43 (3) (2001) 167–188.
27. M.J. Er, W. Chen, S. Wu., “High speed face recognition based on discrete cosine transform and RBF neural networks”,
IEEE Transactions on Neural Networks 16 (3) (2005) 679–691.
28. Z. Pan, A.G. Rust, H. Bolouri,” Image redundancy reduction for neural network classification using discrete cosine
transforms”, Proceeding of International Joint Conference on Neural Networks (Como, Italy) 3 (2000) 149–154

782
783
784

You might also like