You are on page 1of 29

Keystroke Behavioural Analysis

For Fraud Detection

Valerio Maggio
Data Scientist and Researcher
Fondazione Bruno Kessler (FBK)

Trento, Italy
@leriomaggio
Two Common forms of Frauds

Account Hijacking

Card Faking
Account Hijacking

Account Hijacking

User Identification
User Identification
Keystroke Dynamics
Identifying an individual based on their way of typing on a physical or virtual keyboard

Keystroke dynamics consists in analysing the way a user types by monitoring keyboard inputs
thousand of times per second, and processing this data through an algorithm, which then defines a
pattern for future comparison
Keystroke Dynamic Analysis

Time between two key release



Up-Up Time
Time between one pressure and one release- 

Dwell Time
Time between one release and one pressure

Flight Time
Time between two key pressures 

Down-Down Time
Keystroke Patterns Leaning : State of the Art
Data Pipeline: (1) Data Collection

Time between two key release



Up-Up Time
Time between one pressure and one release- 

Dwell Time
Time between one release and one pressure

Flight Time
Time between two key pressures 

Down-Down Time
Data Pipeline: (2) Feature Extraction
TimeShifting key-presses -
if deletions happen

Only Data leading to a


Successful Login
Time between two key release Time between two key pressures

Time between one release and one pressure Time between one pressure and one release
Feature Analysis & Data Preparation
Feature Analysis & Data Preparation
Feature Analysis & Data Preparation
1. Analyse Feature Distribution

2. Rank users accordingly


Up-Up Time - Username Field - web vs app
Up-Up Time - Password Field - web vs app
Dwell Time - Username Field - web vs app
Dwell Time - Password Field - web vs app
Data Cleaning

Complexity-Invariant
Distance Measure
Data Cleaning

Complexity-Invariant
Distance Measure
Feature Scaling
Original 

Feature Data

Standard Scaling

MinMax Scaling
Data Preparation
All feature Combinations

HDF5 format
Data Analysis Protocol (DAP)
Reduce the 

Selection Bias!!
80%

Use separately for 



HyperParams Search
20% Don’t Mix
Keystroke Patterns (Classical) Machine Learning

… …

Deep Keystroke
Learning
One AutoEncoder + FC Network
Outlier Detector (per user)
Deep AutoEncoder

Classification Deep Network

Encoder Decoder
Deep Keystroke Learning

… User Identification

Confusion Classification Matrix

Deep AutoEncoder

Classification Deep Network

Encoder Decoder

One AutoEncoder + FC Network Avg. Accuracy Score: 0.999090


Outlier Detector (per user) Avg. FPR: 0.002246
Outlier Detection
rf.fit(X,y_DL)

Feature Importance
Conclusions and Take Aways
• Data Processing and Cleaning is never painless

• 80% of the time for Data Science Processing

• 20% is for Machine/Deep Learning Code

• 90% of which is looking for Optimum HyperParameters 



(exp. for Deep Learning)

• Use Unsupervised Approaches to get useful insights on the data

• Feature Scaling is paramount

• Beware of the Selection Bias (Multiple Time K-Fold CV)

• DL is not silver bullet


Thanks a lot for your
kind attention

@leriomaggio vmaggio@fbk.com

+ValerioMaggio it.linkedin.com/in/valeriomaggio

You might also like