You are on page 1of 8

Pattern Recognition Letters 144 (2021) 13–20

Contents lists available at ScienceDirect

Pattern Recognition Letters


journal homepage: www.elsevier.com/locate/patrec

A novel spatio-temporal Siamese network for 3D signature


recognitionR
Souvik Ghosh a, Spandan Ghosh a, Pradeep Kumar b,∗, Erik Scheme b, Partha Pratim Roy c
a
Institute of Engineering and Management, Kolkata, India
b
University of New Brunswick, Fredericton, Canada
c
Indian Institute of Technology, Roorkee, India

a r t i c l e i n f o a b s t r a c t

Article history: Signature forgery is at the centre of several fraudulent activities and legal battles. The introduction of
Received 11 December 2019 3D signatures, the virtual signing of ones name in the air, has the potential to restrict forgers due to
Revised 9 August 2020
the absence of visual cues that can be easily copied. Existing 3D signature recognition approaches, how-
Accepted 17 January 2021
ever, have not leveraged the inherent spatial and temporal information, making it difficult to handle the
Available online 21 January 2021
diminished separability and reproducibility of these signatures. In this paper, we propose a novel spatio-
Keywords: temporal adaptation of the Siamese Neural Network, wherein one branch extracts spatial features using
Siamese network a 1D Convolutional Neural Network (CNN) while the other processes the input in the temporal domain
Spatio-temporal relationships using Long Short-Term Memory networks (LSTMs). Unlike conventional deep learning networks, Siamese
CNN networks are an application of One-Shot Learning so as to learn from a small amount of data as is often
LSTM the case in real life problems. They employ a distance metric that is forced to be small for like samples
3D signature recognition
(signatures from the same person), and large for different samples (from different persons). The proposed
approach, termed ST-SNN, is compared to other baseline classification architectures, and demonstrated
using a publicly available biometric 3D signature benchmark dataset, yielding True Positive Rate (TPR) of
94.63% with 4.1% False Acceptance Rate (FAR).
© 2021 Elsevier B.V. All rights reserved.

1. Introduction difficult to forge for would-be imposters. This temporal, gesture na-
ture of 3D signatures also yields advantages over other existing
Signatures have long been accepted as one of the most reliable verification systems that consist of static images (e.g. fingerprint
models of authentication. They encode unique behavioral traits de- and face verification systems [38]). Considering the rapid growth
rived from repetition and habit, resulting in unique information in technologies that have enabled remote and untethered interac-
across people. However, with proper practice and training, it is tions, such as in virtual or augmented reality environments, touch-
possible for skilled forgers to replicate signatures to within the vis- less forms of user verification may become essential for access
ible acuity of the human eye. Nevertheless, conventional two di- control and authentication. The methodologies and advantages of
mensional (2D) online and offline signature mechanisms remain 3D signature recognition are discussed at length in Behera et al.
widely accepted in practice. A wide variety of approaches have [4].
been proposed to automatically recognize and assess such signa- In air-writing, the signer is free to write in 3D space along
tures, as summarized in Galbally et al. [14], Zhang et al. [42]. any direction over any plane [25]. This makes the automated
It has been shown that online signatures are more robust than recognition process a challenging task for researchers. The lack of
their offline counterparts because, while an imposter can remem- confinement to a writing plane and of visual feedback can lead to
ber/mimic the shape of a signature, it is difficult to copy the dy- large variations between signatures from the same person. Such a
namic behavioral information captured by online modes [23]. The high amount of variation can wreak havoc on pattern recognition
extension to 3D signatures, otherwise known as air-writing, adds systems that rely on high inter-class separability. An example of
another dimension to these online signatures, making them more these variations can be seen in Fig. 1, which shows two signatures
performed by the same individual asked to sign the same way
twice. To address these challenges, researchers have proposed
R
Handle by Associate Editor Umapada Pal. solutions using handcrafted features and temporal classifiers such

Corresponding author. as Hidden Markov Models (HMMs) [4]. Nigam et al. [32] proposed
E-mail address: pradeep.kumar@unb.ca (P. Kumar). a 3D signature recognition mechanism using 3D Histogram of

https://doi.org/10.1016/j.patrec.2021.01.012
0167-8655/© 2021 Elsevier B.V. All rights reserved.
S. Ghosh, S. Ghosh, P. Kumar et al. Pattern Recognition Letters 144 (2021) 13–20

Fig. 1. An example of the within class variations between signatures from the same individual.

Oriented Optical Flow (HOOF) and Histogram of Oriented Trajec- • We stress that data acquisition is costly, motivating the use of
tories (HOT) features. The recognition process was governed by a one shot learning based approaches, or similar, in real life ap-
sum-rule on the scores obtained using Support Vector Machine plications.
(SVM) and Naive–Bayes classifiers. A multi-matcher on-line sig- • We demonstrate the approach using a publicly available dataset
nature verification system which fuses the verification scores in and show that it outperforms existing architectures.
pen-position parameter and pen-movement angle features which
are decomposed by Discrete Wavelet Transform (DWT) [31]. 2. Related works
Even these advanced features and conventional systems, how-
ever, are not able to compensate for the reduced separability be- Signature verification and recognition are widely considered as
tween classes and repetitiveness of the same class. Recently, au- one of the most reliable way for user authentication. In literature,
tomatic feature extraction methods such as Convolutional Neural the process is either carried for pen-paper (offline) or online signa-
Networks (CNNs) have been shown to outperform conventional tures. Earlier, offline signatures verification was carried out using
feature engineering methods in many fields, including the recogni- similarity matching algorithms. There are various similarity func-
tion of 3D air-writing [15]. While CNNs have been shown to model tions that are used to compare with the existing information for
the spatial information in the data, they do not inherently capture verification and classification of signatures [26–28]. The time series
temporal relationships between them. Conversely, Long Short-Term data representation of signatures can be directly matched using
Memory networks (LSTMs) have become widely used to model the elastic distance measures such as Dynamic Time Warping (DTW)
temporal data in handwriting, gesture and action recognition prob- [31]. An automatic handwritten signature verification system is
lems [25,29]. proposed in Drouhard et al. [11] which used directional Proba-
Motivated by the above observations, we propose a novel exten- bility Density Function (PDF) to present an offline signature ver-
sion of Siamese Neural Networks (SNN) [24] capable of capturing ification using neural networks and directional PDF. Shanker and
both the temporal and spatial information needed for robust 3D Rajagopalan [37] proposed a signature verification system based
signature recognition. Siamese models are known to be quite pow- on a DTW matching scheme and a modified version thereof. The
erful in face recognition and signature verification tasks because methodology was tested on a signature database of 100 people,
they learn to differentiate between inputs by focusing on their yielding error rates of 2% for the modified DTW as compared to
similarities. Using two identical branches, thus the term Siamese, 29% for the basic DTW method. More recently, Hafemann et al.
they learn to minimize the distance between similar samples while [17] developed a deep CNN model to learn writer-independent fea-
maximizing the distance between dissimilar pairs using a distance tures. The authors achieved an Area Under Curve (AUC) of 0.94
metric learning architecture. Here, however, instead of using two and 0.92 in GDPS-160 and GDPS-300 datasets, respectively. In [18],
identical branches, we replace one with 1D CNNs and the other CNNs were used to learn features for offline handwritten signature
with LSTMs. In this way, we ensure that the inputs are processed verification and achieved a then state-of-the-art equal error rate
in both the spatial and temporal domains. Effectively, we treat the of 1.72% on GDPS-160. Similarly, a pixel matching technique for of-
X, Y, Z coordinates of each signature as a sequence of 3D vectors fline signature verification and recognition system was proposed in
and simultaneously learn both the spatial and temporal relation- Bhattacharya et al. [5] and compared to ANN and SVM classifiers.
ships in these coordinates using two parallel branches, which we True acceptance rates of 0.98 and 0.78 were recorded using ANN
later collapse into a single vector. The main contributions of the and SVM classifiers, respectively, although no corresponding falase
paper are therefore as follows: acceptance rates were reported. A more advanced signature verifi-
cation scheme, the Siamese neural network, was proposed in Dey
et al. [9]. This network achieved an accuracy of 100% in the CEDAR
• We propose a novel spatio-temporal version of a Siamese Neu- dataset, and 84% and 86% in Hindi and Bengali script datasets. En-
ral Network (termed ST-SNN) for the recognition of 3D signa- semble learning has also been used for offline signature verifica-
tures using CNNs and LSTMs. This ST-SNN architecture exer- tion in Das et al. [8], where the authors used a XGBoost classifier
cises one-shot learning and uses a small fraction of the data along with horizontal stacking of probabilities to outperform basic
used by its counterparts. It processes the input in both the spa- verification models.
tial and temporal domains to automatically learn discriminating In online modes, real-time temporal data is acquired using dig-
features. itized tablets where the user signs using a pen or stylus from
• We propose a robust yet lightweight model for oneshot learn- which various features are extracted for recognition [16]. For signa-
ing with data that contains both spatial and temporal informa- tures captured by digitizing tablet, an online signature verification
tion. method is proposed in Jain et al. [21] where, the authenticity of

14
S. Ghosh, S. Ghosh, P. Kumar et al. Pattern Recognition Letters 144 (2021) 13–20

a writer is determined by DTW based similarity matching. Another a spatio-temporal architecture for automatic feature learning and
identity authentication system for online signature using SVMs was user verification purposes.
developed by Kholmatov and Yanikoglu [23] using the local fea-
tures of the points on the signature trajectory. A combination of 3. Methods
static image and dynamic information for signature verification us-
ing online and offline recognition approaches has been proposed in 3.1. CNNs and LSTMs
Alonso Fernández et al. [2]. They presented methods which used
information from global and local coordinate systems and showed CNNs are a class of deep neural networks that are widely ap-
a fusion approach based on linear logistic regression. Signature ver- plied to analyze visual imagery. These networks are highly capa-
ification based on geometric features is discussed [12]. The fea- ble of automatically learning robust and discriminative features
tures were extracted using signature envelop and stroke distribu- by exploring deep architectures at multiple levels of abstraction
tion which were later tested with different classification schemes. from the raw data without requiring domain knowledge [20]. CNNs
HMMs have also been used for online signature verification to pro- have been successfully applied in a number of applications in-
cess the data sequentially [10], however, more recently, Recurrent cluding image/video recognition, recommender systems and nat-
Neural Networks (RNNs) have been explored for online signature ural language processing. However, CNN based architectures often
verification. In [39], the authors present a Siamese Network using ignore the rich temporal information contained in the 3D signature
LSTM and GRU which outperformed previous DTW based verifica- sequences. Conversely, LSTMs are now commonly used to model
tion systems. Likewise, Ahrabian and Babaali [1] used autoencoders time series data [39]. LSTMs are designed to overcome long-term
and Siamese networks to build an online handwritten signature dependencies in sequences, do not suffer from problems like the
verification. The authors used a stack of fully connected networks vanishing gradient and the exploding gradient, and are easier to
with dropouts along with an attention mechanism which improved train. Therefore, in this work, we leverage both CNNs and LSTMs
their learning. They also evaluated different sampling rates for in a Siamese network configuration for the recognition of 3D sig-
the inputs, with the best results of 91.35% being obtained with a natures.
sampling rate of 40 Hz. Siamese network achitectures have also
been explored for online signature verification systems; Sekhar 3.2. Siamese neural networks
et al. [36] developed an online signature verification using Siamese
CNNs and reported an accuracy of 100% on the MCYT-330 dataset. Although significant advances have been gained in deep learn-
Most recently, Wu et al. [40] incorporated Siamese networks with ing, a major short-coming is the large amount of data required to
DTW to preserve the local structures along with the alignment achieve state-of-the-art results and performance. In many applica-
conditions and recorded an EER of 2.11% on the MCYT-100 dataset. tions, especially those requiring human machine interaction, learn-
Despite the numerous approaches proposed, very few of these ing from small datasets is a very real challenge for deep learning
advanced authentication schemes have been developed for 3D in- models. Conversely, One Shot learning is an approach that has been
teraction systems. Lu and Huang [30] proposed a login framework found to yield good results even with small amounts of training
that uses 3D in air handwriting. They used deep neural networks data. Arguably the most widely known implementation of one shot
to achieve a user identification accuracy of 96.7% and 94.3% us- learning is the Siamese Neural Network (SNN), which has gained
ing the Data Glove and Leap Motion systems, respectively. A 3D popularity in facial recognition [24]. Instead of directly classifying a
magnetic signature system for a mobile device was proposed us- sample, SNN learns to find the similarity and dissimilarity between
ing a Multi Layer Perceptron in Ketabdar et al. [22]. Recently, Be- the inputs using three identical networks with the same weights
hera et al. [3] proposed a deep CNN based architecture for air and parameters. Three inputs, anchor, positive and negative sam-
signature recognition. However, the authors extracted geometrical ples, are passed through the networks to obtain encoded versions
features based on convex hulls or polygonal structures and fed of the inputs, then a distance is calculated between anchors and
them into the CNNs for classification purpose. Similarly, a 3D sen- positive samples and anchors and negative samples. The distance
sor and time order stroke context based writing system was pro- measure (d) is calculated using (1). Here, we use the Euclidean dis-
posed by ([7]), who reported an authentication accuracy of 93.6% tance (also known as the L2 distance) as a distance function.
when using DTW and hierarchical clustering. A conditional mu-
tual information maximization scheme has also been used to se- d (x1 , x2 ) = || f (x1 ) − f (x2 )||22 (1)
lect the optimal feature set for a gesture based password sys- where x1 and x2 are two data points.
tem using Leap motion, as proposed in Chahar et al. [6]. The au-
thors performed a weighted fusion approach to record an TPR of 
N
L(a, x p , xn ) = max(d (a, x p ) − d (a, xn ) + α , 0 ) (2)
81.17% with 1% FAR on a dataset consisting of 150 users. Like-
i=1
wise, Nigam et al. [32] proposed a combination of 3D Histogram
of Oriented Optical Flow (HOOF) and Histogram of Oriented Tra- where a is the anchor, x p is a positive sample with respect to a
jectories (HOT) features. The results are then combined with a lo- and xn is a negative sample with respect to a.
cal binary pattern based face verification algorithm with 91% and The loss function used is the triplet loss (2) as introduced in
1.4% TPR and FAR, respectively. A video camera-based air-writing Schroff et al. [35]. The loss has 3 parameters as described above;
recognition system for numerical digits was proposed in Rahman the anchor, the positive sample and the negative sample. Inspired
et al. [34]. The authors employed a sliding window approach to by nearest neighbour clustering, it clusters the positive samples
segment the string of numerals using RNNs, which were later rec- with respect to an anchor while ensuring that they are separated
ognized using similar RNN-based architectures. However, their ap- from negative samples by some minimum margin, α .
proach was based on the use of a marker scheme to extract the 2D
writing trajectory from the videos. Another method of hand seg- 3.3. Spatio-temporal Siamese neural networks
mentation and trajectory extraction from videos can be found in
Zhang et al. [41], where the authors used Microsoft Kinect-based In this work, we extend the SNN to incorporate elements
color and depth video sequences. Later, the authors extracted di- of CNN and LSTM by using parallel branches to form a Spatio-
rectional hand-crafted features and performed recognition of 3D Temporal Siamese Neural Network (ST-SNN). The first branch is
digits and characters. In contrast, our proposed method employs an LSTM with subsequent convolutional layers so as to learn the

15
S. Ghosh, S. Ghosh, P. Kumar et al. Pattern Recognition Letters 144 (2021) 13–20

Fig. 2. The proposed Spatio-temporal Siamese Neural Network (ST-SNN) architecture for 3D signature recognition. One subnetwork consists of an LSTM layer to preserve
temporal features, whereas the other uses CNNs to extract spatial features.

temporal relationships between the sequential 3D vectors. The sec- triplet loss function to minimize the distance between anchor and
ond branch is a CNN model used to learn the spatial relationships positive samples and maximize the distance between anchor and
in the coordinates in the signatures. As a result, the input vector negative samples. The results were computed using a 2-fold cross
is now fed to both the LSTM branch and the the CNN branch as validation scheme by dividing the prepared triplets into two sets.
shown in Fig. 2. The output of these two vectors is then concate- Thus, 3 input tensors of size (batch_size, 2200, 3) for every sample
nated into a single vector which is then passed through layers of were obtained in every batch of every iteration of training, with
convolution to collapse them into a single vector with dimension the anchor tensors remaining the same per user. The aim, concep-
(batch_size, 266). Three such outputs are taken for the anchor, the tually, was to generate a distinct cluster about this anchor image.
positive and the negative samples. At this stage the triplet loss is We used a margin of α = 2. These 3 samples are then passed
evaluated as suggested in conventional Siamese models [24]. through the entire model which is trained on an end-to-end basis.
Fully connected layers were avoided to limit the number of Later, keeping the anchor fixed, the model performance was tested
training parameters [33]. Dropouts were used to prevent overfit- on the remaining dataset ensuring that no triplets from the train-
ting, which was a high risk given the amount of data used. Indeed, ing set were repeated. Contrastive loss, using only the positive and
an important goal throughout the modeling process was to exer- negative samples, was considered as another choice of loss func-
cise one-shot learning for these tasks so that they may actually be tion, but smoother convergence was obtained using the Triplet loss
of use in practical life without requiring the person to sign multi- as depicted in Fig. 4.
ple times for enrollment.
4.2. Identification
3.4. Dataset
For each class, the anchor elements were passed through the
In this work, we demonstrate the performance of ST-SNN using network and their embeddings were stored. Subsequently, each
a publicly available 3D signature dataset by Behera et al. [4]. The sample from the dataset was also passed through the network. The
dataset consists of 1600 air-written signatures performed by 80 in- distribution of their mean distances from true and negative an-
dividuals and recorded using a Leap motion surface. Each individ- chors were obtained are shown in Fig. 5. We observed that around
ual repeated their signature at least 20 times. Fig. 3 shows a set of 2, which was our chosen margin for the triplet loss, we can have
representative examples obtained from different participants. a decision boundary in the distance space and fixing. We have
recorded an accuracy of 94.63% with a distance threshold of 2.3
4. Results using 5 genuine signatures.
The results were also computed when varying the number of
4.1. Data preparation & training signatures in the training set from 1 to 10 as presented in Table 1,
with the maximum recognition results being obtained when using
Each signature in the dataset consists of a text file with three 5 signatures.
space separated columns representing the co-ordinate axes over The ROC curve in Fig. 6 shows the trade-off between TPR
time [4]. A sequence of these coordinate axes forms the 3D sig- and FPR for the two different loss functions for different distance
nature trajectory. Each signature was treated as a time-series se- thresholds. It can be noted that the triplet loss converges to it’s
quence and zero-padded to length 2200. Out of the 20 signature best TPR faster than the contrastive loss approach, yielding an FAR
for each individual, first ten signatures (1–10) were selected as of 0.041 and FRR of 0.053.
positive samples (x p ), one of them was considered as the anchor Standard SVM and random forest classification models were
signature (a), and 10 negative signature samples (xn ) were selected also tested for comparison. As features for these classifiers, the
randomly from different users. These were used as triplets in the Mean, Mode, and Standard deviation were extracted from the

16
S. Ghosh, S. Ghosh, P. Kumar et al. Pattern Recognition Letters 144 (2021) 13–20

Fig. 3. Examples of 3D signature samples from different signers.

Fig. 4. Learning curve for (a) Triplet and (b) Contrastive loss functions. Note that Contrastive loss is similar to Triplet loss, but with no anchor.

Table 1
ST-SNN recognition performance for different num-
bers of training signature samples.

Signature/ user Recognition performance (%)

1 73.42
3 86.57
5 94.63
8 94.62
10 94.36

Note: The bold values are used to denote the most


effective performances of the system.

computed on raw data as added context [4]. The performance of


Fig. 5. Mean distances of samples from their corresponding true (positive) and false these classifiers is presented in Table 2 in comparison to that of
(negative, imposter) anchors. the HMM, DTW, and proposed ST-SNN using raw data as inputs.
Note that the proposed ST-SNN outperforms the DTW, the best of
sequential data from all 3 axes. The SVM was implemented using the other approaches, by almost 5%.
a Gaussian Radial Basis kernel and the random forest classifier Results were also computed using a set of temporal features
used an ensemble of 100 decision trees. Whereas results of Hidden discussed in Behera et al. [4]. From these features, the Mean, Mode,
Markov Models (HMM) and Dynamic Time Warping (DTW) are and Standard deviation were also extracted an input to the SVM

17
S. Ghosh, S. Ghosh, P. Kumar et al. Pattern Recognition Letters 144 (2021) 13–20

Fig. 7. Histogram of per user accuracies.

Fig. 6. ROC curves denoting the tradeoff between TPR and FPR for the triplet and performance (82.3%), likely struggling to learn the spatial informa-
contrastive Loss functions across threshold values. tion inherent in the data. A SNN with CNNs was also evaluated, af-
ter reshaping the data into (batch_size,3,2200), and treating X, Y, Z
Table 2 as separate channels. Similarly, this spatial architecture struggled
Comparative performance analysis with existing methods when
using raw data as input to the classifiers. Note: ‘Acc’ is used for
(58.1%) due to the loss of temporal information. The performance
‘Accuracy’. of the proposed ST-SNN was again found to be superior when em-
ploying the triplet loss function (94.6%) than with the contrastive
Methods Acc FAR FRR #Train/user
loss function (90.1%). In comparison to our ST-SNN which requires
SVM 0.542 0.244 0.642 18 5 genuine signatures for training, we included 10 signatures for the
Random Forest 0.726 0.263 0.322 18
contrastive loss case which makes positive-positive and positive-
HMM [4] 0.768 – – 18
DTW [4] 0.899 – – 18 negative pairs. Note that all of these results were computed using
ST-SNN 0.954 0.037 0.051 18 the raw data given the observed loss of information when using
ST-SNN 0.946 0.041 0.053 5 the extract features (3).
Note: The bold values are used to denote the most effective per- It is important to note that the above dataset does not in-
formances of the system. clude forgery attempts, and so the results represent the recogni-
tion performance alone. Consequently, to further evaluate the gen-
Table 3 eralizability of the approach in the verification context, the ST-SNN
Comparative performance analysis with existing methods when model was also trained on another publicly available online signa-
using extracted temporal features as input to the classifiers.
ture dataset (SVC2004 task-2) [13]. This dataset consists of signa-
Methods Acc FAR FRR #Train/ user tures from 40 individuals, each with 20 genuine signatures and 20
SVM 0.610 0.381 0.392 18 attempted forgeries. Because the proposed network was designed
Random Forest 0.798 0.220 0.182 18 for 3D inputs, pen-pressure information was employed as the 3rd
HMM [4] 0.920 – – 18 input along with the x,y information. As before, ten genuine and
DTW + k-NN [4] 0.971 – – 18
forged signatures were selected per user and verification results
ST-SNN 0.967 0.037 0.051 18
ST-SNN 0.948 0.041 0.053 5
were computed using a 2-fold cross validation. An average accu-
racy of 86.7% with a signature verification FAR of 0.115, FRR of
Note: The bold values are used to denote the most effective per-
0.173, and EER of 13.7% were recorded using the proposed ST-SNN
formances of the system.
framework. This dataset has previously been tested with a vari-
Table 4
ety of features and classifiers, but with different evaluation frame-
Comparative analysis with existing Siamese networks designed for works, making a direct comparison difficult. Nevertheless, He et al.
recognition with limited amounts of data. [19] recently used curvature and torsion features to develop an on-
Methods Acc FAR FRR #Train/ user
line signature verification system with 5 randomly selected train-
ing samples. The authors recorded an EER of 9.83% using Hausdorff
SNN with LSTM 0.823 0.203 0.201 10
distance matching scheme. Despite being designed and optimized
SNN with CNN 0.581 0.368 0.498 10
ST-SNN (ContĿoss) 0.901 0.093 0.104 10 for 3D signature inputs, it can be seen that the proposed ST-SNN
ST-SNN 0.944 0.041 0.052 10 architecture produced competitive results with no additional tun-
ST-SNN 0.946 0.048 0.053 5 ing for this verification task.
Note: The bold values are used to denote the most effective perfor-
mances of the system. 4.3. Error analysis

To better understand the distribution of signature samples that


and Random forest classifiers. The results of this comparison are were not correctly identified by the system, a histogram of per user
shown in Table 3. Note that, in contrast to the previous results, the accuracies was created, as shown in Fig. 7. As noted from the fig-
performance of the ST-SNN is inferior to that of the HMM in this ure, the signatures of 61 signers were classified with an accuracy of
case. This is likely because the spatial and temporal embeddings greater than 90%, whereas the signatures of 2 signers yielded less
that the network was designed to learn from the 3D data were that 50% classification results. Through visual inspection of these
lost during the initial feature extraction stage. subjects, large within signer variations were observed, as were un-
The proposed architecture was also compared with a conven- usually long or chaotic signatures.
tional Siamese network with identical sub networks, as shown in Two such anchor signatures are shown (left column) of Fig. 8(a)
Table 4. The SNN with an LSTM framework alone yielded inferior and (b) along with the anchor signatures with which they were er-

18
S. Ghosh, S. Ghosh, P. Kumar et al. Pattern Recognition Letters 144 (2021) 13–20

Fig. 8. Example of the anchor signatures that were incorrectly recognized. The users in the first column are being confused with the users in the second. (a) User 29 (left)
is confused with User 26 (Right), (b) User 30 (left) is confused with User 10 (Right).

roneously matched (in the right column). Although from different [2] F. Alonso Fernández, J. Fiérrez, M. Martínez Díaz, J. Ortega-García, Fusion of
signers, it can be seen that these pairs are very similar in structure. static image and dynamic information for signature verification, in: IEEE Inter-
national Conference on Image Processing, 2009.
It is likely that the anchors were close to each other in decision [3] S.K. Behera, A. Dash, D.P. Dogra, P.P. Roy, Air signature recognition using deep
space, making it difficult to set a threshold that could differentiate convolutional neural network-based sequential model, in: 24th International
between these signatures. As a result, further work is warranted Conference on Pattern Recognition, 2018, pp. 3525–3530.
[4] S.K. Behera, D.P. Dogra, P.P. Roy, Analysis of 3D signatures recorded using leap
on differentiating between these specific cases. motion sensor, Multimed. Tools Appl. 77 (11) (2018) 14029–14054.
[5] I. Bhattacharya, P. Ghosh, S. Biswas, Offline signature verification using pixel
matching technique, Procedia Technol. 10 (2013) 970–977.
5. Conclusion
[6] A. Chahar, S. Yadav, I. Nigam, R. Singh, M. Vatsa, A leap password based veri-
fication system, in: IEEE 2015 IEEE 7th International Conference on Biometrics
In this work, we present an extension of Siamese Neural Net- Theory, Applications and Systems (BTAS) 1–6, 2015.
works for the recognition of 3D signatures. By exploiting similarity [7] L.W. Chiu, J.W. Hsieh, C.R. Lai, H.F. Chiang, S.C. Cheng, K.C. Fan, Person authen-
tication by air-writing using 3D sensor and time order stroke context, in: In-
metrics using parallel CNN and LSTM branches, the proposed net- ternational Conference on Smart Multimedia, Springer, 2018, pp. 260–273.
work automatically extracts spatio-temporal information to learn [8] S.D. Das, H. Ladia, V. Kumar, S. Mishra, Writer independent offline signature
discriminative signature recognition features. The model was eval- recognition using ensemble learning, arXiv preprint arXiv:1901.06494 (2019).
[9] S. Dey, A. Dutta, J.I. Toledo, S.K. Ghosh, J. Lladós, U. Pal, SigNet: convolutional
uated using a publicly available 3D signature dataset of 80 indi- Siamese network for writer independent offline signature verification, arXiv
viduals, yielding a recognition performance of 94.63% using only 5 preprint arXiv:1707.02131 (2017).
training samples per user, outperforming previously reported clas- [10] J. Dolfing, E.H. Aarts, J. Van Oosterhout, On-line signature verification with hid-
den Markov models, in: International Conference on Pattern Recognition, 1309,
sification schemes. IEEE, 1998.
Our future work will seek to design more enriched models us- [11] J.P. Drouhard, R. Sabourin, M. Godbout, A neural network approach to off–
ing small amounts of training data to continue to improve the per- line signature verification using directional pdf, Pattern Recognit. 29 (1996)
415–424.
formance of the system and to learn to cluster and differentiate
[12] M.A. Ferrer, J.B. Alonso, C.M. Travieso, Offline geometric parameters for auto-
between the signatures that are most similar to each other. matic signature verification using fixed-point arithmetic, IEEE Trans. Pattern
Anal. Mach. Intell. 27 (6) (2005) 993–997.
[13] J. Fierrez, J. Ortega-Garcia, D. Ramos, J. Gonzalez-Rodriguez, HMM-based on–
Declaration of Competing Interest line signature verification: feature extraction and signature modeling, Pattern
Recognit. Lett. 28 (2007) 2325–2334.
The authors declare that they have no known competing finan- [14] J. Galbally, M. Diaz-Cabrera, M.A. Ferrer, M. Gomez-Barrero, A. Morales, J. Fier-
rez, On-line signature recognition through the combination of real dynamic
cial interests or personal relationships that could have appeared to data and synthetically generated static data, Pattern Recognit. 48 (2015)
influence the work reported in this paper. 2921–2934.
[15] J. Gan, W. Wang, K. Lu, In-air handwritten chinese text recognition with tem-
poral convolutional recurrent network, Pattern Recognit. 97 (2020) 107025.
References
[16] D. Guru, H. Prakash, Online signature verification and recognition: an approach
based on symbolic representation, IEEE Trans. Pattern Anal. Mach. Intell. 31
[1] K. Ahrabian, B. Babaali, Usage of autoencoders and Siamese networks for on- (2009) 1059–1073.
line handwritten signature verification, Neural Comput. Appl. 31 (12) (2019) [17] L.G. Hafemann, R. Sabourin, L.S. Oliveira, Writer-independent feature learning
9321–9334.

19
S. Ghosh, S. Ghosh, P. Kumar et al. Pattern Recognition Letters 144 (2021) 13–20

for offline signature verification using deep convolutional neural networks, in: [30] D. Lu, D. Huang, Fmcode: a 3D in-the-air finger motion based user login frame-
2016 International Joint Conference on Neural Networks (IJCNN), IEEE, 2016, work for gesture interface, arXiv preprint arXiv:1808.00130 (2018).
pp. 2576–2583. [31] I. Nakanishi, H. Sakamoto, N. Nishiguchi, Y. Itoh, Y. Fukui, Multi-matcher on–
[18] L.G. Hafemann, R. Sabourin, L.S. Oliveira, Learning features for offline hand- line signature verification system in DWT domain, IEICE Trans. Fund. Electron.
written signature verification using deep convolutional neural networks, Pat- Commun. Comput. Sci. 89 (2006) 178–185.
tern Recognit. 70 (2017) 163–176. [32] I. Nigam, M. Vatsa, R. Singh, Leap signature recognition using hoof and hot fea-
[19] L. He, H. Tan, Z.C. Huang, Online handwritten signature verification based on tures, in: 2014 IEEE International Conference on Image Processing (ICIP), IEEE,
association of curvature and torsion feature with Hausdorff distance, Mul- 2014, pp. 5012–5016.
timed. Tools Appl. 78 (2019) 19253–19278. [33] Z. Qian, T.L. Hayes, K. Kafle, C. Kanan, Do we need fully connected output lay-
[20] M. He, S. Zhang, H. Mao, L. Jin, Recognition confidence analysis of handwritten ers in convolutional networks?, arXiv preprint arXiv:2004.13587 (2020).
chinese character with CNN, in: 13th International Conference on Document [34] A. Rahman, P. Roy, U. Pal, Continuous motion numeral recognition using
Analysis and Recognition, 2015, pp. 61–65. RNN architecture in air-writing environment, in: Asian Conference on Pattern
[21] A.K. Jain, F.D. Griess, S.D. Connell, On-line signature verification, Pattern Recog- Recognition, Springer, 2019, pp. 76–90.
nit. 35 (2002) 2963–2972. [35] F. Schroff, D. Kalenichenko, J. Philbin, FaceNet: a unified embedding for face
[22] H. Ketabdar, K.A. Yüksel, A. Jahnbekam, M. Roshandel, D. Skripko, Magisign: recognition and clustering, in: Proceedings of the IEEE Conference on Com-
user identification/authentication, in: Proc. Of UBICOMM’10, 2010. puter Vision and Pattern Recognition, 2015, pp. 815–823.
[23] A. Kholmatov, B. Yanikoglu, Identity authentication using improved online sig- [36] C. Sekhar, P. Mukherjee, D.S. Guru, V. Pulabaigari, OsvNet: convolutional
nature verification method, Pattern Recognit. Lett. 26 (20 05) 240 0–2408. Siamese network for writer independent online signature verification, arXiv
[24] G. Koch, R. Zemel, R. Salakhutdinov, Siamese neural networks for one-shot im- preprint arXiv:1904.00240 (2019).
age recognition, ICML Deep Learning Workshop, 2015. [37] A.P. Shanker, A. Rajagopalan, Off-line signature verification using DTW, Pattern
[25] P. Kumar, R. Saini, P.P. Roy, D.P. Dogra, Study of text segmentation and recog- Recognit. Lett. 28 (2007) 1407–1414.
nition using leap motion sensor, IEEE Sens. J. 17 (2016) 1293–1301. [38] C. Sousedik, C. Busch, Presentation attack detection methods for fingerprint
[26] Q. Li, X. Zhou, A. Gu, Z. Li, R.Z. Liang, Nuclear norm regularized convolutional recognition systems: a survey, Iet Biom. 3 (2014) 219–233.
max pos@ top machine, Neural Comput. Appl. 30 (2018) 463–472. [39] R. Tolosana, R. Vera-Rodriguez, J. Fierrez, J. Ortega-Garcia, Exploring recurrent
[27] R.Z. Liang, L. Shi, H. Wang, J. Meng, J.J.Y. Wang, Q. Sun, Y. Gu, Optimizing top neural networks for on-line handwritten signature biometrics, IEEE Access 6
precision performance measure of content-based image retrieval by learning (2018) 1–7.
similarity function, in: 23rd International Conference on Pattern Recognition, [40] X. Wu, A. Kimura, S. Uchida, K. Kashino, Prewarping Siamese network: learn-
2016, pp. 2954–2958. ing local representations for online signature verification, in: ICASSP 2019-2019
[28] R.Z. Liang, W. Xie, W. Li, H. Wang, J.J.Y. Wang, L. Taylor, A novel transfer learn- IEEE International Conference on Acoustics, Speech and Signal Processing
ing method based on common space mapping and weighted domain matching, (ICASSP), IEEE, 2019, pp. 2467–2471.
in: 28th International Conference on Tools with Artificial Intelligence, 2016, [41] X. Zhang, Z. Ye, L. Jin, Z. Feng, S. Xu, A new writing experience: finger writing
pp. 299–303. in the air using a Kinect sensor, IEEE Multimed. 20 (2013) 85–93.
[29] J. Liu, A. Shahroudy, D. Xu, G. Wang, Spatio-temporal LSTM with trust gates for [42] X.Y. Zhang, Y. Bengio, C.L. Liu, Online and offline handwritten chinese character
3D human action recognition, in: European Conference on Computer Vision, recognition: a comprehensive study and new benchmark, Pattern Recognit. 61
2016, pp. 816–833. (2017) 348–360.

20

You might also like