You are on page 1of 6

2018 24th International Conference on Pattern Recognition (ICPR)

Beijing, China, August 20-24, 2018

Air Signature Recognition using Deep


Convolutional Neural Network-Based Sequential
Model
S. K. Behera, A. K. Dash and D. P. Dogra P. P. Roy
School of Electrical Sciences Department Computer Science & Engineering
Indian Institute of Technology Indian Institute of Technology
Bhubaneswar 752050, India Roorkee 247667, India
Email: {sb29, akd10, dpdogra}@iitbbs.ac.in Email: proy.fcs@iitr.ac.in

Abstract—Deep convolutional neural networks are becoming the presence of the third dimension in the input data [5], [6],
extremely popular in classification, especially when the inputs are [7].
non-sequential in nature. Though it seems unrealistic to adopt In offline mode [4], people use scanner for digitization, then
such networks as sequential classifiers, however, researchers have
started to use them for applications that primarily deal with analyze shapes and contours of the signatures. The online or
sequential data. It is possible, if the sequential data can be dynamic signatures [3], use a digitizing tablet (pen-sensitive
represented in the conventional way the inputs are provided in computer display) where the users sign by a smart pen or
CNNs. Signature recognition is one of the important tasks for stylus. The sensors like Leap motion, Kinet, or RealSense are
biometric applications. Signatures represent the signer’s identity. also being used to capture air signatures in 3D. Such setups
Air signatures can make traditional biometric systems more
secure and robust than conventional pen-paper or stylus guided are portable and can be integrated with modern handheld
interfaces. In this paper, we propose a new set of geometrical devices easily. In all of the above acquisition systems, features
features to represent 3D air signatures captured using Leap such as position in x-axis, y-axis, z-axis, slope, direction,
motion sensor. The features are then arranged such that they can curvature,velocity, acceleration, pressure, azimuth angle or
be fed to a deep convolutional neural network architecture with altitude angles of the pen with the device, are extracted
application specific tuning of the model parameters. It has been
observed that the proposed features in combination with the CNN from the stored data for recognition purpose [8], [9]. In
architecture can act as a good sequential classifier when tested on the next stage, sequential classifiers are trained using the
a moderate size air signature dataset. Experimental results reveal extracted features to make appropriate models that are used
that the proposed biometric system performs better as compared during recognition [2], [10]. In dynamic time warping (DTW)-
to the state-of-the-art geometrical features with average accuracy based approach, signatures are directly matched using elastic
improvement of 4%.
distance [11]. In case of Hidden Markov Model (HMM), finite
state machines are used to generate a probabilistic model
I. I NTRODUCTION
that is used during matching [12]. The matching techniques
In biometric authentication systems, “Signature” recognition based on DTW [11], Minimal Variance Matching (MVM) [13],
is one of the important modes of identification [1], [2]. Due Support vector machine (SVM) [14], and Neural Networks
to the specific behavioral property, every person’s signature (NN) [15] are commonly used for signature recognition.
bears uniqueness. Usability is one of the main advantages of Apart from the above classifiers, deep learning frameworks
signature-based biometric authentication systems. The hand- are becoming extremely popular as baseline classifiers at a
written inputs, which can be collected easily on papers or faster rate. Especially, deep neural networks are often being
through electronic devices, are therefore quite popular in used for learning models that aim to model high-level ab-
authentication [3], [4]. However, due to security reasons, it stractions in data [16], [17], [18]. Due to multiple layers,
is always desirable to enhance the robustness of traditional deep learning architecture is build for automating feature
signature acquisition systems by other means. This has led learning. Therefore, through the deep learning framework,
to new ways of signature acquisition systems including sty- data are represented by a hierarchy of features from lower-
lus/smart pen based interface or 3D air signature setups. In level to higher-level. Even such architectures that are suitable
3D air signature setups, people are encourage to put their for representing 2 or higher dimentional data like image or
signatures in air and the inputs are captured through sensors video, are being used for sequential classification tasks [19],
built with visible light camera, IR camera, or depth measuring [20], [21]. For example, the human activity recognition (HAR)
devices. Specifically, air signature recognition in 3D has two proposed by Yang et al. [22] has successfully utilized such a
advantages over traditional 2D acquisition systems. They are deep learning framework on multi-channel time series signals
robust to shouder surfing and 3D features are stronger due to acquired from a set of body-worn inertial sensors that capture

978-1-5386-3788-3/18/$31.00 ©2018 IEEE 3525

Authorized licensed use limited to: UNIVERSITE DE SFAX. Downloaded on July 10,2023 at 15:58:54 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. Proposed methodology of air signature recognition system.

predefined human activities. Therefore, we have intuitively


adopted such a deep convolutional neural network architecture
for classification of 3D air signatures.
However, selection of appropriate feature-set is highly im-
portant to make the recognition perfect. Therefore, researchers
have proposed various types of features that preserve static [6],
[23] as well as dynamic [24] characteristics of human signa-
tures. It has been observed that, geometrical characteristics
can be quite handy to define the uniqueness of signatures.
In this paper, we have introduced a new geometrical feature-
set that is capable of preserving the uniqueness of human
signatures. The features are then arranged such that they can
be fed to a deep convolutional neural network architecture with
application specific tuning of the model parameters. It has been
observed that the proposed features in combination with the
Fig. 2. A typical experimental setup using Leap motion interface that has
CNN architecture can act as a good sequential classifier. This been used for air signature acquisition during biometric authentication.
has been done with a primary goal to design a robust biometric
authentication system that is better than existing systems in
terms of accuracy and ease of implementation. Fig. 1 presents the volunteers involved in this study. The Leap motion sensor
the overview of the proposed air signature recognition system. is equipped with infra red camera that creates an inverted
In the quest of accomplishing the above goal, we have made pyramid field of view over the surface of the sensor. Therefore,
the following technical contributions in this field of study: when the palm is placed within this interaction space, its in-
• We have proposed a new geometrical feature-set that built API can track finger tip movements during air signatures.
accounts for the preservation of internal structure with During signature registration, the users are asked to sign using
the help of 3D orientations and normalized distances of their index finger as typically we do while performing air
the boundary points from the center of the signature. signatures over the interaction space. Instantaneous finger tip
• We introduce the application of deep convolutional neural positions are obtained and the 3D trajectories are assumed as
network as a sequential classifiers to recognize 3D air the signature of the intended user. In the pre-processing stage,
signatures by arranging the geometrical features. coordinate values are scaled between 0 to 1 and then re-sized
• An exhaustive experimental evaluation of the proposed to a fixed length using the process as depicted in Fig 3.
geometrical feature-set and the customized architecture
in the context of 3D air signature recognition.
The rest of the paper is organized as follows. In the next
section, we present the feature extraction steps. The architec-
ture of the deep CNN as a sequence classifier applicable to
signature recognition, has been discussed in Section III. Ex-
perimental results and comparative evaluations are presented
in Section IV. Finally, conclusions are presented in Section V.
Fig. 3. Pre-processing of the raw signature data.
II. E XTRACTION OF G EOMETRICAL F EATURES
A Leap motion-based 3D signature acquisition setup as A typical signature with N sequential points represented
depicted in Fig. 2 has been used for collecting air signatures of by S as given in (1), where pi = (xi , yi , zi ) represents the

3526

Authorized licensed use limited to: UNIVERSITE DE SFAX. Downloaded on July 10,2023 at 15:58:54 UTC from IEEE Xplore. Restrictions apply.
instantaneous position of the finger tip at ith instance. We then estimate the change in direction cosines between
successive points on the signature curve. It is done as follows.
S = [p1 , p2 , p3 , ......, pN ]T (1) Let pi and pi+1 represent two consecutive points on the curve
such that
A. Geometrical Features pi = (xi , yi , zi )
Though raw coordinates can be used for training, how- and
ever, often these values are corrupted with noises such as
inaccurate tracking or perspective errors. Therefore, high-level pi+1 = (xi+1 , yi+1 , zi+1 ).
features are often preferred instead of the raw coordinates.
The change in direction cosines across each axis are computed
High-level feature such as local direction, curvature, skew,
and represented as follows:
lineness, velocity or acceleration are often used to extract
meaningful information about the curvy signatures [6], [23], δli = li − li+1
[24]. Geometrical features based on well-known structures
such as convex hulls or polygonal structures can also be δmi = mi − mi+1
extracted to represent signature in high dimension space. In
this work, we have emphasized on designing a feature-set that
δni = ni − ni+1 .
is suitable for 3D air signatures. We have intuitively developed
this feature assuming as if someone is positioned at the center Next, we construct a four dimensional high-level feature
of the 3D structure. Viewing perspective may provide valuable representation of the signature at ith position as given in (3),
information. The concept is depicted in Fig. 4. where δ values are defined earlier.

ξi = [di , δli , δmi , δni ] (3)

Finally, we obtain the complete high-level feature repre-


sentation of the signature as given in (4). This is then used
to prepare the input for deep convolutional neural networks
which is discussed in the following section.

FS = [ξ1 , ξ2 , ξ3 .......ξN ] (4)

III. S EQUENTIAL M ODEL OF CNN FOR S IGNATURE


R ECOGNITION
This section presents the proposed methodology for recog-
nition of 3D air signatures captured using Leap motion sensor
and represented using (4). We call such representation as high-
level features. These feature vectors have been passed through
Fig. 4. A pictorial depiction of perspective parameter estimation and geo- the sequential models of a deep convolutional neural network
metrical feature extraction. for recognition. The proposed architecture of the network is
described hereafter.
Our method starts by locating the centroid (c) of the 3D
signature and then estimating distance and angular parameters A. Convolutional Neural Network
over the entire curve. To avoid data loss, we estimate the local
features on each point (pi ) of the signature curve (S). First, The traditional architecture of the Convolutional Neural
we draw a straight line from c to pi . The normalized distance Network (CNN) usually consists of two major components.
(di ) between these points is recorded as depicted in Fig. 4. The First part is meant for self-learning the features from the raw
normalized distance is assumed to be unique for the same user data known as the feature extractor stage. The second part
due to its rotational invariance property. Next, we compute the is for performing the classification through a fully connected
direction cosigns of this line segment across three coordinate multilayer perceptron (MLP). The feature extractor stage is
axes. Let the line segment cp ~ i makes ∠α, ∠β, and ∠γ with x- a collection of a number of stages of similar layer structure,
axis, y-axis, and z-axis, respectively. We estimate the direction where each stage consists of a combination of two cascaded-
cosines of the line and let the triplet be represented as given layers: convolution and activation layer followed by a pooling
in (2), where li , mi , and ni represent the values of direction layer. However, such networks are mainly suitable for process-
cosines. ing image data. Therefore, representation of sequential data in
appropriate form is extremely important. This is discussed in
< li , mi , ni >=< cos α, cos β, cos γ > (2) next section.

3527

Authorized licensed use limited to: UNIVERSITE DE SFAX. Downloaded on July 10,2023 at 15:58:54 UTC from IEEE Xplore. Restrictions apply.
Fig. 5. Architecture view of the signature classification based on deep convolutional neural networks.

1) Adopted Architecture: Unlike the image classifications Pooling Layer: Pooling is most widely used to reduce the
tasks using CNN where the input is given in the form of 2D spatial size of the output of convolution layer without affecting
image pixels, the input for this work is one dimensional. These its depth. The purpose of pooling layer is to reduce the number
are collection of high-level points points extracted from the 3D of parameters and hence the computation load is reduced. Max
air signatures. In order to fit into the architecture of the CNN pooling and average pooling are two of the most widely used
model mentioned in Fig. 5, the input data representing the 3D pooling techniques from which, max pooling has been adopted
air signature has been restructured and the subsequent stages in this paper.
are described below. Flattening and Fully Connected Layer: After a series of
Input: The inputs are represented by x ∈ Rn×1×m with n convolution, activation and pooling layers, the feature extractor
as the number of samples and m represents the number of stage is completed. The next stage is to perform the classifi-
data corresponding to each sample. cation using an fully connected layer. The output of the last
Convolution Layer: This step is to perform the convolution pooling layer is converted in to a vector which is given as
operation between the input received from the previous layer an input to the fully connected layer. The output of the fully
and the kernel (filter). Generally, multiple kernels are consid- connected layer are generally converted into probabilities and
ered to be used in a convolution layer. Lets, there be ni kernels the popular way to achieve this is through softmax function.
used in a convolution layer Ci and we represent the kernel by
κ, then the output of the ith convolution layer with the input IV. R ESULTS AND D ISCUSSIONS
xi is given in (5), where ⊗ is the convolution operation and A. Data Description
bin is the bias.
A signature dataset has been created for conducting the
xi+1 = κn ⊗ x + i
bin ; with n = n1 , n2 , . . . , ni (5) experiments and to verify the proposed methodology. The
n
dataset has been made available to the research community1 . In
Activation Layer: In order to bring in the concept of non- our study, 50 users were involved for recording air signatures
linearity inside the neural network, activation functions are using the above mentioned experimental setup, where each
used. The activation functions help to learn complex models. user registered their signature 14 times out of which 12
The most widely used activation functions are sigmoid, tanh samples were used for training and rest for testing. Therefore,
and ReLU. In this paper, ReLU activation has been used across a total of 600 signatures (50 X 12 = 600) were used for
all layers. The ReLU activation is defined by training and 100 signature (50 X 2 = 100) were used for

ReLU (x) = max(0, x). 1 https://goo.gl/f9Cao5

3528

Authorized licensed use limited to: UNIVERSITE DE SFAX. Downloaded on July 10,2023 at 15:58:54 UTC from IEEE Xplore. Restrictions apply.
TABLE I proposed feature-set. We have recorded better performance us-
ACCURACY RESULTS USING THE PROPOSED AND EXISTING FEATURES ing the proposed deep learning framework using the proposed
Process Feature-Set Accuracy (%) high-level feature-set as compared to some of the popular
Existing feature 93 classifiers. These comparative results are presented in Table II.
Validation Proposed feature 94
( Existing + Proposed ) feature 98
Existing feature 91
Testing Proposed feature 92
( Existing + Proposed ) feature 95

testing. One of the key objectives of our method is to test


whether less number of signature samples can actually work
in a CNN framework. We were surprised to get exciting
results even with such less number of samples/class. Actually,
this is a positive aspect of our design since registering more
signatures/user is not a practical approach. A user may not
be comfortable to provide more number of signature samples.
However, typical CNN architecture requires 1000s of samples
per class. Therefore, building a network with substantially
less number of samples is a big achievement of the present
research. In the next section, we present the results obtained
using the above dataset.

B. Signature Recognition Results


The TensorFlow framework of deep learning has been used
for implementing the CNN model as described in Section III.
In the training process, we have used 128×1×4 (with sequence Fig. 6. Accuracy variation during training with the increase in number of
length = 128 and 4 number of high-level features) in the first iterations.
layer. In the output layer, we have obtained 50 class probability
values.
In between the input and output layers, 4 groups of layers
including convolution, activation and pooling have been used
alternatively. The number of kernels (filters) have been used
corresponding to the 4 convolution layers are n1 = 32, n2 =
64, n3 = 128 and n4 = 256, respectively. The max-pooling
operations of 4 pooling layers reduce the size of the inputs
from 128 to 8 without changing the depth. Thus, it reduces
the input size immediately before the fully connected layer to
8 × 256. We have obtained 94% validation accuracy during
training process. While testing with the remaining data, we
have recorded accuracy of 92%. Summary of the results is
presented in Table I.
Apart from the above results, we have also experimented
with a few popular existing high-level features such as writing
direction or curvature at each point of the signature. These
features have been proposed by Jaeger et al. [23] and Behera
et al. [6]. We have obtained 93% (validation) and 91% (testing)
accuracy, respectively. Finally, we have applied the proposed
feature-set combined with the existing features. In this setting,
we have recorded significantly better performance (98%) in the
validation while training and (95%) in testing. These results Fig. 7. Loss values during training with the increase in number of iterations.
are described in Table I.
Figs. 6 and 7 show how the accuracy and loss rates vary
with the increase in number of iterations during training, both C. Results by Varying Number of Samples
using the proposed as well as existing features. Also, we have We have also conducted experiments by varying number of
tested using different classifiers to establish the robustness of signatures/user in the training phase. Fig. 8 suggests that both

3529

Authorized licensed use limited to: UNIVERSITE DE SFAX. Downloaded on July 10,2023 at 15:58:54 UTC from IEEE Xplore. Restrictions apply.
TABLE II [2] R. Sabourin, G. Genest, and F. Preteux. Off-line signature verification
T EST ACCURACY RESULTS USING DIFFERENT CLASSIFIERS by local granulometric size distributions. IEEE Trans. Pattern Analysis
Machine Intelligence, 19(9):976–988, 1997.
Process Feature-Set Accuracy (%) [3] D. S. Guru and H. N. Prakash. Online signature verification and
HMM LSTM CNN recognition: An approach based on symbolic representation. IEEE
Existing feature 90 88 91 Transaction on Pattern Analysis and Machine Intelligence, 31(6):1059–
Testing Proposed feature 91 90 92 1073, 2009.
( Existing + Proposed ) feature 93 92 95 [4] A. Shanker and A. Rajagopalan. Off-line signature verification using
dtw. Pattern Recognition Letter, 28(12):1407–1414, 2007.
[5] S. K. Behera, D. P. Dogra, and P. P. Roy. Fast recognition and
verification of 3d air signatures using convex hulls. Expert Systems
validation and testing accuracy increase with the increase in with Applications, 100:106–119, 2018.
number of samples/user. Therefore, we can assume that using [6] S. K. Behera, D. P. Dogra, and P. P. Roy. Analysis of 3d signatures
recorded using leap motion sensor. Multimedia Tools and Applications,
proposed work, performance can be improved by increasing pages 1–26, 2017.
number of signatures/user in the training process. However, [7] S. K. Behera, P. Kumar, D. P. Dogra, and P. P. Roy. Fast signature
it may also be observed that, beyond 10 samples the rate of spotting in continuous air writing. In In Proc. of Fifteenth IAPR
International Conference on Machine Vision Applications (MVA), pages
increase in accuracy is not significant. Therefore, our goal 314–317. IEEE, 2017.
of keeping less number of samples (due to impracticality of [8] A.K. Jain, F. Griess, and S. Colonnel. On-line signature verification.
collective more samples/user) is fitting into the present context. Pattern Recognition, 35:2963–2972, 2002.
[9] L. L. Lee, T. Berger, and E. Aviczer. Reliable on-line signature verifi-
cation systems. IEEE Trans. Pattern Analysis and Machine Intelligence,
18(6):643–649, 1996.
[10] V. Nguyen, M. Blumenstein, and G. Leedham. Global features for the
off-line signature verification problem. In ICDAR, pages 1300–1304,
2009.
[11] I. Nakanishi amd H. Sakamoto, N. Nishiguchi, Y. Itoh, and Y. Fukui.
Multimatcher on-line signature verification system in dwt domain.
IEICE Trans Fundam, page 178185, 2006.
[12] E. M. Nel, J. du Preez, and B. Herbst. Estimating the pen trajectories
of static signatures using hidden markov models. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 27:1733–1746, 2005.
[13] L. J. Latecki, V. Megalooikonomou, Q. Wang, and D. Yu. An elastic
partial shape matching technique. Pattern Recognition, 40(11):3069 –
3080, 2007.
[14] K. Alister and B. Yanikoglu. Identity authentication using improved
on-line signature verification method. Pattern Recognition Letters,
26(18):2400–2408, 2005.
[15] J.P. Draouhard, R. Sabourin, and M. Godbout. A neural network
approaches to on-line signature verification using directional pdf. Pattern
Fig. 8. Accuracy by varying the number of signatures/user. Recognition, 29:415–424, 1996.
[16] Y. Bengio et al. Learning deep architectures for ai. Foundations and
trends R in Machine Learning, 2(1):1–127, 2009.
[17] L. Deng. A tutorial survey of architectures, algorithms, and applications
V. C ONCLUSION for deep learning. APSIPA Transactions on Signal and Information
Processing, 3, 2014.
The paper proposes a new high-level feature-set for air [18] X. Chai, Z. Liu, F. Yin, Z. Liu, and X. Chen. Two streams recurrent
signature recognition captured using Leap motion sensor. The neural networks for large-scale continuous gesture recognition. In In
features are designed with the help of geometrical analysis Proc. of 23rd International Conference on Pattern Recognition (ICPR),
pages 31–36. IEEE, 2016.
of the curved signature structures. We have used a CNN [19] P. Wang, W. Li, S. Liu, Z. Gao, C. Tang, and P. Ogunbona. Large-scale
based sequential model for recognition of the signatures. isolated gesture recognition using convolutional neural networks. In In
Experimental results reveal that, using the proposed feature-set Proc. of 23rd International Conference onPattern Recognition (ICPR),
pages 7–12. IEEE, 2016.
on the adopted deep CNN architecture, Performs of signature [20] P. Wang, W. Li, S. Liu, Y. Zhang, Z. Gao, and P. Ogunbona. Large-scale
recognition has improved. However, we have observed that continuous gesture recognition using convolutional neural networks. In
samples/user may not be as high as it is required in typical In Proc. of 23rd International Conference onPattern Recognition (ICPR),
pages 13–18. IEEE, 2016.
neural network setups. The rate of improvement is not signif- [21] N. C. Camgoz, S. Hadfield, O. Koller, and R. Bowden. Using con-
icant beyond a certain number of samples/user, which is an volutional 3d neural networks for user-independent continuous gesture
added advantage of the proposed system. recognition. In In Proc. of 23rd International Conference on Pattern
Recognition (ICPR), pages 49–54. IEEE, 2016.
Several extensions of the present work are possible. For [22] J. Yang, M. N. Nguyen, P. P. San, X. Li, and S. Krishnaswamy. Deep
examples, developing robust biometric techniques, designing convolutional neural networks on multichannel time series for human
activity recognition. In In Proc. of the Twenty-Fourth International Joint
of algorithms for automatic gesture recognition, and proposing Conference on Artificial Intelligence (IJCAI), pages 3995–4001, 2015.
novel human computer interface, are possible. [23] S. Jaeger, S. Manke, and A. Waibel. Npen++: An on-line handwriting
recognition system. In 7th International Workshop on Frontiers in
R EFERENCES Handwriting Recognition, pages 249–260, 2000.
[24] S. Rashidi, A. Fallah, and F. Towhidkhah. Authentication based on
[1] D. Impedovo and G. Pirlo. Automatic signature verification - the state of signature verification using position, velocity, acceleration and jerk sig-
the art. IEEE Transactions on Systems Man and Cybernetics, 38(5):609– nals. In Proc. of IEEE 9th International ISC Conference on Information
635, 2008. Security and Cryptology (ISCISC), pages 26–31. IEEE, 2012.

3530

Authorized licensed use limited to: UNIVERSITE DE SFAX. Downloaded on July 10,2023 at 15:58:54 UTC from IEEE Xplore. Restrictions apply.

You might also like