Professional Documents
Culture Documents
Vision Graph Convolutional Network For Writer-Independent Offline Signature Verification
Vision Graph Convolutional Network For Writer-Independent Offline Signature Verification
Abstract—As a biometric feature, handwritten signatures have writer-independent (WI). In the case of WD, the system must
various applications in finance, law, and business. The existing be retrained for each new user signature added. It makes
signature verification methods are mostly based on convolutional
2023 International Joint Conference on Neural Networks (IJCNN) | 978-1-6654-8867-9/23/$31.00 ©2023 IEEE | DOI: 10.1109/IJCNN54540.2023.10192006
Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 28,2023 at 02:57:43 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. Framework of SigGCN. Given a pair of test signature images, SigGCN first transforms the images into graph structure data using CNN networks
separately and then inputs them into a multilayer GCN for graph representation learning to obtain the two signature graph representations individually and then
measures the distance between the two representations according to the defined metric function, and then compares it with the threshold value for verification.
Fig. 2. Demonstration of Graphical Image. The signature image is passed through CNN to get N feature vectors of dimension D and the position encoding
with the same shape. Then the K nearest neighbors algorithm finds the K nearest vectors and adds an undirected edge to form a graph with N vertices.
layers can get deeper because DeepGCNs [4] utilizes residual way attention mechanism. To further improve model perfor-
structures. Han et al. [5] Use Graph Neural Network model to mance, MA-SCN [13] also employs a multiplexed attention
outperform representative CNN(ResNet [6]), MLP(CycleMLP mechanism and increases the multi-scale nature of the inputs
[7]), Transformer(Swin-T [8]) on the vision tasks like image by sampling the same signature 3 times according to different
classification and object detection. There appears to be no scales. Unlike the previous two CNN-based approaches, 2C2S
work based on graph neural networks for writer-independent [14] is a Transformer-based model architecture that merges
offline handwritten signature verification. The only article two signatures on the channel before the input model, avoiding
[9] we found using graph neural networks for handwritten the need for multiple attention mechanisms to achieve ground
signature verification was following a writer-dependent set up. feature fusion like any CNN-based model and thus reducing
model parameters. As we have seen, since no GNN-based OSV
While the bulk of studies [1], [2], [10] in this domain are work has been published, we propose our SigGCN model to
based on handcrafted feature analyses, various deep learning- fill the gap in OSV research on graph neural networks.
based approaches have been proposed. Dey et al. [11]proposed
SigNet, which can be said to have opened the door to deep III. T HE P ROPOSED M ETHOD
learning in the OSV task. In SigNet, a Siamese neural network
was used to extract the latent presentation of a signature. This section details the proposed SigGCN method. The
This framework computes a similarity metric involving the SigGCN framework is shown in Fig. 1, which consists of CNN
Euclidean distance between two signatures, trained by the for graphical image, multi-layer GCN for graph representation
contrastive loss function. Following SigNet, a large amount learning, and metric function for measurements.
of research based on deep learning was published. One of the
A. Graphical Image
most notable works is IDN [12], which outperformed other
methods of the time by a wide margin with a model containing The blank background of the signature image accounts for
only about 3 millions of parameters. The number of model much of the image, so if a spatial division-based grid is
parameters is cleverly reduced by using multiplexing, while applied as input, the model must process a lot of redundant
the identification accuracy is greatly improved by using a four- information. As shown in Fig. 2, Transforming images into
Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 28,2023 at 02:57:43 UTC from IEEE Xplore. Restrictions apply.
graph-structured data can significantly reduce the redundant each layer of the graph convolution. As DeepGCNs [4], we
information from blank backgrounds. introduce residual connections to the GCNs:
Giving a signature image with the shape of (C, H, W ), we
Gl+1 = σ(F (Gl , Win
l
))Wout + Gl , (5)
use CNNs to encode into h × w blocks of fixed dimension
D, treating each block as an independent graph vertex. Define C. Loss Function
N = h × w, we obtain a (N, D) tensor, which means N During the training, a sample is a pair of signature images
vertexes, and each vertex has a dimension of D. Fig. 2 demon- (x1 , x2 ) and a ground-truth label y, which 1 represents it is
strates the entire procedure. For each vertex, there will not be a genuine-genuine signature pair, 0 represents it is a genuine-
edges connected to all N nodes. Considering the primarily forgery signature pair. Because the signature’s output, encoded
blank background of the signature image, each vertex in the into graph-structured data and passed through the GCN, is a
graph will be connected to only a few vertices with undirected representation of the signature, we need to define a function
edges. We directly use the KNN algorithm to take out the K that measures the feature distance between two signatures.
vertices with the closest in latent space with a dimension of In this case, we use the Euclidean distance d straightfor-
D to the current vertex and add an undirected edge drawn in wardly. Then all that remains is to define a margin-based focal
red in the figure. After processing all the vertices, we will get loss function for model training. Inspired by focal loss [25],
a graph representation of the signature. we use a similar approach: margin-based focal loss function.
By converting the image into graph structure data through The following is the definition of the loss function:
CNN, we can use GNN to obtain a representation of the whole
image. We add trainable position embedding to the graph loss = y · σ(s(d − a)) · max(0, d − ma)2 +
(6)
vertices that have been encoded in order to prevent the loss of (1 − y) · σ(s(b − d)) · max(0, mb − d)2 .
spatial information.
Here, σ is the Sigmoid function, s is a scalar to adjust loss
B. Graph Convolutional Network weight, a, ma, b, mb denote the margins for avoiding over-
fitting during the training phase. y is the ground-truth label,
The so-called graph convolution [18] is to update its node which is mentioned above. d represents the distance of two
information by aggregating, for each vertex, its information signature embeddings.
with neighboring nodes. The propagation rules between layers
of a multi-layer GCN are as follows: IV. E XPERIMENT A ND A NALYSIS
We have validated our method on these three datasets:
Gl+1 = σ(F (Gl , W l )), (1) CEDAR, BHSig260-Bengali, and BHSig260- Hindi. The se-
Here, W l are trainable parameters. σ denotes the activation lection of these three datasets was made for two reasons. First,
function. Gl is the graph of layer l, and Gl+1 is the graph of they are all completely open to the public, free to download,
next layer. By combining the features of neighboring nodes, and do not even call for a signed license. Second, compared
the aggregation function F computes a node’s representation, to other publicly available datasets, there is more experimental
and the update operation further combines the aggregated work to be done on these three datasets, and there are more
feature: options for comparative work.
All our experiments are done on a single workstation with
xl+1
i = h(xli , g(xli , N (xli ), Waggregate ), Wupdate ), (2) Intel Xeon Gold 6126 CPU @2.6GHz, RTX3090 24GB GPU
and 256GB RAM. For parameters, we set s = 10, a =
where xli represents the ith node in Gl , and N (xli ) is the set 0.3, ma = 0.3, b = 0.6, mb = 0, 9 in loss function, and
of neighbor nodes of xli . A mean aggregator [18], a max- we adopt AdamW optimizer with lr = 0.01 using Cosine
pooling aggregator [19], an attention aggregator [20] are a Annealing learning rate schedule.
few examples of aggregation functions. In this paper, we We set up the network with different layers for experi-
adopt parameters-free max-relative graph convolution [4] for mentation and selected the best results for reporting. The
simplicity and efficiency: results are reported on the CEDAR and Bengalis datasets for
a 12-layer GCN network and the Hindi dataset for a 6-layer
g(·) = max({xlj − xli |j ∈ N (xli )}), (3)
GCN network. The training time for 20,000 iterations on each
A multi-layer perceptron, a gated network [21], etc., can serve dataset is approximately 5 GPU hours.
as the update function. We use MLP as update function h: A. Datasets and Protocol
h(·) = xli Wupdate , (4) CEDAR: Each of the 55 samples in this dataset contains
24 genuine signatures and 24 forged signatures. Each sample
It is worth noting that as the number of layers of the GCN can be constructed with 276 genuine-genuine pairs and 276
network increases, the values of all nodes will become closer genuine-forged pairs. And we randomly select 5 samples for
and closer and finally lose diversity due to the over-smoothing testing and the rest of the samples for training.
phenomenon [22], [23]. To alleviate this, we add a fully BHSig260-Bengali: Each of the 100 samples in this dataset
connected layer and an activation layer, e.g., gelu [24] between contains 24 genuine signatures and 30 forged signatures. Each
Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 28,2023 at 02:57:43 UTC from IEEE Xplore. Restrictions apply.
TABLE I
C OMPARISON OF R ESULTS O BTAINED FROM E XPERIMENTS ON T HREE DATASETS (%).
Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 28,2023 at 02:57:43 UTC from IEEE Xplore. Restrictions apply.
Fig. 3. The ROC curve of Different Combination Loss function on Three Datasets.
Fig. 4. Kernel Density Estimate of Cross-language Evaluation Result. X-axis is the distance metric between each test pair, Y-axis is the probability density.
Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 28,2023 at 02:57:43 UTC from IEEE Xplore. Restrictions apply.
TABLE III for cross-language signature verification tasks in all cases.
C OMPARISON OF R ESULTS O BTAINED FROM A BLATION S TUDY OF L OSS Anyhow, Our SigGCN achieved the best performance on 5
F UNCTION ON ACCURACY (%).
out of 9 experiments. As the models trained on the other two
Combination CEDAR Bengali Hindi datasets are directly tested on the CEDAR dataset with an
accuracy of over 95%, we can see that the CEDAR dataset
Plain 86.70 92.68 79.59
+Focal 92.93 93.84 83.06 appears to be very easy to distinguish in our model. Our model
+Margin 94.17 94.03 86.24 performs better than existing CNN-based methods in several
+Focal+Margin 100 95.99 87.88 cross-language tests. This also shows that graph-structured
data can capture features that cannot be extracted by CNNs
and deal with complex relationships in images more flexibly.
TABLE IV
ACCURACY OF C ROSS -L ANGUAGE C OMPARISONS W ITH E XISTING
Turning to Fig. 4, the X-axis of each plot is the distance metric
M ETHODS (%). between each test pair, and the Y-axis is the probability density.
We can see that the three plots in the first column are models
Train\Test Method CEDAR Bengali Hindi trained on different datasets doing tests on CEDAR. We find
SigNet [11] 100 55.61 64.15 that the distinction between genuine and forgery distances is
CEDAR
IDN [12] 95.98 50.36 50.01 good, and basically, a clear division can be made. This shows
ILNNeck+TCI [16] 90.79 80.83 84.93
Our 100 80.05 77.87
that CEDAR is relatively simple for our model compared to
BHSig260. From the rest of the figures, we recognize that
SigNet [11] 50.00 86.81 52.78
IDN [12] 50.00 95.32 74.30 cross-language handwritten signature verification is open to
Bengali further research.
ILNNeck+TCI [16] 80.34 94.28 79.21
Our 95.72 95.99 79.21
SigNet [11] 59.57 60.65 84.64 V. C ONCLUSION A ND F UTURE W ORKS
IDN [12] 50.00 95.32 74.30
Hindi In this work, we propose an end-to-end method to perform
ILNNeck+TCI [16] 79.76 87.57 95.79
Our 98.36 84.96 87.88 writer-independent offline handwritten signature verification.
Receiving inspiration from human visual cognition, we trans-
form the signature images into graph-structured data. At the
same time, to retain the original spatial information, we add
and body does not change just because the spatial location has
location encoding to the transformed graph data to retrieve the
changed. However, handwritten signatures have solid spatial
valid location information. Feature extraction is performed on
relationships. For example, the strokes on the right side of
the graph data using a GCN to obtain the graph representation
the signature do not appear on the left side of the signature,
features of the signature. The model is effectively trained to
and because of this, adding spatial position embedding to
extract the signature graph representation with the margin-
the embedded graph-structured data will improve the model
based focal loss function proposed in this paper. Comprehen-
performance.
sive experiments show that the method proposed in this paper
To verify the effectiveness of our proposed training loss
outperforms other known methods in CEDAR and BHSig260-
function, we split the loss function into two parts, one part is
Bengali datasets.
focal loss, and the other part is the margin. Then we perform
Self-supervised Learning and Pre-training have exhibited
4 combinations of ways to test on 3 datasets. The detailed
promising performance in many areas. This will address the
results are listed in Table III and Fig. 3, We can see from the
fact that supervised learning requires a large amount of manu-
results that the combination of focal loss and margin has a
ally labeled data, which is highly costly. This paper shows that
positive return. Removing either of the two parties leads to a
GNNs have the equal capability as attention mechanisms and
decrease in model performance. In Fig. 3, The X-axis and the
CNNs in pairwise verification tasks. It is also shown in the
Y-axis represent FPR and TPR, where the FPR = FRR and the
discussion that models trained based on supervised learning
TPR = 1-FRR. The larger the area under the ROC, the better
perform poorly for cross-language signature verification tasks.
the model’s performance. The results shown in the ROC are
In the future, we consider exploring self-supervised learning
consistent with Table III, and the model performs best only
and pre-training GNN models for cross-language signature
when we use the margin-based focal loss function.
verification tasks.
E. Cross-Language Test
R EFERENCES
Verification of a handwritten signature is based on the
writing style that the signer should have. To test the efficacy [1] M. K. Kalera, S. Srihari, and A. Xu, “Offline signature verification and
identification using distance statistics,” International Journal of Pattern
of our suggested method on the cross-lingual verification task, Recognition and Artificial Intelligence, vol. 18, no. 07, pp. 1339–1360,
the characteristics of human writing should be comparable. We 2004.
used cross-validation on three datasets that were in various [2] S. Pal, A. Alaei, U. Pal, and M. Blumenstein, “Performance of an off-
line signature verification method based on texture features on a large
languages. The experimental results are shown in Table IV, indic-script signature dataset,” in 2016 12th IAPR workshop on document
We can see a degradation in the performance of the model analysis systems (DAS). IEEE, 2016, pp. 72–77.
Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 28,2023 at 02:57:43 UTC from IEEE Xplore. Restrictions apply.
[3] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in
neural information processing systems, vol. 30, 2017.
[4] G. Li, M. Muller, A. Thabet, and B. Ghanem, “Deepgcns: Can gcns
go as deep as cnns?” in Proceedings of the IEEE/CVF international
conference on computer vision, 2019, pp. 9267–9276.
[5] K. Han, Y. Wang, J. Guo, Y. Tang, and E. Wu, “Vision gnn: An image
is worth graph of nodes,” arXiv preprint arXiv:2206.00272, 2022.
[6] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proceedings of the IEEE conference on computer vision
and pattern recognition, 2016, pp. 770–778.
[7] S. Chen, E. Xie, C. Ge, D. Liang, and P. Luo, “Cyclemlp: A mlp-
like architecture for dense prediction,” arXiv preprint arXiv:2107.10224,
2021.
[8] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and
B. Guo, “Swin transformer: Hierarchical vision transformer using shifted
windows,” in Proceedings of the IEEE/CVF International Conference on
Computer Vision, 2021, pp. 10 012–10 022.
[9] S. Roy, D. Sarkar, S. Malakar, and R. Sarkar, “Offline signature
verification system: a graph neural network based approach,” Journal
of Ambient Intelligence and Humanized Computing, pp. 1–11, 2021.
[10] A. K. Bhunia, A. Alaei, and P. P. Roy, “Signature verification approach
using fusion of hybrid texture features,” Neural Computing and Appli-
cations, vol. 31, no. 12, pp. 8737–8748, 2019.
[11] S. Dey, A. Dutta, J. I. Toledo, S. K. Ghosh, J. Lladós, and U. Pal,
“Signet: Convolutional siamese network for writer independent offline
signature verification,” arXiv preprint arXiv:1707.02131, 2017.
[12] P. Wei, H. Li, and P. Hu, “Inverse discriminative networks for handwrit-
ten signature verification,” in Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, 2019, pp. 5764–5772.
[13] X. Zhang, Z. Wu, L. Xie, Y. Li, F. Li, and J. Zhang, “Multi-path siamese
convolution network for offline handwritten signature verification,” in
2022 The 8th International Conference on Computing and Data Engi-
neering, 2022, pp. 51–58.
[14] J.-X. Ren, Y.-J. Xiong, H. Zhan, and B. Huang, “2c2s: A two-channel
and two-stream transformer based framework for offline signature ver-
ification,” Engineering Applications of Artificial Intelligence, vol. 118,
p. 105639, 2023.
[15] R. Kumar, J. Sharma, and B. Chanda, “Writer-independent off-line
signature verification using surroundedness feature,” Pattern recognition
letters, vol. 33, no. 3, pp. 301–308, 2012.
[16] X. Cairang, D. Zhaxi, X. Yang, Y. Hou, Q. Zhao, D. Gao, D. Pubu, and
D. Gesang, “Learning generalisable representations for offline signature
verification,” in 2022 International Joint Conference on Neural Networks
(IJCNN). IEEE, 2022, pp. 1–7.
[17] A. Dutta, U. Pal, and J. Lladós, “Compact correlated features for
writer independent signature verification,” in 2016 23rd international
conference on pattern recognition (ICPR). IEEE, 2016, pp. 3422–3427.
[18] T. N. Kipf and M. Welling, “Semi-supervised classification with graph
convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
[19] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on
point sets for 3d classification and segmentation,” in Proceedings of the
IEEE conference on computer vision and pattern recognition, 2017, pp.
652–660.
[20] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Ben-
gio, “Graph attention networks,” arXiv preprint arXiv:1710.10903, 2017.
[21] Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel, “Gated graph
sequence neural networks,” arXiv preprint arXiv:1511.05493, 2015.
[22] Q. Li, Z. Han, and X.-M. Wu, “Deeper insights into graph convolu-
tional networks for semi-supervised learning,” in Thirty-Second AAAI
conference on artificial intelligence, 2018.
[23] K. Oono and T. Suzuki, “Graph neural networks exponentially
lose expressive power for node classification,” arXiv preprint
arXiv:1905.10947, 2019.
[24] D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),” arXiv
preprint arXiv:1606.08415, 2016.
[25] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss
for dense object detection,” in Proceedings of the IEEE international
conference on computer vision, 2017, pp. 2980–2988.
Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 28,2023 at 02:57:43 UTC from IEEE Xplore. Restrictions apply.