4 Marked PDF

2018 16th International Conference on Frontiers in Handwriting Recognition
Signature and Logo Detection using Deep CNN for Document Image Retrieval
Nabin Sharma∗ , Ranju Mandal∗ , Rabi Sharma† , Umapada Pal† and Michael Blumenstein∗
∗ University of Technology Sydney, Broadway, Ultimo, NSW 2007, Australia.
Email: {Nabin.Sharma, Ranju.Mandal, Michael.Blumenstein}@uts.edu.au
† CVPR Unit, Indian Statistical Institute, Kolkata, India 700108. Email: umapada@isical.ac.in
Abstract—Signature and logo as a query are important and logos from “Tobacco-800” dataset. Thus, the sig-
for content-based document image retrieval from a scanned nature and logo-based information will undoubtedly add
document repository. This paper deals with signature and
advantage for document indexing and searching. Detection
logo detection from a repository of scanned documents,
which can be used for document retrieval using signature and recognition of signatures and logos from documents
and/or logo information. A large intra-category variance is very significant because of its various applications.
among signature and logo samples poses challenges to tra- Signatures and logos provide rich information as a query in
ditional hand-crafted feature extraction-based approaches. document images because of their unique properties. Thus,
Hence, the potential of deep learning-based object detectors
retrieval of relevant documents from a large document
namely, Faster R-CNN and YOLOv2 were examined for
automatic detection of signatures and logos from scanned repository can be achieved using the signature and logo-
administrative documents. Four different network models based information.
namely ZF, V GG16, V GGM and YOLOv2 were considered
for analysis and identifying their potential in document image
retrieval. The experiments were conducted on the publicly
available ”Tobacco-800” dataset. The proposed approach
detects Signatures and Logos simultaneously. The results
obtained from the experiments are promising and at par
with the existing methods.
Keywords-Faster R-CNN; Deep Learning; Document re-
trieval; Signature detection, Logo detection
I. I NTRODUCTION
It is a common organisational practice nowadays to
store and maintain large databases which is an effort to
(a) Very Noisy document with (b) Document with mulitple Sig-
move towards a paperless office via digitization. Digiti- log and Signature natures
zation implies scaning and storing of hard paper docu-
ments to digital document forms (i.e. electronic copies
of documents). The major advantages obtained from such
digization process are improved preservation and easy
access of organizational documents. So, large quantities of
administrative documents are often scanned and archived
as images (e.g. “Tobacco-800” [6] dataset) without ade-
quate index information. As a consequence, a tremendous
demand has been created for robust ways to access and
manipulate the information that these images contain.
Obtaining information resources relevant to the query
information from such repositories is the main objec-
tive of document retrieval. Sample scanned documents (c) Noisy Document with Logo (d) Document with Signature
from the “Tobacco-800” dataset is shown in Fig.1. It and Signature and handwritten text
can be seen that documents are quite noisy e.g. Fig.
Figure 1: Samples of printed documents from “Tobacco-
1(a,c). Additionally, there can be multiple signatures in
800” dataset.
a document e.g. Fig. 1(b), as well as handwritten notes
e.g. Fig. 1(d). Handwritten notes are often confused with
signatures, making signature detection a challenging task. In the literature, segmentation and recognition of sig-
The signatures and logos have been found in wide range natures from scanned documents have been found to be a
of documents such as in administrative documents, legal very challenging task. Separation of handwritten annota-
documents, bank cheques, etc. Hence, signatures and logos tions/words [4], [14], [15], [16], [17], [18] from scanned
could be used as key information for searching and re- documents is addressed in most of the earlier works.
trieval of documents. Fig.2 shows some sample signatures Signature detection in scanned documents is discussed in
978-1-5386-5875-8/18/$31.00 ©2018 IEEE 416

DOI 10.1109/ICFHR-2018.2018.00079
(a) (b) (c) (d)
(e) (f) (g) (h)

Figure 2: Sample images of logo and signature from the document repository of ‘Tobacco-800’. (a-d) Samples of logo
(e-f) Samples of signature.
previous works [12]. Zhu et al. [12] proposed a multi- lem. Four different network architectures namely ZF[22],
scale structural saliency approach to capture the dynamic VGG16[23], VGG16-M-1024[23] and Yolov2 [28] are
curvature using a signature production model for signa- used in this study. The primary intension is to explore and
ture detection and segmentation. Signature segmentation model the signaure and logo detection task into a standard
techniques from machine printed documents have been object detection problem. Additionally, real-time detection
proposed in some works [7], [8], [1]. To segment signa- of signature and logo in a single pipeline, makes it more
tures from bank cheques and other documents Madasu et applicable to document retrieval.
al. [7] proposed an approach based on sliding window The rest of the paper is organized as follows. The related
to calculate the entropy and finally fit the window to works on the Signature and Logo detection and Deep CNN
signature block. A major drawback of this technique is based object detection is discussed in section II. In Section
that a priori information about the location of the signature III, we explain the proposed signature and logo detection
is assumed. Ahmed et al. [1] proposed a Speeded Up methodology. The experimental results are presented in
Robust Features (SURF) based approach for signature Section IV. Finally, conclusions are presented in Section
segmentation from document images. V.
Signature-based document retrieval methods have been
discussed in a few proposed works [3], [10], [5], [9], [13]. II. R ELATED W ORKS
Recent works on Logo detction methods include Wang In this section, the current state-of-the-art methods
et. al [11] and Alaei et. al [2]. From the literature it was for Signature and Logo detection and object detection
observed that separate methods have been applied towards using Deep Convolutional Neural Networks (CNN) are
signatures and logos detection. However, signatures and discussed. Related works on Signature and Logo Detection
logos are sometime present in a single document. None of are discussed in Section 2A. In Section 2B the recent Deep
the proposed method till date considered a single pipeline CNN-based object detection methods are reviewed.
for detecting Signature and Logo in an end-to-end fashion.
To address this issue, the end-to-end deep learning-based A. Signature and Logo Detection methods
approaches for object detection has been investigated in Signature-based document retrieval methods have been
this paper to handle signatures and logos together from discussed in a few proposed works [3], [10], [5], [9], [13].
scanned administrative documents. Chalechale et al. [3] described a method for document im-
In this paper, we proposed to use the state-of-the-art age decomposition and retrieval based on connected com-
Deep Convolutional Neural Networks (CNN) for detecting ponent analysis and geometric properties of the labelled
signature and logo from scanned administrative docu- regions. Documents having Arabic/Persian signature were
ments. Specifically, we analyze the potential of Faster considered for the experiment. Srinivasan and Srihari [10]
Region-based Convolutional Neural Networks (RCNN) proposed a method on signature-based retrieval of scanned
[21] and YOLOv2 [28] for the detection of the areas- documents. A model based on Conditional Random Fields
of-interest and adapt it to the document retrieval prob- (CRF) was used to label extracted segments of scanned
417

(a) Faster R-CNN object detection framework

(b) YOLOv2 framework
Figure 3: Object Detection Frameworks
documents as machine-printed, signature and noise. Next, considering Deep CNN is quite intuitive given the recent
a classification technique based on Support Vector Ma- success in the basic computer vision problems. This paper
chine (SVM) was used to remove noise and printed text investigates the possibility of modelling the signature and
overlapping the signature images. Finally, a global shape- logo detection task as a standard object detection problem.
based feature was computed for each signature image.
In [9], Roy et al. presented a signature based document B. CNN-based object detection methods
retrieval technique from documents with cluttered back-
In this section, the current state-of-the-art methods for
ground. Zernike Moment feature was extracted from each
object detection using Deep Convolutional Neural Net-
blob and the K-means clustering algorithm was used to
works (CNN) are discussed. In particular, a brief overview
create the codebook of blobs. During retrieval, Generalized
of R-CNN [20], Fast R-CNN [19], Faster R-CNN [21] and
Hough Transform (GHT) was used to detect the query
YOLOv2 [28] are presented.
signature and a voting was casted to find possible location
of the query signature in a document. Recent advances in object detection techniques pre-
sented the community with Region-Based Convolutional
Wang [11] proposed an algorithm for logo detection and Neural Network (R-CNN) and its successors (Fast and
recognition using a Bayesian model. A mutli-level step- Faster R-CNN). R-CNN [20] uses Selective Search (SS)
by-step approach was used for recognition of logos and the to compute (nearly 2K) object proposals of different scales
logo matching process involved a logo database. Here, a and positions. For each of these proposals, image regions
region adjacency graph (RAG) was used for representing are warped to fixed size (227X227) pixels. The warped
logos, which models the topological relations between the image regions are then fed to the CNN for detections. The
regions. Finally, Bayesian belief networks were employed proposed network architecture uses classification head for
as well in a logo detection and recognition framework. classifying region into one of the classes. The SS does not
Recently, Alaei and Delalandre [2] proposed a system for necessarily provide perfect proposals. Therefore, to make
detection and recognition of logos from document images. up for the slightly wrong object proposals, regression head
A Piece-wise Painting Algorithm (PPA) and some prob- uses linear regression to map predicted bounding boxes
ability features along with a decision tree were used for to the ground-truth bounding boxes. R-CNN is very slow
logo detection and a template-based recognition approach at test time where every individual object proposals are
was proposed to recognize the logo. passed through CNN. The feature extracted are cached to
The methods proposed till date focused on detection the disk. Finally, a classifier such as SVM is trained in
of either signature or logo as two separate pipelines. an offline manner. Therefore, the weights of the CNN did
Most of the existing methods are based on connected not have the chance to update itself in response to these
component analysis (CCA) and doesn’t work at real-time offline part of the network. Moreover, the training pipeline
due many customized processing steps involved. Hence, of the R-CNN is complex.
418
In Fast R-CNN [19] the order of the extracting region lically available for most of the object detectors. There
of proposals and running the CNN is exchanged. In this are less number of images in the dataset for a deep
architecture, the whole image is passed once through the learning system to learn from scratch. Hence, to take
CNN and the regions are now extracted from convolutional full advantage of network architectures, transfer learning
feature map using Region of Interest (ROI) pooling. This technique from ImageNet [25] was used to fine-tune
change in architecture reduces the computation time by our models. The fine-tuning process helps our system to
sharing the computation of convolutional feature map be- converge faster and perform better. We have used various
tween region proposals. The region proposal are projected network architectures such as ZF [22], VGG16 [23],
to the corresponding spatial part of convolutional feature and VGG CNN M 1024 [24] to train the system and
volume. Finally, fully connected layer expect the fixed evaluate the performance on the dataset. ZF is a 8 lay-
size feature vector and therefore the projected region is ered architecture containing 5 convolutional layers and
divided into grid and Spatial Pyramid Pooling (SPP) is 3 fully-connected layers. Whereas, VGG16 is a much
performed to get fixed size vector. SPP deals with the deeper layered architecture with 16 layers, comprising 13
variable window size of pooling operation and thus end- convolutional layers and 3 fully connected layers.
to-end training of the network is very hard. The generation YOLOv2 considered for experiments consists of 22
of the region proposals is the bottle neck at the test convolutional layers and 5 max pooling layers. The input
time. In above mentioned approaches, CNN was used image size was 416X416. The network was fine-tuned
only for regression and classification. The idea was further using weight that are pre-trained on Imagenet.
extended to use CNN also for region proposals. The ground truth or annotations for Signature and Logo
The latest offspring from the R-CNN family, the Faster present on the scanned documents of Tobacoo-800 dataset
R-CNN [21] proposed the idea of small CNN network are available in standard PASCAL VOC XML format.
called Region Proposal Network (RPN), build on top
of the convolutional feature map. A sliding window is IV. R ESULTS AND D ISCUSSION
placed over feature map in reference to the original image. The Tobacco-800 dataset used for experiments
The notion of anchor box is used to capture object at comprises of 1290 scanned administrative documents.
multiple scales. The center of the anchor box having Four different network architectures namely, ZF,
different aspect ratio and size coincide with the center VGG CNN M 1024, VGG16, and YOLOv2 were used
of sliding window. RPN generates region proposals of for experiments. Implementation details and the detection
different sizes and aspect ratios at various spatial locations. results are detailed in the subsections given below.
RPN is a two layered network which does not add to
the computation of overall network. Finally, regression A. Implementation Details
provides finer localization with the reference to the sliding We trained the Faster R-CNN models with Nvidia
window position. Quadro P6000 GPU, 24GB, on a Ubuntu server (Core
Although Fast R-CNN and its predecessors perform i7 processor, 64 GB RAM) with a learning rate of
well with high accuracy, they are computationally very 0.001 and batch size of 64. The RPN batch size is
expensive and time consuming, make them undesirable for kept constant at 128 for region based proposal networks
real-time applications. Faster R-CNN [21] works at a rate (RPN). Regions proposal networks were trained end-to-
of 7 frames per second, while maintaining high accuracy. end using backpropagation and stochastic gradient desecnt
A brief overview of the object detection frameworks are (SGD). Inorder reduce redundancies arising from RPN
shown in Fig. 3. proposals, non-maximum supression (NMS) was applied
Another class of the object detector is YOLOv2 [28] to the proposals based on the class scores. Performance of
which use a single feature map and detects object in each network architecture at a different iterations was also
a single pass. Hence, the name ’You Only Look Once’ analysed. For training using the YOLOv2 architecture, the
(YOLO). It considers detection as a regression problem to batch size used was 64 and the learning rate of 0.0001 was
spatially seperated object bounding boxes and their class considered. In the training phase, the snapshot of trained
probabilities. Although YOLOv2 detects objects quite models were saved at an interval of 10k iterations. Detec-
faster and at real-time, but the accuracy is lower than its tions with overlap greater than the 50% Intersection Over
counterparts on different problems and applications. Union (IOU) threshold with the corresponding ground-
Based on the brief investigation of the state-of-the- truth bounding box are considered as true positive and all
art, Faster R-CNN and YOLOv2 was considered in this other detections as false positive as shown in Eq.1 [27].
study for experiments on Signature and Logo detection.
Different CNN architectures were used with Faster R- area (BBoxpred ∩ BBoxgt )
IOU = (1)
CNN and YOLO for analysis. area (BBoxpred ∪ BBoxgt )
where, BBoxpred and BBoxgt denotes predicted
III. P ROPOSED M ETHODOLOGY
bounding box and ground truth bounding box respectively.
Faster R-CNN [21] with Caffe [26] deep learning The ground truth box with no matching detection are
library and YOLOv2 [28] were considered for our ex- considered false negative detection. To evaluate the de-
periments. The Caffe-based pre-trained models are pub- tection performance, we use Average Precision calculated
419
from the area under the Precision-Recall (PR) curve [27]. Table III: Performance of various network architectures on
While, mean Average Precision (mAP ) is used for a set of individual class using Set 2 (30% Train)
detections and is the mean over classes, of the interpolated Class ZF VGG M VGG16
AP for each class. Signature 0.884 0.892 0.896
To analyze the effect of transfer learning, different sets Logo 0.879 0.879 0.892
of experiment were conducted. The Tobacco-800 dataset mAP 0.882 0.886 0.894
Iterations 60K 60K 40K
was divided into four different sets based on different Time (per image) 0.044 sec 0.048 sec 0.130 sec
number of documents used for training, as detailed in the
Table I.
Table I: Dataset split for different experiements

Dataset Train % Validation % Test %
Set 1 20 10 70
Set 2 30 10 60
Set 3 40 10 50
Set 4 50 10 40
B. Performance of Faster R-CNN

Signature and Logo detection results obtained from
different experiments using three different network ar-

chitectures are detailed in Table II. The mean average Figure 4: AP analysis of each class using VGG16 on Set
precision (mAP) obtained from different networks are 2 (Train 30%)
given the respective columns of the table. Experiments
were conducted on each of the sets (refer Table I) and
the highest mean average precision obtained on the corre- 0.778 was achieved at 90K iteration, and there were no
sponding test sets are presented in Table II. significant performance change at higher iterations. The
It can be noted that there was no significant increase in average precisions for each classes are also detailed in
the accuracy while increasing the amount of documents the first two rows of the Table IV. The performance of
in the training set. Mean average precision of 0.894 was YOLOv2 is significantly less than that of Faster R-CNN
obtained on Set-2 using VGG16 on just training/fine- with VGG16 network.
tunning with 30% of the total samples in the tobacco- The mAP analysis for all the four network architectures
800 dataset. Average precision obtained for each class are presented in the figure 6. The graph shows that VGG16
(i.e Signature and Logo) considering Set-2 are detailed in converged much faster than others, with a higher mAP.
Table III. It can be noted from Table III and the average
precision graph in Figure 4, that VGG16 converged quite Table IV: Performance of YOLOv2 CNN architectures on
faster(40K iterations) than other network architectures individual class using Set 2 (30% Train)
with higher mAP.
Class Y OLOv2
Sample results for Signature and Logo detection ob-
Signature 0.788
tained from VGG16 are shown in Figure 5. The perfor- Logo 0.769
mance of VGG16 was very good considering the very mAP 0.778
noisy documents as shown in Figure 5(a, c). Multiple Iterations 90K
Time (per image) 0.04 sec
signature in Figure 5(b) were also detected successfully.
Moreover, the CNN network did not confuse much with
the other handwritten text present in a document, and D. Comparative study
successfully detected the Signature, as shown in Figure
In this section we attempt to compare the previously
5(d).
proposed methods on signature detection on Tobacco-800
Table II: Mean Average Precision (mAP) obtained from dataset. Table V shows the performance of the previ-
various network architectures ously proposed approaches on signature detection from
documents. An accuracy of 92.8% was reported on the
Dataset ZF VGG M VGG16
‘Tobacco’ dataset for signature detection using a multi-
Set 1 (20% train) 0.887 0.846 0.89
Set 2 (30% train) 0.882 0.886 0.894 scale structural saliency-based [5] approach. A recall of
Set 3 (40% train) 0.885 0.888 0.877 78.4% and a precision of 84.2% were reported by Srini-
Set 4 (50% train) 0.888 0.888 0.895 vasan and Srihari [10] for the signature-based document
retrieval task. In a previously proposed approach [8],
C. Performance of YOLOv2 95.58% accuracy was achieved on signature components
detection. On the contrary, using the proposed Deep CNN
The performance of YOLOv2 on Set-2 (Train 30%) based approach we achieved 89.6% mAP. In principle,
is detailed in Table IV. A mean Average Precision of a fair comparison with the previously report methods is
420
Table V: Comparison of signature detection performance
on ‘Tobacco’ document repository.
Approach Dataset Accuracy (%)
Multi-scale structural
saliency [5] Tobacco-800 92.80
Conditional Random
Field [10] 101 documents 91.20
Gradient-based
feature with SVM [8] Tobacco-800 95.58
Proposed Deep CNN
(Faster R-CNN with VGG16) Tobacco-800 89.6
(a) Very Noisy document with (b) Document with mulitple Sig- considered both Signature and Logo for experiment.
Logo and Signature natures
Table VI: Comparison of logo detection performance on
the ‘Tobacco’ document repository.
Approach Detection Accuracy (%)

Alaei and Delalandre [2] 99.31
Wang [11] 94.70
Proposed Method
(Faster R-CNN with VGG16) 89.2
V. C ONCLUSION
In this study, detection of Signature and Logo from
administrative documents is modeled as a standard object
(c) Noisy Document with Logo (d) Document with Signature
detection problem. Analysis of the state-of-the-art Deep
and Signature and handwritten text CNN based object detectors are explored in the paper,
inorder to understand their potential in real-time docu-
Figure 5: Samples Signature and Logo detection results
ment image retrieval. Given the complexity involved in
from “Tobacco-800” dataset.
detecting Signatures and Logos in noisy documents, the
performance of Deep CNN is also promising. Moreover,
the previous work based on handcrafted feature, required

multiple customized steps for detection. To the best of our

knowledge, none of the previous works considered detect-
ing Signature and Logo using a single pipeline and in an
end-to-end fashion. The study proves that an end-to-end
object detection technique can be adapted and fine-tuned
to detect objects of interest from scanned documents, with
various application potential. The results obtained from
the experiments are very encourging. The outcome of the
present study form the basis of future research to refine the
detection results to eleminate unwanted text and touching
Figure 6: Mean Average Precision analysis of all the CNN charcters present within the detected bounding boxes. This
networks on Set 2 (Train 30%) is will facilitate the Signature and Logo recognition with
higher accuracy.
R EFERENCES
difficult, because of the different samples and dataset split [1] S. Ahmed, M.I. Malik, M. Liwicki and A. Dengel, Signature
used of experiments. Although, the performance of the segmentation from document images, International Confer-
proposed method is less than the previsously reported ence on Frontiers in Handwriting Recognition (ICFHR), pp.
accuracies, it is comparable because an end-to-end system 425-429, 2012.
is used in the current study and two classes (Signature and [2] A. Alaei and M. Delalandre, A complete logo detec-
Logo) were considered for the experiment, rather than one tion/recognition system for document images, International
class. Workshop on Document Analysis Systems (DAS), pp. 324-
Table VI shows the comparative study of logo detection 328, 2014.
performance on the ‘Tobacco’ document dataset. A fair [3] A. Chalechale, G. Naghdy and A. Mertins, Signautre-based
comparison with the exsiting method is not possible due to document retrieval, International Symposium on Signal Pro-
the different samples and dataset split used of the experi- cessing and Information Technology (ISSPIT), pp. 597-600,
ments. Additionally, none of the previously report methods 2003.
421
[4] J. Guo and M. Ma, Separating handwritten material from [19] R. Girshick, Fast R-CNN, IEEE International Conference
machine printed text using hidden markov models, Interna- on Computer Vision (ICCV), pp. 1440-1448, 2015
tional Conference on Document Analysis and Recognition
(ICDAR), pp. 439-443, 2001. [20] R. Girshick, J. Donahue, T. Darrell, and J. Malik: Rich
feature hierarchies for accurate object detection and seman-
[5] G. Zhu, Y. Zheng, D. Doermann and S. Jaeger, Signature tic segmentation. IEEE conference on computer vision and
detection and matching for document image retrieval, IEEE pattern recognition (CVPR), pp. 580-587, 2014
Transactions on Pattern Analysis and Machine Intelligence
(PAMI), pp. 2015-2031, Vol. 31, No. 11, 2009. [21] S. Ren, K. He, R. Girshick, and J. Sun, Faster r-cnn:
Towards real-time object detection with region proposal net-
[6] http://legacy.library.ucsf.edu/: The legacy tobacco document works, Advances in neural information processing systems,
library (ltdl), University of California, San Francisco, 2007. pp. 91-99, 2015
[7] V. K. Madasu, M. H. M. Yusof, M. Hanmandlu and K.Kubik, [22] M. D. Zeiler and R. Fergus, Visualizing and understanding
Automatic extraction of signatures from bank cheques and convolutional networks, European conference on computer
other documents, Digital Image Computing: Techniques and vision (ECCV), pp. 818-833. Springer, 2014
Applications (DICTA), pp. 591-600, 2003.
[23] K. Simonyan and A. Zisserman, Very deep convolutional
[8] R. Mandal, P. P. Roy and U. Pal, Signature segmentation networks for large-scale image recognition, International
from machine printed documents using contextual infor- Conference on Learning Representations (ICLR), 2014
mation, International Journal of Pattern Recognition and
Artificial Intelligence (IJPRAI), Vol. 26, No. 7, 2012. [24] K. Chatfield, K. Simonyan, A. Vedaldi, A. Zisserman,
Return of the Devil in the Details: Delving Deep into Convo-
[9] P. Roy, S. Bhowmick, U. Pal and J. Y. Ramel, Signature- lutional Nets, British Machine Vision Conference (BMVC),
based document retrieval using ght of background informa- 2014
tion, International Conference on Frontiers in Handwriting
Recognition (ICFHR), pp. 225-230, 2012. [25] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei,
Imagenet A large-scale hierarchical image database, IEEE
[10] H. Srinivasan and S. N. Srihari, Signature-based retrieval Conference on Computer Vision and Pattern Recognition
of scanned documents using conditioal random fields, Com- (CVPR), 2009, pp. 248-255, 2009
putational Methods for Counter terrorism, pp. 17-32, 2009.
[26] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long,
[11] H. Wang, Document logo detection and recognition using R. Girshick, S. Guadarrama, and T. Darrell, Caffe: Con-
Bayesian model, International Conference On Pattern Recog- volutional architecture for fast feature embedding, ACM
niton (ICPR), pp. 1961-1964, 2010. international conference on Multimedia, pp. 675-678. ACM,
2014
[12] G. Zhu, Y. Zheng, D. Doermann and S. Jaeger, Multi-scale
structural saliency for signature detection, CVPR, pp. 1-8, [27] Everingham, Mark and Eslami, SM Ali and Van Gool, Luc
2007. and Williams, Christopher KI and Winn, John and Zisser-
man, Andrew, The pascal visual object classes challenge: A
[13] R. Mandal, P. P. Roy, U. Pal and M. Blumenstein, Signa- retrospective. International Journal of Computer Vision, pp.
ture segmentation and recognition from scanned documents, 98-136. 2015
Intelligent Systems Design and Applications (ISDA), pp. 80-
85, 2013. [28] Joseph Redmon and Ali Farhadi: YOLO9000: Better,
Faster, Stronger. http://arxiv.org/abs/1612.08242, 2016
[14] Y. Zheng, H. Li and David Doermann, Machine Printed
Text and Handwriting Identification in Noisy Document
Images, IEEE Transaction on Pattern Analysis and Machine
Intelligence(PAMI), Volume. 26, No. 3, pp. 337-353, 2004.
[15] J. Kumar, R. Prasad, H. Cao, W. Abd-Almageed, D. Doer-

mann and P. Natarajan, Shape Codebook based Handwritten
and Machine Printed Text Zone Extraction, SPIE 7874,
787406 (2011); doi:10.1117/12.876725.
[16] X. Peng, S. Setlur, V. Govindaraju, R. Sitaram and K.

Bhuvanagir, Markov Random Field Based Text Identifica-
tion from Annotated Machine Printed Documents, Inter-
national Conference on Document Analysis and Recogni-
tion(ICDAR), pp. 431-435, 2009.
[17] X. Peng, S. Setlur, V. Govindaraju and R. Sitaram, Over-

lapped Text Segmentation Using Markov Random Field and
Aggregation, International Workshop on Document Analysis
System (DAS), pp.129-134, 2010.
[18] X. Peng, V. Govindaraju, S. Setlur and R. Sitaram, Text

Separation from Mixed Documents Using a Tree-structured
Classifier, International Conference On Pattern Recogni-
ton(ICPR), pp. 241-244 2010.
422

4 Marked PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

4 Marked PDF

Uploaded by

Copyright:

Available Formats

2018 16th International Conference on Frontiers in Handwriting Recognition

978-1-5386-5875-8/18/$31.00 ©2018 IEEE 416

(e) (f) (g) (h)

(a) Faster R-CNN object detection framework

(b) YOLOv2 framework

Figure 3: Object Detection Frameworks

Table I: Dataset split for different experiements

B. Performance of Faster R-CNN

Approach Detection Accuracy (%)

multiple customized steps for detection. To the best of our

[15] J. Kumar, R. Prasad, H. Cao, W. Abd-Almageed, D. Doer-

[16] X. Peng, S. Setlur, V. Govindaraju, R. Sitaram and K.

[17] X. Peng, S. Setlur, V. Govindaraju and R. Sitaram, Over-

[18] X. Peng, V. Govindaraju, S. Setlur and R. Sitaram, Text

You might also like

4 Marked PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

4 Marked PDF

Uploaded by

Copyright:

Available Formats

2018 16th International Conference on Frontiers in Handwriting Recognition

978-1-5386-5875-8/18/$31.00 ©2018 IEEE 416

(e) (f) (g) (h)

(a) Faster R-CNN object detection framework

(b) YOLOv2 framework

Figure 3: Object Detection Frameworks

Table I: Dataset split for different experiements

B. Performance of Faster R-CNN

Approach Detection Accuracy (%)

     

multiple customized steps for detection. To the best of our

[15] J. Kumar, R. Prasad, H. Cao, W. Abd-Almageed, D. Doer-

[16] X. Peng, S. Setlur, V. Govindaraju, R. Sitaram and K.

[17] X. Peng, S. Setlur, V. Govindaraju and R. Sitaram, Over-

[18] X. Peng, V. Govindaraju, S. Setlur and R. Sitaram, Text

You might also like