Professional Documents
Culture Documents
OCR For Bank Cheques
OCR For Bank Cheques
Prabhat Kumar
Department of Computer Science and Engineering
National Institute of Technology Patna
Patna, India
prabhat@nitp.ac.in
Abstract—In spite of rapid evolution of digital technologies, makes the process not only time consuming but also error-
a huge number of applications still rely on the use of paper prone and costly. As many countries use cheque truncation
based medium. This is especially true for processing of bank systems (CTS) nowadays, much time, effort and money can
cheques. The pre-printed account number and cheque number
might be easily readable and processed automatically. However, be saved if this entire process of recognition, verification and
the handwritten texts in cheque are usually processed manually data entry is done automatically using images of cheques.
involving important time and cost. An attempt has been made in
this paper to develop a bank cheque handwritten text recognition The account number and the cheque number are printed
(BCHWTR) system for cheques of Indian banks by recognizing on the cheques in magnetic ink (MICR). Since the MICR
the handwritten characters present in the ’payee name’, ’courtesy
amount (both in words and figures)’ and ’date’ fields by using character set is a special type font, these fields can be au-
image processing techniques on handwritten cheque images. tomatically read using magnetic machines or optical character
Images of bank cheques are fed as input to the proposed recognition (OCR) systems. Recognition of handwritten char-
system. There are four stages in the proposed system: cropping acters by computing approach is a serious problem because of
the image at a specific location; segmentation of handwritten variations of handwritten style and shapes written by different
lines, words and characters; feature extraction from individual
characters and digits using Histogram of Oriented Gradients individuals. Moreover, lack of a specific pattern for characters
(HOG) method and Grey Level Co-occurrence Matrix (GLCM) make it a daunting task. Differences in size, style of cheques,
texture features; recognition of isolated characters and digits low image contrast and complex background often makes the
using the Support Vector Machine (SVM) based classification process of automatic text extraction an extremely challenging
process that ensures correct recognition. The performance of task. The basic goal of the present research work is to develop
present BCHWTR system is evaluated on a self-generated dataset
of bank cheques and it has provided a promising result. a fast and accurate bank cheque handwritten text recognition
Index Terms—Handwritten text recognition, Bank cheques, (BCHWTR) system able to read the handwritten text (payee
Line segmentation, HOG feature, GLCM texture features, SVM name, courtesy amount and date) in a cheque automatically
based classification. with minimal errors. This will largely automate the system
by reducing workload, time and cost per transaction. The
I. I NTRODUCTION present system has been proposed for Indian bank cheques.
The character recognition process has been carried out using
Automatic bank cheque processing is a field of interest in Support Vector Machine (SVM) based classification technique.
banking industry, as a large part of cheques is still processed The overall framework of the proposed BCHWTR system is
manually that involves the manual reading of the cheques and shown in Fig. 1.
keying their respective values into the computer. Bank cheques
are still widely used all over the world for financial transac- The rest of the paper is organized as follows. Section II
tions where the cheques are usually processed manually in details the related works. The process of data collection and
almost all countries. In such manual verification, handwritten pre-processing steps have been discussed in Section III. The
text portions such as payee name, courtesy amount (both in proposed methods of line, word and character segmentation in
words and figures), date and signature of each cheque are to bank cheques are discussed in Section IV. Section V details
be verified through observation by the bank employees. This the proposed feature extraction approach. Performance of the
proposed system is detailed in Section VI. Section VII presents
the conclusion of the paper.
978-1-5386-8215-9/18/$31.00
2018
c IEEE
2018 Conference on Information and Communication Technology (CICT’18)
N −1
X (i − µ)(j − µ)
Correlation = Pij (3)
i,j=0
σ2
N −1
X Pij
Homogeneity = (4)
i,j=0
1 + |i − j|
Fig. 6. Extracted characters of the payee first name
Where i and j denote row and column of a pixel respectively,
N is the number of grey levels in the image as specified by
V. P ROPOSED APPROACH OF FEATURE EXTRACTION number of levels in quantization, σ 2 is the variance of the
Features are distinctive properties of input patterns that help intensities of all reference pixels and µ is the GLCM mean.
in differentiating between the classes of input patterns. As in
VI. E XPERIMENTAL RESULTS AND ANALYSIS
the proposed system the patterns are basic characters and digits
of Latin script, so we have extracted Histogram of Oriented In the proposed system, SVM classifier has been used
Gradients (HOG) [18] features along with Grey Level Co- for training and testing purpose to recognize the handwritten
occurrence Matrix (GLCM) texture features [19] from each text of bank cheques. Different kernels of SVM like linear,
basic character and digit. Fig. 7 demonstrates the overview of polynomial and (Gaussian) radial basis function (RBF) have
the proposed feature extraction approach. been tested. As in the proposed system character based classes
have been considered during training the system using SVM,
so in testing phase, after segmenting each character from a
word, HOG and GLCM texture features (mentioned in section
V) are extracted from each character and digit. Next, feature
values obtained from HOG and GLCM texture features are
combined into a single feature vector. Finally, this feature
vector is fed to SVM classifier to know the label of the
character or digit.
A. Results of character segmentation
Fig. 7. Block diagram of the proposed feature extraction approach
The accuracy of the line segmentation in handwritten text
Initially, luminance gradient is calculated at each pixel of the highly depends upon the type of the text. If sufficient gap
binary image of segmented character and a gradient orientation exists between text lines and the document scanned properly
histogram is created for each cell. Here, the cell area is an area then the performance of line segmentation method will be
that consists of 8 x 8 pixels. Next, features are normalized better. In the present system, results of the proposed line
for each descriptor block. Luminance gradient represents the segmentation algorithm from the cropped cheque image are
change in luminance in terms of magnitude m and orientation satisfactory. Similarly, accuracy of word segmentation from
θ. So, luminance magnitude m of (x, y) coordinates of any a line depends upon the nature of the text obtained from
2018 Conference on Information and Communication Technology (CICT’18)
a line. If the space between words is high, then words are present on the bank cheques as well as handwritten text
easily extracted. In the present system, except for some noise, recognition of cheques of various foreign banks. Application
the proposed word segmentation algorithm performed satis- of this proposed system for the real world scenario will be our
factorily. The accuracy has been increased by using the noise real challenge.
removal technique before word extraction. Finally, despite of
encountering some faulty segmentation, the proposed character R EFERENCES
segmentation algorithm has provided an accuracy of 91.83%.
[1] Arica, N., Yarman-Vural, F. T., ”An overview of character recognition
B. Text recognition performance using SVM focused on off-line handwriting”, IEEE Transactions on Systems, Man,
and Cybernetics, Volume 31, Issue 2, 2001, pp. 216-233.
Different kernels of SVM like linear, polynomial and (Gaus- [2] Marinai, S., Marino, E., Soda, G., ”Font adaptive word indexing of
sian) RBF have been tested to evaluate the performance of modern printed documents”, IEEE Transactions on Pattern Analysis and
Machine Intelligence, Volume 28, Issue 8, 2006, pp. 1187-1199.
text recognition of the proposed system. The text recognition [3] Roy, P. P., Dey, P., Roy, S., Pal, U., Kimura, F., ”A novel approach of
accuracies of the proposed system using various kernels of Bangla handwritten text recognition using HMM”, Proceedings of the
SVM are shown in Table I. The optimal set of values of various 14th International Conference on Frontiers in Handwriting Recognition,
2014, Island of Crete, Greece, pp. 661-666.
hyper parameters in SVM is shown in Table II. This optimal [4] Madasu, V. K., Mohd. Hafizuddin Mohd. Yusof, Hanmandlu, M.,
set has been found using Bayesian optimization technique. Kubik, K., ”Automatic Extraction of Signatures from Bank Cheques and
Other Documents”, Proceedings of the 7th International Conference on
Digital Image Computing: Techniques and Applications, 2003, Sydney,
TABLE I Australia, pp. 591-600.
T EXT RECOGNITION RESULTS OF THE PROPOSED SYSTEM USING VARIOUS [5] Roy, P. P., Bhunia, A. K., Das, A., Dey, P., Pal, U., ”HMM-based
KERNELS OF SVM Indic handwritten word recognition using zone segmentation”, Pattern
Recognition, Volume 60, 2016, pp. 1057-1075.
(Gaussian) RBF Kernel Linear Kernel Polynomial Kernel [6] Pal, U., Roy, P. P., Tripathy, N., Lladós, J., ”Multi-oriented Bangla and
85.42% 86.36% 89.73% Devanagari text recognition”, Pattern Recognition, Volume 43, Issue 12,
2010, pp. 4124-4136.
[7] Roy, P. P., Pal, U., Lladós, J. ”Document seal detection using GHT and
character proximity graphs”, Pattern Recognition, Volume 44, Issue 6,
TABLE II 2011, pp. 1282-1295.
O PTIMAL VALUES OF VARIOUS H YPER PARAMETERS OF SVM
[8] Morita, M., Sabourin, R., Bortolozzi, F., Suen, C. Y., ”A recognition and
verification strategy for handwritten word recognition”, Proceedings of
Soft-margin Width of (Gaussian) Degree of Polynomial 7th International Conference on Document Analysis and Recognition,
constant (C) RBF Kernel (γ) Kernel 2003, Edinburgh, U.K, pp. 482-486.
8 2.2 2 [9] Guillevic, D., Suen, C. Y., ”Cursive script recognition applied to the
processing of bank cheques”, Proceedings of the 3rd International
Conference on Document Analysis and Recognition, 1995, Quebec,
C. Comparative performance analysis Canada, pp. 11-14.
[10] Pal, U., Datta, S., ”Segmentation of Bangla unconstrained handwritten
The present study is not comparable directly with any other text”, Proceedings of the 7th International Conference on Document
existing studies because the efficiency of the proposed system Analysis and Recognition, 2003, Edinburgh, UK, pp. 35-40.
has not been evaluated on any publicly available dataset, it has [11] Djeziri, S., Nouboud, F., Plamondon, R., ”Extraction of items from
checks”, Proceedings of the 4th International Conference on Document
been evaluated on a self-generated dataset. But, we present Analysis and Recognition, 1997, Ulm, Germany, pp. 749-752.
here the comparative performance analysis in the recognition [12] Suen, C. Y., Lam, L., Guillevic, D., Strathy, N. W., Cheriet, M., Said, J.
performances of the present work with few existing studies N., Fan, R., ”Bank check processing system”, International Journal of
Imaging Systems and Technology, Volume 7, Issue 4, 1998, pp. 392-403.
to get the idea of comparative study. The comparative perfor- [13] Freitas, C. O. A., Yacoubi, A. E., Bortolozzi, F., Sabourin, R., ”Brazilian
mance analysis is shown in Table III. Bank Check Handwritten Legal Amount Recognition”, Proceedings
of the 13th Brazilian Symposium on Computer Graphics and Image
Processing, 2000, Gramado, Brazil, pp. 97-104.
TABLE III [14] Yu, M. L., Kwok, P. C. K., Leung, C. H., Tse, K. W., ”Segmentation and
C OMPARATIVE PERFORMANCE ANALYSIS WITH FEW EXISTING STUDIES recognition of Chinese bank check amounts”, International Journal on
Document Analysis and Recognition, Volume 3, 2001, Gramado, Brazil,
Reference Accuracy Rejection Error pp. 207–217.
Tang et al. [16] 60% 39% 1% [15] Kim, K. K., Kim, J. H., Chung, Y. K., Ching, Y. S., ”Legal Amount
Kim et al. [15] 72% 0% 28% Recognition Based on the Segmentation Hypotheses for Bank Check
Yu et al. [14] 74% 15.6% 10.4% Processing”, Proceedings of International Conference on Document
Proposed method 89.73% 2.13% 8.14% Analysis and Recognition, 2001, Seattle, USA, pp. 964-967.
[16] Tang, H., Emmanuel, A., Ching, Y. S., Baret, O., Cheriet, M., ”Spiral
Recognition Methodology and Its Application for Recognition of Chinese
Bank Checks”, Proceedings of 9th International Workshop on Frontiers
VII. C ONCLUSION in Handwriting Recognition, 2004, Tokyo, Japan, pp. 263-268.
Automatic handwritten text recognition in bank cheques is [17] Jayadevan, R., Pal, U., Kimura, F., ”Recognition of words from legal
amounts of Indian bank cheques”, Proceedings of the International
an interesting field of research from both scientific and com- Conference on Frontiers in Handwriting Recognition, 2010, Kolkata,
mercial points of view. In this paper, SVM based BCHWTR India, pp. 166-171.
system has been proposed for Indian bank cheques. The [18] Dalal, N., Triggs, B., ”Histograms of oriented gradients for human
detection”, Proceedings of IEEE Computer Society Conference on
proposed system has provided promising result. This research Computer Vision and Pattern Recognition, 2005, San Diego, USA, pp.
work can further be extended for verification of signatures 886-893.
2018 Conference on Information and Communication Technology (CICT’18)
[19] Haralick, R. M., Shanmugam, K., ”Textural features for image classifi-
cation”, IEEE Transactions on systems, man, and cybernetics, Volume
6, 1973, pp. 610-621.
[20] Meyer, D., Leisch, F., Hornik, K., ”The support vector machine under
test”, Neurocomputing, Volume 55, Issue 1, 2003, pp. 169-186.