Dhana Wade 2020

Proceedings of the Fourth International Conference on Computing Methodologies and Communication (ICCMC 2020)
IEEE Xplore Part Number:CFP20K25-ART; ISBN:978-1-7281-4889-2
Open CV based Information Extraction from

2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC) 978-1-7281-4889-2/20/$31.00 ©2020 IEEE 10.1109/ICCMC48092.2020.ICCMC-00018
Cheques
Aniket Dhanawade, Abhishek Drode, Gifty Johnson, Aadesh Rao and Dr. Savitha Upadhya
Department of Electronics and Telecommunication Engineering, Fr. C.R.I.T.
Vashi, Navi Mumbai, India

dhanawadeaniket72@gmail.com, drodeabhishek1997@gmail.com, gifty.johnson02@gmail.com,
aadeshrao5@gmail.com, savitha.upadhya@fcrit.ac.in
Abstract – In general, bank cheques are used extensively written order directing a bank to pay a part icular
for financial transactions in various organizations. amount of money or to withdraw money. This
Cheques are always verified manually. The traditional
med iu m of service can be easily cheated, when the
verification process will always include date, signature,
legal information, and payment written on the cheques. modes of verification are insecure. Therefore, it is
In this paper, extracting the legal information from necessary to verify the signature.
captured cheque image is obtained by preprocessing the There are two types of signature verificat ion methods.
image, extracting required information and then 1. Offline Methods
recognizing and verifying the handwritten fields. Image These use static in formation for verificat ion purpose.
processing techniques like thinning, median filtering,
dilation, and verification techniques are also employed Offline signature schemes use the signature as the
in this approach. input image and usually have noise present in them.
So to eradicate the noise from the signature after
Keywords – OCR, TESSERACT, OPENCV processing, the input image filters such as median
I. INTRODUCTION filter are applied. 2. Online Methods
These make use of dynamic informat ion for
Money transfer is a crucial part of all institutions. verification purpose. Online signature verification is
There are numerous ways to transfer money like carried using pressure-sensitive tablets and webcam
direct cash, online transactions, demand drafts, that extracts features fro m a signature. [1]
cheques etc. Out of these, money transfer using Handwriting Recognition is one of the most active
cheques is the most widely used method. Every areas of science where deep neural networks are used.
institution or organization perform confidential To train a model of a neural network it takes a large
money transfer using cheques or demand draft. The dataset. The method of identifying handwriting
reason is that use of cheques is more secure as characters is divided into two systems. [2]
compared to other methods. But even today, details of
these cheques are manually entered. For examp le, in a) Android application: The android program
schools, when students deposit their cheques, the allo ws the user, by using their mobile device, to select
officials manually enter their data in an excel sheet. a text p icture to be recognized. To ext ract the image
This task tends to be laborious. The same procedure is informat ion, this image is processed via a python
being followed in banks as well, where thousands of script running on a server.
cheques are dealt with each day. The signatures are b) A server: This is their machine backend.
verified manually, i.e. the employee visually verifies This server is a machine that can run a python script.
if the signature is genuine. The errors occurring in this It's needed because an Android smartphone doesn't
type of manual checking is large and time consuming. have the computational power required to run neural
So, there is a need for automating the above steps. networks and do image processing operations.
Here a system which can scan a cheque and extract all
the necessary informat ion and make a list of all the A. OCR (Optical Character Recognition) [3]
transactions happening is proposed along with a
OCR is a technology that identifies text within an
software that verifies signature using an availab le
image and is used to detect text fro m scanned
database is proposed to be imp lemented. II.
LITERATURE REVIEW documents. OCR processes an image by discovering
and identifying characters. It will expo rt the text o r
Handwritten documents are one of the most reliable convert the characters to editable text d irectly in the
sources of information. One of them is cheque, it is a
978-1-7281-4889-2/20/$31.00 ©2020 IEEE 93
Authorized licensed use limited to: University of Exeter. Downloaded on June 19,2020 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
image. Advanced OCR can export the size, fo rmat

and layout of the text on a page. [3]
B. TESSERACT [4]
Python-tesseract is a Python method for Optical

Character Recognition (OCR). This is used to Fig. 2. Scanned cheque image
acknowledge and interpret the text in the images. It
can read several of the types of images that Pillow, B. Cropping of input image
Leptonica's imag ing libraries support, including jpeg, The acquired image is then cropped using the pillow
png, gif, tiff. If script is used, the detected text will be lib rary. Pillow is a library availab le fo r Python
printed by Python-tesseract rather than written into a programming. All cheques have a standard format.
register [4]. The date, payee, signature, amount is all in the same
C. OPENCV [5] place. So manual cropping is perfo rmed i.e. the
coordinates of the image to be cropped is given
OpenCV stands for Open source computer vision. manually. This gives us the cropped image of all the
It is a lib rary of programming functions that aims at essential data. Fig.3 shows the name of the bank.
real-t ime computer v ision. It consists of more than Fig.4 shows the amount. Fig.5 shows the account
2500 algorith ms wh ich supports computer vision number. Fig.6 shows the bank address. Fig.7 shows
applications. It has C++, Python, MATLAB and the date that has been extracted.
JAVA interfaces and supports Windows, Linu x,
macOS and it also runs on various mobile
operating systems such as Android, iOS,
BlackBerry.[5]
III. IMPLEMENTATION
Fig.1 shows the block diagram o f processing of Fig. 3. Bank name

cheques using image processing.
Fig. 4. Amount on cheque
Fig. 5. Account number
Fig. 1. Block diagram Image processing

A. Image acquisition
Fig. 6. Address
In Image acquisition the recognition system takes a
scanned image as an input image. The image should
have a specific format as JPEG. Th is image is taken Fig. 7. Date
through a scanner or a camera. Fig.2 is a scanned
image of a cheque.
978-1-7281-4889-2/20/$31.00 ©2020 IEEE 94
Fig. 8. Detected text from image

C. Pre-processing Fig. 10. Normalized image
Pre-processing is used to generate data that is The linear normalizat ion is performed using the given
suitable for the OCR system to operate precisely. formula:
The OCR system needs clear image with no noise In= (I-Min)*(newMax-newMin)/Max-Min+newMin,
in them. W ithout pre-processing, the system gives Where ‘I’= grayscale image with intensity values
false output. For e.g. the lines in between the dates ranging from (Min, Max) and ‘In’= New image
has to be removed to extract the proper date. This with with intensity values ranging from (newMin,
is achieved by performing binary thresholding. A newMax). [8] b. Binarization [9]
reference threshold is set and the pixel values Image binarization is used to reduce the amount of
below that are made b lack and the once above the image information such as color and background of
threshold are made white. This helps in removing the image, so that output image is black and white.
the bars which are at a higher intensity as This is done by setting a threshold value and
compared to the written date. classifying all the pixels above threshold as white and
below threshold as black. Binarized image is shown in
Pre-processing involves: [6]
Fig. 11.
a. Noise removing or Normalization
In normalization a noise reduction filter such as

med ian filter is enfo rced on the binary scanned
image to remove single white pixels on black
background and vice-versa. Fig. 9 is the original
image and Fig. 10 is the normalized image.
Fig. 11. Binarized image
c. Morphological Operations [10]
The signature area is separated from the background

through segmentation process of vertical and
horizontal pro jection. Morphological operations like
Erosion and Dilat ion are applied to perform th is
operation. Fig. 12 shows cropped image after
thinning. It is obtained by subtracting the hit-and-miss
Fig. 9 Original image [7]
transform of the image and the structured element
from the image.
Fig. 12. Thinned image [10]
978-1-7281-4889-2/20/$31.00 ©2020 IEEE 95
Fig. 13. Csv File

analysis. It helps manage large data systematically. It
creates a .csv file. csv stands for Co mma Separated
Variables. Many columns can be defined and the data
can be segregated. This type of presentation is clear
and precise and is easy for the user to understand. The
obtained csv file is shown in Fig. 13.
After extract ing the cheque features such as the bank

name, cheque amount, account number, bank address
and the date, the next step is to authenticate the
signature which can be done using Neural Network o r
Fig. 14. Machine Learning algorithms.
Hough Transform VI. CONCLUSION
Hough transform of this image is taken. Hough
transform is used to detect forgeries. Hough transform A. Conclusion
peaks are used for comparison. Fig. 14 shows the The objective of the above wo rk presented in this
Hough transform of the input image. [11][12] paper is to make an auto mated system which will
Fig. 15 shows the pre-processed date performed using extract information fro m cheques, validate the
above steps. informat ion and keep record o f all the info rmation
present that can be used by financial institutes to
reduce human efforts in verification of signature
which is done manually as well as it will keep a
Fig. 15. Pre-processed date record of all the transactions.
VII. RESULT In this paper bank name, address, account number,

date, amount was extracted fro m the cheque using
A. Image text conversion different libraries in Python such as OpenCV and
tesseract. By using Pandas, the data obtained was
The text fro m the cropped image is now read using
stored in the form of excel sheet.
tesseract library in Python that perform OCR on the
images and extracts the text fro m them. The string
obtained can be stored in a variable and can be used
later. Fig. 8 shows the detected text from the image. VII. REFERENCES
[1] M . Jasmine Pemeena Priyadarsini, K.M urugesan,

B. Converting to .csv file
Srinivasa Rao Inbathini, A.Jabeena, K.Sai Tej, “Bank
The obtained text is stored as a variable. The obtained Cheque Authentication using Signature”, International
text now needs to be documented. For this Pandas Journal of Advanced Research in Computer Science and
lib rary has been used. Pandas is mainly used for data Software Engineering, Volume 3, Issue 5, M ay 2013.
[2] Rohan Vaidya, Darshan Trivedi, Sagar Satra,
978-1-7281-4889-2/20/$31.00 ©2020 IEEE 96
“Handwritten Character Recognition Using DeepLearning”,

International Conference on Inventive Communication and
Computational Technologies(ICICCT), 2018.
[3] https://techterms.com/definition/ocr
[4] https://pypi.org/project/pytesseract/
[5] https://en.wikipedia.org/wiki/OpenCV
[6] Ankit Arora1, Aakanksha S. Choubey2, “Offline

Signature Verification and Recognition using Neural
Network”, International Journal of Science and Research
(IJSR), India Online ISSN: 2319-7064, Volume 2 Issue 8,
August 2013.
[7]https://github.com/alankarmisra/SwiftSignatureView
[8]https://en.wikipedia.org/wiki/Normalization_(image_pro
cessing)
[9] Dr. Neeraj Bhargava1, Anchal kumawat2, Dr.

Ritu Bhargava3,” Threshold and binarization for document
image analysis using Otsu’s Algorithm”, International
Journal of Computer Trends and Technology (IJCTT) –
volume 17 Number 5 Nov 2014.
[10] Abhishek, Lakshmesha K.N, “Thinning Approach

in Digital Image Processing”, International Journal of
Latest
Trends in Engineering and Technology, Special Issue
SACAIM 2017.
[11] Dipti Verma, Sipi Dubey, “Static Signature

Recognition System for User Authentication Based Two
Level Cog, Hough Transform and Neural Network”,
International Journal of Engineering Sciences & Emerging
Technologies, ISSN: 2231 – 6604, Volume 6 Dec. 2013.
[12] Rahul Dubey, Deeraj A garwal, “Offline Signature

Recognition using Hough Transform and Neural Network’’,
International Journal of Engineering Sciences & Research
Technology, ISSN: 2277-9655, July 2013.
978-1-7281-4889-2/20/$31.00 ©2020 IEEE 97

Dhana Wade 2020

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dhana Wade 2020

Uploaded by

Copyright:

Available Formats

Proceedings of the Fourth International Conference on Computing Methodologies and Communication (ICCMC 2020)

IEEE Xplore Part Number:CFP20K25-ART; ISBN:978-1-7281-4889-2

Open CV based Information Extraction from

Vashi, Navi Mumbai, India

978-1-7281-4889-2/20/$31.00 ©2020 IEEE 93

image. Advanced OCR can export the size, fo rmat

Python-tesseract is a Python method for Optical

Fig.1 shows the block diagram o f processing of Fig. 3. Bank name

Fig. 4. Amount on cheque

Fig. 5. Account number

Fig. 1. Block diagram Image processing

978-1-7281-4889-2/20/$31.00 ©2020 IEEE 94

Fig. 8. Detected text from image

In normalization a noise reduction filter such as

The signature area is separated from the background

Fig. 12. Thinned image [10]

978-1-7281-4889-2/20/$31.00 ©2020 IEEE 95

Fig. 13. Csv File

After extract ing the cheque features such as the bank

VII. RESULT In this paper bank name, address, account number,

[1] M . Jasmine Pemeena Priyadarsini, K.M urugesan,

978-1-7281-4889-2/20/$31.00 ©2020 IEEE 96

“Handwritten Character Recognition Using DeepLearning”,

[6] Ankit Arora1, Aakanksha S. Choubey2, “Offline

[9] Dr. Neeraj Bhargava1, Anchal kumawat2, Dr.

[10] Abhishek, Lakshmesha K.N, “Thinning Approach

[11] Dipti Verma, Sipi Dubey, “Static Signature

[12] Rahul Dubey, Deeraj A garwal, “Offline Signature

978-1-7281-4889-2/20/$31.00 ©2020 IEEE 97

You might also like