You are on page 1of 5

Proceedings of the Fourth International Conference on Computing Methodologies and Communication (ICCMC 2020)

IEEE Xplore Part Number:CFP20K25-ART; ISBN:978-1-7281-4889-2

Open CV based Information Extraction from


2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC) 978-1-7281-4889-2/20/$31.00 ©2020 IEEE 10.1109/ICCMC48092.2020.ICCMC-00018

Cheques
Aniket Dhanawade, Abhishek Drode, Gifty Johnson, Aadesh Rao and Dr. Savitha Upadhya
Department of Electronics and Telecommunication Engineering, Fr. C.R.I.T.

Vashi, Navi Mumbai, India


dhanawadeaniket72@gmail.com, drodeabhishek1997@gmail.com, gifty.johnson02@gmail.com,
aadeshrao5@gmail.com, savitha.upadhya@fcrit.ac.in
Abstract – In general, bank cheques are used extensively written order directing a bank to pay a part icular
for financial transactions in various organizations. amount of money or to withdraw money. This
Cheques are always verified manually. The traditional
med iu m of service can be easily cheated, when the
verification process will always include date, signature,
legal information, and payment written on the cheques. modes of verification are insecure. Therefore, it is
In this paper, extracting the legal information from necessary to verify the signature.
captured cheque image is obtained by preprocessing the There are two types of signature verificat ion methods.
image, extracting required information and then 1. Offline Methods
recognizing and verifying the handwritten fields. Image These use static in formation for verificat ion purpose.
processing techniques like thinning, median filtering,
dilation, and verification techniques are also employed Offline signature schemes use the signature as the
in this approach. input image and usually have noise present in them.
So to eradicate the noise from the signature after
Keywords – OCR, TESSERACT, OPENCV processing, the input image filters such as median
I. INTRODUCTION filter are applied. 2. Online Methods
These make use of dynamic informat ion for
Money transfer is a crucial part of all institutions. verification purpose. Online signature verification is
There are numerous ways to transfer money like carried using pressure-sensitive tablets and webcam
direct cash, online transactions, demand drafts, that extracts features fro m a signature. [1]
cheques etc. Out of these, money transfer using Handwriting Recognition is one of the most active
cheques is the most widely used method. Every areas of science where deep neural networks are used.
institution or organization perform confidential To train a model of a neural network it takes a large
money transfer using cheques or demand draft. The dataset. The method of identifying handwriting
reason is that use of cheques is more secure as characters is divided into two systems. [2]
compared to other methods. But even today, details of
these cheques are manually entered. For examp le, in a) Android application: The android program
schools, when students deposit their cheques, the allo ws the user, by using their mobile device, to select
officials manually enter their data in an excel sheet. a text p icture to be recognized. To ext ract the image
This task tends to be laborious. The same procedure is informat ion, this image is processed via a python
being followed in banks as well, where thousands of script running on a server.
cheques are dealt with each day. The signatures are b) A server: This is their machine backend.
verified manually, i.e. the employee visually verifies This server is a machine that can run a python script.
if the signature is genuine. The errors occurring in this It's needed because an Android smartphone doesn't
type of manual checking is large and time consuming. have the computational power required to run neural
So, there is a need for automating the above steps. networks and do image processing operations.
Here a system which can scan a cheque and extract all
the necessary informat ion and make a list of all the A. OCR (Optical Character Recognition) [3]
transactions happening is proposed along with a
OCR is a technology that identifies text within an
software that verifies signature using an availab le
image and is used to detect text fro m scanned
database is proposed to be imp lemented. II.
LITERATURE REVIEW documents. OCR processes an image by discovering
and identifying characters. It will expo rt the text o r
Handwritten documents are one of the most reliable convert the characters to editable text d irectly in the
sources of information. One of them is cheque, it is a

978-1-7281-4889-2/20/$31.00 ©2020 IEEE 93

Authorized licensed use limited to: University of Exeter. Downloaded on June 19,2020 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Computing Methodologies and Communication (ICCMC 2020)
IEEE Xplore Part Number:CFP20K25-ART; ISBN:978-1-7281-4889-2

image. Advanced OCR can export the size, fo rmat


and layout of the text on a page. [3]

B. TESSERACT [4]

Python-tesseract is a Python method for Optical


Character Recognition (OCR). This is used to Fig. 2. Scanned cheque image
acknowledge and interpret the text in the images. It
can read several of the types of images that Pillow, B. Cropping of input image
Leptonica's imag ing libraries support, including jpeg, The acquired image is then cropped using the pillow
png, gif, tiff. If script is used, the detected text will be lib rary. Pillow is a library availab le fo r Python
printed by Python-tesseract rather than written into a programming. All cheques have a standard format.
register [4]. The date, payee, signature, amount is all in the same
C. OPENCV [5] place. So manual cropping is perfo rmed i.e. the
coordinates of the image to be cropped is given
OpenCV stands for Open source computer vision. manually. This gives us the cropped image of all the
It is a lib rary of programming functions that aims at essential data. Fig.3 shows the name of the bank.
real-t ime computer v ision. It consists of more than Fig.4 shows the amount. Fig.5 shows the account
2500 algorith ms wh ich supports computer vision number. Fig.6 shows the bank address. Fig.7 shows
applications. It has C++, Python, MATLAB and the date that has been extracted.
JAVA interfaces and supports Windows, Linu x,
macOS and it also runs on various mobile
operating systems such as Android, iOS,
BlackBerry.[5]

III. IMPLEMENTATION

Fig.1 shows the block diagram o f processing of Fig. 3. Bank name


cheques using image processing.

Fig. 4. Amount on cheque

Fig. 5. Account number

Fig. 1. Block diagram Image processing


A. Image acquisition
Fig. 6. Address
In Image acquisition the recognition system takes a
scanned image as an input image. The image should
have a specific format as JPEG. Th is image is taken Fig. 7. Date
through a scanner or a camera. Fig.2 is a scanned
image of a cheque.

978-1-7281-4889-2/20/$31.00 ©2020 IEEE 94

Authorized licensed use limited to: University of Exeter. Downloaded on June 19,2020 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Computing Methodologies and Communication (ICCMC 2020)
IEEE Xplore Part Number:CFP20K25-ART; ISBN:978-1-7281-4889-2

Fig. 8. Detected text from image


C. Pre-processing Fig. 10. Normalized image

Pre-processing is used to generate data that is The linear normalizat ion is performed using the given
suitable for the OCR system to operate precisely. formula:
The OCR system needs clear image with no noise In= (I-Min)*(newMax-newMin)/Max-Min+newMin,
in them. W ithout pre-processing, the system gives Where ‘I’= grayscale image with intensity values
false output. For e.g. the lines in between the dates ranging from (Min, Max) and ‘In’= New image
has to be removed to extract the proper date. This with with intensity values ranging from (newMin,
is achieved by performing binary thresholding. A newMax). [8] b. Binarization [9]
reference threshold is set and the pixel values Image binarization is used to reduce the amount of
below that are made b lack and the once above the image information such as color and background of
threshold are made white. This helps in removing the image, so that output image is black and white.
the bars which are at a higher intensity as This is done by setting a threshold value and
compared to the written date. classifying all the pixels above threshold as white and
below threshold as black. Binarized image is shown in
Pre-processing involves: [6]
Fig. 11.
a. Noise removing or Normalization

In normalization a noise reduction filter such as


med ian filter is enfo rced on the binary scanned
image to remove single white pixels on black
background and vice-versa. Fig. 9 is the original
image and Fig. 10 is the normalized image.
Fig. 11. Binarized image
c. Morphological Operations [10]

The signature area is separated from the background


through segmentation process of vertical and
horizontal pro jection. Morphological operations like
Erosion and Dilat ion are applied to perform th is
operation. Fig. 12 shows cropped image after
thinning. It is obtained by subtracting the hit-and-miss
Fig. 9 Original image [7]
transform of the image and the structured element
from the image.

Fig. 12. Thinned image [10]

978-1-7281-4889-2/20/$31.00 ©2020 IEEE 95

Authorized licensed use limited to: University of Exeter. Downloaded on June 19,2020 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Computing Methodologies and Communication (ICCMC 2020)
IEEE Xplore Part Number:CFP20K25-ART; ISBN:978-1-7281-4889-2

Fig. 13. Csv File


analysis. It helps manage large data systematically. It
creates a .csv file. csv stands for Co mma Separated
Variables. Many columns can be defined and the data
can be segregated. This type of presentation is clear
and precise and is easy for the user to understand. The
obtained csv file is shown in Fig. 13.

After extract ing the cheque features such as the bank


name, cheque amount, account number, bank address
and the date, the next step is to authenticate the
signature which can be done using Neural Network o r
Fig. 14. Machine Learning algorithms.
Hough Transform VI. CONCLUSION
Hough transform of this image is taken. Hough
transform is used to detect forgeries. Hough transform A. Conclusion
peaks are used for comparison. Fig. 14 shows the The objective of the above wo rk presented in this
Hough transform of the input image. [11][12] paper is to make an auto mated system which will
Fig. 15 shows the pre-processed date performed using extract information fro m cheques, validate the
above steps. informat ion and keep record o f all the info rmation
present that can be used by financial institutes to
reduce human efforts in verification of signature
which is done manually as well as it will keep a
Fig. 15. Pre-processed date record of all the transactions.

VII. RESULT In this paper bank name, address, account number,


date, amount was extracted fro m the cheque using
A. Image text conversion different libraries in Python such as OpenCV and
tesseract. By using Pandas, the data obtained was
The text fro m the cropped image is now read using
stored in the form of excel sheet.
tesseract library in Python that perform OCR on the
images and extracts the text fro m them. The string
obtained can be stored in a variable and can be used
later. Fig. 8 shows the detected text from the image. VII. REFERENCES

[1] M . Jasmine Pemeena Priyadarsini, K.M urugesan,


B. Converting to .csv file
Srinivasa Rao Inbathini, A.Jabeena, K.Sai Tej, “Bank
The obtained text is stored as a variable. The obtained Cheque Authentication using Signature”, International
text now needs to be documented. For this Pandas Journal of Advanced Research in Computer Science and
lib rary has been used. Pandas is mainly used for data Software Engineering, Volume 3, Issue 5, M ay 2013.
[2] Rohan Vaidya, Darshan Trivedi, Sagar Satra,

978-1-7281-4889-2/20/$31.00 ©2020 IEEE 96

Authorized licensed use limited to: University of Exeter. Downloaded on June 19,2020 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Computing Methodologies and Communication (ICCMC 2020)
IEEE Xplore Part Number:CFP20K25-ART; ISBN:978-1-7281-4889-2

“Handwritten Character Recognition Using DeepLearning”,


International Conference on Inventive Communication and
Computational Technologies(ICICCT), 2018.

[3] https://techterms.com/definition/ocr
[4] https://pypi.org/project/pytesseract/
[5] https://en.wikipedia.org/wiki/OpenCV

[6] Ankit Arora1, Aakanksha S. Choubey2, “Offline


Signature Verification and Recognition using Neural
Network”, International Journal of Science and Research
(IJSR), India Online ISSN: 2319-7064, Volume 2 Issue 8,
August 2013.

[7]https://github.com/alankarmisra/SwiftSignatureView

[8]https://en.wikipedia.org/wiki/Normalization_(image_pro
cessing)

[9] Dr. Neeraj Bhargava1, Anchal kumawat2, Dr.


Ritu Bhargava3,” Threshold and binarization for document
image analysis using Otsu’s Algorithm”, International
Journal of Computer Trends and Technology (IJCTT) –
volume 17 Number 5 Nov 2014.

[10] Abhishek, Lakshmesha K.N, “Thinning Approach


in Digital Image Processing”, International Journal of
Latest
Trends in Engineering and Technology, Special Issue
SACAIM 2017.

[11] Dipti Verma, Sipi Dubey, “Static Signature


Recognition System for User Authentication Based Two
Level Cog, Hough Transform and Neural Network”,
International Journal of Engineering Sciences & Emerging
Technologies, ISSN: 2231 – 6604, Volume 6 Dec. 2013.

[12] Rahul Dubey, Deeraj A garwal, “Offline Signature


Recognition using Hough Transform and Neural Network’’,
International Journal of Engineering Sciences & Research
Technology, ISSN: 2277-9655, July 2013.

978-1-7281-4889-2/20/$31.00 ©2020 IEEE 97

Authorized licensed use limited to: University of Exeter. Downloaded on June 19,2020 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.

You might also like