You are on page 1of 26

Project Report

On
HCR system to transform handwritten text to printed
format and evaluation of various models

Submitted
In partial fulfilment
For the award of the Degree of

PG-Diploma in Big Data Analytics


(C-DAC, ACTS (Pune))

Guided By: Submitted By:


Mrs. Swapna Yennishetti Swati Ighare (220340128013)
Swalini Mary M (220340128047)
Vasudev Lambhate (220340128050)
Avinash Vhanmane
(220340128051) Vivek Singh
Tomar(220340128055)

Centre for Development of Advanced

Computing (C-DAC), ACTS (Pune-


411008)
Acknowledgement

This is to acknowledge our indebtedness to our Project Guide, Ms. Swapna Yennishetti, C-
DAC ACTS, Pune for her constant guidance and helpful suggestion for preparing this project
HCR system to transform handwritten text to printed format and evaluation of various
models. We express our deep gratitude towards her for inspiration, personal involvement,
constructive criticism that she provided us along with technical guidance during the course of
this project.
We take this opportunity to thank Head of the department Mr. Gaur Sunder for
providing us such a great infrastructure and environment for our overall development.
We express sincere thanks to Mrs. Namrata Ailawar, Process Owner, for their kind
cooperation and extendible support towards the completion of our project.
It is our great pleasure in expressing sincere and deep gratitude towards Mrs. Risha P R
(Program Head) and Mrs. Srujana Bhamidi (Course Coordinator, PG-DBDA) for their
valuable guidance and constant support throughout this work and help to pursue additional
studies.
Also, our warm thanks to C-DAC ACTS Pune, which provided us this opportunity to
carry out, this prestigious Project and enhance our learning in various technical fields.

Swati Ighare (220340128013)


Swalini Mary M (220340128047)
Vasudev Lambhate (220340128050)
Avinash Vhanmane
(220340128051) Vivek Singh
Tomar (220340128055)
ABSTRACT

Handwritten character recognition is an image based sequence recognition task within


computer vision. Traditional approaches rely on lexical segmentation, complex feature
extraction techniques and considerable knowledge in the domain of linguistics. In our
project we used a novel approach to handwriting recognition by using Convolutional
Recurrent Neural Network (CRNN) combined with Connectionist Temporal Classification
(CTC) and EasyOCR library which is most widely used for text recognition and the
comparative study of both CRNN and EasyOCR models. The implemented methods have
the advantage of not being dependent on lexical segmentation and manual feature
extraction. Moreover, applied methods are symbolic and character independent, making the
model globally trainable and suitable to be applied to multiple languages.
Table of Contents

S. No Title Page No.


Front Page I
Acknowledgement II
Abstract II
Table of Contents I
IV
1 Introduction 01-02
1.1 Introduction 01
1.2 Objective and Specifications 02
2 Literature Review 03-04
3 Methodology/ Techniques 05-10
3.1 Approach and Methodology/ Techniques 05
3.2 Dataset 08
3.3 Model Description 09
4 Implementation 11-13
4.1 Implementation 11

5 Results 15-16
5.1 Results 15
6 Conclusion 17
6.1 Conclusion 17
7 References 18
7.1 References 18
HCR system to transform handwritten text to printed format and evaluation of various models

Chapter 1
Introduction
Detection of text regions either from handwritten or printed document images containing various non-textual information is a difficult task, and it can be more challenging to locate the position of the text regions when we deal with a doctor’s prescription.

1.1 Introduction
In a society that is now digitally enhanced, we depend on computers to process huge
amounts of data. Various economic and business requirements demand a fast inputting of
huge volumes of data into the computers. This cannot be achieved by manually typing the
data and entering it into the computers as it is very time-consuming. Hence mechanizing
the manual process plays an important role. Many kinds of research came in the character
recognition area where optical character recognition (OCR) has made a mark. Detection
of text regions either from handwritten or printed document images containing various
non-textual information is a difficult task, and it can be more challenging to locate the
position of the text regions when we deal with a doctor’s prescription.

Optical character recognition – It is known as the process of reading the text from the
documents, both the printed text and handwritten text and converting the text into a form
that the computers can operate on. Optical character recognition is the translation of
handwritten, typewritten, or printed paper into machine editable text by using any
scanning device or software. It is a field of research in pattern recognition, machine
vision, and artificial intelligence. And each year, this technology helps us free large
amounts of physical storage space once given over to file cabinets and boxes of paper
documents.

In our project work, we have proposed a model that uses CRNN and EasyOCR to convert
scanned images of input documents into machine editable text. Neural networks learn and
remember what they have learned, enabling them to predict classes or values for new
datasets, but what makes CRNN different is that unlike normal neural networks, CRNNs

C-DAC ACTS (Pune)/PG- Page|


HCR system to transform handwritten text to printed format and evaluation of various models

rely on the information from previous output to predict for the upcoming input. Firstly,
the input documents are converted into an image format, which is then classified into
printed, handwritten cursive, handwritten discrete and semi-printed by an input classifier.
Text in the image is predicted using corresponding models. The predicted text is therefore
available as machine editable text which can be retrieved easily whenever necessary.

EasyOCR is a python package that allows the image to be converted to text. It is by far
the easiest way to implement OCR and has access to over 70+ languages including
English, Chinese, Japanese, Korean, Hindi, many more are being added. EasyOCR is
created by the Jaided AI Company.

1.2 Objective
The objectives of the project work are as -
□ To test a system capable of recognizing English alphabets.
□ To have better understanding of CRNNs and apply it for character recognition.
□ To have better understanding of different digital image processing tools.
□ To understand the process and steps involved in the development of EasyOCR.
The study will put emphasis on the testing of the CRNN software using computer printed
and handwritten English alphabets, as the system is capable of learning and recognizing a
single character at a time. The duration of training the system will, therefore, be long
because the handwritten characters have more complex factors to be considered such as
alignment and different writing styles.

C-DAC ACTS (Pune)/PG- Page|


HCR system to transform handwritten text to printed format and evaluation of various models

Chapter 2
LITERATURE REVIEW

Dibyajyoti Dhar et al. [1] In this research paper, a method has been proposed to classify
printed and handwritten texts found in doctor’s prescriptions. As the proposed method
has successfully classified the printed and handwritten texts in the documents and with a
very low complexity, hence this can easily be embedded with recognition module as an
additional resource requirement. The dataset used in this model contains handwritten
prescriptions which are used to classify whether the text is printed text or handwritten
text.

AL-Saffar et al. [2] This research paper proposes a Dynamically Configurable


Convolutional Recurrent Neural Network (DC-CRNN) for the handwriting recognition
sequence modeling task. They conducted their experiments on two well-known datasets,
IAM and IFN/ENIT, which include both the Arabic and English languages.

Chammas et al. [3] In this research work, they presented a state-of-the-art CRNN system
for text-line recognition of historical documents. They showed how to train such system
with few labeled text-line data. . They also improved the performance of the system by
augmenting the training set with specially crafted synthetic data at multiscale. At the end,
they proposed a model-based normalization scheme by introducing the notion of the
variability in the writing scale to the test data.

C-DAC ACTS (Pune)/PG- Page|


HCR system to transform handwritten text to printed format and evaluation of various models

Shabana Mehfuz et al. [4] Handwritten character recognition is always a frontier area of
research in the field of pattern recognition and image processing and there is a large
demand for Optical Character for Recognition on hand written documents. This paper
provides a comprehensive review of existing works in handwritten character recognition
based on soft computing technique during the past decade.

Shakrupa et al. [5] In this research paper they have evaluated two RNN architectures for
handwritten text recognition based on Connectionist Temporal Classification and
Sequence-to-Sequence Learning approach. The obtained results are comparable with
81.5% average recognition rate over all manuscripts. Both methods showed promising
results, the CTC model consistently outperformed the Seq2Seq model on both training
and test datasets.

Sueiras et al. [6] use the method of combining deep neural networks with sequence to
sequence networks, also called an encoder-decoder. The proposed architecture aims to
identify characters and conceptualize them with their neighbors to recognize a given
word. For training and testing IAM and RIMES, these datasets consist of handwritten
texts on white background from many people. The error rate in the test set is 12.7% in
IAM and 6.6% in RIMES. This method is more efficient in the case of language
translation and speech to text conversion rather than handwriting recognition.

C-DAC ACTS (Pune)/PG- Page|


HCR system to transform handwritten text to printed format and evaluation of various models

Chapter 3
Methodology and Techniques

3.1 Methodology:
3.1.1 CRNN
The outline of the proposed work is represented using the block diagram as shown in
Fig.1. The first step in the process is training the dataset. The dataset is trained with CNN
and RNN layers. The obtained output and the ground truth text are passed through the
CTC layer to get the trained model. The obtained trained model is then used to recognize
the text in the input image.

The input handwritten image is pre-processed by adjusting the resolution. The first step in
recognition is to break down the paragraph image into line images. Then line images are
further segmented into word images. The word images are then preprocessed and passed
through the same CNN and RNN layers that were used in training.

The output of the RNN layers is given to the CTC layer (decoding level) to decode the
output text with the help of the trained model. The method of extracting word images

C-DAC ACTS (Pune)/PG- Page|


HCR system to transform handwritten text to printed format and evaluation of various models

from a paragraph and combining CNN, RNN, and CTC techniques to train the NN model
is very effective for implementation. As a whole, we are proposing an end-to-end
handwritten text recognition system which is to be implemented by using CRNN and
EasyOCR techniques.

C-DAC ACTS (Pune)/PG- Page|


HCR system to transform handwritten text to printed format and evaluation of various models

Fig.1 Proposed Work

3.1.2 EasyOCR
EasyOCR is actually a python package that holds PyTorch as a backend handler.
EasyOCR like any other OCR (tesseract of Google or any other) detects the text from
images but in my reference, while using it I found that it is the most straightforward way
to detect text from images also when high end deep learning library (PyTorch) is
supporting it in the backend which makes it accuracy more credible. EasyOCR supports
42+ languages for detection purposes. EasyOCR is created by the company named Jaided
AI company.

C-DAC ACTS (Pune)/PG- Page|


HCR system to transform handwritten text to printed format and evaluation of various models

Fig 2. EasyOCR Framework

3.2 Dataset
The IAM dataset has been used for training the model. The IAM database consists of
handwritten English sentences. It is based on the Lancaster-Oslo/Bergen (LOB) corpus.
The database serves as a basis for a variety of recognition tasks, particularly useful in
recognition tasks where linguistic knowledge beyond the lexicon level is used, as this

C-DAC ACTS (Pune)/PG- Page|


HCR system to transform handwritten text to printed format and evaluation of various models

knowledge is automatically derived from the underlying corpus.

The IAM database also includes a few image-processing procedures for extracting the
handwritten text from the forms and the segmentation of the text into lines and words.
The training of the model is done using the IAM dataset along with the IAM dataset,
custom handwritten paragraph images collected from random people were used for
testing. These custom images were captured under normal lighting with a 5MP camera
with a resolution of range 800 to 1000 dpi.

3.1.3 Model Description


Preprocessing-
The preprocessing phase can be considered as the first stage of the recognition system.

C-DAC ACTS (Pune)/PG- Page|


HCR system to transform handwritten text to printed format and evaluation of various models

The main goal of this step is to modify the images in a way that will make it easier and
faster for the recognizer to learn from them. Images of similar size are nowadays
commonly used in image recognition with convolutional networks our main goal was
rapid prototyping of various RNN architectures and therefore the size of input data was
important factor affecting the total time of training.

We considered using convolutional layers for feature extraction, but initial experiments
showed that it provided only minor improvement of accuracy and considerably
increased the training time. We also tried simple image enhancement techniques
(Gaussian blur for noise reduction and binarization) as additional preprocessing steps,
which provided small increase of accuracy for Seq2Seq approach, but did not
substantially affect accuracy of CTC approach.

Recurrent Neural Network(RNN)-

LSTMs are specially constructed RNN nodes to preserve long lasting dependencies.
They consist of self-connected memory cell that can be compared to the classical RNN
node and three gates that control output and input of the node. Each gate is in fact a
sigmoid function of the input to the LSTM node.

The first gate is an input gate which controls whether new input is available for the node.
The second gate is a forget gate which makes possible for the node to reset activation
values of the memory cell. Here last gate is an output gate controlling which parts of the
cell output are available to the next nodes.

Further improvement to the RNN models based on LSTM is achieved by the use of
opposite two directional layers or so-called Bidirectional Long Short-Term Memory
(BLSTM). The goal of the forward layer is to learn context of the input by processing the
sequence from the beginning to the end, while the backwards layer performs the opposite

C-DAC ACTS (Pune)/PG- Page|


HCR system to transform handwritten text to printed format and evaluation of various models

operation by processing the sequence from the end to the beginning. It was demonstrated
that this architecture performs better than a simple uni-directional LSTM.

Connectionist Temporal Classification approach (CTC)-


The feed-forward approach is similar to the original recurrent neural networks by the fact
both architectures require a direct alignment between the input features and target
variables. However in the real-world handwriting recognition problems it is much easier
to segment text into words rather than individual characters. Achieving direct alignment
between the image of input character and character target label would require a prior
segmentation step.

Being a very hard problem by itself its complexity keeps increasing in time since there is
a high tendency of encountering possible errors already in the segmentation step. As a
result this would also limit the context of the data learned by the RNN. To target this
problem the Connectionist Temporal Classification (CTC) approach was introduced,
originally for speech recognition and afterwards also for handwriting recognition.

CTC makes it possible to avoid the previously mentioned direct alignment between the
input variables and the target labels by interpreting the output of the network as a
probability distribution over all possible label sequences on the given input sequence.

C-DAC ACTS (Pune)/PG- Page|


HCR system to transform handwritten text to printed format and evaluation of various models

Chapter 4
Implementation

1. Use of Python Platform for writing the code with Keras, TensorFlow, OpenCV
2. Hardware and Software Configuration:
Hardware Configuration:
● CPU: 8 GB RAM, Quad core processor
● GPU: 16GB RAM Nvidia's GTX 1080Ti

Software Required:
● Anaconda: It is a package management software with free and open-source
distribution of the Python and R programming language for scientific
computations (data science, machine learning applications, large-scale data
processing, predictive analytics, etc.), that aims to simplify deployment.
● Jupyter Notebook:
Jupyter is a web-based interactive development environment for Jupyter
notebooks, code, and data.
Jupyter is flexible: configure and arrange the user interface to support a
wide range of workflows in data science, scientific computing, and
machine learning.
Jupyter is extensible and modular: write plugins that add new components
and integrate with existing ones.
● Spyder: Spyder, the Scientific Python Development Environment, is a free
integrated development environment (IDE) and open source scientific
environment that is included with Anaconda written in Python, for Python,
and designed by and for scientists, engineers and data analysts.
It includes editing, interactive testing, debugging, and introspection features
with the data exploration, interactive execution, deep inspection, and
beautiful visualization capabilities of a scientific package.

C-DAC ACTS (Pune)/PG- Page|


HCR system to transform handwritten text to printed format and evaluation of various models

CRNN Model:

C-DAC ACTS (Pune)/PG- Page|


HCR system to transform handwritten text to printed format and evaluation of various models

Model Summary –

C-DAC ACTS (Pune)/PG- Page|


HCR system to transform handwritten text to printed format and evaluation of various models

CTC Layer-

User Interface –

C-DAC ACTS (Pune)/PG- Page|


HCR system to transform handwritten text to printed format and evaluation of various models

Chapter 5
Results
Epochs

Loss Curve

C-DAC ACTS (Pune)/PG- Page|


HCR system to transform handwritten text to printed format and evaluation of various models

Some Predicted Results –

C-DAC ACTS (Pune)/PG- Page|


HCR system to transform handwritten text to printed format and evaluation of various models

Accuracy Score –

C-DAC ACTS (Pune)/PG- Page|


HCR system to transform handwritten text to printed format and evaluation of various models

Chapter 6
Conclusion
6.1 Conclusion

● In this Python project, we have built a Handwritten character recognition


system. We used OpenCV for image preprocessing and for word
segmentation and then we proposed an efficient handwritten character
recognition approach using CRNN model to predict handwritten text.
● The experimental results shown that convolutional recurrent neural networks
performance is better in recognition for handwritten documents.
● We could tweak the layers, activations with new/augmented data
accommodating the different styles of handwriting of various persons and
work towards a better performer for deployment under different applications.
The model can be adapted for scalability, better convergence, and better accuracy.

6.2 Future Enhancement –


● Segmenting words from lines and paragraphs can be improved.
● We can add more CNN and RNN layers to improve the accuracy.
● The project can be generalized to recognize both handwritten and printed text
images.

C-DAC ACTS (Pune)/PG- Page|


HCR system to transform handwritten text to printed format and evaluation of various models

Chapter 7
References
[1] Dhar, Dibyajyoti & Garain, Avishek & Singh, Pawan & Sarkar, Ram. (2021).
HP_DocPres: a method for classifying printed and handwritten texts in doctor’s
prescription. Multimedia Tools and Applications. 80. 1-34.
10.1007/s11042-020-10151-w.

[2] AL-Saffar, A.; Awang, S.; AL-Saiagh, W.; AL-Khaleefa, A.S.; Abed, S.A. A
Sequential Handwriting Recognition Model Based on a Dynamically Configurable
CRNN. Sensors 2021, 21, 7306. https:// doi.org/10.3390/s21217306

[3] Chammas, Edgard & Mokbel, Chafic & Likforman-Sulem, Laurence. (2018).
Handwriting Recognition of Historical Documents with Few Labeled Data. 43-48.
10.1109/DAS.2018.15.

[4] Shabana Mehfuz,Gauri katiyar, ‘Intelligent Systems for Off-Line Handwritten


Character Recognition: A Review” ,International Journal of Emerging Technology and
Advanced Engineering Volume 2, Issue 4, April 2012. Access Date: 09/07/2015.

[5] Shkarupa, Yaroslav & Mencis, Roberts & Sabatelli, Matthia. (2016). Offline
Handwriting Recognition Using LSTM Recurrent Neural Networks. THE 28TH
BENELUX CONFERENCE ON ARTIFICIAL INTELLIGENCE November 10-11,
2016, Amsterdam (NL). 1. 88.

[6] J. Sueiras, V. Ruiz, A. Sanchez and J.F. Velez, “Offline Continuous Handwriting
Recognition using Sequence to Sequence Neural Networks”, Neurocomputing, Vol. 289,

C-DAC ACTS (Pune)/PG- Page|


HCR system to transform handwritten text to printed format and evaluation of various models

pp. 119-128, 2018.

[7] https://github.com/Breta01/handwriting-ocr

[8] https://github.com/githubharald/SimpleHTR

[9] https://github.com/solivr/tf-crnn

[10] https://keras.io/examples/vision/handwriting_recognition/

C-DAC ACTS (Pune)/PG- Page|

You might also like