You are on page 1of 40

HANDWRITTEN TEXT RECOGNITION USING

PYTHON

A MINOR PROJECT REPORT [INTERNSHIP


REPORT]

Submitted by

Shubham Goel [RA1811033010004]


Prashant Kaundal [RA1811033010001]

Under the guidance of


Dr. R Siva
(Assistant Professor
Department of Computational Intelligence)

in partial fulfillment for the award of the degree


of

BACHELOR OF TECHNOLOGY
in

COMPUTER SCIENCE & ENGINEERING


of

FACULTY OF ENGINEERING AND TECHNOLOGY

S.R.M. Nagar, Kattankulathur, Chengalpattu District

MAY 2022
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
(Under Section 3 of UGC Act, 1956)

BONAFIDE CERTIFICATE

Certified that 18CSP107L minor project report [18CSP108L internship


report] titled “HANDWRITTEN TEXT RECOGNITION USING PYTHON
” is the bonafide work of “SHUBHAM GOEL [RA181033010004], PRASHANT

KAUNDAL [RA1811033010001]” who carried out the minor project work[internship]


under my supervision. Certified further, that to the best of my knowledge the
work reported herein does not form any other project report or dissertation on
the basis of which a degree or award was conferred on an earlier occasion on
this or any other candidate.

SIGNATURE SIGNATURE

Dr. R SIVA Dr. R. ANNIE UTHRA


GUIDE HEAD OF THE DEPARTMENT
Assistant Professor Professor
Dept. of Computational Intelligence Dept. of Computational Intelligence
School of Computing School of Computing
SRM Institute of Science and Technology SRM Institute of Science and Technology
Kattankulathur, Tamil Nadu - 603203 Kattankulathur, Tamil Nadu - 603203

Signature of Internal Examiner Signature of External Examiner


Department of Computational Intelligence
SRM Institute of Science Technology
Own Work* Declaration Form

This sheet must be filled in (each box ticked to show that the condition has been
met). It must be signed and dated along with your student registration number
and included with all assignments you submit – work will not be marked unless
this is done.
To be completed by the student for all assessments

Degree/ Course : Bachelor of Technology in Computer Science Eng.

Student Name : Shubham Goel, Prashant Kaundal

Registration Number : RA1811033010004, RA1811033010001

Title of Work : Handwritten Text Recognition using Python

I hereby certify that this assessment compiles with the University’s Rules and
Regulations relating to Academic misconduct and plagiarism**, as listed in the
University Website, Regulations, and the Education Committee guidelines.

I confirm that all the work contained in this assessment is my own except where
indicated, and that I have met the following conditions:

● Clearly referenced / listed all sources as appropriate


● Referenced and put in inverted commas all quoted text (from books, web, etc)
● Given the sources of all pictures, data etc. that are not my own
● Not made any use of the report(s) or essay(s) of any other student(s) either past
or present
● Acknowledged in appropriate places any help that I have received from
others (e.g. fellow students, technicians, statisticians, external sources)
● Compiled with any other plagiarism criteria specified in the Course
handbook / University website

I understand that any false claim for this work will be penalized in accordance
with the University policies and regulations.

DECLARATION:
I am aware of and understand the University’s policy on Academic misconduct and plagiarism and I
certify that this assessment is my / our own work, except where indicated by referring, and that I have
followed the good academic practices noted above.

If you are working in a group, please write your registration numbers and sign with the date for
every student in your group.
ABSTRACT

Variety of important things are there that we all have in common. However,
several distinctions mark the identity of each individual. Apart from DNA,
fingerprints, and other biometrics, another distinctive feature which lately, fresh
assessments on handwriting evaluation have released is handwriting. Although
duplication of handwriting is debatable and fabrication is a big problem, several
factors like pen holding method, pressure applied, type of strokes, etc, give
uniqueness to handwritten text. Handwriting Recognition is becoming more
important in various fields such as, verifying imprints in banks, scanning ZIP
codes on letters, etc, as technologies are attaining commendable levels of
perception and ability to identify distinction with time. Digitalization of all the
verification and attestation work in banks and other sectors can save heaps of
man power and money. As already discussed, since each person has his own
way of presenting things on paper, training a system to recognize handwritten
text could be a complex task. Aim of digitalization of text recognition is to
identify the deviations in order to identify the text clearly. Reading handwriting
necessitates the use of particular abilities. People's inability to read their own
handwriting is a typical complaint and justification.Hence expecting a computer
to do so seems foolhardy.Distinct algorithms, having several positives and
negatives are available for devising CR models and feature extraction.[1] Our
main aim is to develop having an incredibly high precision rate and minimal
existence complexity, as well as an optimal fit.

iii
ACKNOWLEDGEMENT

We take this opportunity to thank our Dean (College of Engineering & Technology), Prof. T. V.
Gopal, SRM Institute of Science and Technology for providing the facilities that were needed for this
project. We would also like to thank Dr. R. Annie Uthra, Professor & Head of Department of
Computational Intelligence, SRM IST for giving us a conducive atmosphere for fostering the project.

We would like to thank our project coordinators Dr. K. Kottilingam for his utmost guidance during
project development. He has been very instrumental in making our project a success by giving us
valuable advice from time to time.

We want to express our deepest gratitude to our guide Dr. R. Siva Assistant Professor,
Sr.G for his important guidance, consistent encouragement, personal caring, opportune assistance and
providing us with an astounding atmosphere for doing the research. All through the week in spite of
his busy schedule, he has broadened his support for us towards the finishing of the undertaking.
Lastly, we thank everyone who was involved during the incubation of this project.
TABLE OF CONTENTS

ABSTRACT iii

TABLE OF CONTENTS iv

LIST OF FIGURES v

ABBREVIATIONS vi

1 INTRODUCTION 1
2 LITERATURE SURVEY 3
2.1 Existing Systems 3
3 METHODOLOGY 6
3.1 Dataset 6
3.2 Techniques Used 11
3.3 Image Preprocessing 14
3.4 Approach 16
3.5 Tools Used 19
4 ARCHITECTURE & IMPLEMENTATION 22
4.1 CNN Layer 22
4.2 RNN Layer 24
4.3 CTC Layer 24
5 RESULTS AND DISCUSSIONS 26
6 CONCLUSION AND FUTURE ENHANCEMENT 28
6.1 Conclusion 28
REFERENCES 29
LIST OF FIGURES

3.1 Image from IAM Dataset 7


3.2 Image from IAM Dataset 8
3.3 Pictorial Representation of CNN 11
3.4 Pictorial Representation of RNN 12
3.5 Basic Representation of LSTM 13
3.6 Mathematical Interpretation of Misfortune 17
3.7 Pictorial Representation of Algorithm 18
4.1 Network Architecture of implemented model 23
4.2 Transcript decoding process 25
5.1 Outputs received for some input images 26
5.2 Outputs received for some input images 27
ABBREVIATIONS

SVM Support Vector Machines

MLP Multilayer Perceptron

CNN Colvonutional Neural Network

RNN Recurrent Neural Network

CSV Comma Seprated Value


ReLU Rectified Linear Unit
CHAPTER 1

INTRODUCTION

If we consider the areas of image care and model recognition, handwritten text affirmation is now
one of the most gripping and fascinating fields of study. It gives us the capability to accomplish a
better interactive system between humans and computers by conferring largely to the advancement
of motorization interaction. “Handwriting Recognition is the process of transforming a language that
has been previously introduced in its unique spatial type of graphical engravings into a meaningful
representation”[2].

The core of this research, acknowledgement of text which is not connected, has been an important
area in the past decade and has achieved new heights of progress. A huge array of techniques have
been offered and are being experimented with all around the world, to improve further in this field.
The working of various systems which have been made to achieve this are generally exposed in
architecture[3][4]. Looking into the assignment of written by hand text and turning decoded data to a
computerized programme, is the goal of this project. Handwritten text is a broad phrase, we hope to
limit the scope of this research by emphasizing the importance of decoded content for our ideas. The
challenge of maneuvering the picture of all handwritten text, architectured in cursive or square form
has been taken by us in this project.

Processing an image by breaking it, is a very easy task for our brain. Moreover it can also identify all
the constituents of the image. Doing an action exactly by practicing it often and storing it in our
memory is something which we do on a regular basis. The purpose of this challenge is to accurately
recognize images from a collection having thousands of images of physically composed text, to play
around with various systems and understand their effectiveness ourselves and to see positives and
negatives of several algorithms. Despite the abundance of technological writing tools, many people
still choose to take their notes traditionally with pen and paper.

1
However, there are drawbacks to hand-writing text. It’s difficult to store and access physical
documents in an efficient manner, search through them efficiently and to share them with others.
Thus, a lot of important knowledge gets lost or does not get reviewed because of the fact that
documents never get transferred to digital format. We have thus decided to tackle this problem in our
project because we believe the significantly greater ease of management of digital text compared to
written text will help people more effectively access, search, share, and analyze their records, while
still allowing them to use their preferred writing method.

2
CHAPTER 2

LITERATURE SURVEY

2.1 Existing Systems


“Handwritten word recognition based on structural characteristics and lexical
support”[5]: A handwritten recognition calculation which is dependent upon fundamental features
has been put forward in this research. In order to treat the 32x32 matrix of letters as 280-evaluation
vectors, well-known histograms which are even and vertical are combined with the newly introduced
spiral, out-in outspread, and in-out outspread histograms.

A linguistic component reliant upon FSAs(non-cyclic) has kept the recognition process going. The
pre-processing stage, disconnected character pictures are delivered which are utilized as contribution
to the character acknowledgment module. Each character is, at that point, addressed as a
280-measurement vector. Each character is standardized in a 32x32 framework. The even histogram,
the vertical histogram, the outspread histogram, the out-in profile and the in-out spiral profile are, at
that point, determined. The acknowledgment interaction has been upheld by a lexical part dependent
on unique non-cyclic FSAs (Finite-State-Automatas). The built dictionary utilized in the current
framework comprises 230,000 Greek words (normal word length: 9.5 characters; letter set size: a
day and a half).

This study sheds light on a mechanism for handwritten text recognition. The suggested procedure
focuses on withdrawing the features that excellently describe a handwritten text, resulting in a novel
spiral histogram as well as two novel in-far and far-in descriptions. These important features in
addition to the prominent level as well as vertical histograms, form a solid representation of a
transcribed character. The proposed technique was tested with the help of two different data sets,
with recognition precision ranging between 72.8 percent to 98.8 percent.

Handwritten Word Recognition with Character and Inter-Character Neural Networks[6]: A


handwritten word recognition program that is not connected has been shown. The collections of
prospective strings have been linked to images of handwritten text.

3
A visual representation of the word is divided to natives. The optimum combination of groups of
indigenous' associations as well as a vocabulary series has been found with the usage of dynamic
coding. Neural organizations distribute information. Scores must be matched between characters and
parts. Neural Organizations assign a level of certainty to various collections of pieces that are
suitable for situations requiring character clarity and that this certainty is integrated into the
one-of-a-kind programming.

The division interaction at first recognizes associated segments. Some straightforward gathering and
commotion evacuation is performed. The outcomes are alluded to as the underlying portions.
Character certainty task alludes to the accompanying interaction: given a picture s; and a character
class c; allocate a worth to s that shows how much s addresses c: This contrasts from the character
acknowledgment model which is: given a picture s; and a bunch of character classes figure out
which class s has a place with. The calculation is carried out utilizing a match network approach. We
depict it first for the case that solitary character neural organizations are utilized. For each string in
the vocabulary, an exhibit is shaped. The columns of the exhibit compare to the characters in the
string. The segments of the exhibit relate to crude portions.

This paper has introduced a text recognition computation which is based on dynamic programming,
neural-network dependent character recognition, and neural network dependent among character
resemblance scores. A demonstration has been made that data in between characters can result in a
significant improvement in performance.

HandWritten Character Recognition using Artificial Neural Network[7]. This project was developed
with the purpose to use neural networks for the acknowledgement of handwritten characters. It
creates a proper neural system and trains it properly. For preparation reasons, the target yield can be
marked and characters can be separated by the programme.

For the purpose of recognition, following the image's programmed controlling, the preparation set of
data is used to create an arrangement motor. The coding is done in MATLAB and supported by a
GUI[8]. The information is handled by the organization that travels through each neuron, contrasting
the information image with each neuron and assigning a value based on the level of comparability
among the information image and the neurons.

4
The neuron which is nearest to the information image is evaluated as having the greatest yield and
being close to the input. Considering this case, a slew of raucous characters are gathered with the
help of automatic addition of disturbance and a other than zero mean as well as fluctuation mean[9].

5
CHAPTER 3

METHODOLOGY

3.1 Dataset
There are more than 13000 photos consisting of written by hand sentences which
have been produced by more than 650 individual authors. The sentences duplicated by them form a
bundle of English collections. At several stages the IAM dataset has been marked. The entire
collection of data is available online on the fki website[10]. It contains certain types of unbridled
written by hand sentences that were all skimmed through at 300dpi resolution moreover they were
stored in PNG format. A photo has been attached below as an example of a finished sentence, in
addition to that some words which were extracted.
The IAM Handwriting Database contains forms of handwritten English text which can be used to
train and test handwritten text recognizers and to perform writer identification and verification
experiments.

The database was first published at the ICDAR 1999. Using this database an HMM based
recognition system for handwritten sentences was developed and published at the ICPR 2000 .The
segmentation scheme used in the second version of the database is documented in and has been
published in the ICPR 2002. The IAM-database as of October 2002 is described in. We use the
database extensively in our own research, see publications for further details.

The database contains forms of unconstrained handwritten text, which were scanned at a resolution
of 300dpi and saved as PNG images with 256 gray levels. The figure below provides samples of a
complete form, a text line and some extracted words.

IAM Handwriting Dataset is a collection of handwritten passages by several writers. Generally, they
use that data to classify writers according to their writing styles. A traditional way of solving such
problem is extracting features like spacing between letters, curvatures, etc. and feeding them into
Support Vector Machines. But, I wanted to solve this problem by Deep learning using Keras and
Tensorflow. For the purpose, we don't need the full IAM Handwriting Dataset, but some authentic
subset which can be used for training such as a subset of images by top 50 persons who contributed
the most towards the dataset.

6
This database given its breadth, depth, and quality tends to serve as the basis for many handwriting
recognition tasks and for those reasons motivated our choice of the IAM Handwriting Dataset as the
source of our training, validation, and test data for our models. Last but not least, in deep learning
large datasets–even with many pre-trained models–are very important and this dataset containing
over 100K+ word instances met those requirements(deep learning model need at least 105 − 106
training examples in order to be in position to perform well, notwithstanding transfer learning).

Fig 3.1: Image from IAM dataset

7
Fig 3.2: Image from IAM dataset

8
We trained our model on IAM dataset with some additional created dataset. For training we put the
contents of the code file model.zip into the model directory.

To Train the model from scratch

To validate the model

9
To Prediction

Run in Web with Flask

10
3.2 Techniques Used
● Convolutional Neural Network (ConvNet or CNN)

Convolutional neural networks can be identified as a distinguished neural network that has
been used for the purpose of image recognition and categorization without any failure. In addition to
setting a milestone in the fields of self driving automobiles and robots, they have performed really
well in areas like object locationing, identification, retrieval, bisection and so on.

Fig 3.3: Pictorial Representation of CNN

● Recurrent neural network (RNN)


Recurrent neural networks are a portion of ANN where tree structure associations form an
organized chart in a fast paced sequence. This enables it to exhibit transient dynamic
conduct. RNN's being derived from feedforward neural establishment, utilize the inward
status of sources of data to be dealing with varying length sequences. As a result, they are
well-suited to tasks such as undivided, associated handwriting recognition as well as
discourse recognition.

11
Fig 3.4: Pictorial Representation of RNN

● Long short-term memory (LSTM)


It's a fictitious recurrent neural network design. LSTM has linkages which are analytical, as
opposed to ordinary feedforward neural linkages. LTSM may cycle not only individual
information pivots (such as photos), but also entire information organizations (such as voice,
photos, etc). For example, in interruption location frameworks, LSTM is useful for tasks like
undivided, associated handwritten recognition, discourse recognition, and abnormality
acknowledgement.

Long Short Term Memory Network is an advanced RNN, a sequential network, that allows
information to persist. It is capable of handling the vanishing gradient problem faced by
RNN. A recurrent neural network also known as RNN is used for persistent memory. RNNs
remember the previous information and use it for processing the current input. The
shortcoming of RNN is, they can not remember Long term dependencies due to vanishing
gradients. LSTMs are explicitly designed to avoid long-term dependency problems. Just like

12
a simple RNN, an LSTM also has a hidden state where H(t-1) represents the hidden state of
the previous timestamp and Ht is the hidden state of the current timestamp. In addition to that
LSTM also have a cell state represented by C(t-1) and C(t) for previous and current
timestamp respectively.

Here is an example to understand how LSTM works. Here we have two sentences separated
by a full stop. The first sentence is “Bob is a nice person” and the second sentence is “Dan,
on the Other hand, is evil”. It is very clear, in the first sentence we are talking about Bob and
as soon as we encounter the full stop(.) we started talking about Dan. As we move from the
first sentence to the second sentence, our network should realize that we are no more talking
about Bob. Now our subject is Dan. Here, the Forget gate of the network allows it to forget
about it.

Fig 3.5: Basic Representation of LSTM

● Connectionist temporal classification (CTC)


Connectionist temporal classification being a type of neural network yield as well as
affiliated scoring capacity that is used to prepare recurrent RNNs, such as LSTMs to deal with time

13
challenges. It is commonly used for tasks such as on-line handwritten recognition as well as
recognising phonemes in speech sound. The input is a sequence of observations, and the outputs are
a sequence of labels, which can include blank outputs. The difficulty of training comes from there
being many more observations than there are labels.

Since we don't know the alignment of the observed sequence with the target labels we predict a
probability distribution at each time step. A CTC network has a continuous output (e.g. softmax),
which is fitted through training to model the probability of a label. CTC does not attempt to learn
boundaries and timings: Label sequences are considered equivalent if they differ only in alignment,
ignoring blanks. Equivalent label sequences can occur in many ways – which makes scoring a
non-trivial task, but there is an efficient forward–backward algorithm for that.

The issue with methods not using CTC is, what to do when the character takes more than one
time-step in the image? Non-CTC methods would fail here and give duplicate characters. To solve
this issue, CTC merges all the repeating characters into a single character. For example, if the word
in the image is ‘hey’ where ‘h’ takes three time-steps, ‘e’ and ‘y’ take one time-step each. Then the
output from the network using CTC will be ‘hhhey’, which as per our encoding scheme, gets
collapsed to ‘hey’.

If we aren’t careful, the CTC loss can be very expensive to compute. We could try the
straightforward approach and compute the score for each alignment summing them all up as we go.
The problem is there can be a massive number of alignments. For a of length without any repeat
characters and an of length the size of the set is For and this number is almost For most problems
this would be too slow. Thankfully, we can compute the loss much faster with a dynamic
programming algorithm. The key insight is that if two alignments have reached the same output at
the same step, then we can merge them.

3.3 Image Preprocessing


● The duty of image pre-processing has been delegated to a different module.
● The module has been assigned with the task of application of image augmentation on
individual images or manipulating their width by an arbitrary factor.

14
● This is due to the fact that techniques such as rotation and inclusion of noise result in
significant waste of data as a result the image becomes redundant.
● The application of contrast extension is done to the photo produced in the previous stage in
the next phase.
● After this the application of morphological action of erosion to sharpen sentences on the
image is done. As a result, accuracy of our program will improve.

15
3.4 Approach
To begin, extract the significant components from the text in the photo using a recurrent
CNN. The output prior to the convolutional neural network FC layer(512x100) is provided to the BLSTM,
that is used for arrangement dependence as well as time-progression function. The return of BLSTM is
100x80, which translates to 100 timesteps and 80 digits, inclusive of clear.

The RNN is then equipped using CTC LOSS, that removes the symmetry problem in manually
written because each writer's layout is different. We just produced the image's text and also the
BLSTM return; after which it calculates misfortune in the form of log("gtText"); anticipate to be
restricted to the biggest minus possibility path. The specified marks are then used by CTC to find
probable pathways.

Finally, at the time of estimation, we used CTC decode to decrypt the data. First Use Convolutional
Recurrent Neural Network to extract the important features from the handwritten line text Image.
The output before CNN FC layer (512x100x8) is passed to the BLSTM which is for sequence
dependency and time-sequence operations.

Then CTC LOSS Alex Graves is used to train the RNN which eliminate the Alignment problem in
Handwritten, since handwritten have different alignment of every writers. We just gave the what is
written in the image (Ground Truth Text) and BLSTM output, then it calculates loss simply as
-log("gtText"); aim to minimize negative maximum likelihood path. Finally CTC finds out the
possible paths from the given labels.

16
Fig 3.3: Mathematical Interpretation of Misfortune

17
Fig 3.4: Pictorial Representation of Algorithm

18
3.4 Tools Used

Python-3 has been chosen as the coding language for the development of this
project. It is usually used in deep learning projects and is fairly easy to download and to work on.
Moreover, it has a very large user community so any problem or issues will be easier to deal with
with the participation of the community. Additionally, it also provides packages for every task.
Following python packages has been used:

● Numpy
○ NumPy is the fundamental package for scientific computing in Python.
○ It is a Python library that provides a multidimensional array object, various derived
objects (such as masked arrays and matrices), and an assortment of routines for fast
operations on arrays, including mathematical, logical, shape manipulation, sorting,
selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical
operations, random simulation and much more.
○ At the core of the NumPy package, is the ndarray object. This encapsulates
n-dimensional arrays of homogeneous data types, with many operations being
performed in compiled code for performance.

● Flask
○ Flask is a micro web framework written in Python.
○ It is classified as a microframework because it does not require particular tools or
libraries.
○ It has no database abstraction layer, form validation, or any other components where
pre-existing third-party libraries provide common functions.
○ However, Flask supports extensions that can add application features as if they were
implemented in Flask itself.
○ Extensions exist for object-relational mappers, form validation, upload handling,
various open authentication technologies and several common framework related
tools

19
● Opencv-3
○ OpenCV (Open Source Computer Vision Library) is a library of programming
functions mainly aimed at real-time computer vision.
○ Originally developed by Intel, it was later supported by Willow Garage then Itseez
(which was later acquired by Intel.
○ The library is cross-platform and free for use under the open source apache 2 library.
○ Starting with 2011, OpenCV features GPU acceleration for real-time operations.
○ OpenCV is written in C++ and its primary interface is in C++, but it still retains a less
comprehensive though extensive older C interface.

● Tensorflow 1.8.0
○ TensorFlow is a free and open source software library for machine learning
and artificial intelligence.
○ It can be used across a range of tasks but has a particular focus on training and
inference of deep neural networks.
○ TensorFlow was developed by the Google Brain team for internal Google use
in research and production The initial version was released under the Apache
License 2.0 in 2015.
○ Google released the updated version of TensorFlow, named TensorFlow 2.0,
in September 2019.
○ TensorFlow can be used in a wide variety of programming languages, most
notably Python, as well as Javascript, C++, and Java.
○ This flexibility lends itself to a range of applications in many different sectors.
Starting in 2011, Google Brain built DistBelief as a proprietary machine
learning system based on deep learning neural networks.
○ Its use grew rapidly across diverse Alphabet companies in both research and
commercial applications. Google assigned multiple computer scientists,
including Jeff Dean, to simplify and refactor the codebase of DistBelief into a
faster, more robust application-grade library, which became TensorFlow.
○ In 2009, the team, led by Geoffrey Hinton, had implemented generalized
backpropagation and other improvements.

20
○ Which allowed generation of neural networks with substantially higher
accuracy, for instance a 25% reduction in errors in speech recognition.
○ TensorFlow computations are expressed as stateful dataflow graphs. The name
TensorFlow derives from the operations that such neural networks perform on
multidimensional data arrays, which are referred to as tensors. During the
Google I/O Conference in June 2016, Jeff Dean stated that 1,500 repositories
on GitHub mentioned TensorFlow, of which only 5 were from Google.

21
CHAPTER 4

ARCHITECTURE & IMPLEMENTATION

4.1 CNN Layers


The entered picture is sent to convolutional neural network layers. The
CNN's were instilled with the ability to bring out at most crucial elements of the picture. Following
are the elements of the suggested section: Rectified Linear Unit Activation, Convolutional and
Max-Pooling Layers. Convolution is an orderly procedure where two sources of information are
intertwined; it’s an operation that changes a function into something else. Convolutions have been
used for a long time typically in image processing to blur and sharpen images, but also to perform
other operations. (e.g. enhance edges and emboss) CNNs enforce a local connectivity pattern
between neurons of adjacent layers. In deep learning, a convolutional neural network (CNN or
ConvNet) is a class of deep neural networks, that are typically used to recognize patterns present in
images but they are also used for spatial data analysis, computer vision, natural language processing,
signal processing, and various other purposes The architecture of a Convolutional Network
resembles the connectivity pattern of neurons in the Human Brain and was inspired by the
organization of the Visual Cortex. This specific type of Artificial Neural Network gets its name from
one..of..the..most..important..operations..in..the..network:..convolution.

22
Fig 4.1: Network Architecture of implemented model

23
4.2 RNN Layers
After this, Recurrent neural network layers take over and extract input from RNN.
RNN was selected due to sequence reliance as well as time order functioning. An execution of RNN,
Bidirectional Long Short-Term Memory execution of Long short-term memory (LSTM) is being
utilized by us. RNN extracts input from CNN before it travels through a fully connected(FC) layer.
This is of the type (512X100) whereas RNN provides output of type (100X80). A recurrent neural
network (RNN) is a class of artificial neural networks where connections between nodes form a
directed or undirected graph along a temporal sequence. This allows it to exhibit temporal dynamic
behavior. Derived from feedforward neural networks, RNNs can use their internal state (memory) to
process variable length sequences of inputs.
This makes them applicable to tasks such as unsegmented, connected handwriting recognition or
speech recognition. Recurrent neural networks are theoretically Turing complete and can run
arbitrary programs to process arbitrary sequences of inputs.

4.3 Connectionist Temporal Classification Layer(CTC) Layer


Connectionist temporal classification (CTC) is a type of neural network output and
associated scoring function, for training recurrent neural networks (RNNs) such as LSTM networks
to tackle sequence problems where the timing is variable.The connectionist temporal classification
layer takes in the return produced, at the time of training, through RNN. For the determination of
deprivation, CTC is provided with GTC(Ground Truth Text). CTC gives out the outcome by
demystifying that matrix which it is provided with at the time of inference.

Connectionist Temporal Classification (CTC) is a type of Neural Network output helpful in tackling
sequence problems like handwriting and speech recognition where the timing varies. Using CTC
ensures that one does not need an aligned dataset, which makes the training process more
straightforward.

24
Fig 4.2: Transcript decoding process

25
CHAPTER 5

RESULTS AND DISCUSSIONS

Collection of images from IAM dataset,easily available on the formal archive, were used for
preparing and testing the project. Around 1000 pictures were used. After completion the system was
put for observation, around 350 pictures were used for observation purposes and it was discovered
that training and testing accuracy are 80% and 88% respectively.
The Levenshtein Distance metric was used in this project for determining efficiency of the system.
The smallest quantity of exclusive character modifications (introduction, terminations or
replacements) needed for turning single word to alternate word is the Levenshtein distance among
these words.

Fig 5.1: Outputs received for some input images

26
Fig 5.2: Outputs received for some input images

27
CHAPTER 6

CONCLUSION

6.1 Conclusion
An attempt has been made to identify various address blocks penned in cursive
and small characters, in this endeavor. The algorithms we utilized have the advantage of being able
to identify text with characters that are somewhat linked. Due to the program's optimized design, it
has been shown that CRNN (CNN + LSTM) can recognize handwritten sentences in photos. This
system uses seven CNN layers and two LSTM layers to construct a matrix of character probabilities.
The CRNN was used to identify the penned line text picture with no division into words or letters.
The CTC loss Function is then employed for learning. The system got an accuracy of around 85
percent at the time of training as well as 80 percent at the time of testing.

28
REFERENCES

[1] Plamondon, Réjean, and Sargur N. Srihari. "Online and offline handwriting recognition: a
comprehensive survey." Pattern Analysis and Machine Intelligence, IEEE Transactions on 22.1
(2000): 63- 84.
[2] Liu, Xia, and Zhixin Shi. "A format-driven handwritten word recognition system."2013 12th
International Conference on Document Analysis and Recognition. Vol. 2. IEEE Computer Society,
2003
[3] Park, Jaehwa, Venu Govindaraju, and Sargur N. Srihari. "OCR in a hierarchical feature space."
Pattern Analysis and Machine Intelligence, IEEE Transactions on 22.4 (2000): 400-407.
[4] Singh, Sameer, Mark Hewitt,“Cursive Digit And Character Recognition on Cedar Database”,
Pattern Recognition, 2000.Proceedings. 15th international conference on. Vol. 2. IEEE 2000.
[5] Graves and N. Jaitly, “Towards end-to-end speech recognition with recurrent neural networks,”
in Proceedings of the 31st International Conference on Machine Learning, 2014, pp. 1764–1772.
6] U. Marti and H. Bunke, “The IAM-database: an English sentence database for offline handwriting
recognition,” International Journal on Document Analysis and Recognition, vol. 5, no. 1, pp. 39–46,
2002.
[7] B. Shi, X. Bai, and C. Yao, “An End-toEnd Trainable Neural Network for Image-based Sequence
Recognition and its Application to Scene Text Recognition,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 39, no. 4, pp. 2298–2304, 2016.
[8] E. Kavallieratou, K. Sgarbas, N. Fakotakis and G. Kokkinakis, “Handwritten Word Recognition
based on Structural Characteristics and Lexical Support”, by Wire Communications Lab, University
of Patras, 26500 Patras, 2003.
[9] Paul D. Gader, Magdi Mohamed, and JungHsien Chiang, “Handwritten Word Recognition with
Character and Inter-Character Neural Networks”, 2012tworks/
[10] Research Group on Computer Vision and Artificial Intelligence — Computer Vision and
Artificial Intelligence (heia-fr.ch)
[12] https://machinelearningmastery.com/pooling-layers-for-convolutional-neural-networks/

29
APPENDIX A

PUBLICATION DETAILS

We submitted our research paper for publication at International Conference on Big data,

Machine Learning and IOT.

Figure B.1: Submission Notification

30
aa
ORIGINALITY REPORT

3 %
SIMILARITY INDEX
1%
INTERNET SOURCES
2%
PUBLICATIONS
1%
STUDENT PAPERS

PRIMARY SOURCES

1
ir.lib.uwo.ca
Internet Source 1%
2
Sepp Hochreiter, Jürgen Schmidhuber. "Long
Short-Term Memory", Neural Computation,
1%
1997
Publication

3
Submitted to Moulton College
Student Paper 1%
4
P.D. Gader, M. Mohamed, Jung-Hsien Chiang.
"Handwritten word recognition with character
1%
and inter-character neural networks", IEEE
Transactions on Systems, Man and
Cybernetics, Part B (Cybernetics), 1997
Publication

5
www.visionbib.com
Internet Source <1 %

Exclude quotes On Exclude matches < 10 words


Exclude bibliography On

You might also like