You are on page 1of 7

International Conference on Latest Trends in Science, Engineering and Technology (ICLTSET’22) on May 20&21, 2022

organized by Karpagam Institute of Technology, Coimbatore

Deep Convolution Neural Network for Automated


robust Alphanumeric character Recognition
Suriya A K,Vengadesan K,Vasanthakumar S Dr.J.Ramya Associate Professor
ECE department,Hindusthan college of engineering and
From final year students of ECE,Hindusthan college technology
Of engineering and technology.
aksuriya2001@gmail.com

Abstract:The goal is to investigate and develop a acquisition, feature extraction, classification, and
Deep Convolution Neural Network for handwritten recognition. Handwriting recognition is the ability of a
character recognition. Handwritten Character machine to receive and interpret the handwritten input
Recognition is the technique of recognising text in from an external source like image. the most aim of this
digital photos an documents and processing it for use project is to style a system that may efficiently recognize
in applications like machine translation and Character the actual character of format employing a neural
recognition.Using several layers, the model extracts network. Neural computing could be a comparatively
features from photos. These are then used to train the new field, and style components are therefore less
model, which is trained on word-images from the IAM well-specified than those of other architecture. Neural
dataset,allowing it to recognise characters.This computers implement data parallelism. A neural
investigates the use of DCNN for more accurate computer is operated in a very way that's completely
detection and recognition of handwritten text images. different from the operation of a standard computer.
The DCNN model is validated based on its Neural computers are trained and that they don't seem to
performance on English handwritten characters and be programmed. so that the given data is compared to the
following characters “!”#&’()*+,./0123456789:;?ABC trained system and provides the acceptable output text to
DEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnop the user. A handwriting recognition system handles
qrstuvwxyz”,the RNN propagates relevant information formatting, performs correct segmentation into
through this sequence. The popular Long Short-Term characters, and finds the foremost plausible words.
Memory (LSTM) implementation of RNNs is used, as it Off-line handwriting recognition involves the automated
is able to propagate information through longer conversion of the text in a picture into letter codes that
distances and provides more robust training are usable within computer and text-processing
characteristics than vanilla RNN.Convolution neural applications. the info obtained by this manner is
networks (CNNs) are very effective in perceiving the considered a static representation of handwriting.
structure of handwritten characters/words in ways that
help in automatic extraction of distinct features and
make CNN the most suitable approach for solving II. PROPOSED WORK
handwriting recognition problems. This system will be A. Handwritten text recognition:
applied to detect the writings of different format
Handwritten Text Recognition (HTR) systems use
Keywords: -Handwritten Character Recognition, scanned images of handwritten text, as demonstrated in
Convolutional Neural Network, Feature extraction, Figure 1. We will construct a Neural Network (NN) that
Recurrent neural network, Long Short-Term Memory will is trained using the IAM dataset's word-image
(LSTM),Connectionist Temporal Classification (CTC) because the input layer (and therefore also all the
I. INTRODUCTION opposite layers) are often kept small for word-images,
NN-training is possible on the CPU (of course, a GPU
Image processing could be a manipulation of images
would be better). For the implementation of HTR, the
within the computer vision. With the event of
minimum requirement is TF
technology, there are many techniques for the
manipulation of the photographs. The text recognition
includes a huge role in many areas. But it's difficult to
try and do such a task by a machine. For recognition,
we've to coach the system to acknowledge the text. The Fig. 1: Image of word taken from IAM Dataset
character recognition involves several steps like
International Conference on Latest Trends in Science, Engineering and Technology (ICLTSET’22) on May 20&21, 2022
organized by Karpagam Institute of Technology, Coimbatore
B. Model Overview: each layer, feature channels are added, so that the output
feature sequence has a size of 32×256.
For our task, we employ a NN. Convolutional neural
network (CNN) layers, recurrent neural network (RNN) RNN: The feature sequence consists of 256 features
layers, and a Connectionist Temporal Classification layer per time-step, the RNN propagates relevant information
make up the system layer(CTC) We used 5 CNN (feature through this sequence. The favored Long Short-Term
extraction) layers, two RNN layers, and a CTC layer in Memory (LSTM) implementation of RNNs is employed
this project (calculate the loss). To begin, we must first because it is in a position to propagate information
preprocess the images in order to remove the noise. The through longer distances and provides more robust
NN can alternatively be thought of as a function (see Eq. training-characteristics than vanilla RNN. The RNN
1) that transfers a picture (or matrix) M of size WH to a output sequence is mapped to a matrix of 32×80. The
personality sequence (c1, c2,...) with a length between 0 IAM dataset contains 79 different characters, further one
and L. As you can see, the text is identified at the additional character is required for the CTC operation
character level, therefore words or texts not contained (CTC blank label), so that there are 80 entries for every
within the training data is recognized too (as long of the 32 time-steps.
because the individual characters get correctly
classified). CTC: while training the NN, the CTC is given the
RNN output matrix and also the ground truth text and it
computes the loss value. While inferring, the CTC is just
given the matrix and it decodes it into the ultimate text.
Both the bottom truth text and also the recognized text
are often at the most 32 characters long.

Data:

Input: It is a gray-value image of size 128×32.


Usually, the pictures from the dataset don't have exactly
this size, therefore we resize it (without distortion) until
it either contains a width of 128 or a height of 32. Then,
Fig. 2: Green indicates the operations of NN and Pink indicate the we place the image in a (white) target image of size
dataflow through NN. 128×32. This process is shown in Fig. 3. Finally, we will
normalize the gray-values of the image so that it could
simplify the task for the NN. Data augmentation can
easily be integrated by copying the image to random
positions rather than aligning it to the left or by
randomly resizing the image.

Eq. 1: The Neural Network was written as a mathematical function


that maps an image M to a character sequence (c1, c2, …).

C. Operations:

CNN: The input image is given to the CNN layers.


These layers are trained to take out relevant features
from the image. Each layer consists of three operations. Fig. 4: Left: a picture from the dataset with an arbitrary size. it's
First, the convolution operation, 5×5 filter is used in the scaled to suit the target image of size 128×32, the empty a part of the
target image is crammed with white color.
first two layers and 3×3 filter used in the last three
layers to the input. Then, the non-linear RELU function CNN output: Figure. 5 displays the output of the
is applied. At last, a pooling layer summarizes image CNN layers which may be a sequence of length 32. Each
regions and outputs a downsized(smaller) version of the layer entry contains 256 features. All these features are
input. While the height of image size is reduced by 2 in further process has been done by the RNN layers,
International Conference on Latest Trends in Science, Engineering and Technology (ICLTSET’22) on May 20&21, 2022
organized by Karpagam Institute of Technology, Coimbatore
however, some features already showed high correlation
with certain high-level properties of the input image:
there are some features which have a high correlation
with characters for example "e", or with duplicate
characters for example "ll", otherwise properties of
character like loops as already present in handwritten

Fig. 6: Top: output matrix of the RNN layers. Middle: input image.
Bottom: Probabilities for the characters “l”, “i”, “t”, “e” and therefore
the CTC blank label.

D. Implementation using TF:


The implementation consists of 4 modules:
1. SamplePreprocessor.py: prepares the pictures from
the IAM dataset for the NN
Fig. 5: Top: 256 feature per time-step are computed by the CNN 2. DataLoader.py: reads samples, put them into batches
layers. Middle: input image. Bottom: plot of the 32nd feature, which and provides an iterator-interface to travel through the
has a high correlation with the occurrence of the character “e” in the info
image.
3. Model.py: creates the model as described above,
RNN output: Fig. 6 shows a visualization of the RNN loads and saves models, manages the TF sessions and
output matrix for a picture containing the text “little”. provides an interface for training and inference
The matrix shown within the top-most graph consists of 4. main.py: puts all previously mentioned modules
the scores for the characters included in the together
Connectionist Temporal Classification blank label as its We only have a look at Model.py, because the other
last entry. the opposite matrix entries, from top to source files are concerned with basic file IO
bottom, correspond to the subsequent characters: “ (DataLoader.py) and image processing
!”#&’()*+,-./ABCDEFGHIJKLMNOPQRSTUVWXYZ (SamplePreprocessor.py).
abcdefghijklmnop qrstuvwxyz0123456789:;?”. Only the
last character “e” isn't aligned. But this can be OK CNN: For each CNN layer, create a kernel of size k×k
because the CTC operation is segmentation-free and to be utilized in the convolution operation. Then, RELU
doesn't care about absolute positions. From the operation again to the pooling layer with size px×py and
bottom-most graph showing the scores for the characters step-size sx×sy with results of the convolution. These
“l”, “i”, “t”, “e” and also the CTC blank label, the text steps are repeated for all layers during a for-loop. RNN:
can easily be decoded: we just take the foremost Create and stack the two
probable character from each time-step, this forms the
so-called best path, then we throw away repeated RNN: layers consisting of 256 units each. Then, create a
characters and at last all blanks: “l---ii--t-t--l-…-e” → bidirectional RNN from it, such the input sequence is
“l---i--t-t--l-…-e” → “little”. traversed from front to back and therefore the other way
round. As a result, we get two output sequences forward
and backward of size 32×256, which we later
concatenate along the feature-axis to create a sequence
International Conference on Latest Trends in Science, Engineering and Technology (ICLTSET’22) on May 20&21, 2022
organized by Karpagam Institute of Technology, Coimbatore

of size 32×512. Finally, it's mapped to the output passing on the IAM and Bentham HTR datasets. An
sequence (or matrix) of size 32×80 which is fed into the open-source implementation is provided.
CTC layer.
III.THE ARCHITECTURE OF THE PROPOSED
NETWORK
CTC: For loss calculation, we feed both the bottom truth
text and therefore the matrix to the operation. the bottom The proposed network has different layers which are as
shown in the figure below:
truth text is encoded as a sparse tensor. The length of the
input sequences must be given to both CTC operations.

We now have all the input files to make the loss


operation and therefore the decoding operation.

Training: The mean of the loss values of the batch


elements is employed to coach the NN: it's fed into an
optimizer like RMSProp.

E. Improving the model: In case you want to feed


complete text-lines as shown in Fig. 6 instead of
word-images, you have to increase the input size of the
NN

If you want to improve the recognition accuracy, you can


follow one of these hints:
• Data augmentation: increase dataset-size by applying
further (random) transformations to the input images
• Remove cursive writing style in the input images
• Increase input size (if an input of NN is large enough,
complete text-lines can be used)

Word Beam Search: Fig 7:Architecture of proposed network


Recurrent Neural Networks (RNNs) are used for
sequence recognition tasks such as Handwritten Text A. CNN LAYERS
Recognition (HTR) or speech recognition. If trained with
the Connectionist Temporal Classification (CTC) loss
function, the output of such a RNN is a matrix
containing character probabilities for each time-step. A
CTC decoding algorithm maps these character
probabilities to the final text. Token passing is such an
algorithm and is able to constrain the recognized text to
a sequence of dictionary words. However, the running
time of token passing depends quadratically on the
dictionary size and it is not able to decode arbitrary
character strings like numbers. This paper proposes word Fig 8:CNN layer
beam search decoding, which is able to tackle these
problems. It constrains words to those contained in a CNN is meant to imitate human visual processing, and it's
dictionary, allows arbitrary non-word character strings highly optimized structures to process 2D images. Further, it
between words, optionally integrates a word-level can effectively learn the extraction and abstraction of 2D
language model and has a better running time than token features. In detail, the max-pooling layer of CNN is extremely
effective in absorbing shape variations. Moreover, a sparse
reference to tied weights makes CNN involve with fewer
passing. The proposed algorithm outperforms best path
decoding, vanilla beam search decoding and token
International Conference on Latest Trends in Science, Engineering and Technology (ICLTSET’22) on May 20&21, 2022
organized by Karpagam Institute of Technology, Coimbatore
parameters than a totally connected network with similar size.
most significantly, CNN is trainable with the gradient-based
learning algorithm and suffers less from the diminishing ● The formula for applying Activation function(tanh):
gradient problem. providing the gradient-based algorithm
trains the entire network to attenuate a blunder criterion
directly, CNN can produce highly optimized weights and good
generalization performance.
● Formula for calculating output:
RNN layer
RNN have a “memory” which remembers all information
about what has been calculated. It uses the same parameters
for each input as it performs the same task on all the inputs or
hidden layers to produce the output. This reduces the
complexity of parameters, unlike other neural networks. CTC layer:

CTC may be a loss function that is employed to coach neural


networks. there's no must align data because that may assign a
probability for any label.CTC may be a align free.it works on
the summing over the probability of all possible alignments
between the input and also the label.

Here we have an input of size 9 and also the correct


transmission of its iPhone. we force our system to assign an
output character to every input step and so we collapse the
repeats, which ends within the input.

Blank token:

There is how around that called the Blank token. It does mean
anything and simply it gets removed before the final word
output is produced. Now, the algorithm can assign a
personality or blank token to each input step.
• RNN converts the independent activations into dependent
activations by providing the same weights and biases to all the 1. CTC network assigns the character with the simplest
layers, thus reducing the complexity of increasing parameters probability of every input sequence.
and memorizing each previous output by giving each output as 2. Repeats not having a blank token in between get
input to the next hidden layer. merged
3. Lastly, the blank token gets removed.
• Hence these three layers can be joined together such that the
weights and bias of all the hidden layers are the same, into a The CTC network can then give the probability label to the
single recurrent layer. input. By summing all the probabilities of the characters of
each time step.The CTC algorithm is alignment-free — it
● The formula for calculating current state: doesn’t require alignment between the input and thus the
output. However, to induce the probability of output given an
input, CTC works by summing over the probability of all
possible alignments between the two. we'd wish to know what
these alignments are so on know the way the loss function is
ultimately calculated. To motivate the precise sort of the CTC
alignments, first consider a naive approach. Let’s use an
International Conference on Latest Trends in Science, Engineering and Technology (ICLTSET’22) on May 20&21, 2022
organized by Karpagam Institute of Technology, Coimbatore

example. Assume that the given input has length six and Y Future Work: In future we are planning to extend this study
=Y= [c, a, t]. a way to align XX and YY is to assign an output to a larger extent where different embedding models can be
character to each input step and collapsed repeats. Experiment considered on large variety of the datasets. The future is
results.After the text edit has been completed, the paper is completely based on technology no one will use the paper and
ready for the template. Duplicate the template file by using the pen for writing. In that scenario they used write on touch pads
Save As command, and use the naming convention prescribed so the inbuilt software which can automatically detects text
by your conference for the name of your paper. In this newly which they writing and convert into digital text so that the
created file, highlight all of the contents and import your searching and understanding very much simplified.
prepared text file. You are now ready to style your paper; use
the scroll down window on the left of the MS Word ACKNOWLEDGMENT
Formatting toolbar.
This work is partially supported by the grant from the
Summary of Dataset: Hindusthan college of engineering and technology.We thank
Dr.J.Ramya( Associate Professor) for her guidance in this
The IAM Handwriting Database contains forms of project.
handwritten English text which can be used to train and test
handwritten text recognizers and to perform writer REFERENCES
identification and verification experiments. Characteristics of
IAM Dataset.
1. Nurseitov, D., Bostanbekov, K., Kanatov, M.,
● 657 writers contributed samples of their writings Alimova, A., Abdallah, A., & Abdimanap, G. (2021).
● 1532 pages of scanned text. Classification of handwritten names of cities and
● 5685 isolated and labeled sentences. handwritten text recognition using various deep
● 13353 isolated and labeled text lines. learning models. arXiv preprint arXiv:2102.04816.
● 115320 isolated and labeled words. 2. Fischer, A., Frinken, V., Bunke, H.: Hidden Markov
models for off-line cursive handwriting recognition,
in C.R. Rao (ed.): Handbook of Statistics 31, 421 –
442, Elsevier, 2013
3. Frinken, V., Bunke, H.: Continuous handwritten
script recognition, in Doermann, D., Tombre, K.
(eds.): Handbook of Document Image Processing and
Recognition, Springer Verlag, 2014
RESULTS 4. Manoj Sonkusare and Narendra Sahu “A SURVEY
ON HANDWRITTEN CHARACTER
In this project we have given image as an input then it predicts RECOGNITION (HCR) TECHNIQUES FOR
the output by loading the model which is already previously ENGLISH ALPHABETS” Advances in Vision
created and saved. Computing: An International Journal (AVC) Vol.3,
No.1, March 2016
5. N. Venkata Rao and Dr. A.S.C.S.Sastry - ―Optical
The above image is the input given to the neural network to CharacterRecognition Technique Algorithmsǁ-2016
predict the solution. Journal of Theoretical and Applied Information
Technology.
6. Sonkusare, M., & Sahu, N. (2016). A survey on
handwritten character recognition (HCR) techniques
for English alphabets. Advances in Vision
Computing: An International Journal (AVC), 3(1).
7. Manchala, Sri Yugandhar, Jayaram Kinthali,
Kowshik Kotha, J. J. K. S. Kumar, and Jagilinki
CONCLUSION AND FUTURE SCOPE : Jayalaxmi. "Handwritten text recognition using deep
learning with Tensorflow." International Journal of
In this project classification of characters takes place. The Engineering and Technical Research 9, no. 5 (2020).
project is achieved through the conventional neural network. 8. Ray, A., Rajeswar, S., & Chaudhury, S. (2015,
The accuracy we obtained in this is above 90.3%. This January). Text recognition using deep BLSTM
algorithm will provide both the efficiency and effective result networks. In 2015 eighth international conference on
for the recognition. The project gives best accuracy for the text
which has less noise. The accuracy completely depending on advances in pattern recognition (ICAPR) (pp. 1-6).
the dataset if we increase the data, we can get more accuracy. IEEE.
If we try to avoid cursive writing then also its best results.
International Conference on Latest Trends in Science, Engineering and Technology (ICLTSET’22) on May 20&21, 2022
organized by Karpagam Institute of Technology, Coimbatore
9. Om Prakash Sharma, M. K. Ghose, Krishna Bikram 12. U.-V. Marti and H. Bunke. Text line segmentation
Shah, “An Improved Zone Based Hybrid Feature and word recognition in a system for general writer
Extraction Model for Handwritten Alphabets independent handwriting recognition. In Proc. 6th Int.
Recognition Using Euler Number”, International Conf. on Document Analysis and Recognition, pages
Journal of Soft Computing and Engineering (ISSN: 159–163.
2231 - 2307), Vol. 2, Issue 2,pp. 504- 508, May 2012. 13. [5] M. Liwicki and H. Bunke, “Iam-ondb - an on-line
10. J.Pradeep, E.Srinivasan and S.Himavathi English sentence database acquired from the
–―Diagonal based feature extraction for handwritten handwritten text on a whiteboard,” in ICDAR, 2005
alphabets recognition system using neural networkǁ -
International Journal of Computer Science &
14. A. Graves and J. Schmidhuber, “Offline handwriting
Information Technology (IJCSIT), Vol 3, No 1, Feb recognition with multidimensional recurrent neural
2011. networks,” in Advances in neural information
processing systems, 2009, pp. 545–552.
11. S. Günter and H. Bunke. A new combination scheme
for HMM-based classifiers and its application to
handwriting recognition. In Proc. 16th Int. Conf. on
Pattern Recognition, volume 2, pages 332–337.
IEEE,2002.

You might also like