Professional Documents
Culture Documents
By
1
HAND WRITTEN PATTERN RECOGNITION USING
CONVOLUTIONAL NEURAL NETWORK
By
ENGINEERING
Institute of Technical Education and Research
SIKSHA ‘O’ ANUSANDHAN (Deemed to be) UNIVERSITY
Bhubaneswar, Odisha, India
(May,2020)
2
CERTIFICATE
This is to certify that the project titled “ Hand Written Pattern Recognition using
/ our supervision and guidance. The project work, in my / our opinion, has reached
the requisite standard, fulfilling the requirements for the degree of Bachelor of
Technology.
The results contained in this thesis have not been submitted in part or full to
any other university or institute for the award of any degree or diploma.
ITER ITER
3
CONTENTS
CHAPTERS PAGES
Chapter 5: RESULTS 36
Chapter 8: REFERENCES 39
4
DECLARATION
We hereby declare that this written submission report represents our ideas in our own
words and where other’s ideas and words have been included. We have adequately
cited and referenced the original sources. We also declare that we have adhered to all
the principles of academic honesty and integrity and have not misrepresented or
We understand any violation of the above will be cause for disciplinary action
by the University and can also evoke penal action from the sources which have thus
not been properly cited or from whom proper permission have not been taken when
needed.
Date: 18.05.20
5
REPORT APPROVAL
This project report entitled “Hand Written Pattern Recognition using convolutional
Neural Network “ by (Shibani Khatua and Shivani Kumari) is approved for the
degree
Examiners
Supervisors
H.O.D.
6
ABSTRACT
Handwritten character recognition has been one of the active and challenging
research areas in the field of image processing and pattern recognition. A neural
network is a feed forward neural network used for classification and recognition of
which are reading aid for blind, bank cheques and conversion of any hand written
characters for English alphabets without feature extraction using multilayer Feed
Forward neural network. Each character data set contains alphabets. Fifty different
character data sets are used for training the neural network. In the proposed
system, each character has been resized into 30x20 which is directly subjected to
training. That is, each resized character has 600 pixels and these pixels are taken as
features for training the neural network . The results show that the proposed
system yields good recognition rates which are comparable to that of feature
7
INTRODUCTION:-.
Hand written characters are easy to understand by humans as they have the ability to
learn. This ability has been fed to the machines by artificial intelligence and machine
learning . And the field that deals with these characters are known as OCR ( optical
identifying characters from image .An ultimate objective of hand written character
recognition is to simulate the human reading capabilities so that the computer can read
, understand, edit and work as human do with text. Handwriting recognition has been
one of the most fascinating and challenging research areas in field of image processing
and pattern recognition in the recent years .It contributes immensely to the
advancement of automation process and improves the interface between man and
machine in numerous applications. Several research works have been focusing on new
techniques and methods that would reduce the processing time while providing higher
recognition accuracy. Character recognition is mainly of two types online and offline.
In online character recognition, data is captured during the writing process with the
Generally all printed or type-written characters are classified in Offline mode. Off-line
document that have been scanned from a surface such as a sheet of paper and are
stored digitally in gray scale format. The storage of scanned documents have to be
bulky in size and many processing applications as searching for a content, editing,
8
The online mode of recognition is mostly used to recognize only handwritten
characters. In this the handwriting is captured and stored in digital form via different
means. Usually, a special pen is used in conjunction with an electronic surface. As the
pen moves across the surface, the two- dimensional coordinates of successive points
are represented as a function of time and are stored in order. Recently, due to
keyboard. The online handwriting recognition has great potential to improve user and
continues to be an active area for research towards exploring the newer techniques that
LITERATURE SURVEY
9
An early notable attempt in the area of character recognition research is by
Grimsdale in 1959. The origin of a great deal of research work in the early sixties
Eden in 1968. The great importance of Eden's work was that he formally proved
that all handwritten characters are formed by a finite number of schematic features,
a point that was implicitly included in previous works. This notion was later used
the structural part of the optical model has been modelled with Markov chains, and
paper, different techniques are applied to remove slope and slant from handwritten
text and to normalize the size of text images with supervised learning methods.
The key features of this recognition system were to develop a system having high
In literature , T. Som has discussed fuzzy membership function based approach for
image) is formed from 10 images of each character. Bonding box around character
thing is performed and thinned image is placed in one by one row of 100 X 100
canvas. Similarity score of test image is matched with fusion image and characters
are classified.
10
WORKING PRINCIPLE:-
Handwritten recognition is normally divided into six phases which are image
post processing. The block diagram of the basic character recognition is shown :-
Image Acquisition
Pre-processing
Segmentation
Feature Extraction
Classification
Post Processing
A. Image Acquisition-- Digital Image is initially taken as input. The most common of
these devices is the electronic tablet or digitizer. These devices use a pen that is digital
in nature. Input images for handwritten characters can also be taken by using other
a stylus.
crucial for good recognition rate. The main objective of pre-processing steps is to
11
normalize strokes and remove variations that would otherwise complicate recognition
and reduce the recognition rate. These distortions include the irregular size of text,
missing points during pen movement collections, jitter present in irregular size of text,
missing points during pen movement collections, jitter present in text, left or right
are segmented using row histogram. From each row, words are extracted using column
D. Feature Extraction --The main aim of feature extraction phase is to extract that
pattern which is most pertinent for classification . Feature extraction techniques like
applied to extract the features of individual characters. These features are used to train
the system.
E. Classification-- When input image is presented to the system, its features are
extracted and given as an input to the trained classifier like artificial neural network or
support vector machine. Classifiers compare the input feature with stored pattern and
recognition. Language information can increase the accuracy obtained by pure shape
recognition.
12
OCR for cursive 88.8% for lexicon To implement
handwriting. size 40,000 segmentation and
recognition
algorithms for
cursive handwriting.
Recognition of 95% for Hindi and The aim is to utilize
handwritten 98.4% for English the fuzzy technique
numerals based upon numerals overall to recognize
fuzzy model handwritten
numerals for Hindi
and English
numerals.
Combining decision 89.6% overall. To use a reliable and
of multiple an efficient
connectionist technique for
classifiers for classifying
Devanagari numeral numerals.
recognition. [
Hill climbing 93% for uppercase To implement hill
algorithm for letters. climbing algorithm
handwritten for selecting feature
character subset.
recognition.
Optimization of 88% for numbers To apply a method
feature selection for and 70% for of selecting the
recognition of letters. features in an
Arabic characters optimized way.
99.56% for To find out the
Handwritten Devanagari, recognition rate for
numeral recognition 98.99% for the six popular
for six popular Bangla, 99.37% Indian scripts.
Indian scripts. for Telugu,
98.40% for
Oriya, 98.71% for
Kannada and
98.51% for Tamil
overall.
CONCEPT GENERATION
13
Handwritten pattern recognition played a big role in the technology world then. It also
Played an important role in the storage and in the recovery of critical handwriting
Information. This handwritten recognition ensured an accurate medical care and it also
reduced storage costs. It ensured that an essential field of research remains available to
there like National ID number recognition, postal office automation with code number
time.
TRADITIONAL TECHNIQUES
written sometimes in the past. This means the individual characters contained
in the scanned image would need to be extracted. Tools existed that were
imperfections in this step. The most common was when characters that
caused a major problem in the recognition stage. Yet many algorithms were
14
(b) CHARACTER RECOGNITION
However, programmers must manually determine the properties they feel were
important. This approach gave the recognizer more control over the properties
Modern techniques
which were able to learn visual Features, avoiding the limiting feature engineering
features over several overlapping windows of a text line image which an RNN used to
Online recognition :-
15
On-line handwriting recognition involved the automatic conversion of text as it is
written on a special digitizer or PDA, where a sensor picked up the pen-tip movements
as well as pen-up/pen-down switching. This kind of data known as digital ink and can
converted into letter codes which were usable within computer and text-processing
(b) a touch sensitive surface, which may be integrated with, or adjacent to, an output
display.
(c) a software application which interpreted the movements of the stylus across the
writing.
(d) Surface , translating the resulting strokes into digital text. And an off-line
General process
The process of online handwriting recognition can be broken down into a few general
steps:
(a)pre-processing,
(b)feature extraction
(c)classification
16
The purpose of preprocessing was to discard irrelevant information in the input data,
that could negatively affect the recognition. This concerned speed and accuracy.
and denoising. The second step was feature extraction. Out of the two- or more-
dimensional data was extracted. The purpose of this step was to highlight important
information for the recognition model. This data might include information like pen
pressure, velocity or the changes of writing direction. The last big step was
classification. In this step various models were used to map the extracted features to
different classes and thus identifying the characters or words the features represent.
17
CONCEPT SELECTION
research area of pattern recognition due to vast applications and ambiguity in the
high accuracy and low computational speed for handwritten pattern recognition
process. The aim of the proposed attempt was to make the path toward
different layers for recognizing (encoding and decoding) and classifying the given
for training and testing and deep learning framework for handwritten pattern
addition, the proposed system reduces computational time significantly for training
18
CONVOLUTIONAL NEURAL NETWORK
The name “convolutional neural network” indicates that the network employs a
linear operation. Convolutional networks are simply neural networks that use
ARCHITECTURE
multiple hidden layers. The hidden layers of a CNN typically consist of a series of
convolutional layers that convolve with a multiplication or other dot product. The
normalization layers, referred to as hidden layers because their inputs and outputs
19
Though the layers are colloquially referred to as convolutions, this is only by
correlation. This has significance for the indices in the matrix, in that it affects
Convolutional
When programming a CNN, the input is a tensor with shape (number of images) x
(image width) x (image height) x (image depth). Then after passing through a
convolutional layer, the image becomes abstracted to a feature map, with shape
(number of images) x (feature map width) x (feature map height) x (feature map
following attributes:
The depth of the Convolution filter (the input channels) must be equal to
Pooling
the underlying computation. Pooling layers reduce the dimensions of the data by
combining the outputs of neuron clusters at one layer into a single neuron in the
next layer. Local pooling combines small clusters, typically 2 x 2. Global pooling
acts on all the neurons of the convolutional layer. In addition, pooling may
compute a max or an average. Max pooling uses the maximum value from each of
a cluster of neurons at the prior layer. Average pooling uses the average value
20
Fully connected
Fully connected layers connect every neuron in one layer to every neuron in
neural network (MLP). The flattened matrix goes through a fully connected layer
Receptive field
In neural networks, each neuron receives input from some number of locations in
the previous layer. In a fully connected layer, each neuron receives input from
every element of the previous layer. In a convolutional layer, neurons receive input
from only a restricted subarea of the previous layer. Typically the subarea is of a
square shape (e.g., size 5 by 5). The input area of a neuron is called its receptive
field. So, in a fully connected layer, the receptive field is the entire previous layer.
In a convolutional layer, the receptive area is smaller than the entire previous
layer.
Weights
function to the input values coming from the receptive field in the previous layer.
21
PROJECT MODELLING:-
network which memorizes the features of input image which covers its entire
region during scanning through vertical and horizontal sliding filters. It adds a bias
for every region followed by evaluation of scalar product of both filter values and
max(0,x), sigmoid and tan(h), is applied to output of this layer via rectified linear
unit.
Spatial arrangement
Three hyperparameters control the size of the output volume of the convolutional
The depth of the output volume controls the number of neurons in a layer
Stride controls how depth columns around the spatial dimensions (width
and height) are allocated. When the stride is 1 then we move the filters one
the columns, and also to large output volumes. When the stride is 2 then the
Sometimes it is convenient to pad the input with zeros on the border of the
The spatial size of the output volume can be computed as a function of the
input volume size W, the kernel field size of the convolutional layer
23
neurons K, the stride with which they are applied S, and the amount of zero
padding P used on the border. The formula for calculating how many
W −K + 2
+1
S
If this number is not an integer, then the strides are incorrect and the
neurons cannot be tiled to fit across the input volume in a symmetric way.
that the input volume and output volume will have the same size spatially.
However, it's not always completely necessary to use all of the neurons of
the previous layer. For example, a neural network designer may decide to
At second, there comes pooling layer which is also called as max pooling layer or sub
Sampling. In pooling layer (PL), shrinkage in the volume of data takes place for the
easier and faster network computation. Max pooling and average pooling are main
tools for implementing pooling. This layer obtains maximum value or average value
for each region of the input data by applying vertical and horizontal sliding filters
The pooling layer operates independently on every depth slice of the input and resizes
it spatially. The most common form is a pooling layer with filters of size 2×2 applied
with a stride of 2 downsamples at every depth slice in the input by 2 along both width
24
ReLU Layer
ReLU is the abbreviation of rectified linear unit, which applies the non-saturating
activation map by setting them to zero. It increases the nonlinear properties of the
decision function and of the overall network without affecting the receptive fields of
At last , there is fully connected layer after convolution and pooling layer in the
standard neural network (separate neuron for each pixel) which is comprised of n
For example,
There are ten neurons for ten classes (0–9) in digit character classification
problem. However, there should be 26 neurons for 26 classes (a–z) for English
and parameters. Therefore, to train the network with very less amount of samples
is a very difficult task. In convolutional neural network, only few set of parameters
are needed for training of the system. So, convolutional neural network is the key
solution capable map correctly datasets for both input and output by varying the
trainable parameters and numbers of hidden layers with high accuracy. Hence, in
framework had been considered as the best fit for the character recognition from
the handwritten pattern images . For the experiments and the verification of the
25
and this structure of network is also used under autoencoder structure for pattern
recognition.
Neural networks with multiple hidden layers can be useful for solving
classification problems with complex data, such as images. Each layer can learn
One way to effectively train a neural network with multiple layers is by training
one layer at a time. You can achieve this by training a special type of network also
This experiment focuses on how to train a neural network with two hidden layers
to classify digits in images. First, you train the hidden layers individually in an
unsupervised fashion using encoding and decoding under the hidden layers. Then
you train a final softmax layer, and join the layers together to form a stacked
DATASET
26
The MNIST dataset is an acronym that stands for the Modified National
and 9. The task is to classify a given image of a handwritten digit into one of
used and deeply understood dataset and, for the most part, is “solved.” Top-
achieve a classification accuracy of above 99%, with an error rate between 0.4
The labels for the images are stored in a 10-by-5000 matrix, where in every
column a single element will be one to indicate the class that the digit belongs
to, and all other elements in the column will be zero. It should be noted that if
27
The stacked neural network is a simple three-layer neural network including an
encoding layer and a decoding layer for the system function where output units
are directly connected back to input units that shown in Figure 1. The proposed
sparse neural network was trained on theX lnraw inputs , X lmhidden layer
number of hidden neuron and l is number of sparse neural network . The output
layer maps the input vector I ln to the hidden layer H lmwith a non-linear function
S:
(1)
where W li
input unit andhidden unit. bm are a biases in hidden layer. S(v)is the sigmoid
(2)
The output layer Y ln has the same number of units with the input layer and
define as:
(3)
where W j
28
denote the parameters (or weights)associated with the connection between hidden
unit and output unit.bn are the biases in the output layer. S is a sigmoid function
STACKED NETWORK
classification. The first sparse network structure contains the input layer X ln
to learn primary features on the raw input that illustrated in below figure. The
first sparse structure produces the primary feature (I). The primary featureH 1m
feeds the input layer into the second trained sparse network that produce the
secondary features (II). In below figures focuses the primary features used as
the raw input to next sparse network to learn secondary features. Then, the
29
FIG NO-8 OVERVIEW OF HIDDEN LAYERS SHOWING PRIMARY
AND SECONDARY FEATURE
SOFTMAX CALSSIFIER
LAYER
OVERALL STRUCTURE
30
FIG NO-10 OVERVIEW OF PROPOSED NETWORK STRUCTURE
31
FIG NO-12 OVERVIEW OF FLOW DIAGRAM FOR CHARACTER
RECOGNITION
TRAINING
32
The training begins by a sparse neural network on the training data without
replicate its input at its output. Thus, the size of its input will be the same as
the size of its output. When the number of neurons in the hidden layer is less
than the size of the input, it learns a compressed representation of the input.
To avoid this behavior, explicitly set the random number generator seed.
the results from training are different each time. To avoid this behavior,
explicitly set the random number generator seed. Neural networks have
training are different each time. To avoid this behavior, explicitly set the
33
After training the first sparse network, you train the second network in a similar
way. The main difference is that you use the features that were generated from the
first network as the training data in the second sparse network. Also, you decrease
the size of the hidden representation to 50, so that the encoder in the second
STRUCTURE
SOFTMAX LAYER
Train a softmax layer to classify the 50-dimensional feature vectors. Unlike the
sparse network, you train the softmax layer in a supervised fashion using labels for
34
The formation of neural network takes place by the combination of
all network layers along with softmax layer for the possible
outcome
With the full network formed, you can compute the results on the test set. To use
images with the stacked network, you have to reshape the test images into a
matrix. You can do this by stacking the columns of an image to form a vector, and
RESULTS :-
35
These are the required outputs by the given MATLAB code proposed by us and
visualized by a confusion matrix :-
36
CONCLUSION
The aim of our project is to make an interface that can be used to recognize user
Hand written characters .We approached our problem using Convolutional neural
Networks in order to get a higher accuracy .Using modern day techniques like
neural networks to implement deep learning to solve basic tasks which are done
with a blink of an eye by any human like text recognition is just scratching the
surface of the potential behind machine learning. There are infinite possibilities
work similar to biometric device. Photo sensor technology was used to gather the
match points of physical attributes and then convert it into database of known
types.
But with the help of modern-day techniques like convolution neural networks we
are able to scan and understand words with an accuracy never seen before in
history.
neural network are there like reading postal addresses, bank check amounts, and
37
Through this project I learned as an individual team member that
and information.
I should be enthusiastic.
When a group of individuals works together, compared to one person working alone,
they promote a more efficient work output and are able to complete tasks faster due to
many minds intertwined on the same goals and a particular objectives . Working in a
team enables us to learn from one another’s mistakes. one can able to avoid future
errors, gain insight from differing perspectives, and learn new concepts from more
experienced teammate
REFERENCES :-
38
Anita Pal and Davashankar Singnh,”Handwritten English Character
39