You are on page 1of 9

HANDWRITTEN DIGIT RECOGNITION USING MACHINE

LEARNING
Mahaboob Basha.Sk1, Vineetha.J2, Sudheer.D3
1.
aruna@nriit.edu.in, Associate professor, Dept Of IT,NRI Institute of Technology, A.P,India-521212.
2
jonnakutivineetha0341@gmail.com, UG Scholar, NRI Institute of Technology, A.P-521212
3
baythapudisaikrishna@gmail.com, UG Scholar, NRI Institute of Technology, Andhra Pradesh,

---------------------------------------------------------------------------
***----------------------------------------------------------------------------
Abstract - In this Handwritten Digit Recognition project, purposes. Precisely, we can use it in banks for
we will recognize handwritten Digits, i.e. from 0-9.
Handwritten digit recognition is the ability of computers to
recognize human handwritten digits. Handwritten Digit
recognition is one of the practically important issues in
pattern recognition applications. The applications of digit
recognition include postal mail sorting, bank check
processing, form data entry, etc. The heart of the problem
lies within the ability to develop an efficient algorithm that
can recognize handwritten digits and which is submitted by
users by the way of a scanner, tablet, and other digital
devices.

Key Words: Digit recognition, Dataset, Convolutional Neural


Networks, Machine learning, Deep learning, Support Vector
Machine, K Nearest Neighbors, Euclidean distance,
Visualization, Accuracy, marginal hyperplane

1. INTRODUCTION
Humans can see and visually sense the world around
them by using their eyes and brains. Computer vision
works on enabling computers to see and process
images in the same way that human vision does.
Several algorithms developed in the area of computer
vision to recognize images. The goal of our work will
be to create a model that will be able to identify and
determine the handwritten digit from its image with
better accuracy. We aim to complete this by using the
concepts of Convolutional Neural Network and
MNIST dataset. Though the goal is to create a model
which can recognize the digits, we can extend it for
letters and then a person’s handwriting. Through this
work, we aim to learn and practically apply the
concepts of Convolutional Neural Networks. Recent
Convolutional Neural Networks (CNN) becomes one
of the most appealing approaches and has been an
ultimate factor in a variety of recent successes and
challenging machine learning applications such as
challenge ImageNet object detection image
segmentation and face recognition. Therefore, we
choose CNN for our challenging task of image
classification. We can use it for handwriting digits
recognition which is one of high academic and
business transactions. There are many applications of
handwriting digit recognition in our real-life
reading checks, post offices for sorting letters, and implement the concept of a Convolutional Neural
many other related works. The MNIST Dataset Network for digit recognition. Understanding CNN and
(Modified National Institute of Standards and applying it to the handwritten digit recognition system
Technology Dataset) is a handwritten digit dataset. is the target of the proposed model. Convolutional
We can use it for training various image processing Neural Network extracts the features maps from the
systems. The Dataset is also widely used for training 2D images. Convolutional Neural Networks (CNNs) is a
and testing in the field of machine learning. It has very well known deep learning algorithm that can be
60,000 training and 10,000 testing examples. Each used to process images. It assigns weights and biases
image has a fixed size. The images are of size 28*28 to various parts of the image and is very capable of
pixels. It is a Dataset for people who want to try differentiating one image from another same kind of
learning techniques and pattern recognition image. Good accuracy has been achieved for
methods on real-world data while spending minimal handwritten digits recognition by using Convolutional
effort on preprocessing and formatting. We will use Neural Networks. The mammalian system of
this Dataset in our experiment. Convolutional neural visualization is taken into consideration to create CNN
networks are deep artificial neural networks. We architecture. CNN is created by D. H. Hubel in 1962.
can use it to classify images (e.g., name what they Two algorithms with the name gradient descent &
see), cluster them by similarity (photo search), and backpropagation are utilized to train the model.
perform object recognition within scenes. It can be Character images of handwritten digits are used as
used to identify faces, individuals, street signs, input. Artificial neural network (ANN) consists of one
tumors, platypuses, and many other aspects of input layer, one output layer, and, some layers which
visual data. The convolutional layer is the core exist in between the input 3 layer and output layer,
building block of a CNN. The layer’s parameters these middle layers are hidden layers. CNN and ANN
consist of a set of learnable filters (or kernels) that are very similar to each other. CNN’s deep learning
have a small receptive field but extend through the algorithm worked on the analysis of visual images.
full depth of the input volume. During the forward CNN can be used in applications like detection of an
pass, each filter is convolved across the width and object, identification of face, in the field of robotics,
height of the input volume, computing the dot video processing, segmentation, in the field of pattern
product, and producing a 2- dimensional activation recognition, processing of natural language, detection
map of that filter. As a result, the network learns of spam, categorization, speech identification,
when they see some specific type of feature at some classification of the digital image, etc. It is a hard task
spatial position in the input. Then the activation for the machines because handwritten digits are not
maps are fed into a down sampling layer, and like perfect and can be made with many different
convolutions, this method is applied one patch at a handwritings. Handwritten digit recognition is the
time. CNN has also a fully connected layer that solution to this problem which uses the image of a
classifies output with one label per node. digit and recognizes the digit present in the image.
Handwriting digit recognition has an active 2. Literature survey
community of academics studying it. A lot of A few state-of-the-art approaches that use
important work on convolutional neural networks handwritten digit recognition for digit identification
happened for handwritten digit recognition have been summarized here.
[1,6,8,10]. There are many active areas of research 2.1 Handwritten Digit Recognition for Banking
such as Online Recognition, Offline recognition, System
Real-Time Handwriting Recognition, Signature The aim of a handwriting digit recognition system is to
Verification, Postal-Address, Interpretation, Bank- convert handwritten digits into machine-readable
Check Processing, Writer Recognition. formats. The main objective of this work is to ensure
effective and reliable approaches for the recognition of
Deep Learning has emerged as a central tool for self- handwritten digits and make banking operations
perception problems like understanding images, a easier and error free The aim of a handwriting digit
voice from humans, robots exploring the world. We recognition system is to convert handwritten digits
aim to implement the concept of a Convolutional into machine-readable formats. The main objective of
Neural Network for digit recognition. Understanding this work is to ensure effective and reliable
CNN and applying it to the handwritten digit approaches for the recognition of handwritten digits
recognition system is the target of the proposed and make banking operations easier and error free. A
model. Convolutional Neural Network extracts the Handwritten Digit Recognition system (HDR) is meant
features maps from the 2D images. Deep Learning has for receiving and interpreting handwritten input in the
emerged as a central tool for self-perception form of pictures or paper documents. Traditional
problems like understanding images, a voice from systems of handwriting recognition have relied on
humans, robots exploring the world. We aim to handcrafted features and a large amount of prior
knowledge. Training an Optical character recognition where all of the computations are derived until the
(OCR) system based on these prerequisites is a last stage of classification, as well as this, is the
challenging task. Convolutional neural networks instance-based learning algorithm where the
(CNNs) are very effective in perceiving the structure approximation takes place locally. Being simplest and
of handwritten characters/words in ways that help in easiest to implement there is no explicit training phase
the automatic extraction of distinct features and earlier and the algorithm does not perform any
make CNN the most suitable approach for solving generalization of training data. KNN explains
handwriting recognition problems. Our aim in the categorical value using majority votes of K nearest
proposed work is to recognize written characters on neighbors where the value for K can differ, so on
cash deposit/ withdrawal/ and other transactions, changing the value of K, the value of votes can also
we are proposing to develop an automatic banking vary. Take our handwriting training dataset as an
deposit number recognition system that can example, each digit has been prepared a variety of
recognize the handwritten account number and samples. For example, for the digit 0, we got 188
amount number on the cash deposit slip and thus samples with each sample to represent a variation of
automate the cash deposit process at a bank counter. the handwriting of 0. Each digit sample is saved as a
2.2 Survey on Handwritten Digit Recognition text file. We also need to label or class each sample. In
using Machine Learning our case, the labels are digits from 0 to 9 and assign to
Machine learning and deep learning play an each digit sample in our handwriting training set.
important role in computer technology and artificial When we’re given a new digit sample text file, we ask
intelligence. With the use of deep learning and our KNN algorithm to identify the digit in it and label it
machine learning, human effort can be reduced in as a digit in classes 0 to 9. The idea of k-NN is to take
recognizing, learning, predictions, and many more the new sample and then convert it to a feature vector.
areas. This paper presents In our case, the digit is a 32x32 image formatted as a
recognizing the handwritten digits (0 to 9) from the 0,1 text file. To convert it as a feature vector, we will
famous MNIST dataset, comparing classifiers like load the image file into a vector like [0, [1,0,0,1]], the
KNN, PSVM, NN, and convolution neural network on length of the inner vector is 1024 (32x32). We then
basis of performance, accuracy, time, sensitivity, measure the distance of this new sample vector to
positive productivity, and specificity with using every sample vector in the training set. It is computing
different parameters with the classifiers. intensive so k-NN isn’t quite efficient for large training
2.3 Recognition of Handwritten Digits using sets use case cases measurement is an extension of the
Machine Learning Techniques Pythagorean theorem. As I believe knowing how KNN
This paper illustrates the application of object measurement work in mathematical isn’t matter for
character recognition (OCR) using template matching understanding KNN, I will assume we all know how
and machine learning techniques to solve the KNN measurement works. We then take the most
problem of handwritten character recognition. In this similar, in the sense of distance, pieces of sample data
paper, we perform the recognition task using (the nearest neighbors) and look at their labels (0 to
Template Matching, Support Vector Machine (SVM), 9). We look at the top k most similar pieces of sample
and Feed Forward Neural Network. Template data from our known dataset; this is where the k
matching is an image processing technique to break comes from. (k is an integer and it’s usually less than
the image into smaller parts and then match it to a 20.) Lastly, we take a majority vote from the k most
template image. Here we use a Multi-Class SVM similar pieces of data, and the majority is the new
classifier and Neural Network to classify the image. class in 0 to 9 we assign to the new data we were
We use the dataset to train the classifier followed by asked to classify.
feature extraction and finally applying the classifiers Implementation of KNN
to recognize the digits. 1. Compute the Euclidean distance between the test
2.4 EXISTING SYSTEM data point and all the training data.
In this section, we are going to discuss a few 2. Sort the calculated distances in ascending order.
algorithms which are used so far for handwritten 3. Get the k nearest neighbors by taking top k rows
digit recognition. Mentioned below are the from sorted array
algorithms: 4. Find the majority class of these rows.
a. KNN (K Nearest Neighbors) 5. Return predicted class.
b. SVM (Support Vector Machine) Finding accuracy score to make sure the prediction is
2.4.1 KNN (K Nearest Neighbors) correct or not.
KNN is the non-parametric method or classifier used 1)Calculation of Euclidean distance
for classification as well as regression problems. This Euclidean distance is the square root of the sum of
is the lazy or late learning classification algorithm squared distance between two points.
Fig 2.1 Euclidean Formula
Two points are passed (here row1 &row2) The
Euclidean _distance function calculates the difference
between the squares of the points and finally the
square root of the difference.
2)Get the k nearest neighbors after sorting
distance
To find the neighbors we need to first sort the distance
in ascending order, np. a resort () is used to find the
index of minimum distance. After that we will arrange
the according to the sorted index. Slicing the data
according to the number of neighbors.

Fig 2.2 K-Nearest Neighbors Classification

3)Predicting the class of the new data point


The test data will present in the class with a majority
of the votes. So, to find that we will use the max ()
function They key in the max function groups the
neighbors to their classes and count will count the
number of neighbors in each class. Finally max returns
the class with majority votes which will be the
predicted class of the test data.
4)Accuracy calculation
Accuracy shows how close the measured value is to the
true value. Accuracy is calculated by dividing the
correctly classified samples count by total samples.

Higher the accuracy, the more efficient the model.


3. SYSTEM REQUIREMENT ANALYSIS library for machine learning in Python. It provides a
3.1 Hardware and Software Requirements selection of efficient tools for machine learning and
statistical modeling including classification,
regression, clustering, and dimensionality reduction
via a consistent interface in Python. This library,
which is largely written in Python, is built upon
NumPy, SciPy, and Matplotlib.
3.4.1.3 CNN
A Convolutional Neural Network (ConvNet/CNN) is a
Deep Learning algorithm that can take in an input
Table 3.1 Hardware and Software Requirements image, assign importance (learnable weights and
3.2 Functional Requirements biases) to various aspects/objects in the image, and
be able to differentiate one from the other. The pre-
The System should process the input given by the
processing required in a ConvNet is much lower as
user only if it is an image file (JPG, PNG, etc.) System
compared to other classification algorithms. While in
shall show the error message to the user when the
primitive methods filters are hand-engineered, with
input is given is not in the required format. System
enough training, ConvNets can learn these
should detect characters present in the image.
filters/characteristics.
System should retrieve characters present in the
image and display them to the user. 3.4.1.4 Matplotlib
3.3 Non-Functional Requirements Matplotlib is an amazing visualization library in
Python for 2D plots of arrays. Matplotlib is a multi-
Performance: Handwritten digits in the input image
platform data visualization library built on NumPy
will be recognized with an accuracy of about 90%
arrays and designed to work with the broader SciPy
and above.
stack. One of the greatest benefits of visualization is
 Functionality: This software will deliver on that it allows us visual access to huge amounts of
the functional requirements mentioned in data in easily digestible visuals. Matplotlib consists of
this document. several plots like line, bar, scatter, histogram, etc.
 Availability: This system will retrieve the 3.4.1.5 Tkinter
handwritten test regions only if the image Tkinter is the most commonly used library for
contains written text in it. developing GUI (Graphical User Interface) in Python.
 Flexibility: It provides the users to load the It is a standard Python interface to the Tk GUI toolkit
image easily. shipped with Python. As Tk and Tkinter are available
on most of the Unix platforms as well as on the
 Ability: The software is very easy to use and
reduces the learning work. Windows system, developing GUI applications with
Tkinter becomes the fastest and easiest.
3.4 Feasibility Study
3.4.1.6 Keras
3.4.1Technical Feasibility
Keras is an open-source high-level Neural Network
3.4.1.1 Pandas library, which is written in Python is capable enough
Python Pandas is defined as an open-source library to run on Theano, TensorFlow, or CNTK. It was
that provides high-performance data manipulation developed by one of the Google engineers, Francois
in Python. Pandas is built on top of the NumPy Chollet. It is made user-friendly, extensible, and
package, which means NumPy is required for modular for facilitating faster experimentation with
operating the Pandas. Before Pandas, Python was deep neural networks. It not only supports
capable of data preparation, but it only provided Convolutional Networks and Recurrent Networks
limited support for data analysis. So, Pandas came individually but also their combination. It cannot
into the picture and enhanced the capabilities of handle low-level computations, so it makes use of the
data analysis. It can perform five significant steps Backend library to resolve it.
required for processing and analysis of data 3.4.2 Economic Feasibility:
irrespective of the origin of the data, i.e., load,
• Datasets used for Analyzing are from online
manipulate, prepare, model, and analyze.
platforms which are free of cost.

3.4.1.2 Sklearn
4. PROPOSED SYSTEM
Scikit-learn (Sklearn) is the most useful and robust
In this project, we will be using the TensorFlow The pre-processing required in a ConvNet is much
library to build, train and test our models and the lower as compared to other classification algorithms.
data which we will use is MNIST Dataset. The While in primitive methods filters are hand-engineered,
algorithm used will be Convolutional Neural with enough training, ConvNets can learn these
Network (CNN). filters/characteristics. The architecture of a ConvNet is
analogous to that of the connectivity pattern of Neurons
4.1 TensorFlow
in the Human Brain and was inspired by the
TensorFlow is a free and open-source library for organization of the Visual Cortex. Individual neurons
data flow and differentiable programming across a respond to stimuli only in a restricted region of the
range of tasks. It is a symbolic math library and is visual field known as the Receptive Field. A collection of
also used in machine learning applications such as such fields overlaps to cover the entire visual area.An
neural networks. It is used for both research and image is nothing but a matrix of pixel values, So why
production at Google. TensorFlow was developed by not just flatten the image (e.g. 3x3 image matrix into a
the Google Brain team for internal Google use. It was 9x1 vector) and feed it to a Multi-Level Perceptron for
released under the Apache License 2.0 on November classification purposes In cases of extremely basic
9, 2015. TensorFlow offers multiple levels of binary images, the method might show an average
abstraction so we can choose the right one for our precision score while performing prediction of classes
needs. We can build and train models by using the but would have little to no accuracy when it comes to
high-level Keras API, which makes getting started complex images having pixel dependencies throughout.
with TensorFlow and machine learning easy. If we A ConvNet can successfully capture the Spatial and
need more flexibility, eager execution allows for Temporal dependencies in an image through the
immediate iteration and intuitive debugging. For application of relevant filters. The architecture
large ML training tasks, we can use the Distribution performs a better fitting to the image dataset due to the
Strategy API for distributed training on different reduction in the number of parameters involved and
hardware configurations without changing the the reusability of weights. In other words, the network
model definition. can be trained to understand the sophistication of the
image better.
4.2 MNIST Dataset Input Image
The MNIST Dataset (Modified National Institute of
Standards and Technology Dataset) isa large Dataset
of handwritten digits that is commonly used for
training various image processing systems. The
Dataset is also widely used for training and testing
in the field of machine learning. The MNIST Dataset
contains 60,000 training images and 10,000 testing
images. Half of the training set and half of the test
set were taken from NIST's training dataset, while Fig 4.4 4x4x3 RGB Image
the other half of the training set and the other half of In the figure, we have an RGB image that has been
the test set were taken from NIST's testing dataset. separated by its three-color planes — Red, Green, and
There have been several scientific papers on Blue. There are several such color spaces in which
attempts to achieve the lowest error rate. images exist — Grayscale, RGB, HSV, CMYK, etc
Convolution Layer — The Kernel

Fig 4.1 Sample images from MNIST test dataset


4.3 The Algorithm:
CNN To recognize the handwritten digits, a seven-
layered convolutional neural network with one input Fig 4.5 Convoluting a 5x5x1 image with a 3x3x1
layer followed by five hidden layers and one output kernel to get a 3x3x1 convolved feature
layer is designed A Convolutional Neural Network Image Dimensions = 5 (Height) x 5 (Breadth) x 1
(ConvNet/CNN) is a Deep Learning algorithm that can (Number of channels, e.g. RGB) In the above
take in an input image, assign importance (learnable demonstration, the green section resembles our 5x5x1
weights and biases) to various aspects/objects in the input image, I. The element involved in carrying out the
image, and be able to differentiate one from the other. convolution operation in the first part of a
Convolutional Layer is called the Kernel/Filter, K, processing of the input images is completed, sub-
represented in the color yellow. We have selected K as images of individual digits are formed from the
a 3x3x1 matrix. sequence of images. Pre-processed digit images are
5. SOFTWARE DESIGN segmented into a sub-image of individual digits,
5.1 Working Architectural Design which are assigned a number to each digit. Each digit
is resized into pixels. In this step, an edge detection
technique is being used for the segmentation of
dataset images. Feature Extraction: After the
completion of pre-processing stage and segmentation
stage, the pre-processed images are represented in
the form of a matrix that contains pixels of the
Fig 5.1 Working Architectural Design
images that are of very large size. In this way, it will
A Convolutional Neural Network (ConvNet/CNN) is a
Deep Learning algorithm that can take in an input be valuable to represent the digits in the images
image, assign importance (learnable weights and which contain the necessary information. This
biases) to various aspects/objects in the image, and be activity is called feature extraction. In the feature
able to differentiate one from the other. The pre- extraction stage redundancy from the data is
processing required in a ConvNet is much lower as removed. Classification and Recognition: In the
compared to other classification algorithms. While in classification and recognition step the extracted
primitive methods filters are hand-engineered, with feature vectors are taken as an individual input to
enough training, ConvNets can learn these each of the following classifiers.
filters/characteristics. The architecture of a ConvNet is 5.3 FLOW CHART
analogous to that of the connectivity pattern of
Neurons in the Human Brain and was inspired by the
organization of the Visual Cortex. Individual neurons
respond to stimuli only in a restricted region of the
visual field known as the Receptive Field. A collection
of such fields overlaps to cover the entire visual area.
An image is nothing but a matrix of pixel values, So
why not just flatten the image (e.g. 3x3 image matrix
into a 9x1 vector) and feed it to a Multi-Level
Perceptron for classification purposes In cases of
extremely basic binary images, the method might show
an average precision score while performing
prediction of classes but would have little to no
accuracy when it comes to complex images having
pixel dependencies throughout.
5.2 Block Diagram

Fig 5.3 Flow chart representation


5.4 UML Diagrams
Fig 5.2 Block Diagram of the system A UML diagram is a diagram based on the UML
Pre-Processing: The role of the pre-processing step (Unified Modeling Language) to visually represent a
is it performs various tasks on the input image. It system along with its main actors, roles, actions,
upgrades the image by making it reasonable for artifacts, or classes, to better understand, alter,
segmentation. The fundamental motivation behind maintain, or document information about the system.
pre-processing is to take off a fascinating example 5.4.1 Class Diagram
from the background. For the most part, noise
filtering, smoothing, and standardization are to be Class UML diagram is the most common diagram
done in this stage. Segmentation: Once the pre- type for software documentation. Since most
software being created nowadays is still based on
the Object-Oriented Programming paradigm, using
class diagrams to document the software turns out
to be a common-sense solution. This happens
because OOP is based on classes and the relations
between them.

You might also like