You are on page 1of 53

Bangla Handwritten Characters Recognition

Using Convolutional Neural Network


This project is submitted to the Department of Computer Science and Engineering,
Hajee Mohammad Danesh Science and Technology University in partial fulfillment
Of the requirements for the degree of
B.Sc. (Engineering) in Computer Science and Engineering

By
Most. Masura Parvin Mst. Salma Akter Rani
Student ID: 1702014 Student ID: 1702049
Level: 4 Semester: II Level: 4 Semester: II
B.Sc. (Engineering) in CSE B.Sc. (Engineering) in CSE

Session: 2017 Afrin Naher Session: 2017


Student ID: 1702063
Level: 4 Semester: II
B.Sc. (Engineering) in CSE

Session 2017

Department of Computer Science and Engineering


Faculty of Computer Science and Engineering
Hajee Mohammad Danesh Science and Technology University
Dinajpur-5200, Bangladesh.
October 2022
Department of Computer Science and Engineering
Faculty of Computer Science and Engineering
Hajee Mohammad Danesh Science and Technology University
Dinajpur-5200, Bangladesh.

CERTIFICATE

This is to certify that the work entitled as “Bangla Handwritten Characters Recognition
Using Convolutional Neural Network” by Most Masura Parvin, Salma Akter Rani and
Afrin Naher has been carried out under our supervision. To the best of our knowledge this work
is an original one and was not submitted anywhere for a diploma or a degree.

Co-supervisor
.......................
(Md Rashedul Islam)
Assistant Professor
Department of Computer Science and Engineering
Hajee Mohammad Danesh Science and Technology University, Dinajpur-5200,
Bangladesh.

Supervisor
........................
(Md Abu Marjan)
Lecturer
Department of Computer Science and Engineering
Hajee Mohammad Danesh Science and Technology University,
Dinajpur-5200, Bangladesh.
Department of Computer Science and Engineering
Faculty of Computer Science and Engineering
Hajee Mohammad Danesh Science and Technology University
Dinajpur-5200, Bangladesh.

DECLARATION

The work entitled “Bangla Handwritten Characters Recognition Using


Convolutional Neural Network” has been carried out in the Department of Computer
Science and Engineering, Hajee Mohammad Danesh Science and Technology University is
original and conforms the regulations of this University.

We understand the University’s policy on plagiarism and declare that no part of this thesis has
been copied from other sources or been previously submitted elsewhere for the award of any
degree or diploma.

…………………………… ...................................... ..........................................


Name: Most. Masura Parvin Name: Afrin Naher Name: Salma Akter Rani
Student ID: 1702014 Student ID: 1702063 Student ID: 1702049
Session: 2017 Session: 2017 Session: 2017
Acknowledgements
At first, we confess our gratitude to the Almighty Allah for His endless blessing and kindness
to keep us mentally and physically fit to complete our first thesis work for our B.Sc.
(Engineering) in CSE degree. Alhamdulillah. We want to thank Department of Computer
Science and Engineering, Hajee Mohammad Danesh Science and Technology for giving us
the opportunity to conduct our B.Sc.(Engineering) CSE degree. Then we express our
indebtedness to our honorable supervisor Md. Abu Marjan sir, lecturer of CSE department, for
his helpful contribution, necessary guidance, suggestions and encouragement to us. He
inspired us to delve into the research works of several researchers related to the topic to have
the idea of the efforts and works that were done. We also want to thank our co-supervisor
Rashedul Islam sir.
Finally we would like to thank our parents, family members, relatives and our friends for their
continuous supports and all the teachers of CSE department of HSTU who helped us providing
guidelines to perform the work.
Contents
Contents ............................................................................................................................... i
List of Figures ..................................................................... Error! Bookmark not defined.
List of Tables ....................................................................... Error! Bookmark not defined.
Abstract ............................................................................... Error! Bookmark not defined.
Chapter 1
IntroductionError! Bookmark not defined.
1.1 Introduction to Handwritten Character Recognition (HCR)Error! Bookmark not
defined.
1.2 Various methods for HCR ............................................. Error! Bookmark not defined.
1.3 Motivation .................................................................... Error! Bookmark not defined.
1.4 Problem definition ........................................................ Error! Bookmark not defined.
1.5 Objectives ..................................................................... Error! Bookmark not defined.
1.6 Challenges .................................................................... Error! Bookmark not defined.
1.7 Contribution .................................................................. Error! Bookmark not defined.
1.8 Thesis Organization ...................................................... Error! Bookmark not defined.
Chapter 2
Related Work
2.1 Overview ...................................................................... Error! Bookmark not defined.
2.2 Related Work ................................................................ Error! Bookmark not defined.
2.3 Conclusion .................................................................... Error! Bookmark not defined.
Chapter Error! Bookmark not defined.
Thesis Related Study
3.1 General Terminologies .................................................. Error! Bookmark not defined.
3.1.1 Artificial Intelligence (AI) ..................................... Error! Bookmark not defined.
3.1.2 Machine Learning (ML) ........................................ Error! Bookmark not defined.
3.1.3 Gray level image ................................................... Error! Bookmark not defined.
3.1.4 Color (RGB) Image ............................................... Error! Bookmark not defined.
3.1.5 Image Conversion ................................................. Error! Bookmark not defined.
3.1.6 Optimizer and Learning rate .................................. Error! Bookmark not defined.

i
3.1.7 Data Augmentation ............................................... Error! Bookmark not defined.
3.2 Neural Network............................................................. Error! Bookmark not defined.
3.2.1 Deep Learning vs Neural Network ....................... Error! Bookmark not defined.
3.2.2 Working Process of Neural Networks ................... Error! Bookmark not defined.
3.2.3 Types of Neural Network ..................................... Error! Bookmark not defined.
3.2.4 Activation Function .............................................. Error! Bookmark not defined.
3.3 Convolutional Neural Network (CNN) .......................... Error! Bookmark not defined.
3.3.1 Introduction .......................................................... Error! Bookmark not defined.
3.3.2 Importance of CNN .............................................. Error! Bookmark not defined.
3.3.3. Few Definitions ................................................... Error! Bookmark not defined.
Chapter Error! Bookmark not defined.
Methodology
4.1 Dataset Description ....................................................... Error! Bookmark not defined.
4.2 Preparation of dataset .................................................... Error! Bookmark not defined.
4.3 Preprocessing ................................................................ Error! Bookmark not defined.
4.4 RGB to Gray ................................................................. Error! Bookmark not defined.
4.5 Resizing and Rescaling ................................................. Error! Bookmark not defined.
4.6 Train, Test, and Validation split .................................... Error! Bookmark not defined.
4.7 Proposed Methodology ................................................. Error! Bookmark not defined.
4.8 Overview of the proposed model’s architecture ............. Error! Bookmark not defined.
4.9 Block diagram of the proposed model’s architecture ..... Error! Bookmark not defined.
Chapter 5
Result Evaluation
5.1 Environmental Setup ..................................................... Error! Bookmark not defined.
5.2 Training the model ........................................................ Error! Bookmark not defined.
5.3 Model performance ....................................................... Error! Bookmark not defined.
5.4 Output Summary of our Proposed Model ...................... Error! Bookmark not defined.
5.5 Performance Metrics ..................................................... Error! Bookmark not defined.
5.5.1 Accuracy ............................................................... Error! Bookmark not defined.
5.5.2 Confusion Matrix .................................................. Error! Bookmark not defined.
5.5.3 Precision ............................................................... Error! Bookmark not defined.

ii
5.5.4 Recall .................................................................... Error! Bookmark not defined.
5.5.5 F1 Score ................................................................ Error! Bookmark not defined.
5.6 Discussion.................................................................. Error! Bookmark not defined.
Chapter Error! Bookmark not defined.
Conclusion ........................................................................... Error! Bookmark not defined.
References ........................................................................... Error! Bookmark not defined.

iii
List of Figures
Figure 3.1: Relationship between AI, ML and DL……………………………………………..8
Figure 3.2: Neural Network…………………………………………………………….…….10
Figure 3.3: Binary step function…………………………………………………………,,…..14
Figure 3.4: Linear Activation function……………………………………………………,,....15
Figure 3.5: Logistic Regression…………………………………………………………,…...16
Figure 3.6: Tanh……………………………………………………………………………....16
Figure 3.7: ReLU Function…………………………………………………………….……..17
Figure 3.8: Softmax function…………………………………………………………….…...18
Figure 3.9: Down Sampling…………………………………………………………….…….19
Figure 3.10: Convolution Operation…………………………………………………….……20
Figure 3.11: Visualization of Convolution………………………………………………........20
Figure 3.12: Convolution with stride 1…………………………………………………..……21
Figure 3.13: Stride 1with Padding 1………………………………………………….……….21
Figure 3.14: After Applying Padding……………………………………………….………...21
Figure 3.15: Different Layers of CNN……………………………………………….……….22
Figure 3.16: Pooling Layers……………………………………………………….………….23
Figure 4.1: Sample images of used datasets………………………………………….……….25
Figure 4.2: Block diagram of proposed methodology…………………………….…………..27
Figure 4.3: Architecture of our proposed model…………………………………..…………..29
Figure 5.1: Accuracy graph of proposed model………………………………….…………...31
Figure 5.2: Loss graph of proposed model……………………………………….…………...31
Figure 5.3: Confusion matrix of proposed model………………………………..…………....33

iv
List of Tables:
Table 5.1: Proposed models performance summary………………………………………....30
Table 5.2: Proposed architect summary……………………………………………………...32

v
ABSTRACT

Handwritten recognition is one of the most interesting issue in present time due to its

variant applications and helps to make the old form and information digitization and

reliable. One of most common reason conducting with handwritten scripts is big

challenge because of every person has unique style to write and also has different shape

and size. Therefore, this paper proposed a model which helps to recognize Bangla

handwritten 50 basic characters (39 consonants and 11 vowels). The proposed model

trained and validated with Ekush dataset and tested with BanglaLekha-Isolated dataset.

We have tuned different parameters to gain highest accuracy. We have tried different

types of optimizer and different values of learning rate. We used SGD (Stochastic

Gradient Descent) and Adam optimizer among them Adam optimizer performed well

with a learning rate 0.001. We performed 50 epochs. After 50 epochs the proposed

method is shown satisfactory training accuracy 99.38%, validation accuracy 95.19% for

Ekush dataset, and 94.47% testing accuracy on BanglaLekha-Isolated dataset.

Keywords:

HCR, Machine Learning, Deep Learning, CNN, Computer Vision, Pattern Recognition.

vi
Chapter 1

Introduction
1.1 Introduction to Handwritten Character Recognition (HCR)

It is a mechanism which enables to translate different types of documents into analysis-able,


editable and searchable data. An ultimate aim of HCR is to emulate human reading capabilities
in such a way that the machine can read, edit and interact with text as a human in short time.
Identification of HCR has drawn great attention of numerous researchers over half a century,
and many great achievements have been made in this field. With current technical innovations,
handwriting recognition carried out by vision sensors that can capture the position and
movements of 3D fingers that write in the space has attracted remarkable interests. However,
in the past years, a significant progress is made on HCR performance, but still now HCR is a
challenging task due to the great diversity of handwriting style, the existence of many similar
characters and large number of character categories.

1.2 Various methods for HCR


In recent years Convolutional neural network (CNN) becomes popular for complex visual
recognition in due to its architecture and a lot of research on using deep CNN [18] to recognize
handwritten digits, characters, etc. and we will explore the power of deep CNN on the
classification of handwritten Bangla characters. Handwritten recognition has importance for its
various applications as Optical Character Recognition (OCR). Several scientific types of
research have been carried out for character recognition of Greek[1], Chinese[2], Arabic [19]
handwritten digits and alphabet. Though automatic recognition of handwriting remains a great
challenge: the performance of layout analysis, word/character segmentation, and recognition is
still far behind the human recognition capability. Conducting with the handwritten Character
or digit recognition always a big challenge in due to its variation of shape, size and complexity
of Bangla text such as there have many misleading characters. But several years few works
have done for Bangla handwritten digit recognition using CNN and gain some success of
classifying those type of recognition to the researcher for its educational and economic
interesting applications. Bangla writing scripts different from English or other languages
writing scripts because of it came from Sanskrit script which is completely different and it has
alinement and some character are similar to another character as some of them are different
from small dot and line. There is 50 basic character (11 vowels and 39 consonants), 10
modifiers, 10 numerals, more than 300 compound characters. Due to those, all fact Bangla
scripts make difficult to achieve a good result and better performance with Bangla Handwritten
character recognition. If we overcome all these challenges and make a model then we improved

Page | 1
many kinds of Handwritten recognition-based application such as Bangla Handwritten
character base OCR (Optical Character Recognition), Picture to text to speech, Bangla ID card
reading, Number plate reading, vehicle tracking, Post office automation etc.

1.3 Motivation
Bangla is the mother language of Bangladesh, apart from it is the official language of
Bangladesh, West Bengal of India, Tripura, Assam and Jharkhand, Sierra Leone a West African
country. Though Bangla is the 7th most popular language and writing scripts like about 250
million speak in Bangla and 2nd most beautiful language in the world. Considering those all
circumstances the technology in different sectors in these regions, Bangla Handwritten
recognition plays an important role and should overcome the challenge. However, to compare
other language writing script only a few studies are attested on handwritten characters of
Bangla scripts there have a sturdy model such as Latin, Chines, Japanese have achieved a great
success on machine learning and deep learning application.

Besides, Bengali is the fifth most-spoken native language and the seventh most spoken
language by total number of speakers in the world. Still there are lackings of efficient
handwritten Bangla character recognition systems. Thousands of old documents, handwritten
notes in many institutions that are still not in computerized format.

1.4 Problem definition


We have two Bangla handwritten character datasets one of which is BanglaLekha-Isolated and
the other is Ekush. We have to design a CNN model and train this model by Ekush dataset in
such a way that it can identify handwritten Bangla character efficiently.

1.5 Objectives
Objective of this paper is

 To provide an automated system to understand handwritten data efficiently.


 To save time and money by replacing manual process of entering data with automatic
process.

1.6 Challenges
 Handwriting recognition tends to have problems when it comes to accuracy. People can
struggle to read others handwriting. How, then, is a computer going to do it? The issue
is that there’s a wide range of handwriting – good and bad. This makes it tricky for
programmers to provide enough examples of how every character might look. Besides

Page | 2
sometimes, characters look very similar, making it hard for a computer to recognise
accurately.
In the case of handwriting recognition from photos, there are also awkward angles to
consider. The angle the photo is taken could obscure the character, making it harder for
the computer to identify. So, recognition of Bangla handwritten characters accurately
and efficiently is a challenge.
 Working with two different datasets efficiently.

1.7 Contribution
Our contribution will be:
1. Design a new architectural model for Bangla HCR using CNN.
2. Proposed model will be trained and validated with one dataset and tested with another
dataset.

1.8 Thesis Organization


This study is formulated in 6 chapters including the introduction chapter.

Chapter 1 includes the introduction about Handwritten Character Recognition (HCR) and
various methods for HCR, motivation, problem objectives, challenges, and contribution.

Chapter 2 includes a review of the related works on Handwritten Character Recognition


(HCR) with its findings and limitations.

Chapter 3 includes the thesis related study or the background study.

Chapter 4 includes descriptive details about the proposed methodology.

Chapter 5 includes the experimental results and descriptions based on our proposed
methodology.

Chapter 6 includes the conclusion.

Page | 3
Chapter 2

Related Work
2.1 Overview

Recognition of handwritten characters has gained significant popularity in the field of pattern
recognition and machine learning because of its use in various fields. Various techniques in
handwriting recognition system have been proposed for character recognition. Among them
OCR technology, SVM, MLP, KNN, CNN etc. various machine learning and deep learning
algorithms are used.

Sufficient studies and papers describe the techniques used to convert textual content from a
paper document into readable machine form. Character recognition system may serve as a key
factor in creating a paperless environment by digitization and processing of existing paper
documents in the coming days. Here are some reviews about researches done by individuals
and groups on HCR.

2.2 Related Work

In past studies there are many works for recognition of handwritten character in a different
language as Latin [1], Chines [2], Japanese [3] achieve great success. There are a few works
are available for Bangla handwritten basic character, digit and compound character
recognition, some literature has been made on Bangla characters recognition in the past years
as “A complete printed Bangla OCR system” [4], “On the development of an optical character
recognition (OCR) system for printed Bangla script” [5]. there are also few researches on
handwritten Bangla numeral recognition that reaches to the desired recognition accuracy. Pal
et al. have conducted some exploring works for recognizing handwritten Bangla characters
those are “Automatic recognition of unconstrained offline Bangla handwritten numerals” [6],
“A system towards Indian postal automation” [7]. And “Touching numeral segmentation using
water reservoir concept” [8]. The proposed schemes are mainly based on extracted features
from a concept called water reservoir. Apart from there also present several Bangla
Handwritten Character Recognition and had achieved pretty good success.

Halima Begum et al., “Recognition of Handwritten Bangla Characters using Gabor Filter and
Artificial Neural Network” [9] works with own dataset that was collected from 95 volunteers
and their proposed model achieved without feature extraction and with feature extraction
around 68.9% and 79:4% of recognition rate respectively. “Recognition of Handwritten Bangla
Basic Character and Digit Using Convex Hall Basic Feature” [10] achieve accuracy for Bangla
Page | 4
characters 76.86% and Bangla numerals 99.45%. “Bangla Handwritten Character Recognition
using Convolutional Neural Network” achieved 85.36% test accuracy using their own dataset.
In “Handwritten Bangla Basic and Compound character recognition using MLP and SVM
classifier” [11], the handwritten Bangla basic and compound character recognition using MLP
and SVM classifier has been proposed and they achieved around 79.73% and 80.9% of
recognition rate, respectively.

Research contributions relating to OCR of handwritten Bangla script may be categorized into
two major approaches. firstly, an MLP based single step approach, as proposed by Bhowmik
et al. [12], Roy et al. [13] and Basu et al. [14], and secondly, a multistage approach, as proposed
by Rehman et al. [15] and Bhattacharya et al. [16–17].

Most of the aforesaid approaches use MLP based classifiers to classify 50 Basic characters of
Bangla script. In the work of Bhowmik et al. [12], the feature set is constructed from the stroke
features of characters. The dataset used for testing recognition performances of 49 different
classes included characters collected from only 20 different writers. In the work of Roy et al.
[13], the authors have used a quadratic discriminant function. In this work, pattern classes are
grouped together intuitively on the basis of observable similarity, to form 35 pattern groups.
For forming the feature vector for this work, each character image is divided into 4×4 = 16 and
7×7 = 49 sub-images and 4-directional chain code techniques are used for computing the
directional frequencies of the contour pixels in each sub-image.

In one of our earlier works [14], we used 24 modified shadow features, 8 pairs of octant
centroid features and 36 longestrun features, computed on 9 overlapped sub-images, for each
character image to classify it into one of the 50 character classes using an MLP based classifier.
The work described in [15] involves a two stage hierarchical approach for OCR of handwritten
Bangla alphabetic characters, in which multiple experts are employed in the second stage, i.e.,
after coarse classification, for final classification of a pattern of an unknown class. The major
features used for recognition of Basic Bangla characters by this approach include Matra, upper
part of the character, disjoint section of the character, vertical line and double vertical line.
Classification decisions, in the second stage, are mainly based on the consensus among
multiple classifiers but, to reach the consensus, sample confidences of the experts are
considered instead of majority voting method. Sample confidences are certain probabilistic
measures defined for determining class membership of sample patterns by the experts. Failing

Page | 5
to reach a consensus, certain other probabilistic measures, formed with the past performances
of the participating experts, are further considered. A sample pattern is rejected if all the
prescribed confidence measures fail to meet the passing criteria. This is in a nutshell how the
classification decisions of multiple experts are finally combined in the work described in
[15].In the work of Bhattacharya et al. [16], a two-stage approach is adopted to classify 50
handwritten Basic characters and 10 numeric

digits of Bangla script. In this approach also a coarse or a group based coarse classification of
an unknown pattern in first stage is followed by a finer classification in the second stage. Based
on the similarity of shapes, 57 pattern classes are identified for final classification. These
pattern classes are clustered into 11 groups for coarse classification.

An MLP based classifier is employed in the first stage to decide about the group of an unknown
pattern. In the second stage, the pattern is subjected to another MLP based classifier, specific
to its group, for final classification. In another work, Bhattacharya et al. [17] have proposed a
similar two stage approach for recognition of 50 Basic characters of handwritten Bangla script.
64 chain code-frequency features, as used in [12] and [16], are also used here for classification
through MLP based classifiers.

2.3 Conclusion

So according to the previous work reviews, analysis, experience and comparison it is stated
that technology implementation and up-gradation for HCR is always challenging and require
careful consideration and planning. This study proposed that HCR provides an opportunity for
solving some traditional problems but also introduces new concerns. This study also discusses
some typical features and technological solutions of HCR and provides an overview of the
weaknesses and strengths of this technology. At last this technology still leave an opportunity
to be improved to enhance the efficiency and usability for future uses.

Page | 6
Chapter 3

Thesis Related Study


3.1 General Terminologies
3.1.1 Artificial Intelligence (AI)

Artificial Intelligence (AI) is demonstrated by machines, unlike the intelligence displayed by


humans and animals. AI also refers to the simulation of human intelligence in machines which
are programmed to think like that of humans and also mimic their actions. AI programming
focuses mainly on three cognitive skills such as learning, reasoning and self-correction. AI can
also be applied to any machine that exhibits the characteristics or features associated with
human mind such as learning and problem solving. AI is the superset of machine learning.

3.1.2 Machine Learning (ML)

Machine learning (ML) is science of study to give ability to computers to learn without
explicitly being programmed. ML is the subset of AI. ML is based on the idea that the system
-s can learn from data, identify the patterns and make decisions with negligible human
intervention. Basically, in ML the training data is given to a learning algorithm. Generally,

ML is of three types:

Supervised learning: Data sets are labeled and desired output is given for the patterns to be
detected and to label new data sets. For example: insurance underwriting, fraud detection, etc.

Unsupervised learning: Data sets aren’t labeled, the algorithm is asked to identify patterns in
the input data and sorted according to their similarities and differences. For example: customer
clustering, association rule miming, etc.

Reinforcement learning: Data sets aren’t labeled but, after performing an action at each step,
AI system receives feedback or reward for its action. For example: game AI, complex decision
problems, reward systems, etc. Deep learning (DL) is the subset of ML, which has ability to
learn without human supervision. DL mimics the workings of the human brain in processing
data which can be used in detecting objects, recognizing speech, translating languages and
making decisions. DL is evolution to ML. Basically, DL means how deep the machine is
learning. DL is part of broader family of ML. In general, DL is a process to learn without the
intervention or supervision of human. DL uses neural networks like artificial neural network
(ANN), convolutional neural network (CNN) and so on. DL is also known as deep neural
learning or deep neural network.

Page | 7
Figure 3.1: Relationship between AI, ML and DL

3.1.3 Gray level Image

The Gray-scale image represents the brightness of a pixel. The most common pixel format is
the byte image, where this number is stored as an 8-bit integer giving a range of possible values
from 0 to 255. Typically, zero is taken to be black and ‘255’ is taken to be white.

3.1.4 Color (RGB) Image

An RGB image is sometimes referred to as a true-color image, is stored as an m-by-n-by-3


(where m is the number of rows and n represents the number of columns) data array that defines
Red, Green, and Blue color components for each individual pixel. An RGB array can be of
class double, unit8, or unit16. In an RGB array of class double, each color component is a value
between 0 and 1. A pixel whose color components are (0, 0, 0) displays as black, and

a pixel whose color components are (1, 1, 1) displays as white.

3.1.5 Image Conversion

For many applications, it is required to convert the image from one type to another like RGB
to Gray, Gray to RGB, Gray to Binary, RGB to Binary, RGB to Indexed, Gray to Indexed, etc.
In RGB to Gray, the true-color image of RGB is converted into a Gray-scale image, in which
lots of information is discarded which are not required for processing. In Gray to RGB, it is
required to generate the image in three channels such as m-by-n-by-3. For Binary image
generation, any type of image is converted into binary image form which represents two values
such as 0 and 1, one for white and another is assigned for black. A number of issues are
associated for converting the image from one format to another. Image conversion also includes

Page | 8
the conversion of CMY color image to RGB color image. For this, the value {1, 1, 1} is
subtracted from the standard RGB or true color image. RGB model is obtained by additive
process whereas CMY model is obtained by subtractive process. CMY is the complementary
model of RGB color model.

3.1.6 Optimizer and Learning rate

Optimization algorithms help CNN algorithms to minimize the error. Proposed model used
Adam optimizer. Adam optimization algorithm that can be used to update network weights
iteratively in training data. Adam is an update of extension to stochastic gradient descent
algorithm. For its better performance, it is widely used in computer vision researches. Proposed
model used Adam optimizer with a learning rate of 0.001.

𝜂
Adam optimizer, 𝜃𝑡+1 = 𝜃𝑡 − 1 𝑚
̂𝑡
√𝜃𝑡+∈

To calculate the error for optimizing algorithm we used categorical cross entropy function.
Recent research shows that cross entropy performs better than other function like classification
error and mean squared error etc.

Categorical Cross Entropy Function, 𝐿𝑖 = − ∑𝑗 𝑦𝑖,𝑗 𝑙𝑜𝑔(𝑦̂𝑖,𝑗 )

Where, Li = Sample loss value, i = i-th sample in a set, j = label/output index, y = target
values, 𝑦̂ = predicted values.
Learning rate is one of the most important hyper-parameters to tune for training convolutional
neural networks. If the learning rate is low the classification is more accurate but optimizer will
take more time to reach the global optima by reducing the loss. And if the learning rate is high
the accuracy may not converge also some time may diverge. So, choosing the best learning rate
is more difficult. To overcome this challenge, we use automatic learning rate reduction method.
For faster computation, we set a higher learning rate of 0.001 which is atomically reduced by
monitoring the validation accuracy.

3.1.7 Data augmentation

Deep learning technique performs better if it finds more data. For this reason, data
augmentation helps to produce more data artificially. For handwriting characters, recognition
data augmentation helps more because a single person can write a character in a different
variation. For data augmentation, the images are shifted randomly 20% in height or width or
both, also 20% rotation and 20% zoom the images.

Page | 9
3.2 Neural Network

Neural Networks is one of the most significant discoveries in history. Neural Networks can
solve problems that can't be solved by algorithms. Neural Networks is the essence of Deep
Learning. Neural networks, also known as artificial neural networks (ANNs) or simulated
neural networks (SNNs), are a subset of machine learning and are at the heart of deep learning
algorithms. Their name and structure are inspired by the human brain, mimicking the way that
biological neurons signal to one another.

Input Layer Hidden Layer Output Layer

Figure 3.2: Neural Network

Artificial neural networks (ANNs) are comprised of a node layers, containing an input layer,
one or more hidden layers, and an output layer. Each node, or artificial neuron, connects to
another and has an associated weight and threshold. If the output of any individual node is
above the specified threshold value, that node is activated, sending data to the next layer of the
network. Otherwise, no data is passed along to the next layer of the network. Neural networks
rely on training data to learn and improve their accuracy over time. However, once these
learning algorithms are fine-tuned for accuracy, they are powerful tools in computer science
and artificial intelligence, allowing us to classify and cluster data at a high velocity. Tasks in
speech recognition or image recognition can take minutes versus hours when compared to the
manual identification by human experts. One of the most well-known neural networks is
Google’s search algorithm.

3.2.1 Deep Learning vs Neural Network

Deep Learning and neural networks tend to be used interchangeably in conversation, which can
be confusing. As a result, it’s worth noting that the “deep” in deep learning is just referring to

Page | 10
the depth of layers in a neural network. A neural network that consists of more than three
layers—which would be inclusive of the inputs and the output—can be considered a deep
learning algorithm. A neural network that only has two or three layers is just a basic neural
network.

3.2.2 Working Process of Neural Network

A neural network has many layers. Each layer performs a specific function, and the complex
the network is, the more the layers are. That’s why a neural network is also called a multi-layer
perceptron. Before completely getting into the process of how neural networks work, we need
to be familiar with the parts of it. The purest form of a neural network, which is also known as
the node layer, has three layers:

1. The input layer

2. The hidden layer

3. The output layer

As the names suggest, each of these layers has a specific purpose. These layers are made up of
nodes. There can be multiple hidden layers in a neural network according to the requirements.
The input layer picks up the input signals and transfers them to the next layer. It gathers the
data from the outside world. The hidden layer performs all the back-end tasks of calculation.
A network can even have zero hidden layers. However, a neural network has at least one hidden
layer. The output layer transmits the final result of the hidden layer’s calculation. Like other
machine learning applications, we will have to train a neural network with some training data
as well, before we provide it with a particular problem. But before we go more in-depth of how
a neural network solves a problem, we should know about the working of perceptron layers
first:

How do Perceptron Layers Work: A neural network is made up of many perceptron layers,
that’s why it has the name ‘multi-layer perceptron’. These layers are also called hidden layers
of dense layers. They are made up of many perceptron neurons. They are the primary unit that
works together to form a perceptron layer. These neurons receive information in the set of
inputs. We combine these numerical inputs with a bias and a group of weights, which then
produces a single output. For computation, each neuron considers weights and bias. Then, the
combination function uses the weight and the bias to give an output (modified input). It works
through the following equation:

combination = bias + weights * inputs

Page | 11
After this, the activation function produces the output with the following equation:

output = activation(combination)

This function determines what kind of role the neural network performs. They form the layers
of the network.

We can focus on the working of a neural network. Here’s how it works:

1. Information is fed into the input layer which transfers it to the hidden layer.

2. The interconnections between the two layers assign weights to each input randomly.

3. A bias added to every input after weights are multiplied with them individually.

4. The weighted sum is transferred to the activation function.

5. The activation function determines which nodes it should fire for feature extraction.

6. The model applies an application function to the output layer to deliver the output.

7. Weights are adjusted, and the output is back-propagated to minimize error.

The model uses a cost function to reduce the error rate. We will have to change the weights
with different training models. The model compares the output with the original result. It
repeats the process to improve accuracy. The model adjusts the weights in every iteration to
enhance the accuracy of the output.

3.2.3 Types of Neural Network

There are different kinds of deep neural networks – and each has advantages and disadvantages,
depending upon the use. Examples include:

1. Convolutional neural networks (CNNs) contain five types of layers: input, convolution,
pooling, fully connected and output. Each layer has a specific purpose, like summarizing,
connecting or activating. Convolutional neural networks have popularized image classification
and object detection. However, CNNs have also been applied to other areas, such as natural
language processing and forecasting.

2. Recurrent neural networks (RNNs) use sequential information such as time-stamped data
from a sensor device or a spoken sentence, composed of a sequence of terms. Unlike traditional
neural networks, all inputs to a recurrent neural network are not independent of each other, and
the output for each element depends on the computations of its preceding elements. RNNs are
used in fore-casting and time series applications, sentiment analysis and other text applications.

Page | 12
3. Feedforward neural networks, in which each perceptron in one layer is connected to every
perceptron from the next layer. Information is fed forward from one layer to the next in the
forward direction only. There are no feedback loops.

4. Autoencoder neural networks are used to create abstractions called encoders, created from
a given set of inputs. Although similar to more traditional neural networks, autoencoders seek
to model the inputs themselves, and therefore the method is considered unsupervised. The
premise of autoencoders is to desensitize the irrelevant and sensitize the relevant. As layers are
added, further abstractions are formulated at higher layers (layers closest to the point at which
a decoder layer is introduced). These abstractions can then be used by linear or nonlinear
classifiers.

3.2.4 Activation Function

An Activation Function decides whether a neuron should be activated or not. This means that
it will decide whether the neuron's input to the network is important or not in the process of
prediction using simpler mathematical operations.

Importance of an Activation Function: The purpose of an activation function is to add non-


linearity to the neural network. Activation functions introduce an additional step at each layer
during the forward propagation, but its computation is worth it. Here is why—

Let’s suppose we have a neural network working without the activation functions. In that case,
every neuron will only be performing a linear transformation on the inputs using the weights
and biases. It’s because it doesn’t matter how many hidden layers we attach in the neural
network; all layers will behave in the same way because the composition of two linear functions
is a linear function itself. Although the neural network becomes simpler, learning any complex
task is impossible, and our model would be just a linear regression model. There are mainly
three types of Neural Networks Activation Functions are discussing below.

1. Binary Step Function

Binary step function depends on a threshold value that decides whether a neuron should be
activated or not. The input fed to the activation function is compared to a certain threshold; if
the input is greater than it, then the neuron is activated, else it is deactivated, meaning that its
output is not passed on to the next hidden layer.

Page | 13
1

0 1

Figure 3.3: Binary Step Function


0 𝑓𝑜𝑟 𝑥 < 0
Mathematically it can be represented as: f(x) = {
1 𝑓𝑜𝑟 𝑥 ≥ 0

Here are some of the limitations of binary step function:

 It cannot provide multi-value outputs—for example, it cannot be used for multi-class


classification problems.
 The gradient of the step function is zero, which causes a hindrance in the
backpropagation process.

2. Linear Activation Function

The linear activation function, also known as "no activation," or "identity function" (multiplied
x1.0), is where the activation is proportional to the input. The function doesn't do anything to
the weighted sum of the input, it simply spits out the value it was given.

Mathematically it can be represented as: f(x) = x

However, a linear activation function has two major problems :

 It’s not possible to use backpropagation as the derivative of the function is a constant
and has no relation to the input x.
 All layers of the neural network will collapse into one if a linear activation function is
used. No matter the number of layers in the neural network, the last layer will still be a
linear function of the first layer. So, essentially, a linear activation function turns the
neural network into just one layer.

Page | 14
20

10

0 20
10 20
Figure 3.4: Linear Activation Function
3. Non-Linear Activation Functions

The linear activation function shown above is simply a linear regression model. Because of its
limited power, this does not allow the model to create complex mappings between the
network’s inputs and outputs. Non-linear activation functions solve the following limitations
of linear activation functions:

 They allow backpropagation because now the derivative function would be related to
the input, and it’s possible to go back and understand which weights in the input
neurons can provide a better prediction.
 They allow the stacking of multiple layers of neurons as the output would now be a
non-linear combination of input passed through multiple layers. Any output can be
represented as a functional computation in a neural network.

Now, let’s have a look at some different non-linear neural networks activation functions:

1. Sigmoid / Logistic Activation Function

This function takes any real value as input and outputs values in the range of 0 to 1. The
larger the input (more positive), the closer the output value will be to 1.0, whereas the smaller
the input (more negative), the closer the output will be to 0.0, as shown in the figure 3.5 in
the next page. Sigmoid/logistic activation function is one of the most widely used functions
because:

It is commonly used for models where we have to predict the probability as an output. Since
probability of anything exists only between the range of 0 and 1, sigmoid is the right choice
because of its range.The function is differentiable and provides a smooth gradient, i.e.,

Page | 15
preventing jumps in output values. This is represented by an S-shape of the sigmoid activation
function. The derivative of the function is f'(x) = sigmoid(x)*(1-sigmoid(x)).

0.5

1 5 10

Figure 3.5: Logistic Regression


1
Mathematically it can be represented as: f(x) =
1+𝑒 −𝑥

2. Tanh Function (Hyperbolic Tangent)

Tanh function is very similar to the sigmoid/logistic activation function, and even has the same
S-shape with the difference in output range of -1 to 1. In Tanh, the larger the input (more
positive), the closer the output value will be to 1.0, whereas the smaller the input (more
negative), the closer the output will be to -1.0.

-1 0 0

-1
Figure 3.6: Tanh
Advantages of using this activation function are:

 The output of the tanh activation function is Zero centered; hence we can easily map
the output values as strongly negative, neutral, or strongly positive.
 Usually used in hidden layers of a neural network as its values lie between -1 to;
therefore, the mean for the hidden layer comes out to be 0 or very close to it. It helps in
centering the data and makes learning for the next layer much easier.

Page | 16
𝑒 𝑥 −𝑒 −𝑥
Mathematically it can be represented as: f(x) =
𝑒 𝑥 +𝑒 −𝑥
3. ReLU Function

ReLU stands for Rectified Linear Unit. Although it gives an impression of a linear function,
ReLU has a derivative function and allows for backpropagation while simultaneously making
it computationally efficient. The main catch here is that the ReLU function does not activate
all the neurons at the same time. The neurons will only be deactivated if the output of the linear
transformation is less than 0.

The advantages of using ReLU as an activation function are as follows:

 Since only a certain number of neurons are activated, the ReLU function is far more
computationally efficient when compared to the sigmoid and tanh functions.
 ReLU accelerates the convergence of gradient descent towards the global minimum of the
loss function due to its linear, non-saturating property.

Relu
4

-10 0 5 10

Figure 3.7: ReLU Function

Mathematically it can be represented as: f(x) = max (0, x)

4. Softmax Function

The softmax function, also known as normalized exponential function, converts a vector of K
real numbers into a probability distribution of K possible outcomes. It is a generalization of

the logistic function to multiple dimensions, and used in multinomial logistic regression .

Page | 17
Probability
1.0

-10 -5 0 5 10

Figure 3.8: Softmax Function

The Softmax function is described as a combination of multiple sigmoids. It calculates the relative
probabilities. Similar to the sigmoid/logistic activation function, the SoftMax function returns the
probability of each class. It is most commonly used as an activation function for the last layer of the
neural network in the case of multi-class classification.
𝑒𝑥𝑝(𝑧𝑖 )
Mathematically it can be represented as: Softmax, (𝑧𝑖 ) =
∑𝑗 𝑒𝑥𝑝(𝑧𝑗)

3.3 Convolutional Neural Network (CNN)


3.3.1 Introduction

Computer vision is evolving rapidly day-by-day. It’s one of the reason is deep learning. When we
talk about computer vision, a term convolutional neural network (abbreviated as CNN) comes in our
mind because CNN is heavily used here. Examples of CNN in computer vision are face recognition,
image classification etc. It is similar to the basic neural network. CNN also have learnable parameter
like neural network i.e, weights, biases etc.

3.3.2 Importance of CNN

Suppose we are working with MNIST dataset, each image in MNIST is 28 x 28 x 1(black & white
image contains only 1 channel). Total number of neurons in input layer will 28 x 28 = 784, this can
be manageable. What if the size of image is 1000 x 1000 which means we need 10⁶ neurons in input
layer. This seems a huge number of neurons are required for operation. It is computationally
ineffective. So here comes Convolutional Neural Network or CNN. In simple word what CNN does
is, it extract the feature of image and convert it into lower dimension without loosing its
characteristics. In the following example we can see that initial the size of the image is 224 x 224 x
3. If you proceed without convolution then we need 224 x 224 x 3 = 150,528 numbers of neurons in

Page | 18
input layer but after applying convolution we input tensor dimension is reduced to 1 x 1 x 1000. It
means we only need 1000 neurons in first layer of feedforward neural network.

Convolution+ReLu
Max pooling
Fully connected +ReLu
fully
Softmax

fully

Figure 3.9: Down-sampling

3.3.3 Few Definitions

There are few definitions we should know before understanding CNN.

1. Image Representation

Images are encoded into color channels, the image data is represented into each color intensity in a
color channel at a given point, the most common one being RGB, which means Red, Blue and Green.
The information contained into an image is the intensity of each channel color into the width and
height of the image. So the intensity of the red channel at each point with width and height can be
represented into a matrix, the same goes for the blue and green channels, so we end up having three
matrices, and when these are combined they form a tensor.

2. Edge Detection

Every image has vertical and horizontal edges which actually combining to form an image.
Convolution operation is used with some filters for detecting edges. Suppose we have gray
scale image with dimension 6 x 6 and filter of dimension 3 x 3. When 6 x 6 grey scale image
convolve with 3 x 3 filter, we get 4 x 4 image. First of all 3 x 3 filter matrix get multiplied with
first 3 x 3 size of our grey scale image, then we shift one column right up to end , after that we
shift one row and so on.
Page | 19
Gray White
White Convolution

White Gray Black


Gray

Figure 3.10: Convolution operation

The convolution operation can be visualized in the following way. Here our image dimension
is 4 x 4 and filter is 3 x 3, hence we are getting output after convolution is 2 x 2.

Figure 3.11: Visualization of convolution

If we have N x N image size and F x F filter size then after convolution result will be:

(N x N) * (F x F) = (N-F+1) x (N-F+1) (Apply this for above case)

3. Stride and Padding

Stride denotes how many steps we are moving in each steps in convolution. By default it is
one. Stride is a parameter that works in conjunction with padding, the feature that adds blank,
or empty pixels to the frame of the image to allow for a minimized reduction of size in the
output layer. Roughly, it is a way of increasing the size of an image, to counteract the fact that
stride reduces the size.

In order to work the kernel with processing in the image, padding is added to the outer frame
of the image to allow for more space for the filter to cover in the image. Adding padding to an
image processed by a CNN allows for a more accurate analysis of images.

Page | 20
Figure 3.12: Convolution with Stride 1

We can observe that the size of output is smaller than input. To maintain the dimension of
output as in input, we use padding. Padding is a process of adding zeros to the input matrix
symmetrically. In the following example, the extra grey blocks denote the padding. It is used
to make the dimension of output same as input.

Figure 3.13: Stride 1 with Padding 1

Suppose ‘p’ is the padding. Initially (without padding)

(N x N) * (F x F) = (N-F+1) x (N-F+1)---(1) After applying padding

N*N
N*N

F*F
N*N
(N+2P)* (N+2P)
Figure 3.14: After applying padding

If we apply filter F x F in (N+2p) x (N+2p) input matrix with padding, then we will get output
matrix dimension (N+2p-F+1) x (N+2p-F+1). As we know that after applying padding we will
get the same dimension as original input dimension (N x N). Hence we have,
Page | 21
(N+2p-F+1) x (N+2p-F+1) equivalent to N x N

N+2p-F+1 = N ---(2)

p = (F-1)/2 ---(3)

The equation (3) clearly shows that Padding depends on the dimension of filter.

4. Layers in CNN

There are five different layers in CNN

 Input layer
 Convo layer (Convo + ReLU)
 Pooling layer
 Fully connected (FC) layer
 Softmax/logistic layer
 Output layer

Input Convo Pooling FC Output

Figure 3.15: Different layers of CNN

i. Input Layer

Input layer in CNN should contain image data. Image data is represented by three dimensional
matrix as we saw earlier. We need to reshape it into a single column. Suppose we have image
of dimension 28 x 28 =784, we need to convert it into 784 x 1 before feeding into input. If we
have “m” training examples then dimension of input will be (784, m).

ii. Convo Layer

Convo layer is sometimes called feature extractor layer because features of the image are get
extracted within this layer. First of all, a part of image is connected to Convo layer to perform
convolution operation as we saw earlier and calculating the dot product between receptive field
(it is a local region of the input image that has the same size as that of filter) and the filter.
Result of the operation is single integer of the output volume. Then we slide the filter over the
next receptive field of the same input image by a Stride and do the same operation again. We
will repeat the same process again and again until we go through the whole image. The output

Page | 22
will be the input for the next layer. Convo layer also contains ReLU activation to make all
negative value to zero.

iii. Pooling Layer

Single depth slice

Max pool with 2*2 filters


and stride 2

Figure 3.16: Pooling layers

Pooling layer is used to reduce the spatial volume of input image after convolution. It is used
between two convolution layer. If we apply FC after Convo layer without applying pooling or
max pooling, then it will be computationally expensive and we don’t want it. So, the max
pooling is only way to reduce the spatial volume of input image. In the above example, we
have applied max pooling in single depth slice with Stride of 2. We can observe the 4 x 4
dimension input is reduce to 2 x 2 dimension.

There is no parameter in pooling layer but it has two hyperparameters — Filter(F) and Stride(S).
In general, if we have input dimension W1 x H1 x D1, then
W2 = (W1−F)/S+1

H2 = (H1−F)/S+1

D2 = D1

Where W2, H2 and D2 are the width, height and depth of output.

iv. Fully Connected Layer (FC)

Fully connected layer involves weights, biases, and neurons. It connects neurons in one layer
to neurons in another layer. It is used to classify images between different category by training.

Page | 23
v. Softmax / Logistic Layer

Softmax or Logistic layer is the last layer of CNN. It resides at the end of FC layer. Logistic is
used for binary classification and softmax is for multi-classification.

vi. Output Layer

Output layer contains the label which is in the form of one-hot encoded.

Page | 24
Chapter 4

Methodology
4.1. Dataset Description

The proposed model used a dataset named Ekush for training and validation and another dataset
named BanglaLekha-Isolated for testing. BanglaLekha-Isolated dataset is a collection of
bangla handwritten isolated character samples. It contains samples of 50 Bangla basic
characters, 10 Bangla numerals and 24 selected compound characters. 2000 handwriting
samples for each of the 84 characters were collected, digitized and pre-processed. After
discarding mistakes and scribbles, 1,66,105 handwritten character images were included in the
final dataset. The dataset also includes information about the age and gender of the subjects
from whom the handwriting samples were collected. This information is mapped to each
individual image. A separate spreadsheet gives an assessment of the aesthetic quality of the
handwriting samples, collected from three independent assessors. This assessment is done on
groups of 84 characters and not on individual characters.

The Ekush dataset has total 368,776 images where 155,570 alphabets, 151,607 compound
characters, 30830 digits and 30769 modifiers. Ekush dataset’s image resolution depends on
character size. Most of the images have less padding with a black background while the
character in white.

(a) BanglaLekha-Isolated (b) Ekush

Figure 4.1: Sample images of used datasets

4.2 Preparation of dataset

Data preparation plays an important role in deep learning. Data is everywhere, however, the
problem is the lack of processed data. Our proposed model used Ekush dataset and
BanglaLekha-Isolated dataset.

In this paper, we proposed a model for classifying Bangla Handwritten Character, which
contains 50 basic Bangla characters (11 vowels and 39 consonants). We have kept 650000

Page | 25
images from two of the datasets where 40000 images are for training, 10000 images for
validation and 15000 images are for testing and we deleted the rest of the data from the dataset
for the purpose of the ease of our work and for the reduction of time complexity. There are
total 65000 images of 50 classes. Each Class contains 800 images for training, 200 images for
validation and 300 images for testing the model. We used 50000 images from Ekush dataset
and 15000 images from BanglaLekha-Isolated dataset.

Ekush dataset images background is white and characters are black. Firstly, we inverted all the
images to make the background black and character to white. Black pixels represent the value
0, which reduce lots of computation. The images of Ekush dataset are different in height and
width to reduce unnecessary information.

4.3 Preprocessing

Images are preprocessed before feeding to machine learning algorithms. We have preprocessed
Ekush dataset and converted it similar to the testing dataset named BanglaLekha_isolated
because two datasets are completely different. BanglaLekha-Isolated has black background
color with white color characters and Ekush has white background color with black color
characters. The original images were in RGB format. That means, each image has three layers
red, green and blue. We first converted those images of Ekush dataset into grayscale. Grayscale
image has only one layer of n rows and m columns. The images were then resized to 64*64
pixels. After that we inverted each image and used canny edge detector for making the images
of two datasets similar for the HCR purpose. During training time, the image pixel values are
normalized by dividing with 255.

After preprocessing our datasets, we have created a model using Convolutional Neural
Network. So, before showing our proposed model we want to represent some important
information about Neural network and Convolutional Neural Network.

4.4 RGB to Gray


Using cv2 library we have imported our images. We can create RGB image to gray by
manually. But there already exists various short ways. Using cv2 library we have converted the
images to grayscale image. Gray image has only one channel.

4.5 Resizing and Rescaling


The images are resized to 64*64 pixels. Then we performed scaling by dividing each image
with 255. Now, our dataset is ready for feeding to machine learning models.
Page | 26
4.6 Train, Test and Validation split

To measuring the model performance, train test and validation split were created. The training
set is used to train the model with the known output. Validation set used to check model
performance during training time and help the model to tune the hyper-parameters. And test
data used to check the final model performance after training. For training and validation
purpose we used the Ekush dataset. The Ekush dataset has 50000 characters images. 10000
characters 20% of total used in validation and 40000 characters 80% used to train the model.
For testing the model BanglaLekha-Isolated was used which is completely comes from
different distribution. BanglaLekha-Isolated has 15,000 basic characters. All 15,000 images
were used to measure model performances.

4.7 Proposed Methodology

Dataset 02
Dataset 01

Preprocessing
Preprocessing
 RGB to Gray
 RGB to Gray  Resizing
 Resizing  Canny edge
 Color inversion detection
 Canny edge  Scaling
detection
 Scaling Trained CNN model
Testing

CNN Training Result

Figure 4.2: Block diagram of the proposed methodology

We have worked with 2 datasets. Before applying CNN algorithm we preprocessed our
datasets. After preprocessing we have applied CNN in our Dataset 01 named Ekush and got a
new model for bangla handwritten characters recognition. Then we have tested our model using
our second dataset named BanglaLekha-Isolated and after different parameters tuning finally
we have gained a desired result.

Page | 27
4.8 Overview of the proposed model’s architecture

Proposed model used a multilayer CNN for classifying Bangla Handwritten Characters. This
model used convolution, Max pooling layer, fully connected dense layer and dropout. Our
proposed model have 16 layers in which first 2 layers are convolutional layer and 3rd layer is
max-pooling layer and this arrangement of layers repeat four times. After that in 13th layer
flatten is used 14th is dense layer, 15th is dropout layer and the last layer is also a dense layer.

Layer 1 and 2 are a convolutional layer with a filter size of 32 and kernel size of 3, these two
layers also use ReLU activation with the same padding. The output of these layer later
connected with max pooling layer 3.

The output of layer 3 than goes layer 4. Layer 4 and 5 are a convolution layer with a filter size
of 64 and kernel size of 3, these two layers also use ReLU activation with the same padding.
The output of these layer later connected with max pooling layer 6.

Similarly, Layer 7 and 8 are a convolution layer with a filter size of 128, dilation rate 2 and
kernel size of 3, these two layers use ReLU activation with the same padding. The output of
these layer later connected with max pooling layer 9.

Layer 10 and 11 both have a filter size of 256, dilation rate, activation function and padding
are similar to previous layers. The output of these layer later connected with max pooling layer
12. After all of these 12 operations, the output is flatten into an array and pass through a fully
connected dense layer 14 with 256 hidden units and regularized with 25% dropout.

The output of the layer 15 connected with a fully connected dense layer 16 with 50 nodes with
SoftMax activation which is also the output layer for the model. Figure 4.3 showing the
proposed architect in the next page.

Page | 28
4.9 Block diagram of the proposed model’s architecture

Input

Conv2D

Conv2D

MaxPooling2D

Conv2D

Conv2D

MaxPooling2D

Conv2D

Conv2D

MaxPooling2D

Conv2D

Conv2D

MaxPooling2D

Flatten

Dense

Dropout

Dense

Output

Figure 4.3: Architecture of our proposed model

Page | 29
Chapter 5
Result Evaluation
5.1 Environmental Setup
Colaboratory, or Colab for short, is a Google Research product, which allows developers to
write and execute Python code through their browser. Google Colab is an excellent tool for
deep learning tasks. It is also good for the basic Machine Learning Models. It is a hosted Jupyter
notebook that requires no setup and has an excellent free version, which gives free access to
Google computing resources such as GPUs and TPUs. We have used GPU for our thesis work.
We have used google colab to run our experimental model. It provides with 12 GB of RAM
and 40 GB to 300 GB of storage.

5.2 Training the model

The Proposed model was trained on Ekush dataset with a batch size of 128. After 50 epochs
the model got good accuracy. The automatic learning rate reduction formula helps the
optimizer to converge faster by reducing the learning rate. End of the training the learning rate
reduced by 0.001 to 1.5x10-5.

5.3 Model performance

We have trained our CNN model several times. We have chosen different epoch number and
observed the performance of the CNN model. After 10 epochs, the model gives 97.30%
training accuracy, 94.21% validation accuracy and 92.55% test accuracy. After 20 epochs, the
model gives 99.04% training accuracy, 94.66% validation accuracy and 93.34% test accuracy.
We have also measured the performance taking 30, 40, and 50 epochs.

Epochs Training Validation Test Precision Recall F1-Score


Accuracy Accuracy Accuracy
10 97.30 94.21 92.55 93 93 93
20 99.04 94.66 93.34 94 93 93
30 99.23 94.73 93.88 94 94 94
40 99.55 95.09 93.93 94 94 94
50 99.38 95.19 94.47 95 94 94
Table 5.1: Proposed Model’s Performance Summary

After 50 epochs the model gives 99.38% training accuracy, 95.19% validation accuracy and
94.47% test accuracy. All values are listed in the above table including the precision, recall and
f1-score. We have tried different model optimizer and learning rate. We have used SGD and
Adam optimizer among them Adam optimizer performed well with a learning rate 0.001.

Page | 30
We have categorical cross entropy as model’s loss function. To measure performance, accuracy
has been selected.

In the below figure, we have plotted the train accuracy and validation accuracy with respect to
number of epochs. It shows a stable training and validation accuracy.

Figure 5.1: Accuracy graph of proposed model


In the below figure training and validation loss has been plotted. From the figure we can see
that training loss is a continuous stable graph but validation graph is not that much stable.

Figure 5.2: Loss graph of proposed model

5.4 Output Summary of our Proposed Model

In this thesis we used 2D CNN with 5 different layers named convolution, max-pooling, flatten,
dense and dropout layer. In figure 5.1 we have showed our proposed architect summary in the

Page | 31
next page. Our model used total 2,233,362 parameters in which trainable parameters are
2,233,362 and non-trainable parameters is 0.

Layer(type) Output Shape Parameter


conv2d_1_1 (Conv2D) (None, 64, 64, 32) 320
conv2d_1_2 (Conv2D (None, 64, 64, 32) 9248
max_pooling2d_1 (MaxPooling2D) (None, 32, 32, 32) 0
conv2d_2_1 (Conv2D (None, 32, 32, 64) 18496
conv2d_2_2 (Conv2D (None, 32, 32, 64) 36928
max_pooling2d_2 (MaxPooling2D) (None, 16, 16, 64) 0
conv2d_3_1 (Conv2D (None, 16, 16, 128) 73856
conv2d_3_2 (Conv2D (None, 16, 16, 128) 147584
max_pooling2d_3 (MaxPooling2D) (None, 8, 8, 128) 0
conv2d_4_1 (Conv2D (None, 8, 8, 256) 295168
conv2d_4_2 (Conv2D (None, 8, 8, 256) 590080
max_pooling2d_4 (MaxPooling2D) (None, 4, 4, 256) 0
Flatten (None, 4096) 0
Dense (None, 256) 1048832
Dropout (None, 256) 0
Dense (None, 50) 12850

Table 5.2: Proposed Architect Summary

5.5 Performance Metrics


To evaluate the performance or quality of the model, different metrics are used, and these
metrics are known as performance metrics or evaluation metrics.

5.5.1 Accuracy

The accuracy metric is one of the simplest Classification metrics to implement, and it can be
determined as the number of correct predictions to the total number of predictions. It can be
formulated as: Accuracy = Number of correct predictions / Total number of predictions

5.5.2 Confusion Matrix

The accuracy rate can judge the classification ability of the model, but the specific details
cannot be reflected. The confusion matrix is the comparison matrix between the predicted result

Page | 32
and the actual value, which can clearly indicate the prediction details of each category when
the classification model is making predictions.

The confusion matrix shows that the diagonal values are the highest values. These values are
true positive and true negative values. The high true positive and true negative values indicate
that the model’s performance is very good.

Figure 5.3: Confusion matrix of our proposed model

5.5.3 Precision

Precision is the ability of a classifier not to label an instance positive that is actually negative.
For each class, it is defined as the ratio of true positives to the sum of a true positive and false
positive.

Precision:- Accuracy of positive predictions.

𝑇𝑃
Precision =
(𝑇𝑃 + 𝐹𝑃)

5.5.4 Recall

Recall is the ability of a classifier to find all positive instances. For each class it is defined as
the ratio of true positives to the sum of true positives and false negatives.

Recall:- Fraction of positives that were correctly identified.

𝑻𝑷
Recall =
(TP+FN)

Page | 33
5.5.5 F1 score

The F1 score is a weighted harmonic mean of precision and recall such that the best score is
1.0 and the worst is 0.0. F1 scores are lower than accuracy measures as they embed precision
and recall into their computation. As a rule of thumb, the weighted average of F1 should be
used to compare classifier models, not global accuracy.

2∗(Recall ∗ Precision)
F1 Score =
(Recall + Precision)

5.6 Discussion
We have evaluated our model with two different datasets. Ekush dataset is used for training
and validation of our model. We spilted it 80% : 20% for this purposes where 80% is used for
training and 20% is used for validation. BanglaLekha-Isolated dataset is used for testing our
model. We performed 50 epochs. After 50 epochs completed we have gained 99.38% training
accuracy, 95.19% validation accuracy and 94.47% testing accuracy.

Page | 34
Chapter 6

Conclusion
Conclusion
Bangla Handwritten Characters Recognition is the ability of a computer to receive and
intelligently interpret Bangla handwritten character as input from sources such as paper
documents, photographs, touch-screens and other devices. In this thesis, we used character
images for Bangla HCR. We have two Bangla handwritten character datasets one of which is
BanglaLekha-Isolated and the other is Ekush. Ekush dataset is used for training and validation
of our model. For this purposes we spilted it 80% : 20% where 80% is used for training and
20% is used for validation. BanglaLekha-Isolated dataset is used for testing our model. We
have to design a CNN model and train this model using Ekush dataset in such a way that it can
identify handwritten Bangla character efficiently. The Bangla characters images are collected
and done various processes such as pre-processing, feature extraction using CNN and
classification for this purpose.

We have tuned different parameters to gain highest accuracy. We have tried different model
optimizer and learning rate. We used SGD (Stochastic Gradient Descent) and Adam optimizer
among them Adam optimizer performed well with a learning rate 0.001. We performed 50
epochs. After 50 epochs completed we have gained 99.38% training accuracy, 95.19%
validation accuracy and 94.47% testing accuracy and we are satisfied with the performance of
our model.

Page | 35
References
[1] E. Kavallieratou, N. Liolios, E. Koutsogeorgos, N. Fakotakis, G. Kokkinakis,
"The GRUHD Database of Greek Unconstrained Handwriting",ICDAR, 2001,
pp. 561-565

[2] F. Yin, Q.-F. Wang, X.-Y. Zhang, and C.-L. Liu. ICDAR 2013 Chinese
Handwriting Recognition Competition. 2013 12th International Conference on
Document Analysis and Recognition (ICDAR), pages 1464–1470, 2013

[3] B. Zhu, X.-D. Zhou, C.-L. Liu, and M. Nakagawa. A robust model for on-line
handwritten Japanese text recognition. IJDAR, 13(2):121–131,Jan. 2010.

[4] B. B. Chaudhuri, U. Pal, “A complete printed Bangla OCR system,” Pattern


Recognition, vol. 31, pp. 531–549, 1998.

[5] U. Pal, “On the development of an optical character recognition (OCR) system
for printed Bangla script,” Ph.D. Thesis, 1997.

[6] U. Pal, B. B. Chaudhuri, “Automatic recognition of unconstrained offline


Bangla hand-written numerals,” T. Tan, Y. Shi, W. Gao (Eds.), Advances in
Multimodal Interfaces, Lecture Notes in Computer Science, vol. 1948, Springer,
Berlin, pp. 371–378, 2000.

[7] K. Roy, S. Vajda, U. Pal, B.B. Chaudhuri, “A system towards Indian postal
automation,” Proceedings of the Ninth International Workshop on Frontiers in
Handwritten Recognition (IWFHR-9), pp. 580–585, October 2004.

[8] U. Pal, A. Belad, Ch. Choisy, “Touching numeral segmentation using water
reservoir concept,” Pattern Recognition Lett. vol. 24, pp. 261-272, 2003.

[9] Halima Begum et al, Recognition of Handwritten Bangla Characters using


Gabor Filter and Artificial Neural Network, International Journal of Computer
Technology & Applications, Vol 8(5),618-621, ISSN:2229-6093.

[10] Nibaran Das, Sandip Pramanik, “Recognition of Handwritten Bangla Basic


Character and Digit Using Convex Hall Basic Feature”. 2009 International
Conference on Artificial Intelligence and Pattern Recognition(AIPR-09)

[11] N. Das 1, B. Das, R. Sarkar, S. Basu, M. Kundu, M. Nasipuri, “Handwritten


Bangla Basic and Compound character recognition using MLP and SVM
classifier,” Journal of Computing, vol. 2, Feb. 2010.

[12] T.K. Bhowmik, U. Bhattacharya, S.K. Parui, Recognition of Bangla


handwritten characters using an MLP classifier based on stroke features, in:
Proceedings of ICONIP, Kolkata, India, 2004, pp. 814–819.

Page | 36
[13] K. Roy, U. Pal, F. Kimura, Bangla handwritten character recognition, in:
Proceedings of the Second Indian International Conference on Artificial
Intelligence (IICAI), 2005, pp. 431–443.

[14] S. Basu, N. Das, R. Sarkar, M. Kundu, M. Nasipuri, D.K. Basu, Handwritten


Bangla alphabet recognition using an MLP based classifier, in: Proceedings of
the Second National Conference on Computer Processing of Bangla, Dhaka,
2005, pp. 285–291.

[15] A.F.R. Rahman, R. Rahman, M.C. Fairhurst, Recognition of handwritten


Bengali characters: a novel multistage approach, Pattern Recognition 35 (2002)
997–1006.

[16] U. Bhattacharya, S.K. Parui, M. Sridhar, F. Kimura, Two-stage recognition of


handwritten Bangla alphanumeric characters using neural classifiers, in:
Proceedings of the Second Indian International Conference on Artificial
Intelligence (IICAI), 2005, pp. 1357–1376.

[17] U. Bhattacharya, M. Sridhar, S.K. Parui, On recognition of handwritten Bangla


characters, in: Proceedings of the ICVGIP-06, Lecture Notes in Computer
Science, vol. 4338, 2006, pp. 817–828.

[18] AKM Shahariar Azad Rabbya, Sadeka Haquea, Md. Sanzidul Islama, Sheikh
Abujara and Syed Akhter Hossain, On bornoNet: Bangla Handwritten
Characters Recognition Using Convolutional Neural Network, in: ICACC-
2018.

[19] Ahmed El-Sawy, Mohamed Loey, Hazem EL-Bakry On arabic Handwritten


Characters Recognition using Convolutional Neural Network in: WSEAS
Transactions on Computer Research, January 2017.

Page | 37

You might also like