Sufi An 2020

Journal of King Saud University – Computer and Information Sciences xxx (xxxx) xxx
Contents lists available at ScienceDirect
Journal of King Saud University –

Computer and Information Sciences
journal homepage: www.sciencedirect.com
BDNet: Bengali Handwritten Numeral Digit Recognition based on

Densely connected Convolutional Neural Networks
Abu Sufian a,⇑, Anirudha Ghosh a, Avijit Naskar a, Farhana Sultana a, Jaya Sil b, M.M. Hafizur Rahman c
a
Dept. of Computer Science, University of Gour Banga, West Bengal, India
b
Dept. of Computer Science & Technology, IIEST Shibpur, West Bengal, India
c
Dept. of Computer Networks and Communications, CCSIT, King Faisal University, Al Ahsa 31982, Saudi Arabia
a r t i c l e i n f o a b s t r a c t
Article history: Images of handwritten digits are different from natural images as the orientation of a digit, as well as sim-
Received 31 December 2019 ilarity of features of different digits, makes confusion. On the other hand, deep convolutional neural net-
Revised 20 February 2020 works are achieving huge success in computer vision problems, especially in image classification. Here,
Accepted 1 March 2020
we propose a task-oriented model called Bengali handwritten numeral digit recognition based on densely
Available online xxxx
connected convolutional neural networks (BDNet). BDNet is used to classify (recognize) Bengali hand-
written numeral digits. It is end-to-end trained using ISI Bengali handwritten numeral dataset. During
Keywords:
training, untraditional data preprocessing and augmentation techniques are used so that the trained
Bengali digit recognition
CNN
model works on a different dataset. The model has achieved the test accuracy of 99.78% (baseline was
Dataset 99.58%) on the test dataset of ISI Bengali handwritten numerals. So, the BDNet model gives 47.62% error
Deep learning reduction compared to previous state-of-the-art models. Here we have also created a dataset of 1000
Handwritten numerals images of Bengali handwritten numerals to test the trained model, and it giving promising results.
Image classification Codes, trained model and our own dataset are available at C :https://github.com/Sufianlab/BDNet.
Ó 2020 The Authors. Production and hosting by Elsevier B.V. on behalf of King Saud University. This is an
open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
1. Introduction The recent success of deep learning, especially Convolutional Neu-

ral Network (CNN) for computer vision (Ghosh et al., 2019;
Bangla (Bengali) is the second most spoken language in India. It Krizhevsky et al., 2012; LeCun et al., 2015; Gu et al., 2018
ranks fifth in Asia and it is also in the top ten spoken languages in Sultana et al., 2018) has inspired many researchers to use the
the world (Majumder et al., 2007). So, a huge number of people CNN to recognize handwritten characters and digits as a computer
depend on this language for their day to day communication. vision task. The BDNet, we proposing through this paper, is a task-
Therefore, automatic recognition of Bengali handwritten charac- oriented deep CNN based model to classify Bengali numeral digits.
ters and numeral digits are needed to be digitized for making the This model trained using untraditional preprocessing and data
communication smoother. Many research works and models have augmentation techniques for trained with generalization. It has
been proposed to recognize Bengali handwritten characters and achieved new baseline accuracy on benchmark datasets
numeral digits so far, but still, a huge scope is there to improve this (Bhattacharya and Chaudhuri, 2009). Here we have also proposed
task in terms of accuracy and applicability. Most of the previously a new dataset of 1000 images of Bengali handwritten numerals.
proposed models are based on traditional pattern recognition and This own dataset used to test the trained model, and it giving
machine learning techniques where human expertise is required promising results but this new dataset not used during training.
for feature engineering (Bag and Harit, 2013; Pal et al., 2012). The working pipeline of the BDNet is designed by an inspiration
of DenseNet (Huang et al., 2017) which is one of the state-of-the-
art deep CNN algorithm for image classification. The conceptual
⇑ Corresponding author. view of the BDNet has shown in Fig. 1 and details of the model
E-mail address: sufian.csa@gmail.com (A. Sufian). have explained in Section 3.
Peer review under responsibility of King Saud University.
1.1. Contributions of this paper
A deep CNN working model, called BDNet, is designed to recog-

Production and hosting by Elsevier nize Bengali handwritten numerals.
https://doi.org/10.1016/j.jksuci.2020.03.002
1319-1578/Ó 2020 The Authors. Production and hosting by Elsevier B.V. on behalf of King Saud University.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Please cite this article as: A. Sufian, A. Ghosh, A. Naskar et al., BDNet: Bengali Handwritten Numeral Digit Recognition based on Densely connected Con-
volutional Neural Networks, Journal of King Saud University – Computer and Information Sciences, https://doi.org/10.1016/j.jksuci.2020.03.002
2 A. Sufian et al. / Journal of King Saud University – Computer and Information Sciences xxx (xxxx) xxx
In Pal et al. (2006) proposed a scheme where unconstrained off-

line Bengali handwritten numerals were recognized. This scheme
has recognized different handwritten styles. The scheme selects
the required features using the concept of water overflow from
the reservoir, and also collect topological and structural features
of the numerals. They applied this scheme on their own collected
dataset of size 12000 and obtained recognition accuracy of around
92.8%. U. Bhattacharya and B. B. Choudhury presented a handwrit-
ten numeral database along with the Devanagari database and pro-
posed a classifier model (Bhattacharya and Chaudhuri, 2009). Their
database contains 23392 handwritten Bengali numeral images.
Their classifier model is a multi-stage cascaded recognition scheme
where they used wavelet-based multi-resolution representations
and multilayer perception as classifiers. They have mentioned
99.14% training and 98.20% testing accuracy on this dataset.
Cheng-Lin Liu and Ching Y. Suen proposed a benchmark model
(Liu and Suen, 2009) on the ISI numeral dataset (Bhattacharya
and Chaudhuri, 2009) along with a Farsi numeral database. They
preprocessed the dataset into gray-scale images and applied many
traditional feature extraction models. This benchmark model
Fig. 1. Overview of the BDNet model.
achieved the highest test accuracy of 99.40%. Ying Wen and
Lianghua He proposed a classifier model (Wen and He, 2012) for
The BDNet is trained end-to-end with unorthodox data prepro- Bengali handwritten numeral recognition. This model tried to solve
cessing and augmentation. large dataset high dimensionality problem. They combined Baye-
The proposed model has achieved the highest test accuracy sian discriminant with kernel approach with UCI dataset and
with 47.62% error reduction on baseline result (99.78% whereas another dataset such as MNIST (Leun, 2020). The rate of error is
the previous best was 99.58%) on the test data of ISI Bengali 1.8%, the recognition rate is 99.08% and recognition time is
handwritten numeral dataset. 7.46 ms. Local region identification, where optimal unambiguous
A new dataset is created with 1000 samples of handwritten features are extracted, is one of the crucial tasks in the field of
Bengali numerals for testing performance of the trained BDNet character recognition. This idea is adopted by N. Das et al. in their
where the model gives promising results. handwritten digit recognition technique (Das et al., 2012) based on
a genetic algorithm(GA). GA is applied to seven sets of local
regions. For each set, GA selects a minimal local region group with
1.2. Organization of the paper
a Support Vector Machine (SVM) based classifier. The whole digit
images are used for global feature extraction whereas local fea-
The rest of the paper is organized as follows: In Section 2, a lit-
tures are extracted for shape information. The number of global
erature review has done. Model details of BDNet are explained in
features is constant whereas the number of local features depends
Section 3. In Section 4, the dataset and preprocessing of data are
on the number of the local regions. The test accuracy rate was
explained. Training details are explained in Section 5 and result
95.50% for this model. M. K. Nasir and M. S. Uddin proposed a
analysis is in Section 6. Finally, the paper is concluded in Section 7.
scheme (Nasir and Uddin, 2013) where they used K-Means cluster-
ing, Bayes’ theorem and Maximum a Posteriori for feature extrac-
2. Literature Review tion, and for classification, SVM is used. After converting the
images into binary values, some points are found, which was dis-
For background analysis, we have reviewed two relevant carded using a flood fill algorithm. The plinth steps are Clipping,
things: one is existing research works on Bengali handwritten Segmentation, Horizontal, and Vertical Thinning Scan. Here test
numeral recognition and another is the advancements of deep accuracy rate was 99.33%. In Rahman et al. (2015) M. M. Rahaman
learning models for image classification. The first part of this et al. proposed a CNN based model. This method normalizes the
review is for baseline results and domain knowledge of target written character images and then employed CNN to classify
application, whereas later part is for the idea about the latest individual characters. It does not employ any feature extraction
trends of deep learning-based state-of-the-art algorithms. In this method like previously mentioned works. The major steps are
section, we have reviewed these two things in the following two pre-processing of raw images by converting them into a
subsections: gray-scale images and then training the model. In this case, test
accuracy was 85.36%. On another paper, we have seen the exis-
2.1. Existing works on Bengali handwritten numeral recognition tence of auto-encoder for unsupervised pre-training through Deep
CNN which consists of more than one hidden layer with 3 convo-
Bengali handwritten numeral recognition is one of the oldest lutional layers. Each layer was followed by 2 2 max-pooling
pattern recognition problems. Many researchers have been work- layer. This scheme Shopon et al. (2017) was proposed by Md
ing in this field since the 90s of the last century (Dutta and Shopon et al. The layers have 32 3 3 number of kernels. In
Chaudhury, 1993; Pal and Chaudhuri, 2000). Through this subsec- the same manner, the decoder has an architecture with each con-
tion, we have reviewed most notable works on this Bengali hand- volutional layer with 5 neurons, rather than 32. The ReLU (Maas
written numeral recognition. Subhadip Basu et al. proposed et al., 2013) activation is present in all layers. For training pur-
Handwritten Bangla Digit Recognition using Classifier Combination poses, the model enhanced the training dataset by randomly rotat-
Through Dempster-Shafer (DS) Technique (Basu et al., 2005). They ing each image between 0 degrees and 50 degrees and also by
have used the DS technique and MLP classifier for classification and shifting vertically by a random amount between 0 pixels and 6 pix-
also used 3-fold cross-validation on the training dataset of 6000 els. This model was trained in 3 various setups SCM, SCMA, and
handwritten samples. Their scheme achieved 95.1% test accuracy. ACMA. They have achieved a test accuracy of 99.50%. Another
A. Sufian et al. / Journal of King Saud University – Computer and Information Sciences xxx (xxxx) xxx 3
Fig. 2. Building blocks of a classical CNN model (Sultana et al., 2018).
model (Akhand et al., 2016), proposed by M. A. H. Akhand et al. recognition (Ghosh et al., 2019). Therefore, many successive
used pre-processing by using a simple rotation based approach state-of-the-art models came within a short span of time since
to produce patterns and it also makes all images of ISI handwritten 2012 (Sultana et al., 2018; Lim and Keles, 2018). In this subsection,
database into the same resolution, dimension, and size. CNN struc- we briefly reviewed the development of deep learning especially
ture of this model has two convolutional layers with 5 5 sized Convolutional Neural Networks in the field of image classifications.
local receptive fields and two sub-sampling layers with 2 2 sized CNN is a special type of multi-layer neural network inspired by the
local averaging areas along with input and output layers. The input vision mechanism of the animal (Ghosh et al., 2019). Hubel and
layer contains 784 receptive fields for 28 28 pixels image. The Wiesel experimented and said that visual cortex cells of animal
first convolutional operation produces six feature maps. Convolu- detect light in the small receptive field (Fukushima, 1980). Kuni-
tion operation with kernel spatial dimension of 5 reduces 28 spa- hiko Fukushima got motivation from this experiment and pro-
tial dimension to 24 (i.e., 28 þ 1 5) spatial dimension. posed a multi-layered neural network, called NEOCOGNITRON
Therefore, each first level feature map size is 24 24. The accuracy (Hubel and Wiesel, 1968), capable of recognizing visual patterns
rate of the testing is 98.45% on ISI handwritten Bengali numerals. hierarchically through learning. This model is considered as the
In Choudhury et al. (2018), A. Choudhury et al. proposed a his- inspiration for CNN. A classical CNN model is composed of one or
togram of oriented gradient (HOG) and color histogram for the more blocks of convolutional and sub-sampling or pooling layer,
selection of feature algorithms. Here, HOG is used as the feature then single or multiple fully connected layers, and an output layer
set to represent each numeral item at the feature space and SVM function as shown in Fig. 2. The benefits of using CNN are auto-
is used to produce the output from input. The test accuracy of this mated features extraction, parameter sharing and many more
algorithm is 98.05% on CMATERDB 3.1.1 dataset (which is a bench- (Lecun et al., 1998; Krizhevsky et al., 2012; Nanni et al., 2017).
mark Bengali handwritten numeral database created by CMATER Classical CNN is modified in many different ways according to tar-
lab of Jadavpur University, India). M. M. Hasan et al. proposed a get domain (Sultana et al., 2018; Liang et al., 2017; Baldominos
Bengali handwritten digit recognition model based on ResNet (He et al., 2018; Sultana et al., 2019). Yann LeCun et al. introduced
et al., 2016). Their ensemble model from their six best models, the first complete CNN model, called LeNet-5 (Lecun et al., 1998),
applied on the NumtaDB dataset (Alam, 2020), achieved to classify English handwritten digit images. It has 7 layers among
99.3359% test accuracy. In Noor et al. (2018) R. Noor et al. proposed which 3 convolutions, 2 average pooling, 1 fully connected, and 1
an ensemble model based Convolutional Neural Network for recog- output layer. They used SIGMOID function as the activation func-
nizing Bengali handwritten numerals. They train their model in tion for non-linearity before an average pooling layer. The output
many noisy conditions using customized NumtaDB dataset layer used Euclidean Radial Basis Function(RBF) for classification
(Alam, 2020). In all cases, their model achieved more than 96% test of MNIST (Leun, 2020) dataset. The weights of each layer were
accuracy on this NumtaDB dataset. A recent Bengali handwritten trained using the back-propagation algorithm (Rumelhart et al.,
numeral recognition work (Rabby et al., 2019) was there, proposed 1986). AlexNet (Krizhevsky et al., 2012) was the first CNN based
by AKM S. A. Rabby et.al. Here authors used a lightweight CNN model which won the ILSVRC challenge (Russakovsky et al.,
model to classify the handwritten numeral digits. This model is 2015) in 2012 with a significant reduction of error. AlexNet’s error
trained using ISI handwritten Bengali numeral (Bhattacharya and rate was 16.4% whereas the second best error rate was 26.17%. This
Chaudhuri, 2009) and CMATERDB 3.1.1 databases with 20% data model was proposed by Alex Krizhevsky et al. and it is trained by
for validation. Their work achieved validation and test accuracy using ImageNet dataset (Deng et al., 2009), this dataset contains
as 99.74% and 99.58% on ISI handwritten numerals, which is the 15 million high resolution labeled images over 22 thousand cate-
previous best accuracy on that dataset. Their model also shows gories. AlexNet has 11 trainable layers, and the structure is almost
good accuracy on other benchmark datasets such as CMATERDB similar to LeNet-5, but here max-pooling used instead of the
3.1.1 etc. M.S. Islam et.al proposed Bayanno-Net (Islam et al., average pooling, ReLU activation in place of the SIGMOID function,
2019), where they tried to recognize Bangla handwritten numerals softmax function in place of RBF, and 11 11 in-place of 5 5
using CNN. They worked on NumtaDB dataset (Alam, 2020). Their filter size in the first layer. In addition, for the first time dropout
model achieved 97% accuracy. A recent review article (Hoq et al., strategy, Srivastava et al. (2014) and GPU were used to train the
2020) written by M.N. Hoq et.al given a comparative overview of model. In Zeiler and Fergus (2013) Zeiler and Fergus presented
existing classification algorithms to recognized Bengali handwrit- ZFNet which was the winner of the ILSVRC challenge in 2013.
ten numerals. The building blocks of ZFNet is almost similar to AlexNet with
few changes such as the first layer filter size is 7 7 instead of
2.2. Advancements of deep learning for image classification 11 11 in AlexNet. Authors of ZFNet explained how CNN works
with the help of Deconvolutional Neural Networks (DeconvNet).
After the success of AlexNet (Krizhevsky et al., 2012), a deep DeconvNet is just the opposite of CNN. The error rate of ZFNet
learning-based model for image classification, many researchers was 11.7%. K. Simonyan and A. Zisserman proposed VGGNet
shifted to this area of research of computer vision and pattern (Simonyan and Zisserman, 2014), which is like a deeper version
of AlexNet. Here, authors used small filters 3 3 sizes for all layers. between channels. SENet has won the ILSVRC-2017 challenge with
They have used a total 6 different CNN configurations with differ- an error rate of 2.252%. Recently a new concept in convolutional
ent weight layers. This VGGNet secured 2nd place in ILSVRC chal- neural networks proposed through CapsNet (Sabour et al., 2017).
lenge in 2014 with an error rate of 7.3% just 0.6% more than the Here sub-sampling concepts are abolished and classification is pos-
error rate of the winner GoogLeNet (Szegedy et al., 2015). GoogLe- sible of images of different orientations. Though the algorithm is
Net, Going Deeper with Convolutions (Szegedy et al., 2015), is pro- not light for dynamic routing among capsules but it is extremely
posed by Christian Szegedy et al. which was a research team of promising (Patrick et al., 2019).
Google. The Structure of GoogLeNet is different from traditional
CNN, it is wider and deeper than previous models but computa-
tionally efficient. Through inception architecture, multiple parallel 3. BDNet Model Details
filters with different sizes are used, and for this, problems of van-
ishing gradient and over-fitting were tackled. Fully connected lay- The network architecture of the BDNet model has shown in Fig. 3.
ers are not used in GoogLeNet but the average pooling layer is used The BDNet consists of three Dense Blocks and two Transition Blocks
before the classifier. This model won the ILSVRC challenge 2014 followed by Batch Normalization(BN), Rectified Linear Unit(ReLU)
with an error rate of 6.7%. The increasing layer could give more activation, Average Pooling(Avg. POOLING), Fully Connected(FC)
accuracy but will suffer from vanishing gradient problem. To tackle Layer, Softmax Function with Output Layer. Each Dense Block is
this problem, Kaiming He et al. from Microsoft Research proposed made up of 6 bottleneck blocks. Structure of each bottleneck
ResNet (He et al., 2016). ResNet is a very deep model where each block is as follows: . . .! BatchNorm ! ReLU ! ConV2dð1 1Þ !
layer has a residual block with a skip connection to the layer before BatchNorm ! ReLU ! ConV2dð3 3Þ ! . . .. The number of bottle-
the previous layer. ResNet is the winner of the ILSVRC challenge neck blocks(NBL) per dense block has calculated using Eq. (1).
with an error rate of 3.57% and this is a success of beyond the
1 n4
human level. Gao Hunag et al. proposed DenseNet (Huang et al., NBL ¼ b c ð1Þ
2 3
2017), where every layer is connected to all previous layers of
the model. DenseNet overcomes the vanishing gradient problem where n is the number of layers of the network model. Dense connec-
as well as it collects required features of all layers and propagates tivity is present among bottleneck blocks of each dense block i.e. out-
to all successive layers in feed-forward fashions for features reuse. put of each bottleneck block is forwarded to all other successive
Therefore, this model requires less number of parameters to blocks for features propagation. The number of feature maps that will
achieve accuracy, so it is computationally efficient. Inspired by be forwarded depends on the growth rate, and here the growth rate is
the success of ResNet, Jie Hu et al. proposed SENet (Hu et al., 12. In between two dense blocks, we have used one transition block
2018) with the main focus to increase channel relationship which consists of: BatchNorm ! ReLU ! ConV: ! Av g:Polling. To
between successive layers. SENet has added ‘‘Squeeze-and-Excita make the model compact we reduce the number of feature maps.
tion” (SE) block into each block (ResNet Block), and for this, the Some hyper-parameters of the model has explained below and others
model adaptively re-calibrates channel-wise feature responses which are directly related to the training in Section 5.
Fig. 3. Structure of the BDNet.
Fig. 4. Typical Bengali handwritten numeral digits and corresponding printed values.
Number of hidden layers and units: It is preferably good to 4. Dataset and Preprocessing of the Dataset
add more layers when the test error is no longer decreasing in
existing layers. A small number of layers may lead to The Bengali language is mainly derived from the Brahmi script
under-fitting, on the other hand, having more layers is usually and Devanagari script in the 11th Century AD. The structural view
not suitable with appropriate regularization. But adding more lay- of each character and numeral of this language are very complex.
ers makes the model more complex and computation time will So, train a model using the Bengali digit is more difficult compared
increase. After careful experiments, we have used 39 hidden layers, to the English numeral digit as the English digits have a less com-
one fully connected(FC) layer in our model. Then we have used plex structure. In addition, English numerals datasets are easily
softmax function to output 10 classes. Model details are shown available in terms of quantity and quality such as MNIST (Leun,
in Fig. 3. Here, softmax function transforms predicted scores to 2020) but it is not easy for Bengali numeral datasets. Bengali digit
predicted probability scores using the following Eq. (2). also has some high similarity features for different numerals such
as numeral 1 (in Bengali) and numeral 9 (in Bengali) has high sim-
ezi
yî ¼ ð2Þ ilarity features, similarly numeral 5 (in Bengali) and 6 (in Bengali)
X
10
has high similarity features. The typical Bengali handwritten
e zj
j¼1
numerals and corresponding printed values as shown in Fig. 4.
where yî denotes prediction score of i-th digit or class. 4.1. Used dataset
Optimizer: BDNet trained through back-propagation
(Rumelhart et al., 1986) using optimizer. Here, weights updated The ISI Bengali numeral off-line handwritten dataset
using SGD (Stochastic Gradient Descent) (Bottou et al., 2010). The (Bhattacharya and Chaudhuri, 2009) is one of the largest popular
mathematical loss function JðwÞ as in Eq. (3) and it is basically datasets of handwritten Bengali numerals. This dataset consists
the difference between the updated internal parameters of a model of 23392 black and white image data written by 1106 persons col-
which are used for computing the values (yi ) from the set of inputs lected from postal mail and job application forms. Among these
(xi ) used in the model and the desired output (yî ). 23392 data, 19392 are training data and 4000 are testing data.
The entire dataset represents 10 classes for numeral 0 to 9. Some
X
n X
n
typical data items of this dataset shown in Fig. 5.
ðyi yî Þ
2
JðwÞ ¼ 1n J i ðwÞ ¼ 1n ð3Þ
i¼1 i¼1
4.2. Preprocessing of the datasets
As mentioned we have used the ISI Handwritten numeral data-

The working flow of optimizer for BDNet is as follows:
set to train the BDNet. But the data items that we have chosen for
Step 1: Initialization of the vector of parameters w and learning
this task is very untidy and cannot be used directly for our purpose.
rate g.
All the data were raw images in .tif format of different sizes. First,
Step 2: Repeat until an approximate minimum is found:
we have converted the raw images into gray-scale images of size
Step 2.1: Randomly shuffle items in the training set.
28 28, then inverted the colors in a way that the background
Step 2.2: for i ¼ 1 to n do:
became black and the font became white. After that gray-scale
w ¼ w grJ i ðwÞ
images are converted to RGB images of size 32 32 for better fea-
BDNet used random order of the training dataset. As the coeffi-
ture extraction using 3 channels. For the convenience to use the
cients are updated after each training data-sample, so the updates,
BDNet, we have created a CSV file to access the data samples.
as well as the lost function will be randomly jumping all over the
Fig. 6 is showing steps of preprocessing, and how converted data
place. By this randomized updates to the coefficients, it reduces
looks different from actual data after the preprocessing. Distribu-
random walk and avoids distraction.
tion of entire ISI handwritten numerals database as shown in
Activation Function: We have used Rectified Linear Unit
Table 1. Among the training datasets, 10% of data is used for 10-
(ReLU) (Maas et al., 2013) as activation function. ReLU function
fold cross-validation. As we have used unorthodox data augmenta-
as in Eq. 4 works for non-linearity.
tion techniques, so the size of the dataset virtually becomes large,
f ðxÞ ¼ maxð0; xÞ and for large dataset, 10-fold cross-validation worked fine.
Here x denotes the value of a pixel. ReLU removes negative values 4.3. Own dataset
from an activation map by setting them to zero. It increases the
nonlinear properties of the decision function and removes the We have also created our own dataset of 1000 images. Among
chances of vanishing gradient of BDNet without affecting the recep- these 1000 images, 100 images per digit are there for each Bengali
tive fields of the convolution layer. numeral digit from digits zero to nine. This dataset is created by 5
Table 1
Used Dataset Distribution.
Digit Training Sets Test Set

0 1933 400
1 1945 400
2 1945 400
3 1956 400
4 1945 400
5 1933 400
6 1930 400
7 1928 400
8 1932 400
9 1945 400
phone camera. Datasets are created with the focus to make it as

natural as the common people write the Bengali numerals in their
daily life. Each image of the dataset is then set to 28 28 pixels.
We have used this dataset only for testing to check the generic per-
formance of the trained BDNet.
5. Training Details
Fig. 5. Sample handwritten ISI numeral image data.
We have started our experiment by training the model using

the preprocessed labeled dataset mentioned above. Before describ-
laboratory members of this work with the help of some students. It
ing the training details, we have presented the system environ-
has done by writing the digits in standard pages using black or blue
ment and resources used for our work in Table 2. The design of
pens. Then we have scanned the written digits using the mobile
BDNet is inspired by the state-of-the-art algorithm DenseNet. A
new set of hyper-parameters here established and set the required
value for each required hyper-parameter is a very difficult task. It
could be done by trial and error method with careful observation
of the pattern of the data as well as by some mathematical analy-
sis. In a similar fashion, we have done hyper-parameter tuning of
required hyper-parameters of BDNet. Some are mentioned in Sec-
tion 3 others are described below.
Number of Epochs: As it is the number of time the entire train-
ing dataset passes through the model network. So, we could
increase the number of epochs until the training error becomes
small and the validation error is noticeable. For BDNet model, the
number epoch was set to 150 but the model converges around
125 epochs, and it took 37.68 s per epoch to train and validate
simultaneously(in our system mentioned in Table 2).
Weight initialization: We have initialized the weights with
small random numbers (between 0 and 1) to prevent dead neu-
rons, but not too small numbers to avoid zero gradients. Uniform
distribution usually works very well. Here, we have used seedð1Þ
as a python function to initialize the weights randomly.
Batch size: Mini-batch is usually preferable in the place where a
dataset is very large. It is usually used to create a partition in
between the dataset. Batch size doesn’t contribute much to the
precision but helps to control the training speed. We have used
the batch size as 32 and the batch size was 64 for testing.
Learning rate: In BDNet we have used the learning rate as 0.009
and after 80 epoch it has been changed as in Eq. (5).
Table 2
Used system specifications.
Resources Specifications
CPU R
Intel XeonR CPU @2:3GHz with 45 MB Cache
RAM 12.72 GB available
DISK 1 TB (Partially used)
GPU 1Nvidia Tesla T4 having 2560 CUDA cores,
16 GB(14.72 GB available) GDDR6 VRAM
Languages & Packages Python with PyTorch (Paszke et al., 2017)
Training & Validation Time 37.68 Seconds per Epoch
Fig. 6. Pre-processing steps.
g ¼ ðIntial gÞ ð0:15Þ ð5Þ

Weight decay: This is another hyper-parameter tuning where
each step’s current weights(W) are multiplied by a number slightly
less than 1. Weight decay is a regularization term that prevents
growing the number of parameters into a large number. It is
updated as Eq. (6)
@J
Wi ¼ Wi g gkW i ð6Þ
@W i
where J is the current loss, g is the learning rate and k is the weight
decay. After an experiment with several values, finally, the value of
the weight decay was set 1e-5.
Momentum: The concept of momentum is that previous
changes in the weights should influence the current direction of
movement in weight space. Sometimes these weights changes
stuck in a local minimum. To avoid these local minima, BDNet
has used momentum in the objective function, which is a value
between 0 and 1 that increases the size of the steps taken towards
the minimum by trying to jump from a local minimum. Here the
momentum value set is 0.9, and for this reason, speed and accuracy
improve.
Dropout for Regularization: Few techniques ideas make deep
learning popular and usable, and the dropout for regularization
(Srivastava et al., 2014) is one of them. Dropout is used for BDNet
to avoid over-fitting during training. The method simply drops out
some neurons randomly in the neural networks in each iteration of
training according to a threshold probability. Here we used the
dropout threshold probability as 0.09 which is a small probability
i.e. only 9% neuron dropout in each epoch.
Data Augmentation: Data augmentation is one of the impor-
tant parts which gives more versatility to extracted features and
helps to train the deep learning model more accurately. Here, we
have used data augmentation as the size of the dataset is not as
large as required. But the idea has been used with a slightly non-
traditional way as shown in Fig. 7. As an image of the handwritten
Fig. 7. Data augmentation steps.
numeral digits have some problems as we can not crop or rotate
largely. Here, we did augmentation on the training set for each
epoch as follows: adjusted the contrast of training samples by value of training loss decreased drastically and later it slowed
choosing the adjusting factor randomly from the list down as shown in Fig. 8. After 120 epochs, the error rate became
½1; 0:2; 0:3; 0:5; 0:6; 0:7; 0:8; 0:9; 1:3; 1:5, random rotation of train- very small almost between 0.008 to 0.006.
ing images from 15 to +15 degrees, and random zooming up to
9.1%. For this type of data augmentation, a slightly new dataset is
passed through the network in every iteration or epoch. 6.2. Training and validation accuracy
Cross-Validation: To train BDNet 10-fold random cross-
validation is used. We used 10% of training data only for 10-fold The training and validation accuracy has been observed simul-
cross-validation to validate the model during training for general- taneously. Here validation are used to reduced over-fitting the
ization without over-fitting. After one epoch, the training set is re- model during training. As we can see in Fig. 9, the increasing rate
sampled with 10-fold cross-validation. The cross-validation result of accuracy was very high during the initial training period and it
is mentioned in Section 6. gradually became very low. After 125 epochs, it was almost satu-
rated. We can also see from this figure that the training accuracy
was mostly dominated by validation accuracy, and it is happened
6. Result Analysis
because of data augmentation. Maximum training accuracy was
recorded 99.82% after epoch number 118, and maximum valida-
It has mentioned that the BDNet is trained using a pre-
tion accuracy was recorded 100% in many times such as after
processed ISI handwritten Bengali numeral database with data
97th, 107th, 118th, 120th epoch. Although validation accuracy(test
augmentation and 10-fold cross-validation. BDNet is tested using
accuracy is more vital for any task-oriented model) is not sufficient
the test dataset from the same database mentioned in Section 4.
for any model but here great validation recorded. This was
The trained model also tested using our own dataset described in
achieved because of data augmentation and preprocessing.
that section. The following subsections are showing some results
of the BDNet, found during training, validation and testing.
6.3. Test accuracy
6.1. Number of epoch vs training loss
It is mentioned that BDNet has achieved record-breaking test
At first, when the training started, the amount of error or train- accuracy on ISI Bengali handwritten numeral test dataset. BDNet
ing loss was very high and the value of error rate was between 0.95 achieved 99% test accuracy after 45 epochs, and at 125th epoch,
to 1.88. But with the increasing number of training epochs, the it achieved 99.78% test accuracy as shown in Fig. 10, which is the
Fig. 8. Number of epoch vs training loss.
Fig. 9. Number of epoch vs Training and Validation Accuracy.
current best result on the ISI handwritten numeral dataset. Details A total of 9 images are wrongly classified among all the test
of the testing results described in subSection 6.4. images of the dataset which are shown in Table 4. In the confusion
matrix, it has shown that which wrongly predicted image is classi-
fied into which class. After careful observation of the patterns of
6.4. Analysis through confusion matrix these 9 images, the possible reasons behind these wrong classifica-
tion could be understood.
Testing result of the BDNet on the test dataset of ISI handwrit-
ten Bengali numeral can be presented in the confusion matrix as 6.5. Standard deviation of test results
shown in Table 3. Clearly, we can see that among 10 numeral digits
of 4000 test images, only 9 were wrongly predicted or classified. To find how spread out the test accuracy in case of training the
Among 400 test images of the digit 0, one is incorrectly predicted model multiple times with the same hyper-parameter configura-
as 3, and one is 7. Similarly, 3 images of the digit 1, 1 image of tions, we calculate the standard deviation (r) by using Eq. (7).
the digit 2, 2 images of the digit 5, and 1 image of the digit 9 were vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u N
predicted incorrectly. The details of wrong predictions are shown u1 X
in the confusion matrix in Table 3. The F1 score for this confusion
r¼t ðxi lÞ2 ð7Þ
N i¼1
matrix is 0.99775.
Fig. 10. Number of epoch vs Test Accuracy.
Table 3
Confusion matrix of the test result of the ISI handwritten Bengali numerals test dataset.
Predicted Class
Numerals 0 1 2 3 4 5 6 7 8 9 Accuracy(%)
Actual Class 0 398 0 0 1 0 0 0 1 0 0 99.50
1 0 397 1 0 0 0 0 0 0 2 99.25
2 0 0 399 0 0 1 0 0 0 0 99.75
3 0 0 0 400 0 0 0 0 0 0 100.00
4 0 0 0 0 400 0 0 0 0 0 100.00
5 0 0 0 0 1 398 1 0 0 0 99.50
6 0 0 0 0 0 0 400 0 0 0 100.00
7 0 0 0 0 0 0 0 400 0 0 100.00
8 0 0 0 0 0 0 0 0 400 0 100.00
9 0 0 0 0 0 1 0 0 0 399 99.75
Table 4 Table 6
Wrongly Classified Images. Notable Bengali handwritten numerals recognition models and corresponding test
accuracy in the benchmark dataset (Bhattacharya and Chaudhuri, 2009).
Digit Images Actual Class Predicted Class
0 3 Models Test accuracy
0 7 Bhattacharya and Chaudhuri (2009) 98.20%

Liu and Suen (2009) 99.40%
1 2
Das et al. (2012) 97.70%
1 9 Wen and He (2012) 99.40%
Akhand et al. (2016) 98.98%
1 9
Shopon et al. (2017) 99.35%
2 5 Rabby et al. (2019) 99.58%
BDNet (Proposed in this paper) 99.78%
5 4
5 6
9 5 where l is the mean of N values.

Now, we run our model 5 times with the same hyper-parameter
configurations and the achieved test accuracy has shown in Table 5
and we get l ¼ 99:75. So finally we get the standard deviation (r)
Table 5 of test accuracy as 0:015. Codes along with training details of all
Test accuracy of our model in 5-classes cases when we run the model with same
those 5 cases are available at: https://github.com/Sufianlab/BDNet.
hyper-parameter configurations.
Test Accuracy (%)

Cases 1 99.75 6.6. Comparison of test results with base-line-models
2 99.73
3 99.75 The most notable models proposed by researchers for Bengali
4 99.78
handwritten numerals recognition has been discussed in subSec-
5 99.75
tion 2.1. We have compared BDNet with some notable models
Fig. 11. Comparison of notable models and corresponding test accuracy.
Table 7
Confusion matrix of the test result on own test dataset.
Predicted Class
Numerals 0 1 2 3 4 5 6 7 8 9 Accuracy(%)
Actual Class 0 100 0 0 0 0 0 0 0 0 0 100
1 0 100 0 0 0 0 0 0 0 0 100
2 0 0 100 0 0 0 0 0 0 0 100
3 0 0 0 98 0 0 1 0 1 0 98
4 0 0 0 0 100 0 0 0 0 0 100
5 1 0 0 0 0 95 3 1 0 0 95
6 0 0 0 0 0 0 100 0 0 0 100
7 0 0 1 1 0 0 0 98 0 0 98
8 0 0 1 0 0 0 1 0 98 0 98
9 0 1 0 0 0 0 0 0 0 99 99
which are also worked on this benchmark dataset (Bhattacharya tion through image classification. The BDNet is trained using ISI
and Chaudhuri, 2009). Before BDNet, the previous two best models Bengali handwritten numerals dataset and the trained model has
(Liu and Suen, 2009; Rabby et al., 2019) achieved the test accuracy achieved a new benchmark accuracy on a test dataset. The trained
of 99.40% and 99.58% (authors of Rabby et al., 2019 claimed it) BDNet also tested using our own dataset to see the generalization
respectively whereas BDNet achieved 99.78%. All the notable mod- of the model, where a good result has been found.
els and corresponding test accuracies are shown in Table 6. Graph-
ical comparison of said models has shown in Fig. 11 where X-axis Declaration of Competing Interest
presents the models and Y-axis showed the corresponding test
accuracy in the benchmark dataset (Bhattacharya and Chaudhuri, The authors declare that they have no known competing finan-
2009). cial interests or personal relationships that could have appeared
to influence the work reported in this paper.
6.7. Test result on our own dataset
Acknowledgment
Our own test dataset is described in subSection 4.3 which has
First of all, the authors of the BDNet are thankful to CVPR Unit,
10 classes with 100 images per class. This test dataset is not used
Indian Statistical Institute, Kolkata, for providing the dataset for
during training for evaluating the generalization capability of the
academic research. The authors also thankful to the editor and
trained BDNet on a new dataset. As a result, the BDNet has got
reviewers of this journal for mentioned some comments on initial
98.80% test accuracy. The entire result has been shown in the con-
submission to implement in the revised version.
fusion matrix mentioned in Table 7.
7. Conclusion
References
Aim of the work was to propose a practical task-oriented model
to recognize Bengali handwritten numerals with good accuracy, Akhand, M.A.H., Ahmed, M., Rahman, M.M.H., 2016. Convolutional neural network
based handwritten bengali and bengali-english mixed numeral recognition. I.J.
and through this paper, we propose BDNet. BDNet is a densely con- Image Graph. Signal Process. 9. https://doi.org/10.5815/ijigsp.2016.09.06. 40–
nected deep CNN model for handwritten Bengali numeral recogni- 40.
Alam, S., Reasat, T., Doha, R.M., Humayun, A.I. Numtadb-assembled bengali Liu, C.-L., Suen, C.Y., 2009. A new benchmark on the recognition of handwritten
handwritten digits. URL:http://arxiv.org/abs/1806.02452. bangla and farsi numeral characters. Pattern Recogn. 42 (12), 3287–3295.
Bag, S., Harit, G., 2013. A survey on optical character recognition for bangla and https://doi.org/10.1016/j.patcog.2008.10.007.
devanagari scripts. Sadhana 38 (1), 133–168. https://doi.org/10.1007/s12046- Maas, A.L., Hannun, A.Y., Ng, A.Y., 2013. Rectifier nonlinearities improve neural
013-0121-9. network acoustic models. In: Proceedings of the 30th International Conference
Baldominos, A., Saez, Y., Isasi, P., 2018. Evolutionary convolutional neural networks: on Machine Learning. Atlanta, Georgia, USA.
an application to handwriting recognition. Neurocomputing 283, 38–52. Majumder, P., Mitra, M., Parui, S.K., Bhattacharyya, P., 2007. Initiative for indian
https://doi.org/10.1016/j.neucom.2017.12.049. language ir evaluation. In: The First International Workshop on Evaluating
Basu, S., Sarkar, R., Das, N., Kundu, M., Nasipuri, M., Basu, D.K., 2005. Handwritten Information Access (EVIA), pp. 14–16.
bangla digit recognition using classifier combination through ds technique. In: Nanni, L., Ghidoni, S., Brahnam, S., 2017. Handcrafted vs. non-handcrafted features
Pattern Recognition and Machine Intelligence, pp. 236–241. for computer vision classification. Pattern Recogn. 71, 158–172. https://doi.org/
Bhattacharya, U., Chaudhuri, B.B., 2009. Handwritten numeral databases of indian 10.1016/j.patcog.2017.05.025.
scripts and multistage recognition of mixed numerals. IEEE Trans. Pattern Anal. Nasir, M.K., Uddin, M.S., 2013. Hand written bangla numerals recognition for
Mach. Intell. 31 (3), 444–457. https://doi.org/10.1109/TPAMI.2008.88. automated postal system.https://doi.org/10.9790/0661-0864348.
Bottou, L., 2010. Large-scale machine learning with stochastic gradient descent. In: Noor, R., Mejbaul Islam, K., Rahimi, M.J., 2018. Handwritten bangla numeral
Lechevallier, Y., Saporta, G. (Eds.), Proceedings of COMPSTAT’2010, Physica- recognition using ensembling of convolutional neural network. In: 21st ICCIT.
Verlag HD, Heidelberg. pp. 177–186. pp. 1–6.https://doi.org/10.1109/ICCITECHN.2018.8631944.
Choudhury, A., Rana, H.S., Bhowmik, T., 2018. Handwritten bengali numeral Pal, U., Chaudhuri, B.B., 2000. Automatic recognition of unconstrained off-line
recognition using hog based feature extraction algorithm. In: 2018 5th bangla handwritten numerals. In: Tan, T., Shi, Y., Gao, W. (Eds.), Advances in
International Conference on Signal Processing and Integrated Networks Multimodal Interfaces—ICMI 2000. Springer, Berlin Heidelberg, Berlin,
(SPIN), pp. 687–690. https://doi.org/10.1109/SPIN.2018.8474215. Heidelberg, pp. 371–378. https://doi.org/10.1007/3-540-40063-X_49.
Das, N., Sarkar, R., Basu, S., Kundu, M., Nasipuri, M., Basu, D.K., 2012. A genetic Pal, U., Chaudhuri, B.B., Belaid, A., 2006. A complete system for bangla handwritten
algorithm based region sampling for selection of local features in handwritten numeral recognition. IETE J. Res. 52 (1), 27–34. https://doi.org/10.1080/
digit recognition application. Appl. Soft Comput. 12 (5), 1592–1606. https://doi. 03772063.2006.11416437.
org/10.1016/j.asoc.2011.11.030. Pal, U., Jayadevan, R., Sharma, N., 2012. Handwriting recognition in indian regional
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L., 2009. Imagenet: a large-scale scripts: a survey of offline techniques 11(1), 1:1–1:35.https://doi.org/10.1145/
hierarchical image database. In: The IEEE Conference on Computer Vision and 2090176.2090177.
Pattern Recognition (CVPR). Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison,
Dutta, A., Chaudhury, S., 1993. Bengali alpha-numeric character recognition using A., Antiga, L., Lerer, A., 2017. Automatic differentiation in pytorch. In: 31st
curvature features. Pattern Recogn. 26 (12), 1757–1770. https://doi.org/ Conference on Neural Information Processing Systems (NIPS 2017). Long Beach,
10.1016/0031-3203(93)90174-U. CA, USA.
Fukushima, K., 1980. Neocognitron: a self-organizing neural network model for a Patrick, M.K., Adekoya, A.F., Mighty, A.A., Edward, B.Y. Capsule networks a survey. J.
mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. King Saud Univ. Comput. Inf. Sci.https://doi.org/10.1016/j.jksuci.2019.09.014.
36 (4), 193–202. Rabby, A.S.A., Abujar, S., Haque, S., Hossain, S.A., 2019. Bangla handwritten digit
Ghosh, A., Sufian, A., Sultana, F., Chakrabarti, A., De, D., 2019. Fundamental concepts recognition using convolutional neural network. In: Abraham, A., Dutta, P.,
of convolutional neural network. In: et al., Recent Trends and Advances in Mandal, J.K., Bhattacharya, A., Dutta, S. (Eds.), Emerging Technologies in Data
Artificial Intelligence and Internet of Things. Springer, Cham. doi: 10.1007/978- Mining and Information Security. Springer, Singapore, pp. 111–122.
3-030-32644-9_36. Rahman, M.M., Akhand, M.A.H., Islam, S., Shill, P.C., Rahman, M.M.H., 2015. Bangla
Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang, X., Wang, G., handwritten character recognition using convolutional neural network. I.J.
Cai, J., Chen, T., 2018. Recent advances in convolutional neural networks. Image Graph. Signal Process. 8, 42–49. https://doi.org/10.5815/
Pattern Recogn. 77, 354–377. https://doi.org/10.1016/j.patcog.2017.10.013. ijigsp.2015.08.05.
He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. Rumelhart, D.E., Hinton, G.E., Williams, R.J., 1986. Learning representations by back-
In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). propagating errors. Nature 323. https://doi.org/10.1038/323533a0. 533–436.
Hoq, M.N., Islam, M.M., Nipa, N.A., Akbar, M.M., 2020. A comparative overview of Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy,
classification algorithm for bangla handwritten digit recognition. In: A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L., 2015. Imagenet large scale
Proceedings of International Joint Conference on Computational Intelligence, visual recognition challenge. Int. J. Comput. Vision 115 (3), 211–252. https://
pp. 265–277. doi.org/10.1007/s11263-015-0816-y.
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q., 2017. Densely connected Sabour, S., Frosst, N., Hinton, G.E., 2017. Dynamic routing between capsules. In:
convolutional networks. In: The IEEE Conference on Computer Vision and Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S.,
Pattern Recognition (CVPR). Garnett, R. (Eds.), Advances in Neural Information Processing Systems 30.
Hubel, D.H., Wiesel, T.N., 1968. Receptive fields and functional architecture of Curran Associates Inc, pp. 3856–3866.
monkey striate cortex. J. Physiol. (London) 195, 215–243. https://doi.org/ Shopon, M., Mohammed, N., Abedin, M.A., 2017. Image augmentation by blocky
10.1007/BF00344251. artifact in deep convolutional neural network for handwritten digit recognition.
Hu, J., Shen, L., Sun, G., 2018. Squeeze-and-excitation networks. In: The IEEE In: 2017 IEEE International Conference on Imaging, Vision Pattern Recognition
Conference on Computer Vision and Pattern Recognition (CVPR). (icIVPR). pp. 1–6.
Islam, M.S., Foysal, M.F.A., Noori, S.R.H., 2019. Bayanno-net: Bangla handwritten Simonyan, K., Zisserman, A. Very deep convolutional networks for large-scale image
digit recognition using convolutional neural networks. 2019 IEEE Region 10 recognition, CoRR abs/1409.1556. URL:http://arxiv.org/abs/1409.1556.
Symposium (TENSYMP), 23–27. https://doi.org/10.1109/ Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R., 2014.
TENSYMP46218.2019.8971167. Dropout: a simple way to prevent neural networks from overfitting. J. Mach.
Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classification with deep Learn. Res. 15, 1929–1958.
convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Sultana, F., Sufian, A., Dutta, P., 2018. Advancements in image classification using
Weinberger, K.Q. (Eds.), Advances in Neural Information Processing Systems convolutional neural network. In: 2018 Fourth International Conference on
25. Curran Associates Inc, pp. 1097–1105. Research in Computational Intelligence and Communication Networks
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P., 1998. Gradient-based learning applied to (ICRCICN), pp. 122–129. https://doi.org/10.1109/ICRCICN.2018.8718718.
document recognition. Proceedings of the IEEE, vol. 86, pp. 2278–2324. https:// Sultana, F., Sufian, A., Dutta, P. A review of object detection models based on
doi.org/10.1109/5.726791. convolutional neural network, CoRR abs/1905.01614. URL:http://arxiv.org/abs/
LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521, 436–444. https:// 1905.01614.
doi.org/10.1038/nature14539. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke,
Leun, Y. The mnist database of handwritten digits. URL:https://ci.nii.ac.jp/naid/ V., Rabinovich, A., 2015. Going deeper with convolutions. Comput. Vis. Pattern
10027939599/en/. Recogn. URL:http://arxiv.org/abs/1409.4842.
Liang, T., Xu, X., Xiao, P., 2017. A new image classification method based on Wen, Y., He, L., 2012. A classifier for bangla handwritten numeral recognition.
modified condensed nearest neighbor and convolutional neural networks. Expert Syst. Appl. 39 (1), 948–953. https://doi.org/10.1016/j.eswa.2011.07.092.
Pattern Recogn. Lett. 94, 105–111. https://doi.org/10.1016/j.patrec.2017.05.019. Zeiler, M.D., Fergus, R. Visualizing and understanding convolutional networks. CoRR
Lim, L.A., Keles, H.Y., 2018. Foreground segmentation using convolutional neural abs/1311.2901.
networks for multiscale feature encoding. Pattern Recogn. Lett. 112, 256–262.
https://doi.org/10.1016/j.patrec.2018.08.002.

Sufi An 2020

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sufi An 2020

Uploaded by

Copyright:

Available Formats

Journal of King Saud University – Computer and Information Sciences xxx (xxxx) xxx

Contents lists available at ScienceDirect

Journal of King Saud University –

BDNet: Bengali Handwritten Numeral Digit Recognition based on

1. Introduction The recent success of deep learning, especially Convolutional Neu-

1.1. Contributions of this paper

A deep CNN working model, called BDNet, is designed to recog-

In Pal et al. (2006) proposed a scheme where unconstrained off-

Fig. 2. Building blocks of a classical CNN model (Sultana et al., 2018).

Fig. 3. Structure of the BDNet.

As mentioned we have used the ISI Handwritten numeral data-

Digit Training Sets Test Set

phone camera. Datasets are created with the focus to make it as

We have started our experiment by training the model using

g ¼ ðIntial gÞ ð0:15Þ ð5Þ

Fig. 8. Number of epoch vs training loss.

Fig. 9. Number of epoch vs Training and Validation Accuracy.

Fig. 10. Number of epoch vs Test Accuracy.

0 7 Bhattacharya and Chaudhuri (2009) 98.20%

9 5 where l is the mean of N values.

Test Accuracy (%)

Fig. 11. Comparison of notable models and corresponding test accuracy.

You might also like