Isolated Bangla Handwritten Character Recognition With Convolutional Neural Network

20th International Conference of Computer and Information Technology (ICCIT), 22-24 December, 2017
Isolated Bangla Handwritten Character Recognition

with Convolutional Neural Network
Mujadded Al Rabbani Alif Sabbir Ahmed Muhammad Abul Hasan
Dept. of CSE Dept. of CSE Dept. of EE
BRAC University BRAC University KAIST
Dhaka, 1212, Bangladesh Dhaka, 1212, Bangladesh Daejeon, 34141, Republic of Korea
mujadded.alif@gmail.com ahmed.sabbir13301109@gmail.com hasandoesit@kaist.ac.kr
automatic postal code identification, digitizing and recognizing

Abstract—The handwritten character recognition (HCR) prob- text on paper based documents, and automatic bank cheque
lem has been studied extensively during the last few decades with reading. All of those applications have a strong dependency
varying level of success. Although one of the earliest optical char-
acter recognition work was done using an artificial neural net- on the performance of Bangla HCR.
work, due to low computational speed and other computational Recent success of deep learning technology, especially
resource constraints, researchers had to move away to formulate on ImageNet Large Scale Visual Recognition Challenge
the HCR problem of different languages (including Bangla) (ILSVRC) [2], has paved the way of recognizing many
using the hand crafted features based classification methods. complex large scale recognition tasks. In order to produce
The recent progress of deep learning technologies and significant
developments of parallel computing hardware have established high recognition performance in large scale image recognition
a solid platform for the researchers in producing state-of-the-art tasks, deep learning architectures, such as AlexNet [3], VG-
reliable performance in many fields. Even though deep learning GNet [4], and ResNet [5] have already proven their capabili-
is proving its applicability in different computer vision problems ties. The AlexNet is the pioneer in reviving neural network ma-
since AlexNet was proposed in 2012, due to lack of significantly chinery in large scale classification tasks by transforming the
large Bangla handwritten character dataset, Bangla HCR could
not progress far. Recently, a few initiatives have been taken original CNN proposed in [6]. Then, the winner of ILSVRC
to build large scale Bangla handwritten character datasets and 2014, VGGNet reinforced the notion of CNN by introducing
made them available for public use. In this paper, we propose a simpler filters with a longer architecture. Following this trend,
modified ResNet-18 architecture (a convolutional neural network ResNet used an exceptional architectural which used a mod-
architecture) in recognizing Bangla handwritten characters. The ular concept to construct the network for ImageNet challenge
proposed method is applied to two recently created isolated
Bangla handwritten datasets. The used datasets are relatively 2016. Although ResNet uses a much deeper architecture than
large and practical to apply deep learning. Using the proposed VGGNet-16 and VGGNet-19, the architecture convergences
method, we achieve the state-of-the-art recognition performance. faster with a smaller number of parameter. Even though the
deep learning is dominating in different research areas with
Index Terms—handwritten character recognition, isolated its strong discriminating capabilities, we could not test its
Bangla character recognition, Convolutional Neural Network.
performance in Bangla HCR problem due to lack of having a
practically large size of Bangla handwritten character dataset.
I. I NTRODUCTION
A few recent contributions creating such datasets have paved
Bangla (also known as Bengali) is the second most popular the way of conducting research on Bangla HCR problem.
language in the Indian subcontinent. Being the language of 242 At present we have a few significantly large isolate Bangla
million people from all over the world, Bangla holds the sixth handwritten character datasets, such as, [7], [8], which can be
position among the world’s most popular languages. Bangla used to test the performance in the deep learning architectures
is the national and official language of the People’s Republic to solve this problem.
of Bangladesh also it is one of the official languages in a Bangla HCR is considered as a challenging task, particularly
number of Indian states [1]. Bangla script consists of vowels, we consider three main challenges, such as (1) recognizing
consonants, conjunct consonants, digits, vowel diacritics, and convoluted edges, (2) distinguish between the repetition of
punctuation marks. The conjunct consonants consist of 2 to the same pattern in different characters, and (3) different
3 consonants combined together to form a single character handwritten pattern for the same character. Bangla handwritten
containing complex edges. Automatic recognition of such high characters or digits rarely have fixed sizes. It is common to
variation of characters with high accuracy is a demand for find different geometric structures between two characters.
many Bangla language enabled modern applications. Unlike Moreover, many of the characters have convoluted edges,
many other popular languages, research on Bangla handwritten for example most of the time the handwritten structures of
character recognition is not sufficient so far. There are large characters “ ” and “ ” become very convoluted geometric
numbers of potential applications of Bangla HCR, such as structures. Secondly, many different letters in Bangla alphabets
978-1-5386-1150-0/17/$31.00 2017
c IEEE
mentioned methods mainly used handcrafted features extracted
from a small dataset, which made it impractical for deploying
the solutions in commercial applications.
Fig. 1: Dissimilarity between patterns in the same handwritten Recent methods [23], [24] using CNN, Bangla handwritten
Bangla character written by different individuals. character and digit recognition got a boost in performance
in relatively large scale datasets. Sharif et al., [25] proposed
a method on Bangla handwritten numeral classification. Their
have the repetition of the same pattern, for example “ ”, proposed method bridged hand crafted feature using HOG with
“ ”, “ ” and “ ”, “ ”, “ ”, which makes the classifi- CNN. Das et al., [26] proposed a two pass soft-computing
cation task more challenging. Finally, writing style varies approach for Bangla HCR. The first pass combines the highly
from person to person and the geometric structure of the misclassified classes to provide finer treatment in the second
characters fluctuate from sizes and angles, for example in Fig. pass. Sarkhel et al., [27] formulate the problem using a multi-
1 the same letter is having a different shape depending on objective perspective which trained an SVM classifier using
the writers. To solve the aforementioned challenges, in this the most informative regions of the characters. Although the
paper, we propose a modified ResNet-18 architecture to solve recent methods achieve higher performance accuracies than
Bangla HCR problem. The contribution of this paper is two- the earlier approaches, there is a significantly large margin
fold: 1) We propose a modified ResNet-18 architecture which left for improving the performance.
is capable end-to-end learning and achieves state-of-the-art
classification performance on relatively large datasets, 2) We III. P ROPOSED M ETHOD
provide a relative analysis of the performances of several state- The shapes of Bangla handwritten characters and digits
of-the-art deep learning architectures to solve the problem of are geometrically horizontal, structurally less rectangular and
Bangla HCR which can be used as a baseline performance for more spiral. Additionally, most of the conjunct characters have
comparison in the future. convoluted edges and similar pattern with small difference
The rest of the paper is organized as follows. In Section II, which sometimes makes it harder to distinguish even for
we discuss the related works on Bangla HCR. The proposed human eye (especially, when it is standalone). In order to
method is discussed in Section III. The experimental results classify such wide variety, yet strongly similar character set,
and relevant analysis are provided in Section IV. Finally, the we need to deploy a classifier which is robust in discriminating
conclusion is provided in Section V. similar patterns. As the ResNet is a proven architecture in
classifying wide number of classes, we propose a modified
II. R ELATED W ORK
version of ResNet-18 architecture which is particularly robust
In this section, we briefly discuss the related works on in classifying Bangla isolated handwritten characters.
Bangla HCR. There are a number of significant works on
Bangla HCR which can be mentioned here. Roy et al., [9] A. Modified ResNet-18 Architecture
is the pioneer of Bangla optical character recognition (OCR). A deep residual network (ResNet) is composed of stacked
They mainly introduced the Bangla character recognition entities with identity loops referred to as modules. Each
research. Following their path, many researchers investigated module consists of a multiple convolutional layers to learn the
several methods in improving the performance of Bangla features in the input space. ResNet is a proven architecture
OCR [10], [11], [12], [13], [14], [15], [16], [17]. Hasnat by winning 2016 version of ImageNet challenge. In this
et al., [18] proposed a domain specific OCR which classify paper, we use a ResNet architecture with a modification
machine printed characters as well as handwritten characters. which is particularly robust in classifying Bangla characters.
For feature extraction they apply Discrete Cosine Transform The proposed modified ResNet architecture is as follows. A
(DCT) technique over the input image and for classification typical module of ResNet gets input x and generates F (x)
Hidden Markov Model (HMM) was used. They used a simple through pairs of convolutional and ReLU layers. The generated
error correcting module that can correct splitting error due to F (x) is then added to the input x which is computes as
combination of over thresholding and segmentation problem. F (x) = F (x) + x. In the modified ResNet module, we add
Wen and Lui [19] proposed a Bangla numerals recognition a Dropout [28] layer after the second convolutional layer.
method using Principal Component Analysis (PCA) and Sup- By adding the Dropout layer, each module produces more
port Vector Machine (SVM). Liua et al., [20] proposed a generalized output with an increased regularization. In the
method of recognizing Bangla and Farsi numerals. In [21], a literature, Dropout is used by many architectures and it is
local binary pattern based feature extraction method was used mainly applied on layers having a large number of parameters
to classify with K-NN algorithm. Nibaran et al., [22] proposed to prevent feature adaptation and overfitting. Although it is
a feature set representation for Bangla handwritten alphabets used as a substitution of batch normalization [29], some work,
recognition. Their feature set is a combination of 24 shadow such as [30], explained that batch normalization along with
features, 8 distance features, 16 centroid features and 84 quad Dropout performs better in generalization. Inspired by their
tree based longest run features. With all of this they achieved findings, we applied the above mention modification in the
85.40% accuracy on 50 character classes dataset. The above proposed architecture. We keep the max pooling to 3 × 3
Fig. 2: Proposed Modified ResNet-18 architecture for Bangla HCR. In the diagram, conv stands for Convolutional layer, Pool
stands for MaxPool layer, batch norm stand for batch normalization, Relu stands for rectified linear unit activation layer, Sum
stands for the addition in ResNet, and FC stand for fully connected hidden layers. In this architecture, we have eight ResNet
modules which are modified by adding a dropout layer after the second convolutional layers.
TABLE I: Configuration detail of the convolutional layers in filter, edge thickening filter and resized the image to a square
the ResNet-18 architecture. shape with appropriate paddings by default [7]. Our input
Layer Output Layer Information images also ensured to have diversification by adding elastic
Conv1 112 × 112 7 × 7, 64 stride 2 distortions. Inspired by [32], data augmentation was done
3 × 3, maxpool stride 2 in the datasets using elastic distortions by width and height
Conv2.1 56 × 56
3 × 3, 64
Conv2.2 56 × 56 3 × 3, 64
shifting. The range was kept 0.4 for this shifting. Data Aug-
Conv3.1 28 × 28 3 × 3, 128 mentation adds multifariousness to the datasets which ensures
Conv3.2 28 × 28 3 × 3, 128 that the network is observing different samples during training
Conv4.1 14 × 14 3 × 3, 256 phase.
Conv4.2 14 × 14 3 × 3, 256
Conv5.1 7×7 3 × 3, 512
Conv5.2 7×7 3 × 3, 512
IV. E XPERIMENTAL R ESULTS
1×1 Average pooling, 84-d, FC, Softmax We present the experimental results and performance anal-
FLOPs 1.8 × 109
ysis in this section. We conduct experiments on an Ubuntu
machine containing Intel Core i3-2120 (3.30 GHz) CPU with
with the stride of 2 × 2 as it is described in the architecture 12 GB RAM and Nvidia 1050Ti 4GB GPU. The proposed
that decreasing the pooling size or stride do not enhance the modified ResNet-18 architecture is implemented in Keras [33]
performance when the input image size is more than 100×100 with Tensorflow [34] backend.
pixel. We use Softmax after fully connected layers as default.
A. Datasets
Figure 2 shows the proposed modified ResNet-18 architecture
and Table I shows the configuration detail of ResNet-18 In order to train and measure the performance of the
architecture. We experimented using both Root Mean Square proposed method, we use two recently introduced large
Propagation (RMSProp), Adam [31], and Stochastic Gradient datasets, called BanglaLekha-Isolated dataset [7] and CMA-
Descent (SGD) optimizers to find the global minima of the cat- TERdb dataset [8]. The BanglaLekha-Isolated dataset is the
egorical cross entropy loss function. The Adam optimizer is a latest publicly available Bangla handwriting character dataset
memory efficient and faster computing optimization technique, with 84 classes where 50 classes are vowel and consonants, 10
which is based on adaptive estimates of lower-order moments. classes are numeral and 24 classes are frequently used conjunct
Experimentally, Adam optimizer outperformed both RMSProp characters. This dataset contains total 166, 105 images where
and SGD optimizers. the training set consists of 132, 884 images and test set consists
of 33, 221 images. Particularly, it contains 98, 950 simple
B. Input Processing vowels and consonants, 19, 748 digits, and 47, 407 commonly
In order to have a wider variation in the input for the purpose appeared conjunct consonants. The image size of this dataset
of generalized performance of the network, we preprocess vary from 110 × 110 to 220 × 220 pixels. The handwriting
input images by inverting, removal of noise with the median of this dataset is collected from 4 to 27 year age group and
95.10 95.5
95.00
accuracy (%)
Accuracy (%)
95
94.5
94.70
op am D 94
SPr Ad SG 0.1 0.2 0.3 0.4 0.5
RM Dropout rate
(a) (b) (a) (b)
Fig. 3: Example images of Bangla characters taken from a) Fig. 5: Fine tunning the performance by selecting best hy-
BanglaLekha-Isolated dataset, b) CMATERdb3 dataset. perparameters. a) Classification performance of the proposed
method using different optimizer. b) Classification perfor-
mance with the changing dropout rate. In this experiment, we
a small portion of the samples are collected from physically use 112 × 112 input image.
disabled individuals. Figure 3 (a) shows a few examples taken
from the dataset. The CMATERdb dataset is another large
dataset having 231 classes of image. Among the classes, 50 and SGD optimizers. Based on this analysis, we decide to
belonging to simple vowel and consonants, 10 belonging to use Adam optimizer in the rest of the experiments. Secondly,
numerals and the remaining 172 classes belonging to conjunct we investigate the performance of the proposed method using
consonant classes. Figure 3 (b) shows a few examples taken different dropout rates. As it can be seen in Fig. 5 (b), for
from the CMATERdb dataset. BanglaLekha-Isoleted dataset dropout rate 0.2, the proposed method performs the best. The
and CMATERdb dataset are relatively large dataset than the rest of the experiments are done using dropout rate 0.2.
dataset proposed in [35]. The proposed ResNet-18 architecture is applied on the
above mentioned two datasets to measure the performance.
B. Experiments The training and validation curves are reported in Fig. 6
As the image size varies significantly in the BanglaLekha- and Fig. 7 for BanglaLekha-Isolated and CMATERdb dataset
Isolated dataset, selecting the right size of the input image is respectively.
crucial in achieving the optimum classification performance.
We conduct an experiment to find the optimum input size. 1
The experimental results is given in Fig. 4. As it can be seen,
for image size 112 × 112, the proposed modified ResNet- 0.9
18 architecture is performing the best with highest 95.1%

Accuracy (%)
classification accuracy. 0.8
95.10 0.7
95
0.6
training accuracy
valitaion accuracy
accuracy (%)
94.30
0.5
94.10 20 40 60 80 100
94 Iterations
Fig. 6: Training and validation performance with respect to

93.20 iteration using modified ResNet-18 architecture applied on
93.2
BanglaLekha-Isolated dataset.
32 × 32 64 × 64 112 × 112 118 × 118
Fig. 4: Classification performance of the proposed modified We further investigate the performance of Bangla character
ResNet-18 architecture using different input image size. recognition using several state-of-the-art CNN models. In this
investigation, we use VGGNet-16, VGGNet-19, ResNet-18,
The performance of the proposed method is fine-tuned ResNet-34, and the proposed method on BanglaLekha-Isolated
using two hyperparameters. Firstly, we experiment the effect dataset. The classification performance is reported in Fig. 8. As
of different optimizer in classification performance. In this it can be seen and also as expected, VGGNets are preforming
experiment, we measure the performance using three state- lower than the ResNets. Using VGGNet-16 and VGGNet-19,
of-the-art optimizers, namely RMSProp, Adam, and SGD on we achieve 91.0% and 92.11% classification accuracies re-
110 × 110 input images. The experimental results are given in spectively, while we achieve 94.52% and 94.59% classification
Fig. 5 (a). As it can be seen, using Adam optimizer, we achieve accuracies using ResNet-18 and ResNet-34 respectively. Even
0.4% and 0.1% performance boost than that of using RMSProp though the ResNet architectures are performing significantly
1
0.8
Accuracy (%)
0.6
0.4
0.2 training accuracy

valitaion accuracy
20 40 60 80 100
Iterations
Fig. 7: Training and validation performance with respect to

iteration using modified ResNet-18 architecture applied on
CMATERdb dataset.
Fig. 9: Confusion Matrix generated from the classification
performance on test-set of BanglaLekha-Isolated dataset.
well, we achieve 95.10% classification accuracy using our
proposed modified ResNet-18 architecture. Such performance
ensures a significant 0.51% performance boost using the
proposed modified ResNet-18 architecture. interclass confusion). Again, out of 422 test samples, there are
14 instances of class 24 are misclassified as class 35 (5.2%
interclass confusion) and out of 405 test samples, 26 instances
95.10
95 of class 35 are misclassified as class 24 (6.4% interclass
94.52 94.59
confusion). Among other misclassification, out of 430 test
94 samples, 29 instance of calss 75 are misclassified as class 10
accuracy (%)
(6.7% interclass confusion). Such confusions are hard to detect

even for human when it is handwritten and isolated.
92.11 Finally, we compare the classification performance of the
proposed method on CMATERdb dataset with several state-
91.00 of-the-art classification methods. The performance comparison
is shown in Fig. 10. As it can be seen, the proposed method
ed
9
6
outperforms the best previous performance achieved by [23]

-1
-1
-1
-3
os
et
et
et
et
op
N
N
sN
sN
Pr
by a significant margin (5.66%).

G
G
Re
Re
VG
VG
Fig. 8: Classification performance of different deep learning

95.99
architecture on BanglaLekha-Isolated dataset.
95
90.33
Figure 9 shows the confusion matrix of the classification
accuracy (%)
performance of BanglaLekha-Isolated dataset using proposed 86.43

method. As it can be seen, there is a noticeable yellow 85 84.04
spot around lower right corner which signifies the strong 81.00
confusion between class 61 (“ ”) and class 72 (“ ”). In the 77.51
experiment, we found that out of 405 test samples, there are 75.04
75 74.06
25 instances of class 61 is misclassified as class 72 (6.2%
interclass confusion) and out of 397 test samples, 81 instances
ed
M
M
M
Pr N
CN
CN
V
SV
SV
os
SV
S
+S
op
of class 72 is misclassified as class 61 (20.4% interclass

R+
R+
R+
-D
F+
CH
SL
G
TL
TL
TL
Q
+Q
+Q
confusion). This happens due to the appearance similarity

CH
H
-C
between “ ” and “ ”, which is also difficult to classify by

RS
human when they are appearing standalone. Similarly, there

Fig. 10: Performance comparison with the state-of-the-art
are a few other confusion happen between class 75 (“ ”)
method using CMATERdb dataset. Here the referred works are
and class 10 (“ ”), class 71 (“ ”) and class 80 (“ ”), and
as follows: GF+SVM [36], CH+SVM [37], QTLR+SVM [38],
class 24 (“ ”) and class 35 (“ ”). Out of 397 test samples,
CH+QTLR+SVM [8]. RS-CH+QTLR+SVM [27], DCNN [6],
there are 40 instances of class 71 misclassified as class 80
SL-DCNN [23].
(10.0% interclass confusion) and out of 397 test samples,
22 instances of class 80 misclassified as class 71 (5.5%
V. C ONCLUSION [17] A. R. M. Forkan, S. Saha, M. M. Rahman, and M. A. Sattar, “Recog-
nition of conjunctive bangla characters by artificial neural network,” in
In this paper, we proposed a modified ResNet-18 archi- Information and Communication Technology, 2007. ICICT’07. Interna-
tecture that is capable of classifying Bangla handwritten tional Conference on. IEEE, 2007, pp. 96–99.
characters with state-of-the-art accuracies on two recently [18] M. A. Hasnat, S. M. Habib, and M. Khan, “A high performance domain
specific ocr for bangla script,” in Novel Algorithms and Techniques In
proposed large Bangla handwritten character datasets. In the Telecommunications, Automation and Industrial Electronics. Springer,
proposed method, we incorporated a dropout layer in each 2008, pp. 174–178.
of the ResNet module which further enhance the capability [19] Y. Wen, Y. Lu, and P. Shi, “Handwritten bangla numeral recognition
system and its application to postal automation,” Pattern recognition,
of ResNet architecture in Bangla HCR. Evidently, the pro- vol. 40, no. 1, pp. 99–107, 2007.
posed modification in ResNet-18 by adding dropout layers [20] C.-L. Liu and C. Y. Suen, “A new benchmark on the recognition of
boosted the classification performance by a significant margin. handwritten bangla and farsi numeral characters,” Pattern Recognition,
vol. 42, no. 12, pp. 3287–3295, 2009.
Experiments on two large scale Bangla handwritten character [21] T. Hassan and H. A. Khan, “Handwritten bangla numeral recognition
datasets show the robustness of this model for Bangla HCR. using local binary pattern,” in Electrical Engineering and Information
In order to further improve the achieved performance, we need Communication Technology (ICEEICT), 2015 International Conference
on. IEEE, 2015, pp. 1–4.
to investigate the problem further for finding better solution [22] N. Das, S. Basu, R. Sarkar, M. Kundu, M. Nasipuri et al., “An improved
by designing a completely new architecture for Bangla HCR. feature descriptor for recognition of handwritten bangla alphabet,” arXiv
We left that part as our future work. preprint arXiv:1501.05497, 2015.
[23] S. Roy, N. Das, M. Kundu, and M. Nasipuri, “Handwritten isolated
bangla compound character recognition: A new benchmark using a novel
R EFERENCES deep learning approach,” Pattern Recognition Letters, vol. 90, pp. 15–21,
[1] G. F. Simons and C. D. Fennig, Eds., Ethnologue: Languages of the 2017.
World, 20th ed. SIL International, 2017. [24] M. M. Rahman, M. Akhand, S. Islam, P. C. Shill, and M. H. Rahman,
[2] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, “Bangla handwritten character recognition using convolutional neural
Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and network,” International Journal of Image, Graphics and Signal Pro-
L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,” cessing (IJIGSP), vol. 7, no. 8, p. 42, 2015.
International Journal of Computer Vision (IJCV), vol. 115, no. 3, pp. [25] S. Sharif, N. Mohammed, N. Mansoor, and S. Momen, “A hybrid deep
211–252, 2015. model with hog features for bangla handwritten numeral classification,”
[3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification in Electrical and Computer Engineering (ICECE), 2016 9th Interna-
with deep convolutional neural networks,” in Advances in neural infor- tional Conference on. IEEE, 2016, pp. 463–466.
mation processing systems, 2012, pp. 1097–1105. [26] N. Das, R. Sarkar, S. Basu, P. K. Saha, M. Kundu, and M. Nasipuri,
[4] K. Simonyan and A. Zisserman, “Very deep convolutional networks for “Handwritten bangla character recognition using a soft computing
large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. paradigm embedded in two pass approach,” Pattern Recognition, vol. 48,
[5] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image no. 6, pp. 2054–2071, 2015.
recognition,” in Proceedings of the IEEE conference on computer vision [27] R. Sarkhel, N. Das, A. K. Saha, and M. Nasipuri, “A multi-objective
and pattern recognition, 2016, pp. 770–778. approach towards cost effective isolated handwritten bangla character
[6] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning and digit recognition,” Pattern Recognition, vol. 58, pp. 172–189, 2016.
applied to document recognition,” Proceedings of the IEEE, vol. 86, [28] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and
no. 11, pp. 2278–2324, 1998. R. Salakhutdinov, “Dropout: a simple way to prevent neural networks
[7] M. Biswas, R. Islam, G. K. Shom, M. Shopon, N. Mohammed, S. Mo- from overfitting.” Journal of machine learning research, vol. 15, no. 1,
men, and M. A. Abedin, “Banglalekha-isolated: A comprehensive bangla pp. 1929–1958, 2014.
handwritten character dataset,” arXiv preprint arXiv:1703.10661, 2017. [29] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep
[8] N. Das, K. Acharya, R. Sarkar, S. Basu, M. Kundu, and M. Nasipuri, network training by reducing internal covariate shift,” in International
“A benchmark image database of isolated bangla handwritten compound Conference on Machine Learning, 2015, pp. 448–456.
characters,” International Journal on Document Analysis and Recogni- [30] S. Zagoruyko and N. Komodakis, “Wide residual networks,” arXiv
tion (IJDAR), vol. 17, no. 4, pp. 413–431, 2014. preprint arXiv:1605.07146, 2016.
[9] A. K. Ray and B. Chatterjee, “Design of a nearest neighbour classifier [31] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
system for bengali character recognition,” IETE Journal of Research, arXiv preprint arXiv:1412.6980, 2014.
vol. 30, no. 6, pp. 226–229, 1984. [32] S. C. Wong, A. Gatt, V. Stamatescu, and M. D. McDonnell, “Un-
[10] A. Dutta and S. Chaudhury, “Bengali alpha-numeric character recogni- derstanding data augmentation for classification: when to warp?” in
tion using curvature features,” Pattern Recognition, vol. 26, no. 12, pp. Digital Image Computing: Techniques and Applications (DICTA), 2016
1757–1770, 1993. International Conference on. IEEE, 2016, pp. 1–6.
[11] U. Pal and B. Chaudhuri, “Indian script character recognition: a survey,” [33] F. Chollet et al., “Keras,” https://github.com/fchollet/keras, 2015.
pattern Recognition, vol. 37, no. 9, pp. 1887–1899, 2004. [34] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S.
[12] T. K. Bhowmik, U. Bhattacharya, and S. K. Parui, “Recognition of Corrado, A. Davis, J. Dean, M. Devin et al., “Tensorflow: Large-scale
bangla handwritten characters using an mlp classifier based on stroke machine learning on heterogeneous distributed systems,” arXiv preprint
features,” in International Conference on Neural Information Process- arXiv:1603.04467, 2016.
ing. Springer, 2004, pp. 814–819. [35] U. Bhattacharya and B. B. Chaudhuri, “Handwritten numeral databases
[13] A. S. Mashiyat, A. S. Mehadi, and K. H. Talukder, “Bangla off- of indian scripts and multistage recognition of mixed numerals,” IEEE
line handwritten character recognition using superimposed matrices,” in transactions on pattern analysis and machine intelligence, vol. 31, no. 3,
Proc. 7th International Conf. on Computer and Information Technology, pp. 444–457, 2009.
2004, pp. 610–614. [36] S. Basu, N. Das, R. Sarkar, M. Kundu, M. Nasipuri, and D. K. Basu, “A
[14] U. Bhattacharya, M. Shridhar, and S. K. Parui, “On recognition of hierarchical approach to recognition of handwritten bangla characters,”
handwritten bangla characters,” in ICVGIP. Springer, 2006, pp. 817– Pattern Recognition, vol. 42, no. 7, pp. 1467–1484, 2009.
828. [37] N. Das, B. Das, R. Sarkar, S. Basu, M. Kundu, and M. Nasipuri,
[15] A. A. Chowdhury, E. Ahmed, S. Ahmed, S. Hossain, and C. M. “Handwritten bangla basic and compound character recognition using
Rahman, “Optical character recognition of bangla characters using mlp and svm classifier,” arXiv preprint arXiv:1002.4040, 2010.
neural network: A better approach,” in 2nd ICEE, 2002. [38] N. Das, S. Basu, R. Sarkar, M. Kundu, M. Nasipuri, and D. Basu, “Hand-
[16] U. Pal and B. Chaudhuri, “Automatic recognition of unconstrained off- written bangla compound character recognition: Potential challenges and
line bangla handwritten numerals,” in Advances in Multimodal Inter- probable solution.” in IICAI, 2009, pp. 1901–1913.
facesICMI 2000. Springer, 2000, pp. 371–378.

Isolated Bangla Handwritten Character Recognition With Convolutional Neural Network

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Isolated Bangla Handwritten Character Recognition With Convolutional Neural Network

Uploaded by

Copyright:

Available Formats

20th International Conference of Computer and Information Technology (ICCIT), 22-24 December, 2017

Isolated Bangla Handwritten Character Recognition

automatic postal code identification, digitizing and recognizing

18 architecture is performing the best with highest 95.1%

classification accuracy. 0.8

Fig. 6: Training and validation performance with respect to

0.2 training accuracy

Fig. 7: Training and validation performance with respect to

(6.7% interclass confusion). Such confusions are hard to detect

outperforms the best previous performance achieved by [23]

by a significant margin (5.66%).

Fig. 8: Classification performance of different deep learning

performance of BanglaLekha-Isolated dataset using proposed 86.43

of class 72 is misclassified as class 61 (20.4% interclass

confusion). This happens due to the appearance similarity

between “ ” and “ ”, which is also difficult to classify by

human when they are appearing standalone. Similarly, there

You might also like