You are on page 1of 6

International Conference on Bangla Speech and Language Processing (ICBSLP), 21-22 September, 2018

Bangla Handwritten Digit Recognition Using Deep


CNN for Large and Unbiased Dataset
Ashadullah Shawon Md. Jamil-Ur Rahman
Department of Computer Science and Engineering Department of Computer Science and Engineering
Rajshahi University of Engineering and Technology Rajshahi University of Engineering and Technology
Rajshahi, Bangladesh Rajshahi, Bangladesh
shawonashadullah@gmail.com jamilruet13@gmail.com

Firoz Mahmud M.M Arefin Zaman


Department of Computer Science and Engineering Department of Computer Science and Engineering
Rajshahi University of Engineering and Technology Rajshahi University of Engineering and Technology
Rajshahi, Bangladesh Rajshahi, Bangladesh
fmahmud.ruet@gmail.com arefinzamansifat@gmail.com

Abstract—Bangla handwritten digit recognition is a convenient This paper is organized as follows: The related works are
starting point for building an OCR in the Bengali language. presented in Section II. Our dataset and proposed approach
Lack of large and unbiased dataset, Bangla digit recognition was are described in Section III. Section IV shows the
not standardized previously. But in this paper, a large and preprocessing technique of images. We have shown the
unbiased dataset known as NumtaDB is used for Bangla digit architecture of the deep convolutional neural network in
recognition. The challenges of the NumtaDB dataset are highly Section V. Section VI represents our experiments and result
unprocessed and augmented images. So different kinds of analysis and finally conclusion and future works are briefed in
preprocessing techniques are used for processing images and Section VII.
deep convolutional neural network (CNN) is used as the
classification model in this paper. The deep convolutional II. RELATED WORKS
neural network model has shown an excellent performance,
securing the 13th position with 92.72% testing accuracy in the There are several research works based on Bangla
Bengali handwritten digit recognition challenge 2018 among 57 handwritten digit recognition using deep learning. But most of
participating teams. A study of the network performance on the the research works used biased dataset like CMATERDB
MNIST and EMNIST datasets were performed in order to 3.1.1 [4] because NumtaDB dataset was not available then.
bolster the analysis. Recently some researchers have shown a better accuracy of
99.50% using the auto encoder and deep convolutional neural
Keywords— NumtaDB, CNN, Bangla Handwritten Digit network for CMATERDB 3.1.1 dataset. The local binary
Recognition, Large Unbiased Dataset, Computer Vision pattern was also used for Bangla digits recognition [5]. The
first CNN architecture known as Le-net was also used by some
I. INTRODUCTION researchers for Bangla digit recognition [6]. The
Bangla handwritten digit recognition is a classical Convolutional neural network was introduced for better
problem in the field of computer vision. There are various supervised learning and accuracy [7]. So recent researchers
kinds of practical application of this system such as OCR, of Bangla handwritten digit recognition are using deep CNN
postal code recognition, license plate recognition, bank checks architecture. There are some other classifiers like Support
recognition etc. [1]. Recognizing Bangla digit from Vector Machine (SVM), Neural Network (NN) etc. for
documents is becoming more important [2]. The unique handwritten digit recognition. But the performance of CNN is
number of Bangla digits are total 10. So the recognition task better than other classifiers. As a result, CNN has become the
is to classify 10 different classes. The critical task of recent trend handwritten digit recognition. Modified CNN
handwritten digit recognition is recognizing unique architecture by adding more layers or more nodes has become
handwritten digits. Because every human has his own writing a way to break the state of the art accuracy. This strategy is
styles. But our contribution is for the more challenging task. followed by some research works [8]. Besides Bangla, there
The challenging task is about getting robust performance and are some significant research works in English handwritten
high accuracy for large, unbiased, unprocessed, and highly digit recognition. MNIST [9] and EMNIST [10] are the most
augmented NumtaDB [3] dataset. The dataset is a popular dataset for English handwritten digit and character
combination of six datasets that were gathered from different recognition. EMNIST [10] presented handwritten digit
sources and at different times containing blurring, noise, recognition for both balanced and imbalanced dataset. We
rotation, translation, shear, zooming, height/width shift, researched considering MNIST and EMNIST dataset
brightness, contrast, occlusions, and superimposition. We previously. The research work of this paper is related to our
have not processed all kinds of augmentation of this dataset. previous research and that was English handwritten digit
We have processed blur and noisy images mainly. Then our recognition. We achieved better results from our previous
processed images are classified by a deep convolutional neural research than the previous research [10] and it motivates us to
network. contribute in Bangla handwritten digit recognition too.

978-1-5386-8207-4/18/$31.00 ©2018 IEEE


III. DATASET AND PROPOSED APPROACH EMNIST dataset [10] is the extended version of MNIST
dataset where we used EMNIST digits and EMNIST
A. DATASET
balanced dataset for digit and letter recognition. Fig. 2
In our proposed approach, we give importance to the illustrates the visual breakdown of EMNIST digits dataset
NumtaDB dataset which consists of 85,000+ Bangla where it consists of 280,000 samples with 10 classes and
handwritten digit images. Because NumtaDB is a standard, 30,000 samples for each class. Fig. 3 visualize EMNIST
unbiased (in terms of geographic location, age, and gender), balanced dataset having 47 classes where each class consists
large, unprocessed, and reviewed dataset [3, Sec. I]. The of 3,000 examples.
NumtaDB dataset can verify our proposed approach
performance perfectly and we hope that our proposed
approach will get approximately the same accuracy for real-
life handwritten digit recognition. The images of the
NumtaDB dataset are real-world images without any
preprocessing. The NumtaDB dataset is an assembled dataset
from six different sources. According to NumtaDB, “The Fig. 2. Visualization of EMNIST digits dataset [10, Fig.2].
sources are labeled from 'a' to 'f'. The training and testing sets
have separate subsets depending on the source of the data
(training-a, testing-a, etc.). All the datasets have been
partitioned into training and testing sets so that handwriting
from the same subject/contributor is not present in both.
Dataset-f had no corresponding metadata for contributors for
which all of it was added to the testing set (testing-f)” [11].
Each image of the dataset is about 180×180 pixels. The
sample images of the NumtaDB dataset are shown in Fig. 1.

Fig. 3. Visualization of EMNIST balanced dataset [10, Fig.2].

B. Proposed Approach
We propose to classify NumtaDB handwritten digits
following two major steps. These steps are:

• Preprocessing of images.
• Deep convolutional neural network.

The details of our two major steps are described in Section


Fig. 1. Bangla handwritten digit images from NumtaDB. IV and V.
IV. PREPROCESSING OF IMAGES
According to NumtaDB, “Two augmented datasets
(augmented from test images of dataset 'a' and 'c') are A. Resizing and Grayscaling
appended to the testing set which consists of the following The original size of NumtaDB images are 180×180 pixels
augmentations” [11]. which are too large for preprocessing efficiently. So we reduce
the size of images to 32×32 pixels. We also convert all RGB
• Spatial Transformations: Rotation, Translation, images to GRAY scale images. The color channel is converted
Shear, Height/Width Shift, Channel Shift, Zoom. to 1 channel from 3 channel.
• Brightness, Contrast, Saturation, Hue shifts, Noise. B. Interpolation
• Occlusions. Images can lose much important information due to
• Superimposition (to simulate the effect of text being resizing. Inter-area interpolation is preferred method for
visible from the other side of a page). image decimation. This method is resampling using pixel
area relation. We use inter-area interpolation after resizing
So, NumtaDB dataset has made Bangla handwritten digit images.
recognition more challenging by augmented images. .
MNIST dataset [9] is one of the most popular balanced C. Removing Blur from Images
dataset in English handwritten digit and it has a training set We use Gaussian blur to add blur at first and then subtract
of 60,000 samples and a test set of 10,000 samples. For each the blurred image from the original image. Then we add a
class there are 6,000 training examples and 1000 testing weighted portion of the mask to get de-blurred image [12].
examples.
( , ) = ( , − ′( , ) (1)
( , )= ( , ∗ ( , ) (2) layers [8]. That’s why we have chosen the deep learning for
Bangla handwritten digit recognition. We have built a
Here ′( , ) is the blurred image and k is a weight for custom architecture for deep learning. The architecture is
generality. illustrated in the subsection A.
D. Sharpening Images A. Architecture of Model
There are many filters for sharpening images. In this paper, Our proposed architecture consists of 6 convolutional
we use the Laplacian filter. Our filer is a 3×3 matrix. layers and 2 fully connected dense layer. The first two layers
have 32 filters and each filter size is 5×5. The middle two
layers have 128 filters and each filter size is 3×3. The last two
−1 −1 −1 layers have 256 filters and each filter size is 3×3. Rectified
−1 9 −1 Linear unit (ReLu) [14] is used as an activation function for
−1 −1 −1 all layers. Maxpooling layers and Batch normalization are
E. Removing Noise from Images used after every two layers. The pool size of maxpooling
layer is 2×2. Batch normalization is used for speed up
We remove salt and pepper noise from NumtaDB images. learning [15]. Dropout (20%) is added after first dense layer
We use the median filter to remove salt and pepper noise. to reduce overfitting. Among the two fully connected layers,
After preprocessing, the images become clear, sharp and salt the first one has 64 filters and the last one has 10 filters for
and pepper noiseless. the 10 digits. The last activation function is a softmax
function for the classification. We use the Adam [16]
optimizer to update weights. Fig. 6 shows the design of our
proposed deep CNN architecture. From input to output every
configuration is marked properly.

Fig. 4. Preprocessed images.

Images can be more specified by applying segmentation


or thresholding. Otsu’s method [13] is used for global
threshold selection method.
Fig. 6. Design of proposed deep CNN architecture.

Table I Shows the whole model summary of our proposed


deep CNN architecture.

TABLE I. MODEL SUMMARY OF OUR DEEP CNN ARCHITECTURE

Layer Output Shape


Conv2D_1 ( None, 32, 32, 32 )
Conv2D_2 ( None, 32, 32, 32 )
Batch Normalization_1 ( None, 32, 32, 32 )
MaxPooling2D_1 ( None, 16, 16, 32 )
Conv2D_3 ( None, 16, 16, 128 )
Conv2D_4 ( None, 16, 16, 128 )
Batch Normalization_2 ( None, 16, 16, 128 )
MaxPooling2D_2 ( None, 8, 8, 128 )
Conv2D_5 ( None, 8, 8, 256 )
. Conv2D_6 ( None, 8, 8, 256 )
Fig. 5. Preprocessed images after thresholding. Batch Normalization_3 ( None, 8, 8, 256 )
. MaxPooling2D_3 ( None, 4, 4, 256 )
V. DEEP CONVOLUTIONAL NEURAL NETWORK Flatten_1 ( None, 4096 )
Dense_1 ( None, 64 )
Deep learning has been providing the outstanding
Activation_1 ( None, 64 )
performance in the field of handwritten digit recognition
since the last few years. Deep learning is more efficient Dropout_1 ( None, 64 )
learning technique than others because deep learning is the Dense_2 ( None, 10)
combination of feature extraction and deep classification Activation_2 ( None, 10 )
B. Training Model Parameters analyzing the result. Table IV shows precision, recall, F1
The training model of our architecture is dependent on score and WAA of testing result.
some parameters. The parameters of the training model are TABLE IV. PRECISION, RECALL, F1 SCORE AND WAA
given in Table II.
Precision Recall F1 Score WAA
TABLE II. PARAMETERS OF TRAINING MODEL
94.28% 94.17% 94.19% 94.17%
Parameters Name Value
Learning rate 10-3
Fig. 7 shows the confusion matrix of result for 10 classes.
Batch Size 64
Epoch 30
Shuffle True

VI. EXPERIMENT AND RESULT ANALYSIS


A. Experimental Environment
Our experimental environment is configured with high
level Intel core-i9 processor, GeForce GTX 1080ti GPU and
32GB of RAM. This configuration helped us to reduce the
training time of our experiment.
B. Training, Validation, and Testing
Among 85000+ images, train and test split ratio of Fig. 7. Confusion Matrix
NumtaDB dataset is 85%-15% [3]. In the experiment, we
split the training data into training and validation keeping the We normalize the confusion matrix for clear
split ratio about 80%-20%. We use the validation data to understanding. The competition authority [18] provided us
evaluate our model performance. At last, our final result is testing accuracy for every separate dataset (a-f and
augmented-a, augmented-c) among the assembled dataset.
measured by testing dataset.
Table V shows the testing accuracy for every dataset.
C. Evaluation
TABLE V. TESTING ACCURACY FOR EVERY SEPARATE DATASET
According to Bengali handwritten digit recognition
challenge [17], the test datasets of NumtaDB are from six Dataset Accuracy
different sources (codename: a, b, c, d, e, f) and additionally a 99.23%
two augmented datasets were produced from test set A and
test set C. Un-weighted average accuracy (UAA) is used as b 100%
an evaluation metric [17]. c 98.99%
d 98.67%
1
= (3) e 97.91%
8
f 83.84%
Here is the model accuracy of th dataset. Augmented-a 79.06%
D. Result Analysis Augmented-c 84.09%
We observe mainly three results from our experiment.
These are training accuracy, validation accuracy, and testing
accuracy. After 30 epochs we get this result. Table III shows After observing Table V, we find that our approach gets
our training, validation result as well as the testing result for comparatively low accuracy in f, augmented-a, and
un-weighted average accuracy. augmented- c testing dataset. So we find the reason for low
accuracy and detect misclassified digits from these datasets.
TABLE III. UN-WEIGHTED AVERAGE ACCURACY Fig. 8 shows some misclassified digit images.
Training Validation Testing
99.59% 98.57% 92.72%

The processing time of training is also measured. It takes


15 seconds per epoch. As there are total 30 epochs, the total True Label: 7 True Label: 6 True Label: 1 True Label: 6
training time is 30×15=450 seconds. Predicted: 0 Predicted: 8 Predicted: 2 Predicted: 3
We also observe precision, recall, F1 score and weighted Fig. 8. Misclassified digits
average accuracy (WAA) for better understanding and
The misclassified images are highly augmented as it is also VII. CONCLUSION
difficult for the human brain to recognize correctly. According In this paper, we have presented a deep CNN based
to result, it is cleared that our approach cannot detect all kinds Bangla digit recognition system for a standard and challenging
of augmented images. Because we did not preprocess the dataset. We have achieved 92.72% testing accuracy which is
rotated, shifted, zoomed, superimposition and occlusion a good result for large and unbiased NumtaDB dataset
images. Table VI shows the comparison of testing accuracy comparing to other biased datasets. We observed that only
considering augmented images and without augmented deep classification model cannot increase performance. All
images. kinds of preprocessing of images are also very important
TABLE VI. ACCURACY COMPARISON BETWEEN AUGMENTED AND NON- before training. We use some preprocessing technique for blur
AUGMENTED DATASET and noisy images but these are not enough for high
Augmented Non-Augmented performance. As a future work, we think the advanced data
augmentation technique and more advanced CNN model can
92.72% 96.44% overcome the problems of detecting augmented images. We
recommend researchers to continue our research according to
future work direction.
We also apply our proposed model to the previous dataset
that is used by some research papers [19] and compares its REFERENCES
result with our experimental result. BanglaLekha-Isolated [1] B. B. Chaudhuri, and U. Pal, "A complete printed Bangla OCR
Numerals [20] is another independent dataset that is a system", Pattern recognition, vol. 31.5, pp. 531-549, 1998.
combination of CMATERDB Numerals and ISI Numerals [2] U. Pal, and B. B. Chaudhuri, "OCR in Bangla: an Indo-Bangladeshi
language", Pattern Recognition 1994. Vol. 2-Conference B: Computer
dataset for Bangla handwritten digits. But this dataset contains Vision and Image Processing. Proceedings of the 12th IAPR
only 19748 digit images and there are no highly augmented International. Conference on, vol. 2, pp. 269-273, 1994.J. Clerk
images like NumtaDB. It is mentioned in NumtaDB paper that Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2.
dataset ‘e’ of NumtaDB is actually BanglaLekha-Isolated Oxford: Clarendon, 1892, pp.68–73.
Numerals. Table VII shows the result comparison between the [3] S. Alam, T. Reasat, R.M. Doha, and A.I Humayun, “NumtaDB-
NumtaDB and BanglaLekha-Isolated. Assenbled Bengali handwritten digits”, arXiv:1806.02452 [cs.CV], 6
June, 2018.
TABLE VII. RESULT COMPARISON BETWEEN NUMTADB AND [4] M. Shopon, N. Mohammed, and M. A. Abedin, "Bangla handwritten
BANGLALEKHA-ISOLATED NUMERALS digit recognition using autoencoder and deep convolutional neural
network," 2016 International Workshop on Computational Intelligence
Dataset Number of Accuracy/ (IWCI), Dhaka, 2016, pp. 64-68.
Images Result [5] T. Hassan, and A.H. Khan, “Handwritten Bangla numeral recognition
using Local Binary Pattern,” In Electrical Engineering and Information
NumtaDB 85000+ 92.72% Communication Technology (ICEEICT), 2015 International
Conference on, pp. 1-4. IEEE, 2015.
BanglaLekha- 19748 97.91%
Isolated [6] U. Bhattacharya, and B. B Chaudhuri, “Handwritten numeral databases
of indian scripts and multistage recognition of mixed numerals”, In
Numerals IEEE transactions on pattern analysis and machine intelligence, 31(3)
on pp.444-457. IEEE,2009.
[7] M.I.H.R.I. M.A.H. Akhand, and M. Ahmed, “Convolutional neural
We have applied the same preprocessing and deep CNN network training with artificial pattern for bangla handwritten numeral
model on both datasets. But we get a better result on recognition”, ICIEV, vol. 1, no. 1, pp. 16, 2016
BanglaLekha-isolated Numerals than NumtaDB. Because [8] M. Alom, P. Sidike, T. Taha, and V. Asari, “Handwritten Bangla Digit
Recognition Using Deep Learning”, arXiv:1705.02680v1 [cs.CV], 7
NumtaDB has more complex images which are hard to May, 2017.
recognize. That’s why NumtaDB is a challenging dataset. [9] Y. Lecun, and C. Cortes, “The MNIST database of handwrittendigits,”
Finally, we implemented the same procedure and found an 1998. [Online]. Available: http://yann.lecun.com/exdb/mnist/
accuracy of 99.70% accuracy for MNIST digit recognition [10] G. Cohen, S. Afshar, and J. Tapson, A. Schaik, “EMNIST: An
extension of MNIST to handwritten letters.”,
whereas the state of the art of MNIST dataset is 99.79% [21]. arXiv:1702.05373[cs.CV], 2017.
We also found 99.79% accuracy for EMNIST digit
[11] Bengali Ai , “NumtaDB: Bengali Handwritten Digits”, 2018. [Online].
recognition and 90.59% accuracy for EMNIST letter Available: https://www.kaggle.com/BengaliAI/numta. [Accessed: 6-
recognition which has 47 balanced letter classes. In 08-2018].
comparison with an earlier work on EMNIST digits and [12] R. Gonzalez, and R.E. Woods, “Digital Image Processing”, Third
balanced dataset using linear and OPIUM classifier [10], Edition, pp. 162-163.
Table VIII shows that our proposed CNN model produce [13] N. Otsu, “A Threshold Selection Method from Gray Level
better accuracy. Histograms”, In IEEE transactions on system, men and cybernetics, vol
smc-9, no1, January 1979.
TABLE VIII. RESULT COMPARISON IN EMNIST DATASET [14] V.Nair, and Hinton, G. E. “Rectified linear units improve restricted
boltzmann machines.” Proceedings of the 27th International
Dataset Linear [10] OPIUM Proposed Conference on Machine Learning (ICML-10). 2010.
[10] CNN [15] S. Ioffe, and C. Szegedy, “Batch Normalization: Accelerating deep
Digits 84.70% 95.90% 99.79% network training by reducing internal covariance shift”,
arXiv:1502.03167[cs.LG], 2 March, 2015.
Balanced 50.93% 78.02% 90.59% [16] D. Kingma, and J. Ba, “Adam: A method of stochastic optimization”,
arXiv:1412.6980[cs.LG], 2015.
[17] “Bengali handwritten digit recognition challenge”, 2018. [Online].
Available: https://www.kaggle.com/c/numta. [Accessed: 8-08-2018].
[18] “Evaluation of Bengali handwritten digit recognition challenge”,
2018.[Online].Available:https://www.kaggle.com/c/numta#evaluation
. [Accessed: 8-08-2018].
[19] S. Sharif, and M. Mahboob, “Evil Method: A deep cnn for bangla
handwritten numeral classification”, In 4th Internation Conference on
Advance In Electrical Engineering (ICAEE), 2017.
[20] M. Biswas, R. Islam, G. K. Shom, M. Shopon, N. Mohammed, S.
Momen, and A. Abedin, “Banglalekha-isolated: A multi-purpose
comprehensive dataset of handwritten bangla isolated characters,” Data
in brief , vol. 12, pp. 103–107, 2017.
[21] Wan, Li, et al. "Regularization of neural networks using dropconnect."
International Conference on Machine Learning. 2013.

You might also like