You are on page 1of 4

2020 IEEE International Conference for Convergence in Engineering

Comparison of deep CNN and ResNet for


Handwritten Devanagari Character Recognition
Suprava Patnaik Saloni Kumari Shreya Das Mahapatra
School of Electronics KIIT School of Electronics, KIIT School of Electronics, KIIT
Bhubaneswar, India Bhubaneswar, India Bhubaneswar, India
Suprava.patnaikfet@kiit.ac.in 1704211@kiit.ac.in 1704218@kiit.ac.in

Abstract—Handwritten Optical Character Recognition is a confined to hand crafted feature extraction. Significant
lush area of research and is used in various real time amount of publications are available on OCR by using
applications. This research is based on comparative analysis of moment features, skew corrections, connectivity analysis,
handwritten OCR by using Deep CNN and ResNet for stroke or curve estimation, profile histogram, gradient
Devanagari script, a regional language. Devanagari character angles, chain codes, and so on [2]. Input pre-processing,
contains two elements, diacritics and the main grapheme. Key feature extraction and classification are the three major steps
challenge associated with Devanagari script is many a time in conventional handwritten character recognition systems.
different characters look similar. Secondly some characters are Conventional practice is to use machine learning algorithms
written differently by different individuals. Proposed ResNet
to classify either or the other hand crafted features.
manages vanishing gradient issue and improves capability of
traditional Deep CNN. It uses dynamic flow of activation.
ResNet identity blocks help in to overcome vanishing gradient
issues. Proposed architecture scored close to 99% accuracy for
the DHCD, which is better than other state-of-art results.
Training phase of proposed model is reasonably less than many
other variants of deep CNN.

Keywords—CNN, Confusion Matrix, Handwritten


Devanagari character, residual neural network

I. INTRODUCTION
Devanagari script is an Indic script and forms a basis for
over 100 languages. It consists of 47 primary alphabets, 14
vowels, 33 consonants, and 10 digits. Apart from individual
characters, the alphabets are modified when a vowel is added
to a consonant. Apart from being a royal script Devanagari is
the the mother to more than 20 auxiliary script. Basic
Devanagari characters are called aksharas. Handwritten Fig. 1. Handwritten Devanagari characters from DHCD
aksharas are dynamic in both morphological and diacritic
forms. Morphological variations are noticed in forms of Around the year 2012, a milestone was set by AlexNet
width, height, curvatures etc. Diacritics are variation of [1], which later on is considered as the beginning of a new
various glyptic art of writing, for example stroke size, stoke generation of Machine Learning (ML) algorithms. Since
directions, retracing, ascending or descending motions, size then, ML applications have gone through revolutionary
of dots, size of loops, etc. changes and so also the handwritten character recognition
techniques. Deep learning models like Convolutional Neural
Ongoing research on handwritten script recognition Network (CNN) [3, 6, 12], Long Short Term Memory
intends to improve readability of the document, typo-graphic (LSTM) [4 & 5], and their variants have taken leading roles
translation, content analysis and integrating handwritten in almost all ML applications. Key achievement and comfort
documents for various language processing applications. while using deep models for OCR is that models are excited
Unlike English Devanagari aksharas follow static phoneme with raw forms of data, rather than extracting hand crafted
rules. Each alphabet carry assonant and well distinguishable features such as loops, stroke angle or eccentricity of curves.
pronunciation and sound, which is relatively convenient for CNN networks are popular for weight sharing and are
phoneme synthesis. That is why Devanagari handwritten invariant to feature translation. Deep CNN (DCNN) models
recognition not only is instrumental for document analysis have been tested and approved for state-of-art applications
but also can be serviceable for text to audio translation. for Chinese and Latin languages [11, 13]. There are
Writing with ink on paper is inevitable, therefore handwritten comparatively more literature available for Chinese and
character recognition is indispensable. A number of smart Latin scripts. Devanagari and Arabic scripts appears to be
devices are getting embedded with readiness to digitize less progressive and more challenging because of inter
scribbling, either as we write or afterwards by making an character similarity and irregularity in dynamic features
image of the script. Converting scrabble scripts into typed while penning. In [7, 8, 9, 10] authors have reported
notebook pages, which can be shared in electronics form, techniques in general for Indic scripts and Devanagari in
will bring big difference to smart device utilization [14, 15]. particular. In this work we have compared conventional
To our knowledge research outcomes and literature on DCNN and ResNet model for Devanagari character
Devanagari or it’s auxiliary script recognition are available recognition.
since late 1990s. Like many other computer vision
applications, for over a decade, during the initial days focus
of Optical Character Recognition (OCR) research was also

978-1-7281-7340-5/20/$31.00 ©2020 IEEE 235

Authorized licensed use limited to: KIIT University. Downloaded on October 26,2022 at 08:15:38 UTC from IEEE Xplore. Restrictions apply.
2020 IEEE International Conference for Convergence in Engineering

II. DEEP RESNET Deep networks are mostly used to represent complex
functions requiring proper discernment at different level of
A. Deep NN for handwritten character recognition
abstractions. However, DNN does not always work
In the CNN framework features are extracted and satisfactorily. One of the barrier in training of deep network
manifested in a hierarchical manner. Higher level abstraction parameters is vanishing gradients, therefore making gradient
are constructed from combinations of lower level descent learning prohibitively dragging. Residual Network
abstractions. Standard practice is to use a pre-trained DNN (ResNet) is one of the solutions to it. It allows activation to
for some desired application, which is known as transfer
skip a few intermediate layers and join a deeper layer. There
learning. In [10] authors have reported 95.46% accuracy for
are two main types of blocks which are used in a ResNet,
Devanagari character recognition. In [18] it has been
reported that besides transfer learning, large dataset along namely the "identity block" and the "convolutional block".
with dropout layers can improve the performance of DCNN An “identity block” , is a skipped path parallel to main
significantly. Dropout introduces regularization on network sequential flow path, however holds identical dimension at
activation flow, which randomly skip some connection with splitting and merging nodes. On the other hand a
certain probability and improves generalization ability of the convolutional block in the skip paths is almost same as
model. Implementation of deep Capsule Network (CapsNet) identity block but holds a convolutional layer. Role of skip
is reported in [18] with 99% accuracy, however it is reported path convolutional block is similar to simple matrix
that this model took approximately five hours to train on a multiplication and does dimension alteration such that
2.5GHz and 8GB machine. CapsNet, is a variant of DCNN dimension.
which addresses the vanishing gradient problem of DCNN
by introducing dynamic routing between capsules. Capsules A weigh matrix similar to Conv2D block is used in the
are groups of neurons that encode spatial information as well shortcut path to resize the activation, so that the flow match
as the probability of an object being present with reference to to the size of the main path at starting and terminating
specific type of entity, which are object parts. Length and layers. However no non-linear activation is used in the
orientation of capsules are altered in order to realize the CONV2D layer on the shortcut path. The shortcut and the
whole object. Capsules are associated with an activity vector main path values are added up ahead of the the ReLU
which corresponds to the probability of existence of capsule activation.
entity. Besides the activity vector capsule orientation
Traditionally DCNN are popular due to inherent feature
represent to the instantiate parameters of the entity.
extraction and discrimination quality. Performance
B. Why ResNet improvement has remained as the main target of various
deep algorithms. However gradually focus is shifting from
parameter optimization to network architecture, such as
connection readjustments, channel boosting, attention based
information processing and so on. For deep networks, apart
from conventional DCNN models with the concept of multi-
path or cross-layer connectivity, similar to ResNet and
CapsNet, have been proved efficient for many applications.
In this work a simple 5-layered ResNet architecture have
been proposed and tested on publicly available Devanagari
dataset. Objective is to accomplish CapsNet like
performance but with a simpler model and quicker training.
III. FRAMEWORK
Fig. 2. (a) Schematic of DCNN Goal of our work is to improve accuracy but with a
simpler model that involves less training time. Table I
demonstrates the parametric as well as performance
comparison between the two models exploited in this work.
A. About the Dataset
Results reported for this work are based on the training
done on the publicly available Devanagari Handwriting
Character Dataset (DHCD), taken from [16]. In the related
Fig. 2. (b) ResNet identity block
literature [17] authors have reported 98.47% accuracy.
DHCD consists of total of 92,000 images. Reports presented
in this work are based on recognizing the 72000 consonant
characters. Each consonant has contributed 1,700 samples in
form of training images and 300 samples as test images.
Training samples are further divided into training and
validation sets after splitting into 80% and 20% respectively.
The images are of 32× 32 pixels characters written in
Fig. 2. (c) ResNet convolutional block square boxes. However some characters have similar shapes
and different characters are written in a similar way by many
Fig. 2. Schematic of DCNN and ResNet
subjects. Some similar and therefore difficult to recognize
cases are shown in Fig. 5.

236

Authorized licensed use limited to: KIIT University. Downloaded on October 26,2022 at 08:15:38 UTC from IEEE Xplore. Restrictions apply.
2020 IEEE International Conference for Convergence in Engineering

B. Proposed Deep Model

TABLE I. ARCHITECTURE AND PERFORMANCE COMPARISON


BETWEEN DCNN AND RESNET.
Model Deep CNN ResNet
Total Trainable Parameters 711,940 71,132
Epochs 100 40
Training Samples 48960 48960
Validation Samples 12240 12240
Non Trainable Parameter 0 1,884
Training Accuracy 92.56% 99.31%
Validation Accuracy 91.73% 98.67%
Optimizer rmsprop rmsprop

Deep CNN training was carried out over 100 epochs


while that of ResNet was carried out only for 40 epochs.
Experimentation are carried out by using rmsprop optimizer
and categorical_cross-entropy as the loss function for both
the models. The rmsprop tries to dampen the oscillation and Fig. 3. (c ) performance report
also chooses different learning rate for parameters. Number Fig. 3. ResNet
of parameter trained for DCNN was approximately ten times
more than the ResNet, however both the models took close
to two hours to complete training.
IV. RESULTS AND CONCLUSION
A. Epochs and training accuracy
After monitoring the trend of accuracy and loss values,
for both validation and training samples, number of epochs is
set to values, such that trade off between accuracy and epoch
is optimized.
ResNet showed 100% training accuracy, however
validation accuracy rate was slightly sluggish. In comparison
to ResNet, CNN was slower in both training and validation
accuracy rate. Above experiment was conducted for different
combinations blocks. Accuracy and loss curves are shown in
Fig. 4 & Fig. 5 for the ResNet and DCNN respectively.
ResNet accuracy is found to be slightly better than the
DCNN. It is noticed that in case of DCNN class ‘tha’ is
Fig. 4. (a) Loss curve (b) Accuracy curve
being confused with class ‘yaw’ for maximum number of
instances. No such discrepancy is noticed for ResNet model.
B. Confusion Matrix and performance report
Loss curve, accuracy curve and performance report for
36 classes are shown in Fig. 3 and Fig. 4 respectively for
ResNet and DCNN.

Fig. 4. (c ) performance report


Fig. 3. (a) Loss curve (b) Accuracy curve
Fig. 4. Deep CNN

237

Authorized licensed use limited to: KIIT University. Downloaded on October 26,2022 at 08:15:38 UTC from IEEE Xplore. Restrictions apply.
2020 IEEE International Conference for Convergence in Engineering

TABLE II. SOME WRONGLY CLASSIFIED CHARACTERS is likely to be prominent and significant in case of large
dataset. Although the results for ResNet recognition are at
Confusion Index par than many state-of art achievements, there are still a few
True Level Predicted Level Deep ResNet errors which need meticulous analysis. Input augmentation
CNN and salient feature analysis may help in improving the model
‘tha’ ‘yaw’ 11 2 performance. In future, our work will focus on improving the
automatic recognition accuracy by using suitable concepts
‘waw’ ‘tra’ 9 2 and more impressive network architectures.
‘dha’ petchiryakha 6 1 REFERENCES
‘patalosaw’ ‘ra’ 5 1 [1] A. Krizhevsky, I. Sutskever, G. Hinton, “ImageNet Classifification
with Deep Convolutional Neural Networks,” in Proc. Neural
tra waw 8 2 Information and Processing Systems, 2012.
‘ra’ petchiryakha 8 1 [2] Øivind Due Trier, Anil K Jain, and Torfinn Taxt, “Feature extraction
methods for character recognition-a survey,” in Pattern Recognition,
‘petchiryakha’ ‘dha’ 4 1 29(4):641–662, 1996.
[3] Batuhan Balci, Dan Saadati, and Dan Shiferaw, “Handwritten text
‘tha’ ‘tra’ 6 2 recognition using deep learning,” in Proc. of Convolutional Neural
Networks for Visual Recognition, Stanford University Project, 2017.
‘ga’ ‘tha’ 6 0
[4] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” in
‘ja’ ‘chhya’ 5 0 Neural Computing, vol. 8, no. 9, pp. 1735–1780, 1997.
[5] Paul Voigtlaender, Patrick Doetsch, and Hermann Ney, “Handwriting
waw pha 8 2 recognition with large multidimensional long short-term memory
recurrent neural networks,” in Proc. of15th International Conference
on Frontiers in Handwriting Recognition, pages 228–233, 2016.
[6] Zecheng Xie, Zenghui Sun, Lianwen Jin, Ziyong Feng, and Shuye
Zhang, “Fully convolutional recurrent network for handwritten
chinese text recognition,” in 23rd International Conference on Pattern
Recognition, pp. 4011–4016, 2016.
[7] Chaudhuri, A., Mandaviya, K., Badelia, P., et al.: “Optical character
recognition systems for Hindi language,” in Studies in fuzziness and
soft computing (Springer, Cham, 2017, 1st edn.), pp. 193–216.
[8] Khandja, D., Nain, N., Panwara, S. “Hybrid feature extraction
algorithm for Devanagari script,” in ACM Trans on Asian and Low-
Resource Language Information Processing 2015, 15, (1), pp. 2:1–
2:10.
[9] Kekre, H. B., Thepade, S. D., Sanas, S. P., et al. “Devnagari
handwritten character recognition using LBG vector quantization with
gradient masks,” Int. Conf. Advances in Technology and Engineering
(ICATE), Mumbai, India, January 2013, pp. 1–4.
[10] Prasad K. Sonawane and Sushama Shelke, “Handwritten Devanagari
Character Classifification using Deep Learning,” in International
Fig. 5. Different characters written similarly Conference on Information, Communication, Engineering and
Technology (ICICET), Pune, India, pp. 29-31, Aug. 2018
C. Conclusion and scope for future work [11] Wei, X., Lu, S., Lu, Y. “Compact MQDF classifiers using sparse
In the present work, performance of ResNet for DHCD coding for handwritten Chinese character recognition,” Pattern
Recognition, 2018, 76, pp. 679–690.
has been reported. Results are compared with state-of-art
literature. [12] I. J. Goodfellow, Y. Bulatov, J. Ibarz, S. Arnoud, and V. Shet,
‘‘Multi-digit number recognition from street view imagery using deep
convolutional neural networks’’, in Proc. Comput. Sci., 2014, pp. 1–
TABLE III. COMPARISON WITH STATE-OF-WORK LITERATURE 13.
[13] Zhang, X.-Y., Bengio, Y., Liu, C.-L., “Online and offline handwritten
Model Model Accuracy % Dataset Chinese character recognition: a comprehensive study and new
benchmark,” Pattern Recognition, 61, pp. 348–360, 2017.
DCNN of this work DCNN 97.49 DHCD
[14] Y. Weng and C. Xia, ‘‘A new deep learning-based handwritten
ResNet of this work ResNet 99.38 DHCD character recognition system on mobile computing devices,’’ Mobile
Netw. Appl., vol. 7, pp. 1–10, Mar. 2019.
Acharya et al. [17] DCNN 98.47 DHCD
[15] H. Gao, Y. Duan, L. Shao, and X. Sun, ‘‘Transformation-based
Gupta et al [18] CapsNet 99.02 DHCD processing of typed resources for multimedia sources in the IoT
environment,’’ Wireless Network, vol. 26, pp. 1–17, Nov. 2019.
Sonawane et al [10] AlexNet 95.4 Self
[16] https://archive.ics.uci.edu/ml/machine-learning-databases/00389/
(Transfer Dataset
Learning) [17] S. Acharya, A.K. Pant and P.K. Gyawali, “Deep Learning Based
Large Scale Handwritten Devanagari Character Recognition,” In
It is shown Table III that ResNet results are better than Proceedings of the 9th International Conference on Software,
DCNN and conventionally ResNet learning complexity is Knowledge, Information Management and Applications (SKIMA),
pp. 121-126, 2015.
less than CapsNet. In presence of artifact or unintentional
variations among the training samples of DNN, residual [18] Shivansh Gupta and Ramesh Kumar Mohapatra, “Performance
Improvement in Handwritten Devanagari Character Classification,”
values work against and counter balances rate of gradient Women Institute of Technology Conference on Electrical and
optimization. Vanishing gradient often leads to degradation Computer Engineering (WITCON ECE), 2019.
of character recognition rate. More degradation phenomena

238

Authorized licensed use limited to: KIIT University. Downloaded on October 26,2022 at 08:15:38 UTC from IEEE Xplore. Restrictions apply.

You might also like