You are on page 1of 8

Improving MDLSTM for Offline Arabic

Handwriting Recognition Using Dropout


at Different Positions

Rania Maalej1(&) and Monji Kherallah2


1
Research Group on Intelligent Machines,
National School of Engineers of Sfax, Sfax University, Sfax, Tunisia
rania.mlj@gmail.com
2
Faculty of Sciences, Sfax University, Sfax, Tunisia
monji.kherallah@enis.rnu.tn

Abstract. RNN and LSTM are now a state-of-the-art technology that provide a
very good performance on different machine learning tasks as handwritten
Arabic word recognition. This field remains an on-going research problem due
to its cursive appearance, the variety of writers and the diversity of styles. In this
work, we propose a new offline Arabic handwriting recognition system based on
a particular RNN named the MDLSTM on which we propose to apply dropout
technique in different positions such as before, after or inside the MDLSTM
layers. This regularization technique has the advantages of preventing our
system against overfitting problem and reducing the error recognition rate. We
carried out experiments on the well-known IFN/ENIT Database.

Keywords: Dropout  LSTM  MDLSTM  Offline Arabic handwriting


recognition

1 Introduction

Recurrent Neural Networks (RNN) are among the most powerful sequence learners. In
particular, The Long Short Term Memory (LSTM) has achieved remarkable success in
various machine learning tasks including language modeling [1], speech recognition [2],
machine translation [3], image captioning [4]. LSTM overcomes the problem of van-
ishing and exploding gradients of traditional RNNs. These units have been shown to
give the state of the art performance on handwriting recognition, they have been used as
a stacked bidirectional LSTM for online recognition [5] and as a stacked Multidirec-
tionnal LSTM for the offline task [6], the later system has bien tested on the IFN/ENIT
corpus [7]. But with the huge number of parameters, overfitting can occur. In order to
protect the network against this problem, dropout is applied in different positions. This
technique consists in temporarily removing some units from the network. Those
removed units are randomly selected only during the training stage. This regularization
can improve network performance and significantly reduce the error rate.

© Springer International Publishing Switzerland 2016


A.E.P. Villa et al. (Eds.): ICANN 2016, Part II, LNCS 9887, pp. 431–438, 2016.
DOI: 10.1007/978-3-319-44781-0_51
432 R. Maalej and M. Kherallah

This paper is organized as follows. Section 2 presents relevant previous works.


Section 3 describes our contribution and in Sect. 4 we report on experiment results.
Finally, conclusion and future work are drawn in Sect. 5.

2 Related Work

Six major steps are the bases of the traditional procedure of recognition: image
acquisition, pre-processing, segmentation, feature extraction, classification and post
processing. It is obvious that considerable time and expertise are a must for the feature
extraction stage because it has to be redesigned for each alphabet. We suggest a trained
system on pixel data in order to overcome this complex step. It is evident that this type
of system submitted into holistic approaches, possesses the same difficulty degree to
recognize a number of languages. Likewise, the major interest to use these raw images
in training stage is their capability to learn the visual and the sequential aspect of
cursive handwriting concurrently as well.
In the last years, the majority of researches carried out have been based on either
HMM [8] or on the combination of HMM with neural networks [9]. Although being
successful, HMM possesses some cons like both its poor discrimination and the shortage
of strength to handle the long-term dependencies in sequences as they follow a
first-order equation. The suitable solution adapted by some researchers was the use of
Recurrent Neural Networks RNN [5]. In fact, RNNs prove their efficiency for modeling
times series. They can be trained discriminatively and they do not require a prior
knowledge of data. The use of Recurrent Neural Networks RNN [5] was the perfect
solution followed by several researches. Indeed, RNNs showed their effectiveness to
model times series. They are able to be trained discriminatively and they do not need a
prior knowledge of data. However, RNNs are unable to bear the vanishing gradient and
the burden of exploding. Luckily, these problems can be worked out with a particular
node called the Long Short-Term Memory (LSTM) which holds better outcomes either
in speech recognition [10] or in online handwriting recognition [11]. For the latter field,
Bidirectional LSTM was suggested as it offers the possibility to integrate context in both
sides of each given letter in the input sequence. For offline handwriting recognition, this
architecture is not the suitable option as the input data is not one-dimensional anymore.
Consequently, we tend to choose the application of the MDLSTM.
Combining a Multi-Dimensional Recurrent Neural Network (MDRNN) with the
LSTM nodes is the concept of Multidimensional Long Short Term Memory
(MDLSTM) [12, 13] which is a recurrent network where many connections substituted a
single recurrent connection so that we can represent all spatio-temporal dimensions of
input data. Although MDLSTM’s success, overfitting can occur on this network because
of the large number of hidden layers and also due to the enormous number of param-
eters. We can overcome this inconvenience by using dropout [13] consisting of
removing some units, which are arbitrarily chosen only during the training stage, from
the network momentarily. This regularization is able to better both the network per-
formance and significantly decrease error rate as well.
Improving MDLSTM for Offline Arabic Handwriting Recognition 433

Table 1. Error recognition rate reduced with dropout


Authors Network Dataset Error rate reduction w/dropout
Maalej et al. [18] BLSTM ADAB 8.12 %
Maalej et al. [19] MDLSTM IFN/ENIT 4.88 %

We noted that dropout was both successfully practiced with several types of deep
neural networks proving to be a significant improvement for a recognition rate [14–17]
and triumphantly exploited in RNN, mainly in BLSTM. Likewise, it has proven its
efficiency by minimizing label error rate by more than 8 % on ADAB Dataset for the
online Arabic handwriting recognition [18] and by more than 4.88 % on IFN/ENIT for
the offline Arabic handwriting [19] (Table 1).
In previous systems based on RNN [17–19], dropout was practiced on only some
layers that were unable to be fully-connected so that one does not harm the recurrent
connections and mainly one can keep the RNN able to model long input sequences. In
this Work, as done before, some units in other positions in MDLSTM network were
dropped before, after or inside the MDLSTM layers.

3 System Overview

In this section, the architecture of the offline Arabic handwriting recognition system
based on MDLSTM and CTC is presented (see Fig. 1). Being a robust method,
MDLSTM allows a flexible modeling of this multidimensional context by giving
recurrent connections for every spatio-temporal dimensions existing in the input data.
These connections strengthen MDLSTM against local distortion in image input (e.g.
rotation, shears …). The principal issue of this method is how to gain one-dimensional
label sequences from the two-dimensional images. Consequently, we suggest to push
data through a hierarchy of MDLSTM layers as well as sub-samples windows added

Fig. 1. Architecture of recognition system based on MDLSTM


434 R. Maalej and M. Kherallah

after each level so that we can incrementally collapse the two-dimensional images into
one-dimensional sequences to be finally labeled by the output layers.
To prevent our network from overfitting, dropout is applied at different positions,
for implementation, we add dropout layers at different locations around MDLSTM
layers. Dropout layers return the same input except at dropped nodes that return null. In
our system, 50 % of nodes are randomly dropped. Figure 2 shows dropout layer added
before MDLSTM layers, in this case we choose to drop the same input units for all
directions.
However Fig. 3 illustrates dropout layer added after MDLSTM layers, and in Fig. 4
dropped units are shown inside MDLSTM layers.

Fig. 2. Dropout applied before MDLSTM layers

Fig. 3. Dropout applied after MDLSTM layers


Improving MDLSTM for Offline Arabic Handwriting Recognition 435

Fig. 4. Dropout applied inside MDLSTM layers

4 Experiments Results

IFN/ENIT Database [7], with 32492 images of Arabic words written by more than
1000 writers are used to validate our system. Those words are 937 Tunisian
town/village names. IFN/ENIT Database is divided in 5 sets (see Table 2) and it was
triumphantly exploited by more than 50 research groups as well in Offline Arabic
handwriting recognition competition in ICDAR 2009 [20].
Our system is trained with 19724 words gathered in set a, set b and set c, however
we use set d and set e, that contains 12768 words, for testing.
Some network’s parameters are fixed either automatically or Hand-Tuned. In fact,
we fix three levels for the network hierarchy which are separated by two feedforward
layers with the tanh activation function (see Fig. 1). Each level of the MDLSTM
hierarchy contained four hidden layers for our two-dimensional data. Theses hidden
layers were recurrently connected, indeed all input units are connected to the all hidden
units and all hidden units are connected to both output units and hidden units. For these
LSTM units, gate activation function is the logistic sigmoid, while the cell input and
output functions are both tanh, for more details, we refer the reader to [21]. Regarding
the online steepest descent was used for training with a momentum of 0.9 and a
learning rate of 1e−4. The number of LSTM blocks are: 2 blocks in the first level, 10
blocks in the second level and 50 blocks in the third level. The sizes of the two
feedforward layers separating the hidden levels are 6 and 20. And the dimensions of the

Table 2. The IFN/ENIT database


Sets Words Characters
a 6537 51984
b 6710 53862
c 6477 52155
d 6735 54166
e 6033 45169
TOTAL 32492 257336
436 R. Maalej and M. Kherallah

Table 3. Dropout‘s effect in label error rate tested on different positions around MDLSTM
Label error rate (%) on the IFN/ENIT database
Dropout before Dropout inside Dropout after MDLSTM w/o
MDLSTM MDLSTM MDLSTM Dropout
11.62 11.88 12.09 16.97

three subsampling windows, expressed as a list like (3, 4), (3, 4) and (2, 4), those value
are the width and the height of corresponding window.
The output layers are based especially on the CTC method [22]. This technique
involves a Softmax layer to compute the probability distribution PrðkjtÞ for each step
throughout the input sequence. This distribution covers the 120 Target labels incre-
mented by one extra blank symbol to represent a non-output. So, in total, the size of
this Softmax layer achieves 121. At every timestep the network chooses to emit a label
or not. All these decisions define a distribution over alignments between the input and
target sequences. Afterwards, and due to forward-backward algorithm, CTC sum over
all possible alignments and finally it normalizes probability PrðzjxÞ of the target
sequence given the input sequence. Thus CTC is the best choice for unsegmented
cursive handwriting recognition.
The error measure used as the early-stopping criterion on the validation set is the
label error rate. So convergence is achieved when the label error rate on a validation set
does not decrease by more than a threshold for a given number of iterations. So the
training stops if the label error rate did not considerably decrease for 20 epochs.
Dropout is employed to regularize the network’s parameters and it was found to
boost its performance. Dropout is tested in different places in network, after, before and
inside each LSTM layers. According to [13, 23] the best dropout rate, that results in the
maximum amount of regularization, is equal to 0.5.
After training, we test our best obtained network with set d and set e, we get, as
mentioned in Table 3, an impressive label error rate which does not exceed 11.62 %
when dropout is applied before MDLSTM layers compared to 11.88 % obtained with
the same architecture when some units are dropped inside the MDLSTM layers, and
12.09 when dropout layer are added after MDLSTM layers. All those results are better
than those found without applying dropout during training.

5 Conclusion

In this paper, we have proposed to improve a powerful offline Arabic handwriting


recognizer based on MDLSTM. For that, we have opted for a successfully regular-
ization method called Dropout and we have presented how it can be applied on
MDLSTM network. Dropout consists in temporarily removing some units from the
network. So, we have tested this technique by zeroing some units in different positions
in the network, such as before, after or inside the MDLSTM layers.
Experimental results show that applying dropout before the MDLSTM layers gives
best results and it has successfully improve network performance by both preventing it
from overfitting problem and significantly reducing the label error rate by more than
Improving MDLSTM for Offline Arabic Handwriting Recognition 437

5.35 %. As well, we have also achieved good results when dropout is added after or
inside MDLSTM layers. However, both randomly dropping out some units during
training and repeatedly sampling a random subset of input feature make training stage
much slower. So, as future work, we aim to show how to do fast dropout [24, 25]
training on MDLSTM network.

References
1. Sundermeyer, M., Schlüter, R., Ney, H.: LSTM neural networks for language modeling. In:
INTERSPEECH, pp. 194–197, September 2012
2. Sak, H., Senior, A.W., Beaufays, F.: Long short-term memory recurrent neural network
architectures for large scale acoustic modeling. In: INTERSPEECH, pp. 338–342,
September 2014
3. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In:
Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
4. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption
generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 3156–3164 (2015)
5. Graves, A., Liwicki, M., Bunke, H., Schmidhuber, J., Fernández, S.: Unconstrained on-line
handwriting recognition with recurrent neural networks. In: Advances in Neural Information
Processing Systems, pp. 577–584 (2008)
6. Graves, A.: Offline arabic handwriting recognition with multidimensional recurrent neural
networks. In: Märgner, V., El Abed, H. (eds.) Guide to OCR for Arabic Scripts, pp. 297–
313. Springer, London (2012)
7. Pechwitz, M., Maddouri, S.S., Märgner, V., Ellouze, N., Amiri, H.: IFN/ENIT-database of
handwritten Arabic words. In: Proceedings of the CIFED, vol. 2, pp. 127–136, October 2002
8. Slimane, F., Zayene, O., Kanoun, S., Alimi, A.M., Hennebert, J., Ingold, R.: New features
for complex Arabic fonts in cascading recognition system. In: 2012 21st International
Conference on Pattern Recognition (ICPR), pp. 738–741. IEEE, November 2012
9. Dreuw, P., Doetsch, P., Plahl, C., Ney, H.: Hierarchical hybrid MLP/HMM or rather MLP
features for a discriminatively trained gaussian HMM: a comparison for offline handwriting
recognition. In: 2011 18th IEEE International Conference on Image Processing (ICIP),
pp. 3541–3544. IEEE, September 2011
10. Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural
networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP), pp. 6645–6649. IEEE, May 2013
11. Kozielski, M., Doetsch, P., Ney, H.: Improvements in RWTH’s system for off-line
handwriting recognition. In: 2013 12th International Conference on Document Analysis and
Recognition (ICDAR), pp. 935–939. IEEE, August 2013
12. Graves, A.: Supervised sequence labelling, pp. 5–13. Springer, Heidelberg (2012)
13. Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving
neural networks by preventing co-adaptation of feature detectors (2012). arXiv preprint:
arXiv:1207.0580
14. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a
simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–
1958 (2014)
438 R. Maalej and M. Kherallah

15. Miao, Y., Metze, F.: Improving low-resource CD-DNN-HMM using dropout and
multilingual DNN training (2013)
16. Zhang, S., Bao, Y., Zhou, P., Jiang, H., Dai, L.: Improving deep neural networks for LVCSR
using dropout and shrinking structure. In: 2014 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), pp. 6849–6853. IEEE, May 2014
17. Pham, V., Bluche, T., Kermorvant, C., Louradour, J.: Dropout improves recurrent neural
networks for handwriting recognition. In: 2014 14th International Conference on Frontiers in
Handwriting Recognition (ICFHR), pp. 285–290. IEEE, September 2014
18. Maalej, R., Tagougui, N., Kherallah, M.: Online Arabic handwriting recognition with
dropout applied in deep recurrent neural networks. In: 2016 12th IAPR International
Workshop on Document Analysis Systems (DAS), pp. 418–421. IEEE, April 2016
19. Maalej, R., Tagougui, N., Kherallah, M.: Recognition of handwritten Arabic words with
dropout applied in MDLSTM. In: Campilho, A., Karray, F. (eds.) ICIAR 2016. LNCS, vol.
9730, pp. 746–752. Springer, Heidelberg (2016). doi:10.1007/978-3-319-41501-7_83
20. El Abed, H., Märgner, V.: ICDAR 2009-Arabic handwriting recognition competition. Int.
J. Doc. Anal. Recogn. (IJDAR) 14(1), 3–13 (2011)
21. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780
(1997)
22. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal
classification: labelling unsegmented sequence data with recurrent neural networks, June
2006
23. Baldi, P., Sadowski, P.J.: Understanding dropout. In: Advances in Neural Information
Processing Systems, pp. 2814–2822 (2013)
24. Wang, S.I., Manning, C.D.: Fast dropout training. In: ICML, vol. 2, pp. 118–126 (2013)
25. Bayer, J., Osendorfer, C., Korhammer, D., Chen, N., Urban, S., van der Smagt, P.: On fast
dropout and its applicability to recurrent networks (2013). arXiv preprint: arXiv:1311.0701

You might also like