You are on page 1of 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/225229876

SVM-based hierarchical architectures for handwritten Bangla character


recognition

Article in International Journal on Document Analysis and Recognition (IJDAR) · June 2009
DOI: 10.1007/s10032-009-0084-x · Source: DBLP

CITATIONS READS

104 1,523

4 authors, including:

Pradip Ghanty Anandarup Roy

10 PUBLICATIONS 335 CITATIONS


École de Technologie Supérieure
25 PUBLICATIONS 464 CITATIONS
SEE PROFILE
SEE PROFILE

Swapan Kumar Parui


Indian Statistical Institute
163 PUBLICATIONS 3,385 CITATIONS

SEE PROFILE

All content following this page was uploaded by Pradip Ghanty on 30 May 2014.

The user has requested enhancement of the downloaded file.


IJDAR (2009) 12:97–108
DOI 10.1007/s10032-009-0084-x

ORIGINAL PAPER

SVM-based hierarchical architectures for handwritten


Bangla character recognition
Tapan Kumar Bhowmik · Pradip Ghanty ·
Anandarup Roy · Swapan Kumar Parui

Received: 19 March 2008 / Revised: 11 November 2008 / Accepted: 26 February 2009 / Published online: 26 March 2009
© Springer-Verlag 2009

Abstract We propose support vector machine (SVM) based The HLA scheme with overlapped groups outperforms the
hierarchical classification schemes for recognition of hand- other two HLA schemes.
written Bangla characters. A comparative study is made
among multilayer perceptron, radial basis function network Keywords SVM · RBF · MLP · Handwritten character
and SVM classifier for this 45 class recognition problem. recognition · Bangla · Fusion · Grouping of classes ·
SVM classifier is found to outperform the other classifiers. Hierarchical learning architectures
A fusion scheme using the three classifiers is proposed which
is marginally better than SVM classifier. It is observed that
there are groups of characters having similar shapes. These 1 Introduction
groups are determined in two different ways on the basis of
the confusion matrix obtained from SVM classifier. In the Though optical character recognition (OCR) systems for
former, the groups are disjoint while they are overlapped in some Indian scripts are available [1], there has not been much
the latter. Another grouping scheme is proposed based on the work on recognition of handwritten Indian scripts. The pres-
confusion matrix obtained from neural gas algorithm. Groups ent paper deals with recognition of handwritten Bangla char-
are disjoint here. Three different two-stage hierarchical learn- acters. Bangla is the fifth most popular language in the world
ing architectures (HLAs) are proposed using the three group- and the second most popular language in the Indian subcon-
ing schemes. An unknown character image is classified into tinent. Bangla script has 50 basic characters (39 consonants
a group in the first stage. The second stage recognizes the and 11 vowels). There are more than 300 Bangla compound
class within this group. Performances of the HLA schemes characters along with vowel modifiers. The present study
are found to be better than single stage classification schemes. deals with Bangla basic characters only.
Most of the handwritten character recognition problems
T. K. Bhowmik are complex and deal with a large number of classes. A lot
Read-Ink Technologies Pvt. Ltd.,
of research effort has been made in this direction for several
Indiranagar, Bangalore 560 008, India
e-mail: tkbhowmik@gmail.com scripts [2,3] and has been applied successfully to various
real life applications such as postal automation, bank check
P. Ghanty verification, etc. [4,5]. Multilayer perceptrons (MLP) and
Praxis Softek Solutions Pvt. Ltd.,
hidden Markov models (HMM) have been used for classi-
Module 616, SDF Building, Sector V,
Salt Lake City, Kolkata 700 091, India fication purpose [6,7]. Support vector machine (SVM) has
e-mail: pradip.ghanty@gmail.com not yet been used much in handwritten character recogni-
tion problems. Dong et al. [8] applied SVM classifier to im-
A. Roy · S. K. Parui (B)
prove the performance of a handwritten Chinese character
Computer Vision and Pattern Recognition Unit,
Indian Statistical Institute, Kolkata 700 108, India recognition system. Camastra [9] applies SVM for English
e-mail: swapan@isical.ac.in handwriting recognition. Liu et al. [10] have evaluated the
A. Roy performance of several classifiers for handwritten numeral
e-mail: roy.anandarup@gmail.com recognition. In [11], Günter and Bunke combine three HMM

123
98 T. K. Bhowmik et al.

classifiers for English handwritten word recognition. Bellili [19]. But OVO is applicable when the number classes is
et al. [12] combine MLP and SVM classifiers for handwrit- smaller. Thus, a two-stage classification architecture is suit-
ten digit recognition. This hybrid architecture is based on able for OVO to achieve better accuracy.
the idea that the correct digit class mostly belongs to the The main purpose of the present paper is to design a hier-
two maximum MLP outputs. The classification algorithms archical classification architecture with SVM classifiers in
can be separated into two main categories. Discriminative order to achieve better accuracy. For a hierarchical classifier,
approaches try to find the better separation among all clas- formation of groups of similar classes is necessary. Forma-
ses. But, in general, they cannot deal with outliers. Besides, tion of groups are normally done on an ad hoc manner [18].
model-based approaches make the outlier detection possi- To the best of the authors’ knowledge, no formal methodol-
ble but are not sufficiently discriminative. Observing these ogy for forming such groups is available in the literature. We
characteristics, Milgram et al. [13] proposed a combination develop here several formal grouping schemes on the basis
of model-based approach with support vector classifiers in a of the confusion matrix produced by supervised or unsuper-
two-stage classification system. A number of other schemes vised classification. Such a hierarchical classifier with such
with combination of different classifiers are available in the grouping schemes can be useful for any large class recogni-
literature for this purpose [14,15]. tion problem.
There have been only a few studies in recognition of hand- The performance of a recognition system depends on the
written Bangla characters [16–18]. In [16], a multistage rec- features being used for the classifiers. Different kinds of fea-
ognition system is applied to a small database collected in tures have been proposed and their performances on standard
laboratory environments. Bhowmik et al. [17] proposed an databases have been reported [6]. The performance of recog-
MLP based recognition scheme using stroke features for Ban- nition systems can still be improved by choosing better fea-
gla basic characters. Bhattacharya et al. [18] also designed a tures. One of the popular and efficient features, namely, the
two-stage MLP based recognition system using shape based wavelet features, has been used in many handwritten charac-
features. These two studies are based on large databases. In ter recognition problems. Bhattacharya et al. [20,21] applied
the present study, we apply MLP, radial basis function (RBF) these features for handwritten Bangla and Devnagari numeral
and SVM classifiers to handwritten Bangla character recog- recognition and reported satisfactory results. We explore how
nition problem and compare their performances in terms of effective these features are for the present recognition prob-
classification accuracy. The entire handwritten Bangla char- lem where the number of classes is large.
acter database is partitioned into training, validation and test
sets. The validation set is used to determine several parame-
ters of the classifiers. A majority voting fusion scheme is used 2 Feature extraction
for this problem on the basis of MLP, RBF and SVM classi-
fiers. From the confusion matrices (on the basis of validation The wavelet transform [22] is a well-known tool that finds
set) of different classifiers, it has been observed that there application in many areas including image processing. Due
are groups of classes within each of which the misclassifi- to the multi-resolution property, it decomposes the signal at
cation rate is larger compared to the misclassification rate different scales. For a given image, the wavelet transform
between such groups. Based on this observation, groups are produces one low frequency subband image reflecting an
identified using certain techniques. A hierarchical learning approximation of the original image and three high frequency
architecture (HLA) for handwritten Bangla character recog- components of the image reflecting the detail. The approxi-
nition is designed on the basis of these groups. In the first mation component is used here to generate the feature vector
stage of HLA, the group for an unknown sample is identified in the present recognition problem. In our experiment, we
and the following stage recognizes the sample into a class consider Daubechies wavelet transform with four coefficients
within this group. Three such different HLA schemes based (l0 , l1 , l2 , l3 ) forming the lowpass or smoothing filter and
on three different grouping algorithms are proposed. another four coefficients (h 0 , h 1 , h 2 , h 3 ) forming the high-
In the present paper, we intend to explore how these clas- pass filter where
sifiers compare with respect to their classification accuracies √ √ √ √
in a large class problem like the present one. In order to 1+ 3 3+ 3 3− 3 1− 3
deal with a large number of classes, a two-stage classifica- l0 = √ , l1 = √ , l2 = √ , l3 = √ ,
4 2 4 2 4 2 4 2
tion architecture is proposed so that each stage involves a and h 0 = l3 , h 1 = −l2 , h 2 = l1 , h 3 = −l0 .
smaller number of classes. An SVM classifier is basically
meant for two-class problems. However, there exist methods For an input image, we first calculate the bounding box of the
to use SVM classifiers for larger number of classes. One- image, then normalize it to a square image of size 64×64 with
versus-all (OVA) and one-versus-one (OVO) are two such an interpolation technique. Wavelet decomposition algorithm
methods. OVO gives better classification accuracy than OVA with the above lowpass and highpass filters is applied to this

123
SVM-based hierarchical architectures for handwritten Bangla character recognition 99

modified BP algorithm [24] using self-adaptive learning rate


values.
The RBF network is another model of neural network
commonly used in pattern classification problems [23]. In
the RBF network architecture, we use the Gaussian acti-
vation function as the basis function [23]. Each output of
the network is augmented by a sigmoid function. An unsu-
pervised learning method is applied for the hidden units to
estimate the initial basis function parameters. The Gradi-
ent Descent learning method is applied to tune the network
weights as well as the basis function parameters during super-
vised learning.

3.2 Support vector machine

The SVM [25,26] is a machine learning method basically


used for two-class pattern recognition problems. Suppose
a training data set D consists of pairs {(xi , yi ), 1 ≤ i ≤
n} where input vectors are xi ∈ R p and each binary label
yi ∈ {−1, +1} corresponds to two classes. The SVM max-
Fig. 1 Original image “I” and the components produced by Wavelet
transform imizes the cost function 21 w T w subject to the constraints
xi ·w+b ≥ +1 for yi = +1 and xi ·w+b ≤ −1 for yi = −1,
normalized image twice (row-wise and column-wise) to get where w is the weight vector and b is the bias. When the train-
four 32 × 32 images that are shown in the third row of Fig. 1. ing points are not linearly separable, the cost function is refor-
mulated by introducing slack variables ξi ≥ 0; i = 1, . . . , n.
Its left-most image is obtained by using the lowpass filter n
twice and is considered for the recognition task. From this The SVM now finds w that minimizes 21 w T w + C i=1 ξi
image, four 16 × 16 images are obtained in a similar fashion subject to xi · w + b ≥ +1 − ξi for yi = +1, xi .w + b ≤
and are shown in the fourth row of Fig. 1. Its left-most image −1 + ξi for yi = −1 and ξi ≥ 0, ∀i. The constant C is a reg-
is also considered for the recognition task. The binarized ver- ularization parameter. When the decision function is non-lin-
sions of these two images are used as the input feature vector ear, the above scheme cannot be used directly. The SVM then
for the classifiers. In Fig. 1, the image of Bangla character “I”, evaluates a function ϕ : R p −→ H to map the training data
its 64 × 64 normalized form and the components produced from R p to a higher dimensional feature space H . Since in
by Wavelet transform are shown. H , the data may be linearly separable, the linear formulation
can be applied to these data. A kernel K (x, xi ) = ϕ(x)ϕ(xi )
is used to construct the optimal hyperplane in H without con-
sidering it explicitly. The commonly used kernels are:
3 Classifiers Polynomial of degree d: K (x, xi ) = (xxi +1)d . Radial Basis
Function (RBF): K (x, xi ) = exp(−γ x − xi 2 ) for γ > 0.
We will now briefly discuss here the classifiers that we use to We use both linear and nonlinear SVMs. The RBF kernel
discriminate between the classes on the basis of the feature with different γ values and polynomial kernel with different
vector described above. degrees (d) are used for nonlinear SVMs. We use SV M light
[27] software for learning the SVM classifier.
3.1 Neural networks
3.3 Support vector machine for multiclass problems
The neural networks (NNs) have been successfully used in
various pattern classification applications [23]. Here, we use Multiclass SVMs are realized by combining several
two different NN models, namely, multilayer perceptron two-class SVMs. Two popular methods, namely, one-ver-
(MLP) and radial basis function (RBF) networks. Use of sus-one (OVO) [28] and one-versus-all (OVA) [25] are used
MLP has become quite popular in handwriting recognition for this combination. In OVO method, c(c − 1)/2 binary
[20] because when a pattern has a lot of variations due to classifiers are constructed for a c-class problem. The
handwriting style, the MLP is quite effective for the recogni- binary classifier Ci j , i < j is trained using samples from
tion task. The well-known back-propagation (BP) algorithm class i and class j containing positive and negative samples
is used to train the MLP. For the present task, we use the respectively. The max-wins voting (MWV_SVM) [28] is

123
100 T. K. Bhowmik et al.

used to classify an unknown sample. In MWV_SVM scheme degrades the recognition accuracy. To remedy this situation,
if the decision function value for an unknown sample x is several methods have been suggested. One such method is
greater than or equal to 0 from classifier Ci j then the vote for to break the tie by choosing the decision of any arbitrary
class i is increased by one. Otherwise, the vote for class j is classifier. This method is abbreviated as the FMRS indicat-
increased by one. The sample x is assigned to class k where ing fusion of MLP, RBF and SVM classifiers. Another way
class k has the largest number of votes among all c(c − 1)/2 is to rely on the decision of a particular classifier to break
classifiers. The OVA method on the other hand needs c bi- the tie. We term the schemes as FRMS_M, FRMS_R and
nary classifiers for a c-class problem. The ith binary classifier FRMS_S when the ties are broken using the MLP, RBF and
constructs a decision boundary between class i and the other SVM classifiers respectively.
c − 1 classes. The winner-takes-all strategy (WTA_SVM)
[25] is used to assign the class label to an unknown sample
4.3 Hierarchical learning architectures (HLAs)
x. WTA_SVM strategy assigns x to the class having the larg-
est value of the decision function from all c binary classifiers,
For a large class problem, it is possible to find some groups of
even when all decision function values (di , i = 1, . . . , c) are
classes within each of which the misclassification rate is high
negative.
compared to the misclassification rate between such groups.
Both the methods OVO and OVA have some disadvan-
On the basis of this observation, several classes are merged to
tages. For a large class (c-class, say) problem the OVO
form groups. These groups however may or may not be dis-
method constructs c(c−1)/2 binary classifiers, whereas OVA
joint. A classification scheme based on the groups is termed
needs only c binary classifiers. So, OVA incurs less overhead.
as a hierarchical learning architecture (HLA). Here, we use a
Yet, in general, the OVO method gives better classification
two stage classification scheme. The first stage identifies the
accuracy than the OVA method for large class problems [19].
correct group of an unknown sample. The second stage rec-
ognizes the sample as a member of a particular class in that
group. The most important aspect of HLA is to determine
4 Classification schemes the groups and the classes they contain. We first propose two
grouping schemes to build disjoint groups and overlapped
The classifiers described in Sect. 3 may be used in different groups, on the basis of the confusion matrix obtained from
ways for a pattern classification problem. Classifiers can be any supervised classification. Camastra [9] used Neural Gas
used separately or may be combined to form a fused clas- method to identify the similar structures in the input patterns.
sifier. A hierarchical architecture with the same or different We develop a third grouping scheme using the NG method.
classifiers may also be used. The following subsections detail Finally the HLA is constructed using all the three grouping
the schemes adopted for the present problem. schemes.

4.1 Single stage classification scheme


5 Grouping schemes
Three pattern recognition tools, namely, MLP, RBF and SVM
are used for classification. In the single stage classification We describe here the grouping schemes used to identify the
scheme, all the classifiers are employed separately. groups of similar characters. The proposed methods intend to
merge the classes in such a way that minimizes the between
4.2 Single stage classification with fusion scheme group misclassification rate. The groups obtained from these
schemes may be disjoint or overlapped. Sections 5.1 and
It is possible that different classifiers may have different sep- 5.2 describe the first two grouping schemes. The grouping
arating hyper-surface in the feature space. Thus, a natural scheme based on NG method is described in Sect. 5.3.
option is to fuse the three classifiers, namely, MLP, RBF and
SVM classifiers so that the resulting fusion classifier captures 5.1 Disjoint grouping scheme
the discriminating characteristics of all these classifiers.
The fused classifier is implemented based on majority vot- Let the confusion matrix obtained from a supervised classi-
ing. The principle of majority voting is that if at least two of fier with c classes be ((ai j )) where ai j , (i, j = 1, . . . , c) is
the classifiers classify x, an unknown input vector, in class the number of samples belonging to class i that are classified
C then it is finally classified into C (note, we consider here as class j. We define similarity between classes i and j as
three classifiers). However, a tie may occur where the three n i j = ai j + a ji , (i < j). We define similarity between two
classifiers classify x into three different classes. For a large groups of classes as follows. Suppose two groups G p and
number of classes, the possibility of a tie increases which G q (having m and n classes, respectively) are represented by

123
SVM-based hierarchical architectures for handwritten Bangla character recognition 101

the index sets (i 1 , . . . , i m ) and ( j1 , . . . , jn ). The similarity we obtain the groups as {(1, 2, 3, 4) and (5, 6)}. Here the
between G p and G q , ( p < q) is defined as: idea is that two groups are not merged if there is a pair of
classes (across these groups) which are very dissimilar. The
s pq = min n i j , ( i = i 1 , . . . , i m , j = j1 , . . . , jn ). (1) classes 4 and 6 form such a pair here. Note that if thr = 4
i< j
the algorithm terminates earlier and outputs more number of
Note that grouping done on the basis of this similarity mea- groups, namely, {(1, 2), (3, 4), (5, 6)}.
sure has resemblance to complete linkage clustering [29]. If
operator “min” is replaced by operator “max”, the resultant
grouping has resemblance to single linkage clustering [29]. 5.2 Overlapped grouping scheme
In many cases, the latter operator leads to a situation where
one group contains a large number of classes compared to In the grouping algorithm described above, we have used
other groups. This is the reason why the similarity measure both the rows and columns of the confusion matrix to deter-
defined in Eq. 1 is used here. mine a group. We now consider only a column for forming
Initially, each individual class is a group (i.e., c classes rep- a group. For each column, we select the best (i.e., smallest)
resenting c groups). First, two classes i and j are found such subset of classes (i.e., with size less than c) that incurs an
that their similarity value n i j is maximum among all pairs of error rate  or less. In this classification scheme, there are c
classes. They are merged into one group—resulting in c − 1 classes or groups in the first stage. In the second stage, indi-
groups. Subsequently, similarity values between groups are vidual groups will have varying number of classes. Note that
computed and two most similar groups are merged into one. samples from one character class may belong to more than
The process continues until we reach a stage when all the one group. This is why we call the groups overlapped groups.
similarity values between groups are less than or equal to a Note that in the earlier scheme, samples from one character
certain non-negative threshold value thr . class belong to exactly one group.
Note that using the above algorithm, it is not possible to get Consider the confusion matrix ((ai j )) defined
less number of groups than what is achieved with thr = 0. in c
Sect. 5.1.
Suppose the target error rate is . Let N j = i=1 ai j and
However, it will be possible to get fewer groups if the opera- 
thr j be an integer such that ai j ≤thr j ai j =  ∗ N j . How-
tor “min” in Eq. 1 is replaced by “ jth min”, ( j = 2, 3, . . .). ever, for the discrete nature of ai j , the equality may not
As an illustration, consider the confusion matrix ((ai j )) hold for any thr j and, therefore, we take the largest value
in Table. 1 with 6 classes. 
of thr j such that ai j ≤thr j ai j ≤  ∗ N j . Now, the jth group
G j , ( j = 1, . . . , c) is defined as {i : ai j > thr j }. For exam-
Table 1 Confusion matrix of 6 classes ple, the fourth column in the confusion matrix in Table. 1 is
[5 1 16 76 3 0] where N4 = 101. Suppose  = 0.05. Note
1 2 3 4 5 6  
that ai j ≤4 ai j < 0.05N j and ai j ≤5 ai j > 0.05N j . Thus
1 75 15 4 5 1 0
2 18 78 1 1 1 1 thr4 = 4 and G 4 = {1, 3, 4}.
3 2 7 73 16 0 2 For a c class problem, we construct c groups in the first
4 1 3 19 76 1 0 stage and for each such group, a separate classifier is
5 3 4 3 3 72 15 designed. Suppose a sample x is classified in class k in the
6 3 3 3 0 17 74
first stage. Then the sample is fed to the classifier for G k
in the second stage. The final class is assigned to sample x
by the second stage classifier. In this scheme, if a sample
Let thr = 0. Note that n 34 = 35 is maximum among
is misclassified in the first stage, it may sometimes be cor-
all n i j and the classes 3, 4 are merged into one group in the
rectly classified in the second stage. This is not possible in
first iteration resulting in 5 groups, namely, {(1), (2), (3, 4),
the disjoint grouping scheme.
(5), (6)}. The resulting groups are renamed as {(1), (2), (3),
(4), (5)}. The similarity matrix of these groups is shown
in Table 2a. Note, according to the definition of similarity
(Eq. 1), the lower triangle and the main diagonal of the sim- 5.3 Grouping scheme with neural gas
ilarity matrix are not defined. After first merging we obtain
s12 = 33 being the maximum. Hence, we merge groups 1 Camastra [9] proposed a grouping scheme using the Neu-
and 2 (Table 2b). Accordingly, in the next two iterations we ral Gas (NG) method. Camastra used NG to verify whether
form groups with (3, 4) and (1, 2) respectively. The similar- the uppercase and the lowercase letters of English can be
ity matrices after these two merges are shown in Table 2c,d. merged into a group or not [9]. We use NG here for auto-
Note, now s12 = 0 (≤ thr ). Hence no further merging is per- matic grouping of classes. The following subsections give a
formed with the groups. Thus in terms of the original classes brief overview of the NG and the NG based grouping scheme.

123
102 T. K. Bhowmik et al.

Table 2 Similarity matrix 1 2 3 4 5


between (a) 5 groups, 1 2 3 4
1 - 33 6 4 3 1 2 3
(b) 4 groups, (c) 3 groups 1 - 4 4 3 1 2
2 - - 4 5 4 1 - 4 3
and (d) 2 groups 2 - - 3 0 1 - 0
3 - - - 3 0 2 - - 0
3 - - - 32 2 - -
4 - - - - 32 3 - - -
4 - - - -
5 - - - - -
(a) (b) (c) (d)

5.3.1 The neural gas similar and hence should form a group. Based on this obser-
vation, a grouping mechanism is formalized as follows. Sup-
Vector quantization methods encode a set of data points in pose an impure Voronoi region contains m classes, namely,
n-dimensional space with a smaller set of neurons described C1 , C2 , . . . Cm having p1 , p2 , . . . , pm samples respectively.
by their weight vectors Wk , k = 1, . . . , N . Neural Gas (NG) Let q jk be min( p j , pk ) where j < k. Let ((n i j )) be the
[30] is a vector quantization technique with soft competi- upper triangular similarity matrix where n i j is the sum of all
tion between the neurons. In each training step, the squared qi j over all impure Voronoi regions. Here n i j represents sim-
Euclidean distances dik 2 = x − W 2 between an input ilarity between classes i and j as in Sect. 5.1. The similarity
i k
vector xi and all neurons Wk are computed. The vector of s pq between groups of classes is also defined in the same
these distances is d. The N distances are now given ranks way. Finally, the formation of the groups of classes based
rk (d) = 0, . . . , N − 1 according to the ascending order of on n i j and s pq is done using the algorithm in Sect. 5.1. The
distances. The learning rule is difference between the grouping schemes in Sects. 5.1 and
5.3 is that the former is based on the results of a supervised
Wk = Wk + ηh ρ (rk (d))(x − Wk ). (2)
classification while the latter makes use of the results of an
The function h ρ (rk (d)) = e(−r/ρ) is a monotonically unsupervised classification.
decreasing function of the ranking. The width of h(·) is deter- Note that the groups formed by the NG based method
mined by the neighborhood range ρ. The learning rule is are always disjoint. The similarity matrix obtained by this
also affected by a global learning rate η. The values of ρ method is upper triangular and hence formation of over-
and η decrease exponentially from an initial positive value lapped groups on the basis of the columns cannot be con-
(ρ(0), η(0)) to a smaller final positive value (ρ(T ), η(T )) sidered as in Sect. 5.2.
according to
ρ(t) = ρ(0)[ρ(T )/ρ(0)](t/T ) (3)
(t/T )
6 Results and discussions
and η(t) = η(0)[η(T )/η(0)] (4)
where t is the time step and T the total number of training In this section, we present the results obtained by different
steps. schemes described so far. In addition, we compare the per-
formances of the single stage classification schemes with
5.3.2 Group identification from NG the proposed HLA schemes. In the single stage classifica-
tion schemes, we have used binary representation of 16 × 16
Suppose, after completion of the algorithm, the Voronoi decomposed image as the feature vector. The HLA schemes
region Vk (k = 1, . . . , N ) corresponding to the kth neu- consist of two stages as discussed earlier in Sect. 4.3. In the
ron, denotes the set of input vectors that are closest to the first stage, the binary representation of 16 × 16 decomposed
weight vector Wk . Clearly, the union of all Vk ’s constitutes image is used as the feature vector while the binary represen-
the whole dataset. Now, if the underlaying classes in the data- tation of 32 × 32 decomposed image is used in the second
set are quite apart from each other in the feature space, each stage. For all schemes, the parameters of the classifiers are
Vk is expected to contain samples from a single class. How- determined on the basis of the validation sets.
ever, in problems like the present one, this is not the case
and there will be some Vk ’s that will contain samples from 6.1 Bangla handwritten database
more than one class. Now if two classes i and j have sim-
ilarity, there will be neurons whose Voronoi region Vk will Experiments have been carried out on a moderately large
contain samples from both the classes i and j. The Voro- database of basic handwritten Bangla isolated characters.
noi regions that have samples from only one class are called The database has been collected from different sections
pure and others impure. Thus, a study of the composition (like different age groups, different educational levels, etc.)
of impure Voronoi regions can determine which classes are of the population in and around Kolkata, India, and is a

123
SVM-based hierarchical architectures for handwritten Bangla character recognition 103

MLP and RBF are chosen based on trial and error method
on the validation set. The optimal number of hidden nodes
for MLP is found to be 200 and for RBF to be 260. The
OVA method is used to train 45 SVM classifiers. While using
Fig. 2 Five pairs of similar pattern Bangla characters
SVM, linear kernel, RBF kernel and polynomial kernel are
used. The hyper parameter γ = 0.10 for RBF kernel and
degree d = 3 for polynomial kernel have resulted in the min-
Table 3 Handwritten database statistics
imum validation error in the present problem. The recogni-
No of classes Training samples/ Validation samples/ Test samples/
class class class
tion accuracies of SVM classifiers on the test set are 56.88%
using linear kernel, 74.21% using RBF kernel (γ = 0.10)
45 400 100 100 and 79.47% using polynomial kernel with d = 3. The SVM
classifier with polynomial kernel of degree 3 has the maxi-
mum validation accuracy and hence is used for the rest of the
experiments. Overall test accuracies of MLP, RBF and the
SVM classifiers are 71.44%, 74.56% and 79.47%, respec-
tively.

6.3 Performance of single stage classification with fusion


scheme

In Sect. 4, we have discussed the fusion of several classifiers.


We here design a fusion classifier using the MLP, RBF and
SVM classifiers. We first make a study on the distribution of
correctly and incorrectly classified samples in the validation
set, by MLP, RBF and SVM classifiers in the single stage
classification scheme (Fig. 4).
Let us now define the following sets:

M: Set of character images that are classified correctly by


MLP classifier.
Fig. 3 Bangla symbol set with class references R: Set of character images that are classified correctly by
RBF classifier.
S: Set of character images that are classified correctly by
representative of a wide spectrum of handwriting styles. SVM classifier.
Bangla character set consists of 11 vowels and 39 conso-
nants. However, two consonant characters have nearly the
same shape (Column 1 in Fig. 2) and each pair of characters
in the other four columns have the same shape except the
presence of a dot at the bottom of the character.
Hence, we consider the handwritten Bangla character rec-
ognition problem as a 45 class problem here. The entire data-
base has been divided into three sets, namely, training set,
validation set and test set. The number of samples in each
set is shown in Table 3. The size of the whole database is
27,000.
Some handwritten samples from the database along with
the class references are shown in Fig. 3.

6.2 Performance of single stage classification scheme

Three pattern recognition tools, namely, MLP, RBF and SVM Fig. 4 Distribution of correctly and incorrectly classified samples by
are used for classification. The number of hidden nodes for MLP, RBF and SVM classifiers

123
104 T. K. Bhowmik et al.

Table 4 Results of fusion scheme Table 5 Groups of classes with similar patterns obtained using disjoint
grouping scheme
FMRS FMRS_M FMRS_R FMRS_S
Group Groups with Group Groups with
Validation set 81.21 80.62 81.91 82.44 id. character references id. character references
Test set 80.20 79.73 81.31 81.84
A {1, 2, 8, 21} H {17, 25}
B {3, 4, 9, 11} I {18, 19, 28, 41}
C {5, 6, 22} J {23, 29}
D: Set of character images that are classified correctly by
D {7, 12, 20, 32, 38} K {30, 37}
both MLP and RBF classifiers, but not by SVM classifier.
E {10, 16, 24, 26, 34, 45} L {33}
E: Set of character images that are classified correctly by
F {13, 15, 27, 35, 36, 39, 40} M {42, 43, 44}
both RBF and SVM classifiers, but not by MLP classifier.
G {14, 31}
F: Set of character images that are classified correctly by
both SVM and MLP classifiers, but not by RBF classifier.
X: Set of character images that are classified correctly by
only MLP classifier. construction of HLA requires grouping of classes. We em-
Y: Set of character images that are classified correctly by ploy the first two grouping schemes with the confusion ma-
only RBF classifier. trix based on validation set. A third HLA is designed based
Z: Set of character images that are classified correctly by on the groups obtained using the NG based approach. The
only SVM classifier. following subsections describe the results obtained from the
U: Validation set. three HLA schemes.

On the basis of our validation set, the sizes of the sets shown in 6.4.1 HLA with disjoint groups
Fig. 4 are as follows: #(M) = 3210, #(R) = 3465, #(S) =
3586, #(D) = 132, #(E) = 418, #(F) = 229, #(X ) = On the basis of our validation set, the groups obtained by the
124, #(Y ) = 190, #(Z ) = 214, #(M ∩ R) = 2857, #(R ∩ algorithm described in Sect. 5.1 with thr = 0 are shown in
S) = 3143, #(S ∩ M) = 2954, #(M ∩ R ∩ S) = Table 5.
2725, #(M ∪ R ∪ S) = 4032 and #(M ∪ R ∪ S) = 468. With these groups, we construct a two-stage HLA using
Mathematically, the accuracy of the majority voting scheme SVM. Since the performance of OVO method is better than
always lies within the range (#[(M ∩ R ∩ S) ∪ D ∪ E ∪ OVA method and the number of classes is not too large in
F]/#(U )) × 100 to #((M ∪ R ∪ S)/#(U )) × 100 (which any single stage, we choose OVO method for both the stages
is 77.87–89.60% in our experiment). In fact, there does not of the two-stage HLA. In the first stage of this scheme, all
exist any fusion scheme, which exceeds the accuracy of the 13 groups are considered as 13 different classes. Thus,
89.60% on validation set. Earlier, we have pointed out the we train SVM classifiers for 13 classes. Here we construct
possibility of a tie and proposed certain methods to deal with 78 binary classifiers using the OVO method. To classify an
it. The present problem has 45 classes, so the occurrence unknown sample MWV_SVM scheme is used. In the sec-
of a tie is not infrequent. We use the four schemes, namely, ond stage, 12 SVM classifiers for the 12 groups are designed
FMRS, FMRS_M, FMRS_R and FMRS_S to break the tie. (for single element group L, no second stage classifier is
The recognition accuracies of the fusion classifier on the val- needed). The results of applying SVM on the test set in the
idation and test sets are shown in Table 4. From Table 4 it first and second stages are shown in Table 6. Note that in
can be seen that FMRS_S gives the best result among all the this scheme, if a sample is misclassified in the first stage as
four schemes. a character belonging to a group other than its own, then
the character will never be classified correctly in the second
6.4 Performances of hierarchical learning architectures stage. The average recognition accuracy obtained on test set is
(HLAs) 85.22%.

From the results of the single stage classification schemes, 6.4.2 HLA with overlapped groups
it is clear that the SVM classifier individually gives the best
result among the three classifiers under consideration. On We now identify the overlapped groups using the algorithm
the basis of this observation, we choose the SVM classi- proposed in Sect. 5.2 and design the corresponding HLA.
fier from now on. So, the results of HLA schemes as well The algorithm of Sect. 5.2 is applied after setting the error
as the comparisons are made with respect to SVM only. rate  = 0.05. We end up getting the groups given in Fig. 5.
Now, we construct a two stage HLA for the problem. The It is clear that ith class is present in the ith group. The ticked

123
SVM-based hierarchical architectures for handwritten Bangla character recognition 105

Table 6 Classification accuracy within groups and between groups Table 7 Groups of classes obtained from Neural Gas
using disjoint grouping scheme
Group Groups with Group Groups with
Group id. Number of Inter-Group Intra-Group id. character references id. character references
test samples classification classification
accuracy (%) accuracy (%) A {1, 2, 19} H {13, 27, 35, 40}
B {3, 4, 5, 22} I {14, 31, 38}
A 400 93.50 93.32
C {6, 9, 11} J {15, 36, 39}
B 400 94.25 94.69
D {7, 23, 29} K {17, 25}
C 300 96.00 93.40
E {8, 21, 30, 37} L {18, 28, 41}
D 500 92.20 89.80
F {10, 16, 24, 26, 34, 45} M {42, 43, 44}
E 600 99.33 86.58
G {12, 20, 32, 33}
F 700 96.86 81.27
G 200 88.50 88.14
H 200 99.00 95.96
I 400 89.50 92.46 6.4.3 HLA with groups using NG based approach
J 200 88.50 96.05
K 200 87.00 95.40 After construction of the matrix ((n i j )) on the basis of the
L 100 80.00 – training set, we proceed with the grouping algorithm
M 300 98.33 96.95
described in Sect. 5.3 with N = 1000. Using thr = 0, a total
of 13 groups are obtained. The groups are different from the
earlier groups (Table 5) though their number is the same by
coincidence. In Table 7, we list all the groups obtained from
the grouping scheme using NG.
entries in a row indicate the classes that form the group cor- The classification scheme is similar to the HLA for dis-
responding to the row. We design the HLA using overlapped joint groups (see Sect. 5.1). Test set accuracies for individual
groups and SVM classifier. An average accuracy of 88.13% groups are shown in Table 8. The average recognition accu-
is obtained on the test set. racy on test set is obtained as 84.05%.

Fig. 5 Groups obtained from


overlapped grouping scheme

123
106 T. K. Bhowmik et al.

Table 8 Within groups and between groups classification accuracy for for disjoint group HLA lies between 83.80 and 86.80%. For
NG overlapped group HLA, the accuracy is in between 86.47 and
Group id Number of Inter-Group Intra-Group 88.38% and for HLA_NG accuracy is in between 82.69 and
test samples classification classification 85.56%.
accuracy (%) accuracy (%)

A 300 89.00 92.51 6.6 Comparison of proposed HLA schemes


B 400 94.25 88.86 with an existing two-stage scheme
C 300 96.33 95.50
D 300 86.67 91.39 As we have pointed out earlier Bhattacharya et al. [18]
E 400 91.75 94.55 designed a two-stage MLP based recognition system for
F 600 99.17 84.37 Bangla basic characters. The features are obtained by com-
G 400 90.00 94.17 puting local chain code histograms of the contour of the
H 400 88.25 80.45
character images. For comparison, Bhattacharya et al. also
I 300 91.33 88.32
computed the same features from the skeleton of the character
images. Different groups of the character classes are made by
J 300 79.33 92.86
manual inspection with the confusion matrix obtained from
K 200 99.00 91.41
45 class MLP classifier. Results using the contour based fea-
L 300 96.33 94.81
tures show better recognition accuracy. Hence for compara-
M 300 99.00 97.64
tive study, we compute features only from the contour. We
here take the following comparison strategy. We compute
6.5 Comparative study of different schemes the contour based chain code features from our database and
execute the MLP based recognition scheme of Bhattacharya
We now compare the performance of the single stage et al. In this course, we adopt the groups as designed by
classification scheme with the proposed HLA schemes. The Bhattacharya et al. When working with MLP one needs to
test accuracies for each of 45 classes obtained by the five specify the number of hidden layers and hidden nodes. Bhat-
different classification schemes are given in Fig. 6. The per- tacharya et al. defined only the first stage architecture of
formance of SVM based single stage classification scheme the MLP. We adopt the given architecture and use 40 nodes
is marked as SSC. The fusion scheme FMRS_S is taken arranged in a single hidden layer. However, for the second
for comparison as it outperforms the other three (Table 4). stage MLP, we determine the number of hidden nodes based
The proposed HLA schemes are marked as HLA_DG and on trial and error method on the validation set used in the
HLA_OG for disjoint group and overlapped group HLA previous experiments. Here we have obtained 82.27% test
schemes respectively. The NG based HLA is termed as accuracy using our database and the scheme of Bhattacharya
HLA_NG. It is clear from Fig. 6 that the performances of et al. On the other hand, in a second experiment, we take the
the proposed HLA schemes are better than the single stage same features and execute the MLP based HLA schemes. Us-
classification schemes for most of the classes. For 12 clas- ing MLP based disjoint group HLA we get 83.42% and with
ses, the disjoint group HLA scheme performs better than the overlapped group HLA we get 85.09% test accuracies. Since,
other schemes. The performances of the overlapped group we have the same features and classification tool (MLP)
HLA scheme are better than the other schemes for 26 clas- for all the experiments, evidently our automatic grouping
ses. The NG based HLA scheme is the best performer for the schemes outperform the manual grouping of classes by Bhat-
other 7 classes. tacharya et al. However, with Wavelet features, we have
In order that the recognition results are not biased on a earlier obtained the best results with the SVM based HLA
particular choice of the training-validation-test set combina- schemes (Sect. 6.5). Hence, we perform our experiments with
tion, we have randomly created six different combinations SVM based disjoint group HLA and overlapped group HLA
of training-validation-test sets and repeated each of the five using the contour based chain code features. The recognition
classification schemes six times for these six combinations accuracy obtained by HLA_DG is 86.07% and the same by
(the earlier results were based on one such combination). HLA_OG is 89.22%.
In Table 9, the test accuracies (averaged over all six com-
binations) for the five schemes are shown. The scheme is
mentioned in the first column of the table. In the second 7 Conclusions and future scope
column, the average test accuracies are shown. In the SVM
based single stage classification scheme, test accuracy lies In this paper, SVM based hierarchical learning architecture
between 78.31 and 80.58%. Similarly for FMRS_S the accu- (HLA) schemes for Bangla handwritten character recogni-
racy lies between 79.02 and 82.16%. Classification accuracy tion have been proposed. A comparative study among MLP

123
SVM-based hierarchical architectures for handwritten Bangla character recognition 107

Fig. 6 Comparison of single (a)


stage and two-stage 100
classification schemes: (a) for
1–15 classes, (b) for 16–30 90
classes and (c) for 31–45 classes
80

70

60 SSC
FMRS_S
HLA_DG
50
HLA_OG
HLA_NG
40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Class #

(b)
100

90

80

70

60 SSC
FMRS_S
HLA_DG
50
HLA_OG
HLA_NG
40
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Class #

(c)
100

90

80

70

60 SSC
FMRS_S
HLA_DG
50
HLA_OG
HLA_NG
40
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
Class #

Table 9 Average classification accuracies on six test sets Three new schemes for grouping similar classes have been
Scheme Average test accuracy (%) developed on the basis of the confusion matrix using cer-
tain objective criteria. However, to determine the number of
SSC 79.33 ± 1.25 groups, user intervention is necessary here. To automatically
FMRS_S 80.37 ± 1.79 find the optimal number of groups in terms of classification
HLA_DG 84.78 ± 2.02 accuracy, can be an issue of further study. Also, devising
HLA_OG 88.02 ± 1.55 an overlapped grouping scheme on the basis of ai j + a ji of
HLA_NG 83.58 ± 1.98 Sect. 5.1 can be studied.
In the proposed HLA schemes, the features used in both
the stages of classification, are essentially the same. Better
networks, RBF networks and SVMs has been made with classification may be achieved if different feature sets are
respect to classification accuracy. Such a comparative study used in different stages. For example, if features based on
among different classifiers has not so far been done in the spectral characteristics are used in one stage, shape based or
context of handwriting recognition of any Indian script. structural features can be used in the other. In fact, we have

123
108 T. K. Bhowmik et al.

obtained better results by using the chain code based feature 15. Rahman, F.R., Fairhurst, M.C.: Multiple classifier decision
at both stages. Other features may produce still better results. combination strategies for character recognition: A review.
IJDAR 5(4), 166–194 (2003)
16. Rahman, F.R., Rahman, R., Fairhurst, M.C.: Recognition of Hand-
written Bengali characters: A novel multistage approach. Pattern
References Recognit. 35(3), 997–1006 (2002)
17. Bhowmik, T.K., Bhattacharya, U., Parui, S.K.: Recognition of
1. Chaudhuri, B.B., Pal, U.: A complete printed Bangla OCR Bangla handwritten characters using an MLP classifier based on
system. Pattern Recognit. 31(5), 531–549 (1998) stroke features. In: Proceedings of the 11th International Confer-
2. Plamondon, R., Srihari, S.N.: On-line and off-line handwriting ence on Neural Information Processing (ICONIP), India, pp. 814–
recognition: a comprehensive survey. IEEE Trans. PAMI 22(1), 63– 819 (2004)
84 (2000) 18. Bhattacharya, U., Shridhar, M., Parui, S.K.: On recognition of
3. Arica, N., Vural, F.Y.: An overview of character recognition handwritten Bangla characters. In: In: Kalra, P., Peleg, S. (eds.)
focused on off-line handwriting. IEEE Trans. Syst. Man Cybern. Proceedings of the 5th Indian Conference on Computer Vision,
Part C Appl. Rev. 31(2), 216–233 (2001) Graphics and Image Processing (ICVGIP), Springer Lecture Notes
4. Setlur, S., Lawson, A., Govindaraju, V., Srihari, S.: Large scale on Computer Science, vol 4338, pp. 817–828 (2006)
address recognition systems truthing, testing, tools, and other eval- 19. Hsu, C.W., Lin, C.J.: A comparison of methods for multiclass
uation issues. Int. J. Doc. Anal. Recognit. (IJDAR) 4(3), 154– Support Vector Machines. IEEE Trans. Neural Netw. 13(2), 415–
169 (2001) 425 (2002)
5. Gorski, N., Anisimov, V., Augustin, E., Baret, O., Maximov, S.: 20. Bhattacharya, U., Chaudhuri, B.B.: Fusion of combination rules
Industrial bank check processing: the A2iA CheckReaderTM . of an MLP classifiers for improved recognition accuracy of hand-
IJDAR 3(4), 196–206 (2001) printed Bangla numerals. In: Proceedings of the 8th ICDAR, Korea
6. Oh, I.S., Suen, C.Y.: Distance features for neural network-based I, pp. 322–326 (2005)
recognition of handwritten characters. IJDAR 1(2), 73–88 (1998) 21. Bhattacharya, U., Parui, S.K., Shaw, B., Bhattacharya K.: Neural
7. Nopsuwanchai, R., Povey, D.: Discriminative trainig for HMM- combination of ANN and HMM for handwritten Devnagari
based offline handwritten character recognition. In: Proceedings numeral recognition. In: Proceedings of the 10th International
of the 7th International Conference on Document Analysis and Workshop on Frontiers in Handwriting Recognition (IWFHR),
Recognition (ICDAR), Scotland, pp. 114–118 (2003) France, pp. 613–618 (2006)
8. Dong, J.X., Zak, A.K., Suen, C.Y.: An improved handwrit- 22. Daubechies, I.: Ten lectures in Wavelets. CBMS-NSF Regional
ten Chinese character recognition system using support vector Conf. Series in Appl. Math. 4, 909–996 (1998)
machine. Pattern Recognit. Lett. 26(12), 1849–1856 (2005) 23. Haykin, S.: Neural Networks: A Comprehensive Founda-
9. Camastra, F.: A SVM-based cursive character recognizer. Pattern tion. Prentice-Hall, Englewood Cliffs (1999)
Recognit. 40, 3721–3727 (2007) 24. Bhattacharya, U., Parui, S.K.: Self-adaptive learning rates in back-
10. Liu, C.L., Nakashima, K., Sako, H., Fujisawa, H.: Handwritten digit propagation algorithm improve its function approximation perfor-
recognition: Benchmarking of state-of-the-art techniques. Pattern mance. In: Proceedings of the IEEE International Conference on
Recognit. 36(10), 2271–2285 (2003) Neural Networks, Australia, pp. 2784–2788 (1995)
11. Günter, S., Bunke, H.: Combination of three classifiers with differ- 25. Vapnik, V.: Statistical Learning Theory. Wiley, NY (1998)
ent architectures for handwritten word recognition. In: Proceedings 26. Cristianini, N., Shawe-Taylor, J.: An introduction to support vec-
of the 9th IWFHR, pp. 63–68 (2004) tor machine and other kernel based learning methods. Cambridge
12. Bellili, A., Gilloux, M., Gallinari, P.: An MLP-SVM combination University Press, Cambridge (2000)
architecture for offline handwritten digit recognition: reduction of 27. Joachims, T.: SVMlight: Support Vector Machine (2002)
recognition errors by Support Vector Machines rejection mecha- 28. KreBel, U. (1999) Pairwise classification and support vector
nisms. IJDAR 5, 244–252 (2003) machines. In: Advances in Kernel Methods: Support Vector Learn-
13. Milgram, J., Sabourin, R., Cheriet, M.: Two-stage classification ing, MIT Press, Cambridge, pp. 255–268
system combining model-based and discriminative approaches. In: 29. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-
Proceedings of the 17th International Conference on Pattern Rec- Hall, NJ (1988)
ognition (ICPR), pp. 152–155 (2004) 30. Martinetz, T.M., Berkovich, S.G., Schulten, K.J.: ‘Neural-gas”
14. Chellapilla, K., Shilman, M., Simard, P.: Combining multiple clas- network for vector quantization and its application to time-series
sifiers for faster optical character recognition. In: Proceedings of the prediction. IEEE Trans. Neural Netw. 4(4), 558–569 (1993)
7th International Workshop Document Analysis Systems (DAS),
New Zealand, pp. 358–367 (2006)

123

View publication stats

You might also like