Professional Documents
Culture Documents
Ijdar12 97-108
Ijdar12 97-108
net/publication/225229876
Article in International Journal on Document Analysis and Recognition (IJDAR) · June 2009
DOI: 10.1007/s10032-009-0084-x · Source: DBLP
CITATIONS READS
104 1,523
4 authors, including:
SEE PROFILE
All content following this page was uploaded by Pradip Ghanty on 30 May 2014.
ORIGINAL PAPER
Received: 19 March 2008 / Revised: 11 November 2008 / Accepted: 26 February 2009 / Published online: 26 March 2009
© Springer-Verlag 2009
Abstract We propose support vector machine (SVM) based The HLA scheme with overlapped groups outperforms the
hierarchical classification schemes for recognition of hand- other two HLA schemes.
written Bangla characters. A comparative study is made
among multilayer perceptron, radial basis function network Keywords SVM · RBF · MLP · Handwritten character
and SVM classifier for this 45 class recognition problem. recognition · Bangla · Fusion · Grouping of classes ·
SVM classifier is found to outperform the other classifiers. Hierarchical learning architectures
A fusion scheme using the three classifiers is proposed which
is marginally better than SVM classifier. It is observed that
there are groups of characters having similar shapes. These 1 Introduction
groups are determined in two different ways on the basis of
the confusion matrix obtained from SVM classifier. In the Though optical character recognition (OCR) systems for
former, the groups are disjoint while they are overlapped in some Indian scripts are available [1], there has not been much
the latter. Another grouping scheme is proposed based on the work on recognition of handwritten Indian scripts. The pres-
confusion matrix obtained from neural gas algorithm. Groups ent paper deals with recognition of handwritten Bangla char-
are disjoint here. Three different two-stage hierarchical learn- acters. Bangla is the fifth most popular language in the world
ing architectures (HLAs) are proposed using the three group- and the second most popular language in the Indian subcon-
ing schemes. An unknown character image is classified into tinent. Bangla script has 50 basic characters (39 consonants
a group in the first stage. The second stage recognizes the and 11 vowels). There are more than 300 Bangla compound
class within this group. Performances of the HLA schemes characters along with vowel modifiers. The present study
are found to be better than single stage classification schemes. deals with Bangla basic characters only.
Most of the handwritten character recognition problems
T. K. Bhowmik are complex and deal with a large number of classes. A lot
Read-Ink Technologies Pvt. Ltd.,
of research effort has been made in this direction for several
Indiranagar, Bangalore 560 008, India
e-mail: tkbhowmik@gmail.com scripts [2,3] and has been applied successfully to various
real life applications such as postal automation, bank check
P. Ghanty verification, etc. [4,5]. Multilayer perceptrons (MLP) and
Praxis Softek Solutions Pvt. Ltd.,
hidden Markov models (HMM) have been used for classi-
Module 616, SDF Building, Sector V,
Salt Lake City, Kolkata 700 091, India fication purpose [6,7]. Support vector machine (SVM) has
e-mail: pradip.ghanty@gmail.com not yet been used much in handwritten character recogni-
tion problems. Dong et al. [8] applied SVM classifier to im-
A. Roy · S. K. Parui (B)
prove the performance of a handwritten Chinese character
Computer Vision and Pattern Recognition Unit,
Indian Statistical Institute, Kolkata 700 108, India recognition system. Camastra [9] applies SVM for English
e-mail: swapan@isical.ac.in handwriting recognition. Liu et al. [10] have evaluated the
A. Roy performance of several classifiers for handwritten numeral
e-mail: roy.anandarup@gmail.com recognition. In [11], Günter and Bunke combine three HMM
123
98 T. K. Bhowmik et al.
classifiers for English handwritten word recognition. Bellili [19]. But OVO is applicable when the number classes is
et al. [12] combine MLP and SVM classifiers for handwrit- smaller. Thus, a two-stage classification architecture is suit-
ten digit recognition. This hybrid architecture is based on able for OVO to achieve better accuracy.
the idea that the correct digit class mostly belongs to the The main purpose of the present paper is to design a hier-
two maximum MLP outputs. The classification algorithms archical classification architecture with SVM classifiers in
can be separated into two main categories. Discriminative order to achieve better accuracy. For a hierarchical classifier,
approaches try to find the better separation among all clas- formation of groups of similar classes is necessary. Forma-
ses. But, in general, they cannot deal with outliers. Besides, tion of groups are normally done on an ad hoc manner [18].
model-based approaches make the outlier detection possi- To the best of the authors’ knowledge, no formal methodol-
ble but are not sufficiently discriminative. Observing these ogy for forming such groups is available in the literature. We
characteristics, Milgram et al. [13] proposed a combination develop here several formal grouping schemes on the basis
of model-based approach with support vector classifiers in a of the confusion matrix produced by supervised or unsuper-
two-stage classification system. A number of other schemes vised classification. Such a hierarchical classifier with such
with combination of different classifiers are available in the grouping schemes can be useful for any large class recogni-
literature for this purpose [14,15]. tion problem.
There have been only a few studies in recognition of hand- The performance of a recognition system depends on the
written Bangla characters [16–18]. In [16], a multistage rec- features being used for the classifiers. Different kinds of fea-
ognition system is applied to a small database collected in tures have been proposed and their performances on standard
laboratory environments. Bhowmik et al. [17] proposed an databases have been reported [6]. The performance of recog-
MLP based recognition scheme using stroke features for Ban- nition systems can still be improved by choosing better fea-
gla basic characters. Bhattacharya et al. [18] also designed a tures. One of the popular and efficient features, namely, the
two-stage MLP based recognition system using shape based wavelet features, has been used in many handwritten charac-
features. These two studies are based on large databases. In ter recognition problems. Bhattacharya et al. [20,21] applied
the present study, we apply MLP, radial basis function (RBF) these features for handwritten Bangla and Devnagari numeral
and SVM classifiers to handwritten Bangla character recog- recognition and reported satisfactory results. We explore how
nition problem and compare their performances in terms of effective these features are for the present recognition prob-
classification accuracy. The entire handwritten Bangla char- lem where the number of classes is large.
acter database is partitioned into training, validation and test
sets. The validation set is used to determine several parame-
ters of the classifiers. A majority voting fusion scheme is used 2 Feature extraction
for this problem on the basis of MLP, RBF and SVM classi-
fiers. From the confusion matrices (on the basis of validation The wavelet transform [22] is a well-known tool that finds
set) of different classifiers, it has been observed that there application in many areas including image processing. Due
are groups of classes within each of which the misclassifi- to the multi-resolution property, it decomposes the signal at
cation rate is larger compared to the misclassification rate different scales. For a given image, the wavelet transform
between such groups. Based on this observation, groups are produces one low frequency subband image reflecting an
identified using certain techniques. A hierarchical learning approximation of the original image and three high frequency
architecture (HLA) for handwritten Bangla character recog- components of the image reflecting the detail. The approxi-
nition is designed on the basis of these groups. In the first mation component is used here to generate the feature vector
stage of HLA, the group for an unknown sample is identified in the present recognition problem. In our experiment, we
and the following stage recognizes the sample into a class consider Daubechies wavelet transform with four coefficients
within this group. Three such different HLA schemes based (l0 , l1 , l2 , l3 ) forming the lowpass or smoothing filter and
on three different grouping algorithms are proposed. another four coefficients (h 0 , h 1 , h 2 , h 3 ) forming the high-
In the present paper, we intend to explore how these clas- pass filter where
sifiers compare with respect to their classification accuracies √ √ √ √
in a large class problem like the present one. In order to 1+ 3 3+ 3 3− 3 1− 3
deal with a large number of classes, a two-stage classifica- l0 = √ , l1 = √ , l2 = √ , l3 = √ ,
4 2 4 2 4 2 4 2
tion architecture is proposed so that each stage involves a and h 0 = l3 , h 1 = −l2 , h 2 = l1 , h 3 = −l0 .
smaller number of classes. An SVM classifier is basically
meant for two-class problems. However, there exist methods For an input image, we first calculate the bounding box of the
to use SVM classifiers for larger number of classes. One- image, then normalize it to a square image of size 64×64 with
versus-all (OVA) and one-versus-one (OVO) are two such an interpolation technique. Wavelet decomposition algorithm
methods. OVO gives better classification accuracy than OVA with the above lowpass and highpass filters is applied to this
123
SVM-based hierarchical architectures for handwritten Bangla character recognition 99
123
100 T. K. Bhowmik et al.
used to classify an unknown sample. In MWV_SVM scheme degrades the recognition accuracy. To remedy this situation,
if the decision function value for an unknown sample x is several methods have been suggested. One such method is
greater than or equal to 0 from classifier Ci j then the vote for to break the tie by choosing the decision of any arbitrary
class i is increased by one. Otherwise, the vote for class j is classifier. This method is abbreviated as the FMRS indicat-
increased by one. The sample x is assigned to class k where ing fusion of MLP, RBF and SVM classifiers. Another way
class k has the largest number of votes among all c(c − 1)/2 is to rely on the decision of a particular classifier to break
classifiers. The OVA method on the other hand needs c bi- the tie. We term the schemes as FRMS_M, FRMS_R and
nary classifiers for a c-class problem. The ith binary classifier FRMS_S when the ties are broken using the MLP, RBF and
constructs a decision boundary between class i and the other SVM classifiers respectively.
c − 1 classes. The winner-takes-all strategy (WTA_SVM)
[25] is used to assign the class label to an unknown sample
4.3 Hierarchical learning architectures (HLAs)
x. WTA_SVM strategy assigns x to the class having the larg-
est value of the decision function from all c binary classifiers,
For a large class problem, it is possible to find some groups of
even when all decision function values (di , i = 1, . . . , c) are
classes within each of which the misclassification rate is high
negative.
compared to the misclassification rate between such groups.
Both the methods OVO and OVA have some disadvan-
On the basis of this observation, several classes are merged to
tages. For a large class (c-class, say) problem the OVO
form groups. These groups however may or may not be dis-
method constructs c(c−1)/2 binary classifiers, whereas OVA
joint. A classification scheme based on the groups is termed
needs only c binary classifiers. So, OVA incurs less overhead.
as a hierarchical learning architecture (HLA). Here, we use a
Yet, in general, the OVO method gives better classification
two stage classification scheme. The first stage identifies the
accuracy than the OVA method for large class problems [19].
correct group of an unknown sample. The second stage rec-
ognizes the sample as a member of a particular class in that
group. The most important aspect of HLA is to determine
4 Classification schemes the groups and the classes they contain. We first propose two
grouping schemes to build disjoint groups and overlapped
The classifiers described in Sect. 3 may be used in different groups, on the basis of the confusion matrix obtained from
ways for a pattern classification problem. Classifiers can be any supervised classification. Camastra [9] used Neural Gas
used separately or may be combined to form a fused clas- method to identify the similar structures in the input patterns.
sifier. A hierarchical architecture with the same or different We develop a third grouping scheme using the NG method.
classifiers may also be used. The following subsections detail Finally the HLA is constructed using all the three grouping
the schemes adopted for the present problem. schemes.
123
SVM-based hierarchical architectures for handwritten Bangla character recognition 101
the index sets (i 1 , . . . , i m ) and ( j1 , . . . , jn ). The similarity we obtain the groups as {(1, 2, 3, 4) and (5, 6)}. Here the
between G p and G q , ( p < q) is defined as: idea is that two groups are not merged if there is a pair of
classes (across these groups) which are very dissimilar. The
s pq = min n i j , ( i = i 1 , . . . , i m , j = j1 , . . . , jn ). (1) classes 4 and 6 form such a pair here. Note that if thr = 4
i< j
the algorithm terminates earlier and outputs more number of
Note that grouping done on the basis of this similarity mea- groups, namely, {(1, 2), (3, 4), (5, 6)}.
sure has resemblance to complete linkage clustering [29]. If
operator “min” is replaced by operator “max”, the resultant
grouping has resemblance to single linkage clustering [29]. 5.2 Overlapped grouping scheme
In many cases, the latter operator leads to a situation where
one group contains a large number of classes compared to In the grouping algorithm described above, we have used
other groups. This is the reason why the similarity measure both the rows and columns of the confusion matrix to deter-
defined in Eq. 1 is used here. mine a group. We now consider only a column for forming
Initially, each individual class is a group (i.e., c classes rep- a group. For each column, we select the best (i.e., smallest)
resenting c groups). First, two classes i and j are found such subset of classes (i.e., with size less than c) that incurs an
that their similarity value n i j is maximum among all pairs of error rate or less. In this classification scheme, there are c
classes. They are merged into one group—resulting in c − 1 classes or groups in the first stage. In the second stage, indi-
groups. Subsequently, similarity values between groups are vidual groups will have varying number of classes. Note that
computed and two most similar groups are merged into one. samples from one character class may belong to more than
The process continues until we reach a stage when all the one group. This is why we call the groups overlapped groups.
similarity values between groups are less than or equal to a Note that in the earlier scheme, samples from one character
certain non-negative threshold value thr . class belong to exactly one group.
Note that using the above algorithm, it is not possible to get Consider the confusion matrix ((ai j )) defined
less number of groups than what is achieved with thr = 0. in c
Sect. 5.1.
Suppose the target error rate is . Let N j = i=1 ai j and
However, it will be possible to get fewer groups if the opera-
thr j be an integer such that ai j ≤thr j ai j = ∗ N j . How-
tor “min” in Eq. 1 is replaced by “ jth min”, ( j = 2, 3, . . .). ever, for the discrete nature of ai j , the equality may not
As an illustration, consider the confusion matrix ((ai j )) hold for any thr j and, therefore, we take the largest value
in Table. 1 with 6 classes.
of thr j such that ai j ≤thr j ai j ≤ ∗ N j . Now, the jth group
G j , ( j = 1, . . . , c) is defined as {i : ai j > thr j }. For exam-
Table 1 Confusion matrix of 6 classes ple, the fourth column in the confusion matrix in Table. 1 is
[5 1 16 76 3 0] where N4 = 101. Suppose = 0.05. Note
1 2 3 4 5 6
that ai j ≤4 ai j < 0.05N j and ai j ≤5 ai j > 0.05N j . Thus
1 75 15 4 5 1 0
2 18 78 1 1 1 1 thr4 = 4 and G 4 = {1, 3, 4}.
3 2 7 73 16 0 2 For a c class problem, we construct c groups in the first
4 1 3 19 76 1 0 stage and for each such group, a separate classifier is
5 3 4 3 3 72 15 designed. Suppose a sample x is classified in class k in the
6 3 3 3 0 17 74
first stage. Then the sample is fed to the classifier for G k
in the second stage. The final class is assigned to sample x
by the second stage classifier. In this scheme, if a sample
Let thr = 0. Note that n 34 = 35 is maximum among
is misclassified in the first stage, it may sometimes be cor-
all n i j and the classes 3, 4 are merged into one group in the
rectly classified in the second stage. This is not possible in
first iteration resulting in 5 groups, namely, {(1), (2), (3, 4),
the disjoint grouping scheme.
(5), (6)}. The resulting groups are renamed as {(1), (2), (3),
(4), (5)}. The similarity matrix of these groups is shown
in Table 2a. Note, according to the definition of similarity
(Eq. 1), the lower triangle and the main diagonal of the sim- 5.3 Grouping scheme with neural gas
ilarity matrix are not defined. After first merging we obtain
s12 = 33 being the maximum. Hence, we merge groups 1 Camastra [9] proposed a grouping scheme using the Neu-
and 2 (Table 2b). Accordingly, in the next two iterations we ral Gas (NG) method. Camastra used NG to verify whether
form groups with (3, 4) and (1, 2) respectively. The similar- the uppercase and the lowercase letters of English can be
ity matrices after these two merges are shown in Table 2c,d. merged into a group or not [9]. We use NG here for auto-
Note, now s12 = 0 (≤ thr ). Hence no further merging is per- matic grouping of classes. The following subsections give a
formed with the groups. Thus in terms of the original classes brief overview of the NG and the NG based grouping scheme.
123
102 T. K. Bhowmik et al.
5.3.1 The neural gas similar and hence should form a group. Based on this obser-
vation, a grouping mechanism is formalized as follows. Sup-
Vector quantization methods encode a set of data points in pose an impure Voronoi region contains m classes, namely,
n-dimensional space with a smaller set of neurons described C1 , C2 , . . . Cm having p1 , p2 , . . . , pm samples respectively.
by their weight vectors Wk , k = 1, . . . , N . Neural Gas (NG) Let q jk be min( p j , pk ) where j < k. Let ((n i j )) be the
[30] is a vector quantization technique with soft competi- upper triangular similarity matrix where n i j is the sum of all
tion between the neurons. In each training step, the squared qi j over all impure Voronoi regions. Here n i j represents sim-
Euclidean distances dik 2 = x − W 2 between an input ilarity between classes i and j as in Sect. 5.1. The similarity
i k
vector xi and all neurons Wk are computed. The vector of s pq between groups of classes is also defined in the same
these distances is d. The N distances are now given ranks way. Finally, the formation of the groups of classes based
rk (d) = 0, . . . , N − 1 according to the ascending order of on n i j and s pq is done using the algorithm in Sect. 5.1. The
distances. The learning rule is difference between the grouping schemes in Sects. 5.1 and
5.3 is that the former is based on the results of a supervised
Wk = Wk + ηh ρ (rk (d))(x − Wk ). (2)
classification while the latter makes use of the results of an
The function h ρ (rk (d)) = e(−r/ρ) is a monotonically unsupervised classification.
decreasing function of the ranking. The width of h(·) is deter- Note that the groups formed by the NG based method
mined by the neighborhood range ρ. The learning rule is are always disjoint. The similarity matrix obtained by this
also affected by a global learning rate η. The values of ρ method is upper triangular and hence formation of over-
and η decrease exponentially from an initial positive value lapped groups on the basis of the columns cannot be con-
(ρ(0), η(0)) to a smaller final positive value (ρ(T ), η(T )) sidered as in Sect. 5.2.
according to
ρ(t) = ρ(0)[ρ(T )/ρ(0)](t/T ) (3)
(t/T )
6 Results and discussions
and η(t) = η(0)[η(T )/η(0)] (4)
where t is the time step and T the total number of training In this section, we present the results obtained by different
steps. schemes described so far. In addition, we compare the per-
formances of the single stage classification schemes with
5.3.2 Group identification from NG the proposed HLA schemes. In the single stage classifica-
tion schemes, we have used binary representation of 16 × 16
Suppose, after completion of the algorithm, the Voronoi decomposed image as the feature vector. The HLA schemes
region Vk (k = 1, . . . , N ) corresponding to the kth neu- consist of two stages as discussed earlier in Sect. 4.3. In the
ron, denotes the set of input vectors that are closest to the first stage, the binary representation of 16 × 16 decomposed
weight vector Wk . Clearly, the union of all Vk ’s constitutes image is used as the feature vector while the binary represen-
the whole dataset. Now, if the underlaying classes in the data- tation of 32 × 32 decomposed image is used in the second
set are quite apart from each other in the feature space, each stage. For all schemes, the parameters of the classifiers are
Vk is expected to contain samples from a single class. How- determined on the basis of the validation sets.
ever, in problems like the present one, this is not the case
and there will be some Vk ’s that will contain samples from 6.1 Bangla handwritten database
more than one class. Now if two classes i and j have sim-
ilarity, there will be neurons whose Voronoi region Vk will Experiments have been carried out on a moderately large
contain samples from both the classes i and j. The Voro- database of basic handwritten Bangla isolated characters.
noi regions that have samples from only one class are called The database has been collected from different sections
pure and others impure. Thus, a study of the composition (like different age groups, different educational levels, etc.)
of impure Voronoi regions can determine which classes are of the population in and around Kolkata, India, and is a
123
SVM-based hierarchical architectures for handwritten Bangla character recognition 103
MLP and RBF are chosen based on trial and error method
on the validation set. The optimal number of hidden nodes
for MLP is found to be 200 and for RBF to be 260. The
OVA method is used to train 45 SVM classifiers. While using
Fig. 2 Five pairs of similar pattern Bangla characters
SVM, linear kernel, RBF kernel and polynomial kernel are
used. The hyper parameter γ = 0.10 for RBF kernel and
degree d = 3 for polynomial kernel have resulted in the min-
Table 3 Handwritten database statistics
imum validation error in the present problem. The recogni-
No of classes Training samples/ Validation samples/ Test samples/
class class class
tion accuracies of SVM classifiers on the test set are 56.88%
using linear kernel, 74.21% using RBF kernel (γ = 0.10)
45 400 100 100 and 79.47% using polynomial kernel with d = 3. The SVM
classifier with polynomial kernel of degree 3 has the maxi-
mum validation accuracy and hence is used for the rest of the
experiments. Overall test accuracies of MLP, RBF and the
SVM classifiers are 71.44%, 74.56% and 79.47%, respec-
tively.
Three pattern recognition tools, namely, MLP, RBF and SVM Fig. 4 Distribution of correctly and incorrectly classified samples by
are used for classification. The number of hidden nodes for MLP, RBF and SVM classifiers
123
104 T. K. Bhowmik et al.
Table 4 Results of fusion scheme Table 5 Groups of classes with similar patterns obtained using disjoint
grouping scheme
FMRS FMRS_M FMRS_R FMRS_S
Group Groups with Group Groups with
Validation set 81.21 80.62 81.91 82.44 id. character references id. character references
Test set 80.20 79.73 81.31 81.84
A {1, 2, 8, 21} H {17, 25}
B {3, 4, 9, 11} I {18, 19, 28, 41}
C {5, 6, 22} J {23, 29}
D: Set of character images that are classified correctly by
D {7, 12, 20, 32, 38} K {30, 37}
both MLP and RBF classifiers, but not by SVM classifier.
E {10, 16, 24, 26, 34, 45} L {33}
E: Set of character images that are classified correctly by
F {13, 15, 27, 35, 36, 39, 40} M {42, 43, 44}
both RBF and SVM classifiers, but not by MLP classifier.
G {14, 31}
F: Set of character images that are classified correctly by
both SVM and MLP classifiers, but not by RBF classifier.
X: Set of character images that are classified correctly by
only MLP classifier. construction of HLA requires grouping of classes. We em-
Y: Set of character images that are classified correctly by ploy the first two grouping schemes with the confusion ma-
only RBF classifier. trix based on validation set. A third HLA is designed based
Z: Set of character images that are classified correctly by on the groups obtained using the NG based approach. The
only SVM classifier. following subsections describe the results obtained from the
U: Validation set. three HLA schemes.
On the basis of our validation set, the sizes of the sets shown in 6.4.1 HLA with disjoint groups
Fig. 4 are as follows: #(M) = 3210, #(R) = 3465, #(S) =
3586, #(D) = 132, #(E) = 418, #(F) = 229, #(X ) = On the basis of our validation set, the groups obtained by the
124, #(Y ) = 190, #(Z ) = 214, #(M ∩ R) = 2857, #(R ∩ algorithm described in Sect. 5.1 with thr = 0 are shown in
S) = 3143, #(S ∩ M) = 2954, #(M ∩ R ∩ S) = Table 5.
2725, #(M ∪ R ∪ S) = 4032 and #(M ∪ R ∪ S) = 468. With these groups, we construct a two-stage HLA using
Mathematically, the accuracy of the majority voting scheme SVM. Since the performance of OVO method is better than
always lies within the range (#[(M ∩ R ∩ S) ∪ D ∪ E ∪ OVA method and the number of classes is not too large in
F]/#(U )) × 100 to #((M ∪ R ∪ S)/#(U )) × 100 (which any single stage, we choose OVO method for both the stages
is 77.87–89.60% in our experiment). In fact, there does not of the two-stage HLA. In the first stage of this scheme, all
exist any fusion scheme, which exceeds the accuracy of the 13 groups are considered as 13 different classes. Thus,
89.60% on validation set. Earlier, we have pointed out the we train SVM classifiers for 13 classes. Here we construct
possibility of a tie and proposed certain methods to deal with 78 binary classifiers using the OVO method. To classify an
it. The present problem has 45 classes, so the occurrence unknown sample MWV_SVM scheme is used. In the sec-
of a tie is not infrequent. We use the four schemes, namely, ond stage, 12 SVM classifiers for the 12 groups are designed
FMRS, FMRS_M, FMRS_R and FMRS_S to break the tie. (for single element group L, no second stage classifier is
The recognition accuracies of the fusion classifier on the val- needed). The results of applying SVM on the test set in the
idation and test sets are shown in Table 4. From Table 4 it first and second stages are shown in Table 6. Note that in
can be seen that FMRS_S gives the best result among all the this scheme, if a sample is misclassified in the first stage as
four schemes. a character belonging to a group other than its own, then
the character will never be classified correctly in the second
6.4 Performances of hierarchical learning architectures stage. The average recognition accuracy obtained on test set is
(HLAs) 85.22%.
From the results of the single stage classification schemes, 6.4.2 HLA with overlapped groups
it is clear that the SVM classifier individually gives the best
result among the three classifiers under consideration. On We now identify the overlapped groups using the algorithm
the basis of this observation, we choose the SVM classi- proposed in Sect. 5.2 and design the corresponding HLA.
fier from now on. So, the results of HLA schemes as well The algorithm of Sect. 5.2 is applied after setting the error
as the comparisons are made with respect to SVM only. rate = 0.05. We end up getting the groups given in Fig. 5.
Now, we construct a two stage HLA for the problem. The It is clear that ith class is present in the ith group. The ticked
123
SVM-based hierarchical architectures for handwritten Bangla character recognition 105
Table 6 Classification accuracy within groups and between groups Table 7 Groups of classes obtained from Neural Gas
using disjoint grouping scheme
Group Groups with Group Groups with
Group id. Number of Inter-Group Intra-Group id. character references id. character references
test samples classification classification
accuracy (%) accuracy (%) A {1, 2, 19} H {13, 27, 35, 40}
B {3, 4, 5, 22} I {14, 31, 38}
A 400 93.50 93.32
C {6, 9, 11} J {15, 36, 39}
B 400 94.25 94.69
D {7, 23, 29} K {17, 25}
C 300 96.00 93.40
E {8, 21, 30, 37} L {18, 28, 41}
D 500 92.20 89.80
F {10, 16, 24, 26, 34, 45} M {42, 43, 44}
E 600 99.33 86.58
G {12, 20, 32, 33}
F 700 96.86 81.27
G 200 88.50 88.14
H 200 99.00 95.96
I 400 89.50 92.46 6.4.3 HLA with groups using NG based approach
J 200 88.50 96.05
K 200 87.00 95.40 After construction of the matrix ((n i j )) on the basis of the
L 100 80.00 – training set, we proceed with the grouping algorithm
M 300 98.33 96.95
described in Sect. 5.3 with N = 1000. Using thr = 0, a total
of 13 groups are obtained. The groups are different from the
earlier groups (Table 5) though their number is the same by
coincidence. In Table 7, we list all the groups obtained from
the grouping scheme using NG.
entries in a row indicate the classes that form the group cor- The classification scheme is similar to the HLA for dis-
responding to the row. We design the HLA using overlapped joint groups (see Sect. 5.1). Test set accuracies for individual
groups and SVM classifier. An average accuracy of 88.13% groups are shown in Table 8. The average recognition accu-
is obtained on the test set. racy on test set is obtained as 84.05%.
123
106 T. K. Bhowmik et al.
Table 8 Within groups and between groups classification accuracy for for disjoint group HLA lies between 83.80 and 86.80%. For
NG overlapped group HLA, the accuracy is in between 86.47 and
Group id Number of Inter-Group Intra-Group 88.38% and for HLA_NG accuracy is in between 82.69 and
test samples classification classification 85.56%.
accuracy (%) accuracy (%)
123
SVM-based hierarchical architectures for handwritten Bangla character recognition 107
70
60 SSC
FMRS_S
HLA_DG
50
HLA_OG
HLA_NG
40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Class #
(b)
100
90
80
70
60 SSC
FMRS_S
HLA_DG
50
HLA_OG
HLA_NG
40
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Class #
(c)
100
90
80
70
60 SSC
FMRS_S
HLA_DG
50
HLA_OG
HLA_NG
40
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
Class #
Table 9 Average classification accuracies on six test sets Three new schemes for grouping similar classes have been
Scheme Average test accuracy (%) developed on the basis of the confusion matrix using cer-
tain objective criteria. However, to determine the number of
SSC 79.33 ± 1.25 groups, user intervention is necessary here. To automatically
FMRS_S 80.37 ± 1.79 find the optimal number of groups in terms of classification
HLA_DG 84.78 ± 2.02 accuracy, can be an issue of further study. Also, devising
HLA_OG 88.02 ± 1.55 an overlapped grouping scheme on the basis of ai j + a ji of
HLA_NG 83.58 ± 1.98 Sect. 5.1 can be studied.
In the proposed HLA schemes, the features used in both
the stages of classification, are essentially the same. Better
networks, RBF networks and SVMs has been made with classification may be achieved if different feature sets are
respect to classification accuracy. Such a comparative study used in different stages. For example, if features based on
among different classifiers has not so far been done in the spectral characteristics are used in one stage, shape based or
context of handwriting recognition of any Indian script. structural features can be used in the other. In fact, we have
123
108 T. K. Bhowmik et al.
obtained better results by using the chain code based feature 15. Rahman, F.R., Fairhurst, M.C.: Multiple classifier decision
at both stages. Other features may produce still better results. combination strategies for character recognition: A review.
IJDAR 5(4), 166–194 (2003)
16. Rahman, F.R., Rahman, R., Fairhurst, M.C.: Recognition of Hand-
written Bengali characters: A novel multistage approach. Pattern
References Recognit. 35(3), 997–1006 (2002)
17. Bhowmik, T.K., Bhattacharya, U., Parui, S.K.: Recognition of
1. Chaudhuri, B.B., Pal, U.: A complete printed Bangla OCR Bangla handwritten characters using an MLP classifier based on
system. Pattern Recognit. 31(5), 531–549 (1998) stroke features. In: Proceedings of the 11th International Confer-
2. Plamondon, R., Srihari, S.N.: On-line and off-line handwriting ence on Neural Information Processing (ICONIP), India, pp. 814–
recognition: a comprehensive survey. IEEE Trans. PAMI 22(1), 63– 819 (2004)
84 (2000) 18. Bhattacharya, U., Shridhar, M., Parui, S.K.: On recognition of
3. Arica, N., Vural, F.Y.: An overview of character recognition handwritten Bangla characters. In: In: Kalra, P., Peleg, S. (eds.)
focused on off-line handwriting. IEEE Trans. Syst. Man Cybern. Proceedings of the 5th Indian Conference on Computer Vision,
Part C Appl. Rev. 31(2), 216–233 (2001) Graphics and Image Processing (ICVGIP), Springer Lecture Notes
4. Setlur, S., Lawson, A., Govindaraju, V., Srihari, S.: Large scale on Computer Science, vol 4338, pp. 817–828 (2006)
address recognition systems truthing, testing, tools, and other eval- 19. Hsu, C.W., Lin, C.J.: A comparison of methods for multiclass
uation issues. Int. J. Doc. Anal. Recognit. (IJDAR) 4(3), 154– Support Vector Machines. IEEE Trans. Neural Netw. 13(2), 415–
169 (2001) 425 (2002)
5. Gorski, N., Anisimov, V., Augustin, E., Baret, O., Maximov, S.: 20. Bhattacharya, U., Chaudhuri, B.B.: Fusion of combination rules
Industrial bank check processing: the A2iA CheckReaderTM . of an MLP classifiers for improved recognition accuracy of hand-
IJDAR 3(4), 196–206 (2001) printed Bangla numerals. In: Proceedings of the 8th ICDAR, Korea
6. Oh, I.S., Suen, C.Y.: Distance features for neural network-based I, pp. 322–326 (2005)
recognition of handwritten characters. IJDAR 1(2), 73–88 (1998) 21. Bhattacharya, U., Parui, S.K., Shaw, B., Bhattacharya K.: Neural
7. Nopsuwanchai, R., Povey, D.: Discriminative trainig for HMM- combination of ANN and HMM for handwritten Devnagari
based offline handwritten character recognition. In: Proceedings numeral recognition. In: Proceedings of the 10th International
of the 7th International Conference on Document Analysis and Workshop on Frontiers in Handwriting Recognition (IWFHR),
Recognition (ICDAR), Scotland, pp. 114–118 (2003) France, pp. 613–618 (2006)
8. Dong, J.X., Zak, A.K., Suen, C.Y.: An improved handwrit- 22. Daubechies, I.: Ten lectures in Wavelets. CBMS-NSF Regional
ten Chinese character recognition system using support vector Conf. Series in Appl. Math. 4, 909–996 (1998)
machine. Pattern Recognit. Lett. 26(12), 1849–1856 (2005) 23. Haykin, S.: Neural Networks: A Comprehensive Founda-
9. Camastra, F.: A SVM-based cursive character recognizer. Pattern tion. Prentice-Hall, Englewood Cliffs (1999)
Recognit. 40, 3721–3727 (2007) 24. Bhattacharya, U., Parui, S.K.: Self-adaptive learning rates in back-
10. Liu, C.L., Nakashima, K., Sako, H., Fujisawa, H.: Handwritten digit propagation algorithm improve its function approximation perfor-
recognition: Benchmarking of state-of-the-art techniques. Pattern mance. In: Proceedings of the IEEE International Conference on
Recognit. 36(10), 2271–2285 (2003) Neural Networks, Australia, pp. 2784–2788 (1995)
11. Günter, S., Bunke, H.: Combination of three classifiers with differ- 25. Vapnik, V.: Statistical Learning Theory. Wiley, NY (1998)
ent architectures for handwritten word recognition. In: Proceedings 26. Cristianini, N., Shawe-Taylor, J.: An introduction to support vec-
of the 9th IWFHR, pp. 63–68 (2004) tor machine and other kernel based learning methods. Cambridge
12. Bellili, A., Gilloux, M., Gallinari, P.: An MLP-SVM combination University Press, Cambridge (2000)
architecture for offline handwritten digit recognition: reduction of 27. Joachims, T.: SVMlight: Support Vector Machine (2002)
recognition errors by Support Vector Machines rejection mecha- 28. KreBel, U. (1999) Pairwise classification and support vector
nisms. IJDAR 5, 244–252 (2003) machines. In: Advances in Kernel Methods: Support Vector Learn-
13. Milgram, J., Sabourin, R., Cheriet, M.: Two-stage classification ing, MIT Press, Cambridge, pp. 255–268
system combining model-based and discriminative approaches. In: 29. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-
Proceedings of the 17th International Conference on Pattern Rec- Hall, NJ (1988)
ognition (ICPR), pp. 152–155 (2004) 30. Martinetz, T.M., Berkovich, S.G., Schulten, K.J.: ‘Neural-gas”
14. Chellapilla, K., Shilman, M., Simard, P.: Combining multiple clas- network for vector quantization and its application to time-series
sifiers for faster optical character recognition. In: Proceedings of the prediction. IEEE Trans. Neural Netw. 4(4), 558–569 (1993)
7th International Workshop Document Analysis Systems (DAS),
New Zealand, pp. 358–367 (2006)
123