You are on page 1of 7

2006 International Joint Conference on Neural Networks

Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada


July 16-21, 2006

Using Accuracy and Diversity to Select Classifiers to Build


Ensembles

Rodrigo G F Soares, Alixandre Santana, Anne M P Canuto and Marcilio C P de Souto

Informatics and Applied Mathematics Department –


Federal University of Rio Grande do Norte (UFRN)
Natal, RN - BRAZIL, 59072-970

1
with a reasonable accuracy could be enormously large
Abstract—Ensemble of classifiers is an effective way of [1,20]. In this case, one could use a method for selecting
improving performance of individual classifiers. classifiers in order to decrease the number of ensemble
However, the task of selecting the ensemble members is members and, at the same time, keeping the diversity
often a non-trivial one. For example, in some cases, a among the selected members.
bad selection strategy could lead to ensembles with no Currently, most of the existing classifier selection
performance improvement. Thus, many researchers methods for MCSs use as choice criterion either the
have put a lot of effort in finding an effective method accuracy or the diversity. In contrast, in this paper, the
for selecting classifier for building ensembles. In this criterion selection proposed will be based on both the
context, a Dynamic Classifier Selection (DCS) method accuracy and diversity of the classifiers. Also, the current
is proposed, which takes into account both the accuracy methods are often applied only during the training phase:
and the diversity of the classifiers. once the member classifiers are chosen, they are used to
build a given ensemble, which will be always the same
during the test (use) phase. Differently, in this paper, a
I. INTRODUCTION dynamic classifier selection procedure is used: the member

I N an attempt to improve recognition performance of


individual classifiers, a common approach is to combine
multiple classifiers, forming Multi-Classifier Systems
classifiers to form the ensemble are chosen at the test (use)
phase. That is, different testing patterns can be classified
by different ensemble configurations.
(MCSs). MCSs, also known as ensembles or committees, In order to do so, two different versions of the dynamic
exploit the idea that a pool of different classifiers, referred selection procedure are presented. The first one uses a
individually as to experts or recognition modules, can offer clustering algorithm (k-means) to group patterns of a
complementary information about patterns to be classified, validation set. Likewise, in the second version, a k-NN ( k
improving the effectiveness of the overall recognition Nearest Neighbor) classifier is built with the validation set.
process [11]. In both cases the diversity and the accuracy of each
In the literature, MCSs has been widely used in several classifier is calculated based on the results provided by
pattern recognition tasks. In the last decade, for instance, a these methods. For example, for the k-means case, as a
large number of papers have proposed the combination of testing pattern is received as input, the distances of such a
classifiers for designing high performance classification pattern is calculated to the centroids (Euclidean distance)
systems, mainly in terms of classification accuracy, in in the clusters produced by the k-means. The pattern will
areas such as alphanumeric character recognition [4], face be assigned to the cluster with the closest centroid. Then,
recognition [6,14,24], among others. the ensemble is formed based on the most accurate and
In the context of MCSs, another aspect that has been diverse classifiers associated with the cluster chosen. That
acknowledged as very important is the diversity of MCSs is, the ensemble is formed dynamically according to the
[12,18,19,21]. For example, there is clearly no accuracy test pattern. A similar reasoning is used in the case of the
gain in an ensemble that is composed of a set of identical k-NN – given a testing pattern the k closest patterns
classifiers. Thus, if there are many different classifiers to (Euclidean distance) in the validation set are recovered in
be combine, one would expect an increase in the overall the accuracy and diversity are calculated based on them.
accuracy when combining them, as long as they are diverse This paper is divided into seven sections and organized
(i.e., the errors produced by these classifiers are as follows. Section 2 describes some related works. Multi-
uncorrelated). In addition, in some real world applications classifier systems are briefly described in Section 3,
the number of classifiers required to form an ensemble focusing on the combination methods and diversity of the
ensembles. Section 4 presents the two versions of method
1
proposed. The experimental analysis is introduced in

0-7803-9490-9/06/$20.00/©2006 IEEE 1310


Section 5, while a discussion of the results obtained is means and k-NN will be used as aid to calculate the
presented in Section 6. Section 7 describes the final accuracy and diversity [11].
remarks of this work.
III. MULTI-CLASSIFIER SYSTEMS
II. RELATED WORKS
The study of Multi-Classifiers Systems (MCSs), also
There have been a great deal of work dealing with known as ensembles or committee of classifiers, has
methods for producing a pool of classifiers and, then, emerged as a need to have a computational system that
selecting the classifiers that are most diverse and accurate works with pattern recognition on an efficient way [4,17].
[1,2,7,13,15,20,23]. The selection procedure is, in general, The goal of using MCSs is to improve the performance of
based on either accuracy [13,15,23] or diversity [2,7,20]. a pattern recognition system in terms of better
For instance, when using diversity, classifiers could be generalization and/or in terms of increased efficiency and
clustered based on the diversity that they produce [7]. In clearer design. There are three main choices in the design
this case, as classifiers that belong to the same group tend of a MCS: the organization of its components, the system
to make correlated errors, one classifier of each group is components and the combination methods that will be
selected to be a member of the ensemble. Differently, in used. In terms of the organization of its components, a
[2], an ensemble diversity procedure based on uncertain MCS can be defined as modular and ensemble. In the
points (patterns) is introduced. Uncertain patterns are the modular approach, each classifier becomes responsible for
ones in which the proportion of correct votes of a pool is in a part of the whole system and they are usually linked in a
between 10 and 90. These uncertain points are considered serial way. In contrast, in the ensemble approach, all
to deliver diversity to the ensembles, since there is no classifiers are able to answer to the same task in a parallel
general agreement among the classifiers about the correct or redundant way. Moreover, there exists a combination
output to these points. In this context, the classifiers that module that is responsible for providing the overall system
have higher accuracy for the uncertain points (diversity) output [11]. In this paper, the kind of MCS analyzed will
are chosen to be part of the ensemble. be of the ensemble type. Thus, hereafter, the terms
For the selection process based on accuracy, for Ensembles and MCSs will be used interchangeably.
example, a new algorithm based on Bagging and genetic With respect to the choice of the components of the
algorithm was proposed [23]. Still in this context, a novel MCS, the correct choice of the set of classifiers is
method for pruning classifiers from ensembles, based on a fundamental to the overall performance of a multi-
clustering approach, was presented in [13]. In order to so, a classifier system. As already mentioned, the main aim of
method of distributing the voting weights of the classifiers combining classifiers is to improve their generalization
is implemented. In addition, a depth-first technique for ability and recognition performance. However, the
building a tree with the most independent classifiers is combination of a set of identical classifiers will not
used. Finally, in [1], six selection criteria were evaluated in outperform the individual members. The ideal situation
the setting of combining classifiers for isolated handwritten would be a set of classifiers with uncorrelated errors - they
character recognition. However, only one of them, the would be combined in such a way as to minimize the effect
exponential error count criterion, uses accuracy and of these failures. The set of distinct classifiers can be
diversity to choose the ensemble members. achieved by varying the classifier’s structure (topology and
Most of the works reviewed in the previous paragraph parameters), varying the data and/or using different types
use a selection procedure which is performed priori to the of classifiers as components of an ensemble. In terms of
testing phase. In other words, all testing patterns will be using different types of classifiers, the ensembles can be
classified according to the same ensemble configuration. classified as hybrid and non-hybrid. When an ensemble is
However, the testing patterns can be very diverse, composed of classifiers of the same type, it is called non-
requiring different configurations of ensembles to classify hybrid or homogeneous ensembles. On the other hand,
them more accurately. This kind of problem could be ensembles composed of classifiers of more than one type
minimized, as previously mentioned, by means of a of classifiers, it is called hybrid or heterogeneous
Dynamic Classifier Selection (DCS) procedure in that the ensembles.
member classifiers to form the ensemble are chosen at the Once a set of classifiers has been created and the
test (use) phase. That is, different testing patterns can be strategy for organizing them has been defined, the next
classified by different ensemble configurations. In general, step is to choose an effective way of combining their
the selection criterion of this method is implemented based outputs. There are a great number of combination methods
on accuracy and only a single classifier is chosen reported in the literature [6,11,17,18,24]. According to
[9,10,22]. In contrast, in this paper, the DCS method their functioning, three main strategies of combination
proposed uses both accuracy and diversity to choose not methods are discussed in the literature on classifier
only the best classifier, but a set of member classifiers to combination: selection-based, combination-based and
form the ensemble to classify the testing pattern. As in the hybrid methods. The assumption in selection-based
other DCS methods in the literature, in this paper, the k- methods is that each classifier has expertise in some local

1311
area of the feature space. When an input pattern is It is important to emphasize that the selection of the
submitted for classification, the classifier responsible for most suitable classifier is made during the test phase and it
the vicinity of the input pattern is given the highest is performed in case that there is a disagreement among the
authority to label it. Combination-based methods assume classifiers.
that all classifiers are equally experienced in the whole
C. Hybrid Methods
feature space and the decisions of all of classifiers are
taken into account for any input pattern. There are some
combination models halfway between these two extremes, Hybrid methods are the ones in which selection and fusion
which are the hybrid methods. techniques are used in order to provide the most suitable
output to classify the input pattern. Usually, there is a
A. Combination-based Methods criterion process to decide whether to use the selection or
combination method. Also, the main idea is to use selection
There are a vast number of combination-based methods only and if only the best classifier is really good to classify
reported in the literature. They could be classified the testing pattern. Otherwise, a combination method is
according to their characteristics as Linear or Non-linear. used. Two main examples of hybrid methods are: Dynamic
• Linear combination methods. Currently, the Classifier Selection based on multiple classifier behavior
simplest ways to combine multiple neural (Dcs-MCS) [9] and Dynamic classifier selection using also
networks are the sum and average of the neural Decision Templates (Dcs-DT) [10].
networks’ outputs [11]. D. Diversity in Ensembles
• Non-linear methods. This class includes rank-
based combiners, such as Borda Count [17],
As already mentioned, there is no gain in the
majority voting strategies [11], the Dempster-
combination (MCS) that is composed of a set of identical
Shafer technique [16], fuzzy integral [4], neural
classifiers. The ideal situation, in terms of combining
networks [4] and genetic algorithms [11].
classifiers, would be a set of classifiers that present
B. Selection-based Methods uncorrelated errors. In other words, the ensemble must
show diversity among the members in order to improve the
Unlike the combination-based methods, only one performance of the individual classifiers. Diversity can be
classifier is needed to correctly classify the input pattern in reached in three different ways:
selection-based methods. In order to do so, it is important • Variations of the parameters of the classifiers
to define a process to choose a member of the ensemble to (e.g., varying the initial parameters, such as the
make the decision, which is usually based on the input weights and topology, of a neural network
pattern to be classified. The choice of a classifier to label is model [21]).
made during the operation phase. This choice is typically • Variations of the training dataset of the
based on the certainty of the current decision. Preference is classifiers (e.g., the use of learning strategies
given to more certain classifiers. One of the main methods such as Bagging and Boosting [11]).
in classifier selection was proposed by [22], which is called • Variations of the type of classifier (e.g., the use
Dynamic Classifier Selection (DCS). of different types of classifiers, such neural
networks and decision trees, as members of an
Dynamic Classifier Selection: Woods et al. [22] use ensemble - hybrid ensembles).
local analysis of competence to nominate a classifier to There are different diversity measures available from
label an input. According them, the main steps to calculate different fields of research. In [12,18], for instance, ten
the Local Class Accuracy (LCA) for a test pattern (xt) can diversity measures are defined. In this paper, one widely
be defined as follows: applied will be used.
1. Take the class labels provided by all classifiers,
The double-fault measure [8]. This measure uses
2. For each classifier (Di, i=1,..,L) find k (k=10 is
the proportion of the cases that has been misclassified by
recommended) points closest to xt for which Di has
both classifier and it is defined as follows:
provided the same label.
3. Calculate the proportion of points in which Di has N 00
provided the true label and make it be the local class DFi ,k = 11 (1)
accuracy of this classifier. N + N 10 + N 01 + N 00
4. Choose the classifier with the highest LCA. Three Where: N00 is the number of patterns that both
main situations may occur, which are: classifiers wrongly classified; N01 is the number of patterns
4.1. If there is only one winner, let it label xt. that the first classifier classified correctly and the second
4.2. If two classifiers are tied, choose a third classier classified wrongly; N10 is the number of patterns that the
with the second highest LCA. first classifier classified wrongly and the second classified
4.3. If all classifiers are tied, pick a random class correctly and N11 is the number of patterns that both
label among the tied labels. classifiers wrongly classified.

1312
IV. THE PROPOSED METHOD 2.1. In order to rank the classifiers in an increasing
order of diversity any pairwise diversity measure
As already mentioned, the classifier selection method can be used. In this paper, a double fault
proposed in this paper is based on both the accuracy and diversity measures is used (Eq. 1);
the diversity of the classifiers. Such a method uses a 3. Select the N most accurate classifiers of the k
dynamic classifier selection procedure, in which different neighbors;
testing patterns can be classified by different ensemble 4. From the N most accurate classifiers, select the J most
configurations. Two different versions of this method are diverse classifiers to compose the ensemble members
proposed and they will be described in the next two to classify the testing pattern, where (J <= N);
sections. 5. Classify the testing pattern using a combination-based
method.
A. Cluster and select version As already mentioned, the main difference of this
version and the previous one is the method used to
The selection procedure of this version is similar to the calculate similarity of the validation patterns, k-NN or
clustering and selection method proposed in [10]. The clustering method. The other difference is when the
main differences between the method in [10] and the one selection procedure is performed. When using the
proposed in this paper are the following. In method clustering method, all the rankings and selection are
proposed a set of classifiers is chosen based on both calculated before the testing phase. For each group, the
accuracy and diversity, whereas in [10] only a single best configuration is chosen a priori. On the other hand, in
classifier is selected based only on the accuracy. Also, in the k-NN method, the choice of the closest validation
[10] the best classifier is only chosen if it is statistically patterns, as well as the ranking and selection processes are
better than the other classifiers. Otherwise, a fusion method done during the testing phase. This may lead to different
is used with all the classifiers. configurations to classify all the testing patterns. However,
The clustering and selection procedure proposed can be this means a more complex processing than using the
described as follows. clustering method, since these processes have to be
1. Cluster the validation patterns into k groups. This can calculated to all testing patterns.
be accomplished by using the k-means [11]. Intuitively, Step 3.3 of the algorithm for both versions
2. For each cluster produced, rank the classifiers in a works as follows.
decreasing order of accuracy and increasing order of 1. Pick the two most diverse classifiers and assign them
diversity. to the set of selected classifiers (I=2);
2.1. In order to rank the classifiers in an increasing 2. While I >= J
order of diversity any pairwise diversity measure 2.1. The next classifier to be selected is the one which
can be used. In this paper, a double fault is most diverse to all classifiers in the set of
diversity measures is used (Eq. 1); selected classifiers;
3. For each testing pattern, do: 2.2. I = I + 1.
3.1. Assign the testing pattern to the cluster with the An example to illustrate the functioning of step 3.3 is as
respective nearest centroid (Euclidean distance). follows. Suppose an ensemble composed of five classifiers
3.2. Choose the N most accurate classifiers of this (A,B,C,D and E). In this example, suppose that the first
cluster. five elements of the decreasing rank of diversity is D(B,C),
3.3. From the N most accurate classifiers, select the J D(A,E), D(B,E), D(C,E), D(C,D), D(A,B) and D(A,C).
most diverse classifiers to compose the ensemble Based on this, choose the three most diverse classifiers,
members to classify the testing pattern, where (J which would be B, C and E. The choice of B and C is
<= N); because they have the highest diversity measure. Although
3.4. Classify the testing pattern using a combination- the second highest diversity is between A and D, the third
based method. chosen classifier is E because it is the most diverse to both
B and C. If a fourth classifier was chosen, it would be
B. k-NN and selection version
classifier A.

In this version of the method, instead of a clustering V. EXPERIMENTAL WORK


method, a k-NN (k Nearest Neighbor) classifier is built
with the validation set. In this version, the selection
procedure can be described as follows for each testing In this investigation, an empirical comparison of the
pattern. proposed method is performed. For this investigation, ten
1. Based on the k-NN built with the validation set, find different classifiers will be used, which are: two MLP
the k neighbors of the testing patterns. (Multi-Layer Perceptron) neural networks, two RBF
2. Based on the k neighbors, rank the classifiers in a (Radial Basis Function) neural networks, two naive
decreasing order of accuracy and increasing order of bayesian classifeers, two SVM (Support Vector Machine)
diversity. and two JRIP (Optimized IREP). The choice of the

1313
aforementioned classifiers was due to their different UCI repository [3]). The images were hand-
learning bias. JRIP, for instance, is a propositional rule segmented to create a classification for every
learner, Repeated Incremental Pruning to Produce Error pixel. Each instance is a 3x3 region. Nineteen
Reduction (RIPPER), which was proposed by William W. continuous attributes were extracted from the
Cohen as an optimized version of IREP [5]. region and there are 7 different classes of regions,
All the learning methods used in this study were which are: brickface, sky, foliage, cement,
implemented using Java Language and were exported from window, path, grass;
the Weka machine learning visual package • Database B: Proteins: It is a protein database
(http:\\www.cs.waikato.ac.nz/~ml/weka/). The values for which represents a hierarchical classification,
parameters of a classifier were chosen as follows. For manually detailed, and represents known
example, for an algorithm with only one parameter, an structures of proteins. They are organized
initial value for the parameter was chosen followed by the according to their evolutionary and structural
run of algorithm. Then, experiments with a larger and relationship. The main protein classes are all-α,
smaller value were also performed. If with the initially all-β, α/β, α+β e small. It is an unbalanced
chosen value, the classifier obtained has the best results (in database, which has a total of 582 patterns, in
terms of validation error), then no more experiments were which 111 patterns belong to class all-α, 177
performed. Otherwise, the same process was repeated for patterns to class all-β, 203 patterns to α/β, 46
the parameter value with the best result so far. As a patterns to class α+β e 45 patterns to class small.
consequence, this procedure becomes more time
consuming with the increasing in the number of parameters B. Cross-Validation
to be investigated.
Also, four different ensemble methods were used. The In order to evaluate the robustness of the classifier, a
first one is the first version of the proposed method (cluster common methodology is to perform cross validation on the
and select), using a sum combination-based method to classifier. Ten fold cross validation has been proved to be
combine the output of the classifiers. The second method is statistically good enough in evaluating the performance of
the second version proposed method (K-NN and select), the classifier [16]. In ten fold cross validation, the training
also using a voting method. set is equally divided into 10 different subsets. Nine out of
The third one is the static selection method in which a ten of the training subsets are used to train the learner and
classifier selection method based on accuracy and diversity the tenth subset is used as the test set. The procedure is
is performed, followed by the sum combination method. repeated ten times, with a different subset being used as the
The fourth method is the original DCS (dynamic classifier test set. In fact, in this investigation, each method was run
selection) method [22], using a K-NN method to define the with a 10 replication of 10-fold cross validation
neighbors of the testing pattern and choosing the classifier (generating 100 classifiers). Applying the distinct
with the highest accuracy of the neighbors. algorithms to the same folds with k at least equal to 10, the
The selection procedure used in the third method has statistical significance of the differences between the
been used in a static way, in which the ensemble members methods can be measured, based on the mean of the error
are chosen before the testing phase starts and it is based in rate from the test sets (paired t-test [16]).
the whole validation set. It is aimed to investigate whether
the use of a dynamic selection is valuable for the accuracy VI. AN ANALYSIS OF THE RESULTS
of a classifier selection method. Additionally, the use of
the original DCS method aims to investigate the A. Individual Classifiers
importance of the decision based both on accuracy and
diversity, choosing more than one classifier and applying a Before starting the investigation of the performance of the
combination method. ensembles, it is important to analyze the performance of
For the classifier selection methods, three classifiers are the individual classifiers. Table I shows the correct mean
chosen out of the ten classifiers that compose the pool of (CM) and standard deviation (SD) of the individual
classifiers. In both versions of the proposed method, six classifiers employed in the ensembles for databases A and
classifiers are chosen based on their accuracy on the B. Configurations of all classifiers are chosen taking into
chosen group. After that, the three most diverse classifiers account the highest correct mean and the lowest standard
are chosen. deviation of them.
A. Databases
TABLE I: Correct mean and standard deviation of the
Two different databases are used in this investigation, individual classifiers applied to databases A and B
which are described as follows. Classifier Database A Database B
• Database A: an image database, where instances CM SD CM SD
were drawn randomly from a database of 7 JRIP-1 95.48 4.10 77.92 6.29
outdoor images (segmentation dataset from the JRIP-2 95.29 4.30 76.23 5.50

1314
MLP-1 95.67 4.12 80.38 3.90 mean of both versions of the proposed method was
MLP-2 94.33 3.85 79.06 5.22 statistically significant for database B (p-value = 0.019 for
Naive-1 87.90 3.86 77.55 4.66 cluster/selection version and p-value = 0.031 for the k-
Naive-2 89.10 5.57 78.49 3.80 NN/Selection version). On the other hand, for database A,
RBF-1 95.33 3.48 76.23 6.55 there is no statistical evidence to state that the
RBF-2 93.71 2.95 76.79 4.79 improvement in the correct mean of the proposed methods
SVM-1 95.76 3.43 81.51 4.94 is significant.
SVM-2 95.19 3.11 81.13 5.26 In the second test, the hypothesis tests (t-test) is
According to the correct mean provided by all performed to compare the correct mean of both versions of
classifiers, one can see that all classifiers have delivered a the proposed method with the original DCS, using a
similar pattern of performance for both databases. confidence level of 95% (α = 0.05). It could be seen that
Considering the correct mean of the classifiers, for both the correct mean of both versions of the proposed method
databases, SVM-1 had delivered the highest correct mean was statistically significant for database B (p-value = 0.029
for all methods. In relation to the lowest standard for cluster/selection version and p-value = 0.048 for the k-
deviation, for database A, RBF-2 had presented the lowest NN/Selection version). On the other hand, for database A,
standard deviation for all classifiers, while Naïve-2 had there is no statistical evidence to state that the
provided the lowest standard deviation for database B. The improvement in the correct mean of the proposed methods
highest difference of the classifiers (between the highest is significant.
and the lowest correct mean) is lower for database B (5.25) As it could be seen, for database A, the improvement
than for database A (7.86). reached by the proposed method was not statistically
significant to any of the other two methods. This is because
B. Ensembles
the correct means of the individual classifiers and the
ensembles are really high. In this sense, an improvement
Table II shows the correct mean and standard deviation for that is statistically significant is difficult to reach. On the
all four ensemble methods applied to databases A and B. other hand, for database B, which is an unbalanced
For the cluster version of the proposed method, a k-means database, the improvement reached by both versions of the
clustering algorithm was used, in which k = 6 for database proposed method was statistically significant, when
A and k = 10 for database B. Differently, for the k-NN comparing with the static selection and the original DCS
version of the proposed method, a k-NN method was used, methods.
with k = 10 for database A and k = 18 for database B. For
the original DCS, a k-NN method has also been used, using VII. FINAL REMARKS
k = 10 for both databases.
In this paper, a DCS (dynamic classifier selection)
procedure was proposed, which takes into account
TABLE II: Performance of the ensemble methods
accuracy and diversity of the classifiers in order to choose
applied to databases A and B
the ensemble members. Moreover, two different versions
Classifier Database A Database B
of the proposed method were presented. The first one used
CM SD CM SD
a clustering algorithm to group patterns of a validation set.
K-NN/Selection 95.90 3.00 83.77 5.42
In the second version, a k-NN method was used to
Cluster/Selection 96.81 2.68 84.52 5.81
calculate similarity among patterns of a validation set.
Static 96.14 2.77 79.24 4.71
In order to evaluate the performance of the proposed
O-DCS 95.79 4.21 80.00 4.10
method, an empirical comparison was performed. Four
different ensemble methods were analyzed, which are: both
As it can be observed from Table II, all ensemble
versions of the proposed method, a static classifier
methods, apart from the static method for database B, have
selection method, using also diversity and accuracy, and
delivered a higher correct mean than all individual
the original DCS method. These ensembles were applied to
classifiers. Moreover, all two versions of the proposed
two different databases.
method have delivered the highest and second highest
Through this analysis, it could be observed that both
correct mean of all methods. For database B, for instance,
versions of the proposed method reached an improvement
the improvement of the proposed method was of 5.28%,
in the correct mean, when compared with the other two
when compared with the static classifier selection method.
methods. Nevertheless, this improvement was statistically
In order to verify whether the improvement in the
significant only for database B. It is believed that this is
correct mean obtained by the proposed methods is
because the correct mean of the classification systems was
statistically significant, the hypothesis tests (t-test) is
really high for database A. In this sense, an improvement
performed, using a confidence level of 95% (α = 0.05). In that is statistically significant is difficult to reach.
the first test, both versions of the proposed methods were The results obtained in this paper are very encouraging
compared with the static selection procedure. Based on the since it shows that the use of a dynamic classifier selection
hypothesis test, it is possible to conclude that the correct procedure which takes into account diversity and accuracy

1315
is positive to the performance of the ensembles. [18] Shipp C.A. and Kuncheva L.I.. Relationships between
combination methods and measures of diversity in
ACKNOWLEDGMENT combining classifiers, Inf Fusion, 3 (2), 2002, 135-148.
[19] A Tsymbal, M Pechenizkiy and P Cunningham. Diversity in
This work has the financial support of CNPq (Brazilian search strategies for ensemble feature selection. Inf Fusion,
Research Council), under process number 471309/2004-3 Special issue on Diversity in Multiple Classifier System. 6
and 307236/2003-0 (1), pp. 83-98, 2005.
[20] G. Valentino. An Experimental Bias-Variance Analysis of
SVM Ensembles Based on Resampling Techniques. IEEE
REFERENCES Transactions on System, Man and Cybernetics- Part B:
[1] M. Askela. “Comparison of classifier selection methods for Cybernetics, Vol. 35, No. 6, pp.1252-1271, 2005
improving committee performance”. In in Multiple [21] T. Windeatt, Diversity measures for multiple classifier
Classifier Systems—4th Int.Workshop, MCS 2003, vol. system analysis and design. Inf Fusion, Special issue on
2709, ser. Lecture Notes in Computer Science, T. Windeatt Diversity in Multiple Classifier System, pp. 21-36, 6 (1),
and F. Roli, Eds., Guilford, U.K., 2003, pp. 306–316. 2005.
[2] R. Banfield, L. Hall, K. Bowyer, andW. Kegelmeyer, “A [22] Woods, K. Kegelmeyer, W and Bowyer, K. Combination of
new ensemble diversity measure applied to thinning Multiple Classifiers using Local Accuracy estimates, IEEE
ensemble,” in Multiple Classifier Systems—4th Trans on Patt Analysis and Mach Intelligence, 19(4), 405-
Int.Workshop, MCS 2003, vol. 2709, ser. Lecture Notes in 410, 1997
Computer Science, T. Windeatt and F. Roli, Eds., Guilford, [23] X. Wu and Z. Chen. “Recognition of Exon/Intron
U.K., 2003, pp. 306–316. Boundaries Using Dynamic Ensembles”. Proceedings of the
[3] Blake, C.L. and Merz, C.J.: UCI Repository of machine Computational Systems Bioinformatics Conference (CSB),
learning databases 2004
[http://www.ics.uci.edu/~mlearn/MLRepository.html]. [24] Zhou,D, J. and Zhang, "Face recognition by combining
Irvine, CA: University of California, Department of several algorithms" Pattern Recognition 3(3):497-500, 2002.
Information and Computer Science. (1998)
[4] A Canuto.. Combining neural networks and fuzzy logic for
applications in character recognition. PhD thesis, University
of Kent, 2001
[5] W W. Cohen. “Fast Effective Rule Induction", 'Machine
Learning: Proceedings of the Twelfth International
Conference'(ML), pp.115-123. 1995.
[6] J Czyz and M Sadeghi and J Kittler and L Vandendorpe.
Decision fusion for face authentication, Proc First Int Conf
on Biometric Authentication, 686-693, 2004.
[7] G. Giacinto anf F Roli. “An Approach to the Automatic
Design of multiple classifier systems. Pattern Recognition
Letters, 22: 25-33, 2001.
[8] G. Giacinto and F. Roli, "Design of effective neural network
ensembles for image classification", Image and Vision
Computing Journal , 19(9-10), pp. 697-705, 2001.
[9] Giacinto, G. and Roli, F. “Dynamic Classifier Selection
based on Multiple Classifier Behaviour”. Pattern
Recognition, vol. 34, 1879-1881, 2001
[10] Kuncheva, L. Switching Between Selection and Fusion in
Combining Classifiers: An Experiment. IEEE Trans on
Systems, Man and Cybernetics – Part B Vol. 32, N.2, 146-
155, 2002.
[11] Kuncheva L.I. Combining Pattern Classifiers. Methods and
Algorithms, Wiley, 2004.
[12] Kuncheva L, Whitaker, C. Measures of diversity in classifier
ensembles, Mach Learning, 51, 181-207, 2003.
[13] A. Lazarevic, and Z Obradovic. Effective pruning of neural
network classifier ensembles. Proc. Of the International Joint
conference on Neural Networks, vol. 2, pp.796-801, 2001.
[14] Lemieux, A and Parizeau, M "Flexible multi-classifier
architecture for face recognition systems". The 16th
International Conference on Vision Interface, 2003
[15] D. Marginenatu and T. Dietterich, “Prununig adaptive
boosting,” in Proc. 14th Int. Conf. Machine Learning, San
Francisco, CA, 1997, pp.378–387.
[16] T Mitchell. Machine Learning. McGraw-hill, 1997
[17] A J C Sharkey. Multi-net System. In Combining Artificial
Neural Nets: Ensemble and Modular Multi-net Systems,
(Ed) A. J. C. Sharkey. Spring-Verlag, 1-30, 1999.

1316

You might also like