You are on page 1of 6

2014 Brazilian Conference on Intelligent Systems

Improving Classifiers and Regions of Competence in
Dynamic Ensemble Selection
Tiago P. F. Lima, Anderson, T. Sergio, and Teresa B. Ludermir
Centro de Informática
Universidade Federal de Pernambuco, UFPE
Recife, Brasil
{tpfl2,ats3,tbl}@cin.ufpe.br

One of the most important issues surrounding ensembles of
classifiers is ensemble selection. Given a pool of classifiers
࡯ ൌ  ሼ࡯૚ ǡ ǥ ǡ ࡯ࡸ ሽ , the ensemble selection has focused on
finding the most relevant subset of classifiers ࡱ, rather than
combining all available ࡸ classifiers, where ȁࡱȁ ൑ ȁ࡯ȁ. We can
perform this task either by static selection, i.e., selecting only
one ensemble for all query patterns, or by dynamic selection,
i.e., selecting different ensembles for different query patterns.

Abstract — This paper evaluates some strategies to
approximate the performance of dynamic ensembles based on
NN-rule to the oracle performance. For this purpose, we use a
multi-objective optimization algorithm, based on Differential
Evolution, to generate automatically a pool of accurate and
diverse classifiers in the form of Extreme Learning Machines.
However, the rule defined for selecting the classifiers depends on
the quality of the information obtained from regions of
competence. Thus, we also improve the regions of competence in
order to avoid noise and create smoother class boundaries.
Finally, we employ a dynamic ensemble selection method. The
performance of the proposed method was experimentally
investigated using 12 benchmark datasets and results of
comparative analysis are presented.

A way to define the upper limit of ensemble selection
performance is through the concept of oracle. If at least one
classifier ࡯࢏ ‫ ࡯ א‬can correctly classify a given pattern ࢄ࢛ࢗࢋ࢘࢟ ,
then the oracle can correctly classify ࢄ࢛ࢗࢋ࢘࢟ . The objective of
this work is to use the nature oracle, so we can only select
those classifiers which might be able to correctly classify a
given sample. This is accomplished in a dynamic fashion, since
different patterns might require different ensembles.

Keywords — Dynamic ensembles; oracle; multi-objective
optimization; differential evolution; extreme learning machine.

I.

INTRODUCTION

To understand how we can explore the oracle concept to
improve a Dynamic Ensemble Selection (DES) process, first
we have to look three basic steps in Fig. 1: (1) generate a
classifier pool using the training dataset ञ; (2) produce the
region of competence by the training dataset ञ or an
independent validation dataset ठ ; and finally (3) select the
classifier(s) based on the information extracted from the
regions of competence. The classifier(s) selected is(are)
combined to classify the query pattern ࢄ࢛ࢗࢋ࢘࢟ .

The research field of ensembles of classifiers become very
popular after the half of the 1990 decade, with many papers
published on the creation of ensemble methods that provide
some theoretical insights of why combining different classifiers
could be interesting. According to Dietterich [1], there are
three main motivations to combine multiple classifiers, the best
case, the worst case, and the computational motivations:
Representational (or best case) motivation: combination of
multiple classifiers may have a better performance than the
single best classifier among them. There are many theoretical
and experimental evidences that it is possible if the classifiers
in an ensemble make different errors on a query pattern.

Fig. 1. Dynamic ensemble selection process

Statistical (or worst case) motivation: it is possible to avoid
the worst classifier by averaging several classifiers. This
simple combination was demonstrated to be efficient in many
applications. There is no guarantee, however, that the ensemble
will outperform a single classifier.
Computational motivation: some algorithms perform an
optimization task in order to learn and suffer from local
minima. In this case it is a difficult task to find the best
classifier, and it is often used several (hundreds or even
thousands) initializations in order to find a presumable optimal
classifier. Combination of such classifiers showed to stabilize
and improve the best single classifier result.

978-1-4799-5618-0/14 $31.00 © 2014 IEEE
DOI 10.1109/BRACIS.2014.14

Despite the large number of selection methods available in
the literature, the classifier generation and region of
competence in DES process have not been given much
attention. In addition, due to the high computational cost
usually observed in the dynamic selection solutions, its
application is often criticized. In fact, the decision as to
whether or not use dynamic selection is still an open question.
13

As an important branch of neural network. Firstly. the trial vector) is generated through both mutation and crossover. Section V describes the idea of proposed method. a number of individuals generated uniformly from the whole search space form the initial population. While there is still no one EA that is universally regarded as better than others. For each individual in the population (i. learning ability to handle imprecise information. Finally. This is equivalent to ࡴࢼ ൌ ࢀ. (1) IV. yielding evolutionary ANNs. ࡯࢘ is the crossover probability. The training process is described briefly as follows. The main two steps (i. σࡸ࢐ୀ૚ ࢼ࢐ ࢌ൫࡭࢐ ǡ ࡮࢐ ǡ ࡼ࢐ ൯ ൌ ࢀ࢐ ǡ ࢐ ൌ ૚ǡ ǥ ǡ ࡺ DIFFERENTIAL EVOLUTION SELECTION METHODS Let ࡯ ൌ ሼ࡯૚ ǡ ǥ ǡ ࡯ࡸ ሽ be a pool composed of ࡸ classifiers and ࡱ ൌ ሼࡱ૚ ǡ ǥ ǡ ࡱࡹ ሽ be a pool composed of ࡹ ensembles formed from ࡯ . and ࡺ is the number of distinct input or output data. Denote the regions of competence by ࡾ૚ ǡ ǥ ǡ ࡾࡷ . ࡷ ࡼࢉ ൌ σࡷ ࢐ୀ૚ ࡼ൫ࡾ࢐ ൯ࡼࢉ ሺࡾ࢐ ሻ ൌ σ࢐ୀ૚ ࡼ൫ࡾ࢐ ൯ࡼሺࡱ࢏ሺ࢐ሻ ȁࡾ࢐ ሻ (4) To maximize ࡼࢉ . In DES process. introduced by Huang et al. ࢼ࢐ is the output weight that connects the ࢐࢚ࢎ hidden node and output node. mutation and crossover) are described briefly as follows. ࡸ is the number of hidden neurons.e. The main characteristic of ELMs is learning without iterative training. high parallelism. where ࡴ is the Moore-Penrose (MP) generalized inverse of ࡴ. mutation. the target vector). ࢁ࢐ǡ࢏ǡࢍ ൌ ቊ ǥ ‫ڰ‬ ǥ ࢂ࢐ǡ࢏ǡࢍ ܑ܎࢘ࢇ࢔ࢊ࢏ǡ࢐ ሾ૙ǡ ૚ሿ ൑ ࡯࢘‫ ࢐ܚܗ‬ൌ ࢐࢘ࢇ࢔ࢊ ࢄ࢐ǡ࢏ǡࢍ ‫܍ܛܑܟܚ܍ܐܜܗ‬ (3) ࢁ࢐ǡ࢏ǡࢍ represents the ࢐࢚ࢎ component of ࢁ࢏ǡࢍ . EXTREME LEARNING MACHINES DE is one of the most powerful stochastic real-parameter optimization algorithms of current interest [6.e. There are many types of machine learning classification techniques. and its capability to generalize well on unseen [3]. robustness. ࢌሺǤ ሻ is the activation function. and ࢐࢘ࢇ࢔ࢊ is a randomly chosen index which can ensure at least one parameter be copied from ࢂ࢏ǡࢍ . Extreme Learning Machine (ELM). ࡮࢐ is the hidden layers biases. DE operates through similar computational steps as employed by a standard EA: initialization. III. where ࢌሺ࡭૚ ǡ ࡮૚ ǡ ࡼ૚ ሻ ‫ڭ‬ ࡴൌ൥ ࢌሺ࡭૚ ǡ ࡮૚ ǡ ࡼࡺ ሻ (2) Note that ࢏ ് ࢘૚ ് ࢘૛ ് ࢘૜. Mutation: The mutant of ࢄ࢏ǡࢍ obtained by the mutation operation is represented by ࢂ࢏ǡࢍ . a new individual (i. fault and failure tolerance. Denote by ࡼሺࡱ࢏ ȁࡾ࢐ ሻ the probability of correct classification by ࡱ࢏ in region ࡾ࢐ . Let ञ ൌ ሼሺࡼ࢏ ǡ ࢀ࢏ ሻȁࡼ࢏ ‫ א‬ব࢔ ǡ ࢀ࢏ ‫ א‬ব࢓ ǡ ࢏ ൌ ૚ǡ ǥ ǡ ࡺሽ be the training dataset. The ࢏࢚ࢎ target vector in the ࢍ࢚ࢎ generation of the population is denoted as ࢄ࢏ǡࢍ . The search process of near-optimal ANNs is widely explored using Evolutionary Algorithms (EAs) [5].This paper is organized as follows: Section II defines terms and provides the main concepts of the chosen base classifier (Extreme Learning Machines). [4]. Artificial Neural Networks (ANNs) [2] are one of the most popular employed. Section VI presents the experimental results. we assign ࡱ࢏ሺ࢐ሻ so that ࡼ൫ࡱ࢏ሺ࢐ሻ ȁࡾ࢐ ൯ ൒ ࡼ൫ࡱ࢚ ȁࡾ࢐ ൯ǡ ‫ ࢚׊‬ൌ ૚ǡ ૛ǡ ǥ ǡ ࡹ From (4) and (5). Section III presents the origins of enhanced evolutionary algorithm proposed. Differential Evolution (DE) is considered to have some advantages due to its simplicity. we have that 14 (5) . plays an important role in the fields of pattern classification. while the gradient-based algorithms focus on the later property only. where ࡼ࢏ is a ࢔-dimensional input pattern and ࢀ࢏ is a ࢓ -dimensional target. ࢘ࢇ࢔ࢊ࢏ǡ࢐ ሾ૙ǡ ૚ሿ is a uniformly distributed random number. section VII gives some final considerations about the main topics covered in this work. and the corresponding trial vector is represented by ࢁ࢏ǡࢍ . Step 2: The output weights are analytically determined through the generalized inverse operation of the hidden-layer matrices. where ࡭࢐ is the input weights. This particular type of classifier has been extensively used due to its inherent characteristics: nonlinearity. 7]. ࡲ is a scaling factor. we decide which ensemble from ࡱ we should nominate for each region ࡾ࢐ ǡ ࢐ ൌ ૚ǡ ǥ ǡ ࡷ.e. However. In a basic DE. II. Let ࡱ࢏ሺ࢐ሻ ‫ ࡱ א‬be the ensemble responsible for region ࡾ࢐ Ǥ The overall probability of correct classification ࡼࢉ can be computed by (4). Let ࡱ‫ כ‬be the ensemble with the highest average accuracy amongst the ensembles of ࡱ over the whole feature space. Section IV shows a brief review of selection methods. EAs and ANNs are combined to produce a hybrid model with low error and high generalization. In the selection step. where ࡼ൫ࡾ࢐ ൯ is the probability that an input drawn from the distribution of the problem falls in ࡾ࢐ . ࢂ࢏ǡࡳ ൌ ࢄ࢘૚ǡࢍ ൅ ࡲሺࢄ࢘૛ǡࢍ െ ࢄ࢘૜ǡࢍ ሻ Step 1: Randomly assign values to the input weights and the hidden neuron biases. the randomness of weights and biases may lead to non-optimal performance. ࢄ࢏ǡࢍ will be replaced by ࢁ࢏ǡࢍ if ࢁ࢏ǡࢍ is better than ࢄ࢏ǡࢍ . ା ELM can reach good generalization performance by ensuring two properties of learning: the smallest norm of weights besides the smallest squared error within the training samples. crossover and selection. Crossover: ࢁ࢏ǡࢍ is generated from ࢂ࢏ǡࢍ and ࢄ࢏ǡࢍ . ࢌሺ࡭ࡸ ǡ ࡮ࡸ ǡ ࡼࡸ ሻ ‫ڭ‬ ൩ ࢌሺ࡭ࡸ ǡ ࡮‫ ۺ‬ǡ ࡼࡺ ሻ ‫ۼ‬ൈ‫ۺ‬ ࢼલ૚ ࢀલ૚ ࢼ ൌ቎ ‫ ڭ‬቏ and ࢀ ൌ ቎ ‫ ڭ‬቏ ࢼલࡸ ‫ۺ‬ൈ‫ܕ‬ ࢀલࡸ ‫ۼ‬ൈ‫ܕ‬ Step 3: Calculate the output weights by ࢼ ൌ ࡴା ࢀ. as in (1).

The population size used was ࡺࡼ ൌ ૛૙. The training stage generates a pool of classifiers࡯ ൌ  ሼ࡯૚ ǡ ǥ ǡ ࡯ࡸ ሽ using the training dataset ञ . a noise reduction filter is applied to the validation dataset ठ to remove noise patterns. The fourth and fifth parts correspond to the input weights and hidden biases (obtained in the range ሾെ૚ǡ ૚ሿ). the local region is computed using the patterns of the filtered dataset ठԢ . After the structure is set. Denote ࡿ࡯࢘ as the set of all successful crossover probabilities࡯࢘࢏'s at generation ࢍ. the objective function is computed. regardless of the way the feature space has been partitioned. The second part contains information on the hidden neurons. The information of each part is decoded to form an ELM. The third part encodes the activation function. With the use of EA. and triangular basis function. In (8). The only condition is to ensure that ࡱ࢏ሺ࢐ሻ is the best among the ࡹ ensembles in ࡱ for region ࡾ࢐ . as in (11). basically. A. For instance. sine function. the MP generalized inverse is used to analytically calculate the output weights. i. In encoding schema. as in (9). The location parameter Ɋࡲ of the Cauchy distribution is initialized to be ૙Ǥ ૞ and then updated at the end of each generation as in (12). the crossover probability ࡯࢘ of each individual ࢏ is independently generated according to a normal distribution of mean Ɋ࡯࢘ and standard deviation ૙Ǥ ૚. thereby avoiding difficulties stemming from a non automatic trial-and-error search. The mean Ɋ࡯࢘ is initialized to be ૙Ǥ ૞ and then updated at the end of each generation as in (10). 3. and oracle information [15. We used many generations because we wanted to provide enough time for a satisfactory fitness level to be achieved for the population. Denote ࡿࡲ as the set of all successful mutation factors ࡲ࢏ 's in generation ࢍ . Moreover. In (7). and then truncated to ሾ૙ǡ ૚ሿ. Thus. an individual contains the ELM information organized in five parts. The enhanced DE was executed for ࡳ࢓ࢇ࢞ ൌ ૚૙૙૙ generations. such as diversity [17-19]. Having too many hidden neurons is analogous to a system of equations with more equations than free variables: the system is over specified. Feature Selection ࢓ ૛ ࡾࡹࡿࡱ ൌ ටσࡺ ࢏ୀ૚ σ࢐ୀ૚ሺࢀ࢐࢏ െ ࡻ࢐࢏ ሻ Τࡺ ൈ ࢓ Ɋ࡯࢘ ൌ ሺ૚ െ ࢉሻ ȉ Ɋ࡯࢘ ൅ ࢉ ȉ ࢓ࢋࢇ࢔࡭ ሺࡿ࡯࢘ ሻ (10) ࡲ࢏ ൌ ࢘ࢇ࢔ࢊࢉ࢏ ሺɊࡲǡ ૙Ǥ ૚ሻ (11) Ɋࡲ ൌ ሺ૚ െ ࢉሻ ȉ Ɋࡲ ൅ ࢉ ȉ ࢓ࢋࢇ࢔ࡸ ሺࡿࡲ ሻ Hidden Biases ૛ ࢓ࢋࢇ࢔ࡸ ሺࡿࡲ ሻ ൌ σࡲ‫ ࡲ ࡲࡿא‬ൗσࡲ‫ࡲ ࡲࡿא‬ 15 (9) (12) (13) . Similarly. Moreover. the root mean square error (RMSE) and classification error (CE). and incapable of generalization. we adopt the most known error functions (in training and validation datasets). Improving Classifiers The hybridization of an enhanced DE and ELM was performed to build an automatic method capable of seeking a pool of ELMs. ࢓ is the number of output units. defined respectively in (7) and (8). ࢉ is the number of classes and ࡯࢏ is the number of errors per class. and then truncated to be ૚ if ࡲ࢏ ൒ ૚ or regenerated if ࡲ࢏ ൑ ૙. we may find measures that consider interactions among classifiers.e. In the query stage. sigmoid function. 2 shows the architecture of the proposed approach.21] or other grouping approaches [22]. the mutation factor ࡲ of each individual ࢏ is independently generated according to a Cauchy distribution with location parameter Ɋࡲ and scale parameter ૙Ǥ ૚. We use the KNORA-Eliminate [16] to select the dynamic ensemble. 3. PROPOSED METHOD Fig. ambiguity [20. However. ࡻ࢐࢏ is the output obtained to the pattern ࢏ in the output ࢐ and ࡺ is the number of samples. as illustrated in Fig. V..‫כ‬ ‫כ‬ ࡼࢉ ൒ σࡷ ࢐ୀ૚ ࡼ൫ࡾ࢐ ൯ࡼሺࡱ ȁࡾ࢐ ሻ ൌ ࡼሺࡱ ሻ The first part of the individual is responsible for the feature selection. the measures are conducted in different ways. Finally.16]. Proposed method flowchart Activation Function Input Weights ࡯ࡱ ൌ σࢉ࢏ୀ૚ ࡯࢏ (8) ࡯࢘࢏ ൌ ࢘ࢇ࢔ࢊ࢔࢏ ሺɊ࡯࢘ǡ ૙Ǥ ૚ሻ Fig. the individualbased measures most often take into account the classifier accuracy. The enhanced DE takes into account multiple objectives (instead of using only the training dataset error to avoid overfitting [23]). We use the minimum number of neurons ࡺ࢓࢏࢔ ൌ ૚૙ and the maximum number of neurons ࡺ࢓ࢇ࢞ ൌ ૜૙. where ࢉ is a positive constant between ૙ and ૚ and ࢓ࢋࢇ࢔࡭ ሺǤ ሻ is the traditional arithmetic mean. 2. respectively. We used the Gaussian radial basis function. Composition of an individual in ELM optimization Hidden Neurons (7) In the enhanced DE. ranking of classifiers [9]. ࢀ࢐࢏ is the target to pattern ࢏ in the output ࢐ . an encoding schema and objective function were defined. in order to reduce the complexity of ELMs generated. probabilistic information [10. one may find measures based on pure accuracy (overall local accuracy or local class accuracy) [8].11]. where ࢓ࢋࢇ࢔ࡸ ሺǤ ሻ is the Lehmer mean that is calculated as in (13). at each generation ࢍ. It is worth noting that. hyperbolic tangent function. (6) Equation (6) shows that selection methods perform equal of better than the best ensemble ࡱ‫ כ‬. classifier behavior calculated on output profiles [12-14]. Fig. at each generation ࢍ.

B. as summarized in Table I. the selection operation of enhanced DE selects the best from the parent vector ࢄ࢏ǡࢍ and the trial vector ࢁ࢏ǡࢍ according to their objective functions ࢌሺǤ ሻ. ࢂ࢏ǡࢍ ࢘࢖ࢌ is generated as in (14).55 (0.e. when compared to the current population. All inputs (patterns) have been normalized into the range [૙ǡ ૚].89) 14. The final pool was statistically better in all tasks. ࢘࢖ࢌ ࢂ࢏ǡࢍ ൌ ࢄ࢏ǡࢍ ൅ ࡲ࢏ ቀࢄ࢐ǡࢍ െ ࢄ࢏ǡࢍ ቁ ൅ ࡲ࢏ ൫ࢄ࢘૚ǡࢍ െ ࢄ࢘૛ǡࢍ ൯ (14) ࢘࢖ࢌ In (14).25) Waveforms 20. If the archive size exceeds a threshold.23) 14.66 (2. Based on this. while the outputs (targets) have been normalized into [െ૚ǡ ૚]. in enhanced DE. except in Glass task (statistically equivalent).53) Pendigits 3. ࡼ ‫࡭ ׫‬. then ࢄ࢏ is a noise. Improving Regions of Competence Although DES methods can usually achieve better classification performance than the single best classifier among them. with Pareto dominance forming the basis of solution quality.93 (1. The mutant vector.53) 33. The Pareto front denoted by ࢆ‫ כ‬is the set of individuals ࢆ‫ כ‬ൌ ሼࢆ‫ ࢐כ‬ȁࢆ࢐‫ ࢏ࢆ ط כ‬ǡ ‫ࢆ א ࢏ࢆ׊‬ሽ.52) 3.47 (1..66) 4. is described as in (15).87) Vehicle 25. where ࢄ࢏ǡࢍ .75 (2. i. then ࢄ࢏ is a noise. ERROR RATES IN IMPROVING CLASSIFIERS Task Abalone Fig.15) Page 5.80 (1. TABLE II.89) 06. at ࢍ ൌ ࢍ ൅ ૚.01 (0.52) Ecoli 17. attributes and classes. (15) Examples Attributes Classes Abalone 4177 8 3 Cancer 699 9 2 Car 1728 6 4 2 Diabetes 694 8 Ecoli 336 7 8 Glass 214 9 6 Page 5473 10 5 Pendigits 10992 16 10 Sat 6435 36 6 Vehicle 846 18 3 Waveforms 5000 40 3 Yeast 1484 8 10 Table II presents the error rates in percentage.34 (1.e. The parent solutions that fail in the selection process are added to the archive. the member for the next generation.01) Car 13. Thus. The second technique is based on the majority vote concept. The experiments were executed with a ૜૙-dimension search space. can provide additional information about the promising progress direction. denoted as ࢅ૚ ‫ࢅ ط‬૛ if ࢟૚࢏ ൑ ࢟૛࢏ ‫ א ࢏׊‬ሼ૚ǡ ǥ ǡ ࢓ሽ and ࢟૚࢐ ൏ ࢟૛࢐ ‫ א ࢐׌‬ሼ૚ǡ ǥ ǡ ࢓ሽ. ࢄ࢐ǡࢍ and ࢄ࢘૚ǡࢍ are selected from ࡼ . For example.86) 33.26 (0. Each task was randomly divided into ૞૙Ψ for training. The current DES systems end up selecting the wrong classifiers when there are noise patterns near the query pattern. ࢄ࢏ǡࢍା૚ ൌ ቊ EXPERIMENTS AND RESULTS For evaluating our method. i. denote ࡭ as the set of archived inferior solutions and ࡼ as the current population.14 (1.70 (2. recently explored inferior solutions.50) 19.66) Cancer 3. The concept of Pareto dominance and Pareto optimality are fundamental in multi-objective optimization.62 (1.42) Sat 15.76) Diabetes 27.86) Yeast 45. if all classifiers from ࡯ misclassify ࢄ࢏ . the experiments were conducted using ૚૛ benchmarks classification tasks found in the UCI Machine Learning Repository [25]. 4. it has been showed that there is still a large performance gap between DES and the oracle [24].95 (0.37) 40.24) 23. then some solutions are randomly removed from the archive to keep the archive size at ࡺࡼ. TABLE I.92 (2. then ࢅ૚ dominates ࢅ૛ . Improving regions of competence flowchart 16 Improving Classifiers Before After 37.03) .92 (2. as shows the flowchart in Fig. of the current population and the archive.. In DE.24 (0.57 (0. while ࢄ࢘૛ǡࢍ is randomly chosen from the union.75) 13.25 (4. comparing the initial pool (before classifier optimization) and final pool (after classifier optimization) without improving regions of competence.23 (0. ࢄ࢐ǡࢍ is randomly chosen as one of the individuals in Pareto front. The first technique use the oracle concept. The best results are emphasized in bold.63) 2. if the majority vote of all classifiers from ࡯ misclassify ࢄ࢏ .77 (2. according to paired ࢚ -test ሺࢻ ൌ ૙Ǥ ૙૞ሻǡ and the standard deviation in brackets.27 (7. Given the objective vectors ࢅ૚ ǡ ࢅ૛ ‫ א‬ব࢓ . 4.50 (4. we propose two techniques that remove samples that are considered noise.25) Glass 34.VI. ૛૞Ψ for validation and ૛૞Ψ for test. say ࡺࡼ.15 (3. These tasks present different degrees of difficulties and different number of examples.97 (6. ࢁ࢏ǡࢍǡ ࢏ࢌࢌሺࢁ࢏ǡࢍ ሻ ‫ࢌ ط‬ሺࢄ࢏ǡࢍ ሻ ࢄ࢏ǡࢍ ǡ ࢕࢚ࢎࢋ࢘࢝࢏࢙ࢋ SPECIFICATION OF THE TASKS USED IN THE EXPERIMENTS Number of Task Finally.

38) 14.29 (0.97 (2. ૡ against ૝ tasks.8. This becomes a problem when there are many noise patterns in dataset where the regions of competence are computed.10) 18.using the filter based on majority vote.89 35.19 (1. executed in Weka 3.53 - - - - Glass 32.78 (0. and Random Subspace Method [28] (RSM).35 (2. In most number of tasks.70 (2. and the standard deviation in brackets.46 (5.51) 2.95 (6.97 (2.88) 19. Figs. other techniques were used for comparison.74 (0.00) 35.53 (3.6.99 3. as well the original dimensionality in validation dataset ठ.73 (0.66) 56.37 (0.20 (3.49) 32. 5 and 6 show the minimum and maximum reduction rates obtained in filtered validation dataset ठԢ.68 (1.92 (6.19 (2.80 - 21. RSM was better in three tasks (Page.72 (3.TABLE III.16) 19.84) 59.72 (0.29 (4. Page. 5. Nevertheless ȁठԢȁ ൏ ȁठȁ .38 Task To access the accuracy of the proposed method.71 (1.86 (0. The parameters values were chosen as default from Weka 3.55) 2.08 (4. Reduction rates using the filter based on majority vote achieved the biggest absolute reductions considering both maximum and minimum values. the paired ࢚ -test ሺࢻ ൌ ૙Ǥ ૙૞ሻshowed that there is no statistically significant difference.96 (1.96 23. 6.16 (1.37 (0. It is important to mention that one of the main problems in selection methods based on the NN-rule is the computational cost to define the neighborhood of the query pattern.97 (1. Task Abalone Cancer Car Diabetes Ecoli Fig.42) 12.00) 6.84 (0.44) 28. because the results are obtained with different experimental model setups as well as with different learning approaches.45 (1.30 3.41) 5. However.78 (2.52 Diabetes 23.73 Vehicle 19. in an empirical analysis. the dynamic selection using ठԢ presented better results with lower error rates. COMPARISON AMONG METHODS FROM LITERATURE [22] [29] [30] [31] Ecoli 13. the proposed method achieved one of the best results.6.21) 4. it was possible to observe that the 17 .21 (2.20 (1.58) 3.60 (2. according to the empirical analysis.16) 8.83) 25.44) 11.01) ADBO BAG RMS 43.53 (3.15) Traditional Ensemble Methods 33. Thus the boldfaced values indicate the method that has the lowest error for each task.92 (0. The paired ࢚ -tests ( ࢻ ൌ ૙Ǥ ૙૞ ) showed that the proposed method was better than ADBO in all tasks.33 (4.99) 41. The algorithm needs to reduce the neighborhood often. Reduction rates (in Ψ) using the filter based on oracle concept Glass Page Pendigits Sat Vehicle Waveforms Yeast COMPARISON AMONG TRADITIONAL ENSEMBLE METHODS Proposed Method Filter Based on Majority Oracle Vote 33.60 3.72) 80.93) 46.13 22.77 (1.36) 3. Bagging [27] (BAG).88) 40.13) 44. Pendigits .83 (2.82 VII.26) 35.59) 33.50 - 3.50) 2.99 (1. Reduction rates (in Ψ) using the filter based on majority vote Table IV presents comparisons among some methods from the literature. Pendigits using the filter based on majority vote. FINAL REMARKS This work is concerned with the development of a method that improve classifiers and regions of competence in DES. which contributes in decreasing the cost of computing the nearest neighbors in the dynamic selection.33) 14. and Sat) and equivalent in only one task (Glass).62) 4.09) 57.22 (0.30) Table III shows that.11) 24.64 (0. which increases considerably the computational time.31) 28. This type of comparison must be made with caution.89 (5.50) 6. In an empirical analysis.46 (0. the proposed method have the lowest error rates for most of tasks.31) 13. The best results are emphasized in bold.98) 25.06 - 23.21) 3.87 (5.03) 3.30 (1.46) 13.46) 18.59) 18.88 (0.79 (2.47) 32.88 15.35) 11.80 23.89 (2. Proposed Method [11] Cancer 2. the neighborhood is reduced and the algorithm computes again. When none of the classifiers correctly classifies all the neighbors.75) 19.18) 14. The method is based on an enhanced DE and techniques that remove samples that are considered noise.33 35. Fig.73) 23.94 (3.32 - - 31.89 - - 19.99) 23.78 (0.81 (1.72 (2.27) 36.06) 4. Table III presents the performance of some traditional ensemble methods.60) 3.8: AdaBoost [26] (ADBO).81) 40.11 (0. and Sat) and equivalent in two tasks (Glass and Yeast).48) 3.79) 25.84 (5.36 (1. TABLE IV.78 (1. BAG was better in four tasks (Car.43 (0.27) 33.67) 31.07 (0.06 (2.90) 25. Through experimental results.96 (7.64) 3.

pp. vol. M. 66–75. 2000. G. The Random Subspace Method for Constructing Decision Forests. A study on the performances of dynamic classifier selection based on local accuracy estimation. Haykin. R. 1108-1113. International Joint Conference on Computational Sciences and Optimization. W. A multi-objective memetic and hybrid methodology for optimizing the parameters and performance of artificial neural networks. R. Ko. Nadir.method reached a good improvement. I. L. 341-359. I. Sabourin. Roli. 14381450. 127. no. Britto Jr. Thomas. pp. New dynamic ensemble of classifiers selection approach based on confusion matrix for arabic handwritten recognition. International Joint Conference on Neural Networks. 832-844. J. Roli. 1998. 2. Ludermir. T. Ho. Pattern Recognition vol. Lecture Notes in Computer Science. de Souto.uci. B. 1st Int. Neural Comput. 2010. Cavalin. IEEE Transactions on Pattern Analysis and Machine Intelligence vol. and T. pp. the use of this approach can still reduce the computational cost because the filter eliminates some patterns. Sabourin. L. Shin. Pattern Anal. Huang. Giacinto. 38. 4. Storn. Intell. 22. Santana. Ninth Brazilian Symposium on Neural Networks. M. Furthermore. 70. Machine Learning. pp 123-140. 229–236. 1-15. M. pp. Soares. M. 1126-1133. Maupin. G. Kurzynski. S. Dynamic classifier ensemble selection based on GMDH. pp. E. T. 2008. pp. A. F. Neurocomputing. On two measures of classifier competence for dynamic ensemble selection . 2009. pp. 10. pp. Figueiredo. Zhu. vol. C. 11. and K. vol. Q. Lysiak. Combining both ensemble and dynamic classifier selection schemes for prediction of mobile internet subscribers. 2010. vol. because reduced the cost of computing the nearest neighbor rule which can be high in some cases. 1. C. M.1. T. 1997. A. International Symposium on Communications and Information Technologies. 9. 25. 1718-1731. [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] 18 International Conference on Multimedia Computing and Systems. M. Suen. P. [14] [15] [16] [17] [18] ACKNOWLEDGMENT [19] The authors thank FACEPE. Nagy. 1997. S. Dietterich. K. any other classifier can be used. pp. 489-501. Cruz. no.ics. vol. C. Y. F. Woods. no. vol. 38. Hull. Srihari. Second Internation Conference on Document Analysis and Recognition.: UCI machine learning repository. Lysiak. vol 16. Q. 2013. K. IEEE Trans. 1990. D. F. pp. 6679. Differential Evolution – A survey of the stateof-the-art. Work. no. 1993. 2004. the use of techniques that remove samples that are considered noise decreased the computational cost. M. 63–68. 2993–3009. Ho. G. Ludermir. B. no. Breiman. H. Journal of Global Optimization. Wiley Interscience. N. Sabourin. 2009. Classifier combination for handprinted digit recognition.. Giacinto. and T. School of Information and Computer Science. Y. Kuncheva. Das. Woloszynski. 19. vol. Kurzynski. L. and C. vol. P. IEEE Transactions Pattern Analysis and Machine Intelligence. CA (2013). 34. pp. Bache. vol. Ensemble Methods in Machine Learning. Siew. vol. pp. P. 1994. Zhu. R. F. Hybrid Artificial Intelligent Systems. P. Almeida. 1. especially in improving classifiers. 8. R. 2006. Irvine. Probabilistic approach to the dynamic ensemble selection using measures of competence and diversity of base classifiers. 659664. W.41. 2011. K. E. vol. Decision combination in multiple classifier systems. 308-313.36– 41. Woloszynski. 2007. 4. Didaci. pp. From dynamic classifier selection to dynamic ensemble selection. G. and K. L. O. Nabiha. vol. 5. no. Machine Learning. no. Appl. G. Mach. J. Expert Syst.vol. M. we choose ELM as our base classifier but. Canuto.edu/ml R. Sohn. R. J. A. Suganthan. Moreover. Investigating the use of Alternative topologies on Performance of the PSO-ELM. Extreme Learning Machine: Theory and Applications. W. 15. Neurocomputing. 673688. R. 10. T. A. Neurocomputing. Prince. Roli. Knowl. no. K. Dynamic classifier selection based on multiple classifier behaviour. IEEE International Conference on Tools with Artificial Intelligence. L. Cavalcanti. C.L. 10th International Conference on Image Analysis and Processing. 405-410. S. A.. Evolutionary Extreme Learning Machine. 2005. 2011. Bagging predictors. R. 2003. 2012. P. Methods for dynamic classifier selection. Pattern Recognition vol. and P. B. 2188–2191. Rodriguez. J. Even for the methods that have a fixed neighborhood size and therefore does not need to recompute. 2001. J. A dynamic overproduce-andchoose strategy for the selection of classifier ensembles. Differential evolution: a simple and efficient heuristic for global optimization over continuous spaces. vol. Giacinto. dos Santos. Appl. . 41. Combining pattern classifiers: methods and algorithms.experimental comparative analysis. R. Prentice Hall. no. H. A dynamic classifier selection method to build ensembles using accuracy and diversity. pp. no.University of California. A. pp. 2014. K. 163-166. pp. Y. Qin. no. Neural Networks and Learning Machines. F. Lichman. He. 4. 1996. E. Mitiche. 546-552. CNPq and CAPES (Brazilian Research Agencies) for their financial support. Combination of multiple classifiers using local and accuracy estimates. 197-227. vol.. pp. The Strength of Weak Learn Ability. 1999. Pattern Recognit. no. Bowyer. 2011. S. M. Schapire. Sabourin. I. Dynamic selection approaches for multiple classifier systems. vol. S.731–734. B. Marcialis. Kuncheva. 1. 24. no. 2013. pp. For this work. 2006. and T. A method for dynamic ensemble selection based on a filter and an adaptive distance to improve the quality of the regions of competence. other optimization algorithms could be applied. 11. Pattern Recognition. Huang. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [20] T. IEEE Trans. Suganthan. B. G. Ludermir. Pattern Recognit. pp. Y. M. Ren. G. on Multiple Classifier Systems. Xiao. Classifier ensembles with a random linear oracle. 1759–1763. A. P. pp. http://archive. in principle. D. pp. 2008. pp. 4-12. P. Kegelmeyer. IEEE Transactions on Evolutionary Computation. pp. no. 4-31. 19. 20. 2. pp. 73. 5. R. Optimizing Dynamic Ensemble Selection Procedure by Evolutionary Extreme Learning Machines and a Noise Reduction Filter. pp. 500508. R. pp. Data Eng. and T. T. 1879-1881. N. pp. F. G. G. It is relevant to mention that the evolutionary optimization avoids considerable human effort and difficulties stemming from a non-automatic trial-and-error search. G. 2005. Lima.