Paper 6 - 240417 - 184500 OCR

1394 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 7, NO.
5, OCTOBER 2023
Neural Network Ensemble With Evolutionary

Algorithm for Highly Imbalanced Classification
Poly Z. H. Sun , Member, IEEE, Tian-Yu Zuo , Rob Law , Edmond Q. Wu , Senior Member, IEEE,
and Aiguo Song , Senior Member, IEEE
Abstract—Imbalanced data is a major challenge in classification unseen samples (i.e., test sets). In general, the rules hidden in
tasks. Most classification algorithms tend to be biased toward the the majority class can be easily learned by the classifier given
samples in the majority class but fail to classify the samples in numerous training samples. However, the rules hidden in the
the minority class. Recently, ensemble learning, as a promising
method, has been rapidly developed in solving highly imbalanced minority class are quite difficult to learn because the classifier
classification. However, the design of the base classifier for the easily regards the sample belonging to the minority class as
ensemble is still an open question because the optimization problem unimportant outliers or noise. The patterns of the minority class
of the base classifier is gradientless. In this study, the evolutionary may then be ignored by the classifier. As a result, the classifier
algorithm (EA) technique is adopted to solve a wide range of may achieve a high classification accuracy for the majority
optimization design problems in highly imbalanced classification
without gradient information. A novel EA-based classifier opti- class but an extremely low accuracy for the minority class. In
mization design method is proposed to optimize the design of mul- fact, in many engineering applications, those minority classes
tiple base classifiers automatically for the ensemble. In particular, typically need more attention than the majority classes. Most
an EA method with a neural network (NN) as the base classifier existing classification methods cannot provide reliable classi-
termed NN ensemble with EA (NNEAE) is developed for highly fication performance when dealing with (highly) imbalanced
imbalanced classification. To verify the performance of NNEAE,
extensive experiments are designed for testing. Results illustrate data. Therefore, exploring an effective classification method for
that NNEAE outperforms other compared methods. imbalanced data is of great significance.
The solutions developed for class-imbalanced classification
Index Terms—Network architecture search, evolutionary
algorithm, classification, imbalanced data.
can roughly be divided into three approaches:
1) sampling-based approach, such as undersampling [1] and
I. INTRODUCTION oversampling [2];
2) cost-sensitive learning-based approach [3], [4], which
ATA with imbalanced classes, such as credit card fraud-
D ulent transaction identification, machine fault diagnosis,
and cancer diagnosis, exist widely in the real world. In such
considers the influence of sample misclassification on
model construction;
3) ensemble learning-based approach. The latter deploys
cases, the number of samples in one class considerably exceeds
several simple classifiers (base classifiers) and constructs
that in other classes. Machine learning-based classification is a
exclusive training sets for them. Each exclusive training set
popular paradigm for solving the problem of class-imbalanced.
is used for training each base classifier. The classification
The core idea of machine learning is to find underlying patterns
results of all base classifiers are collected and then the
and patterns in a large set of training samples (i.e., training
label prediction of the test sample is accomplished through
sets), which makes the established classifier able to identify
an ensemble strategy. Generally, an ensemble classifier
can achieve higher classification accuracy and sample
Manuscript received 8 August 2022; revised 7 November 2022, 20 December generalization ability than the single classifier, although
2022, and 26 January 2023; accepted 22 February 2023. Date of publication
28 March 2023; date of current version 25 September 2023. This work was the performance of the base classifier in the ensemble is
supported by the Zhejiang Provincial Natural Science Foundation under Grant much worse than that of a single complex classifier. In [5],
LY23F030001. (Corresponding authors: Aiguo Song; Tian-Yu Zuo.) the author provided evidence for this point of view.
Poly Z. H. Sun is with the Department of Industrial Engineering, School
of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, Bagging and boosting are two basic ideas in ensemble learn-
China (e-mail: zh.sun@sjtu.edu.cn). ing. Although the two have been proposed decades ago, they
Tian-Yu Zuo is with the School of Automation, Nanjing University of still influence the development of ensemble learning methods
Information Science & Technology, Nanjing 210044, China (e-mail: tian_
yu_zuo@163.com). in recent years, even the latest technology borrows from one or
Rob Law is with the Asia-Pacific Academy of Economics and Management; both of these core ideas. Moreover, the sampling-based approach
Department of Integrated Resort and Tourism Management, Faculty of Busi- and cost-sensitive learning-based approach are used to im-
ness Administration, University of Macau, Macau 0999078, China (e-mail:
roblaw@um.edu.mo). prove the performance of ensemble learning-based algorithms.
Edmond Q. Wu is with the Department of Automation, Shanghai Jiao Tong Wu et al. [6] proposed an uncorrelated cost-sensitive multiset
University, Shanghai 200240, China (e-mail: edmondqwu@163.com). learning (UCML) method. This is a bagging method, which
Aiguo Song is with the School of Instrument Science and Engineering,
Southeast University, Nanjing 210096, China (e-mail: a.g.song@seu.edu.cn). is based on the idea of multiset feature learning. A balanced
Digital Object Identifier 10.1109/TETCI.2023.3251400 set construction strategy cost-sensitive technology are used to
2471-285X © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on April 17,2024 at 07:47:31 UTC from IEEE Xplore. Restrictions apply.
SUN et al.: NEURAL NETWORK ENSEMBLE WITH EVOLUTIONARY ALGORITHM FOR HIGHLY IMBALANCED CLASSIFICATION 1395
provide uncorrelated balanced sample sets for the base classifier. For this motivation, this paper proposes designing a strategy to
Furthermore, Jing et al. [7] considered the influence of the evolve the structural parameters of base classifiers automatically
distribution difference between multiple balanced sets and the to obtain enhanced ensemble model generalization performance.
original training set on the model performance. Then, a balanced Our ensemble optimization is developed on the basis of the EA
set construction method based on generative adversarial network paradigm, and the optimization object is each base classifier in
was proposed. The performance of UCML was further improved the ensemble. Specifically, we design an evolutionary algorithm
through combining it with an ensemble depth measurement (EA) mechanism for the imbalanced classification (especially
(DM) network. The cost of misclassification was included in for highly imbalanced classification), including the encoding
the loss function of the prediction network (i.e., ensemble DM mode of the base classifier and the specific implementation
network), so the method [4] can be considered a combination of evolutionary operators. Through selection, crossover, and
of bagging boosting. Density-based undersampling strategy and mutation, the structure of each base classifier and the ensemble
cost-sensitive classification method are combined to provide a of multiple base classifiers can be optimized in the iterative
solution algorithm for class-imbalanced classification [8]. In [9], process. The crossover operation cannot only realize local search
a multiobjective optimization method based on evolutionary at the structure level of the base classifier but also realize
algorithms (EAs) [10] was developed by Fernandes et al. to the optimization at the structure level of the ensemble. In the
construct optimally balanced sets. The core goal of this work iterative process of evolution, those base classifiers with lower
was to allocate the best-balanced training set for a set of base performance will be replaced with new and more promising base
classifiers to improve the overall prediction performance. There- classifiers. The evolutionary objective is to improve the perfor-
fore, how to select samples and then form balanced training mance of the ensemble in the imbalanced classification problem.
sets can be regarded as the optimization object. The balanced A balanced-multiset construction strategy is also developed to
set is encoded using genetic algorithm (GA) individuals. An assign training data to the base classifier. Lastly, a weighted
ensemble consisting of multiple multilayer perceptrons (MLPs) voting strategy is developed for the ensemble. The basic neural
was trained using multiple balanced sets. The method [9] can network (NN) is adopted as the base classifier in this study.
also be considered a bagging method. Therefore, our proposed method is termed NN evolutionary
How to construct an effective ensemble is an open question. based on EA for the ensemble (NNEAE).
Diversity and accuracy are two criteria to evaluate the effec- It should be noted that the general EA framework proposed in
tiveness of the ensemble. Considerable research is focused on this paper requires a flexible variable-length representation in an
developing optimization algorithms or new objective functions EA algorithm such as genetic programming and variable-length
to improve the diversity and accuracy of the ensemble. The real-code GAs to meet the system requirements.
difficulty of ensemble optimization lies not only in the existence This paper also provides a contribution to the development
of many crucial factors related to diversity and accuracy but of EA in the field of neural architecture search (NAS) [18].
also in the extremely difficult construction of a differentiable Ensemble optimization can be considered a special case of NAS
loss function to guide the optimization process of the ensemble. in the field of ensemble learning. Specifically, in ensemble opti-
Ensemble optimization is a gradient-free optimization problem. mization, not only the structure of individual networks (i.e., base
Naturally, EA technology is expected to provide a solution to the classifier) needs to be optimized, but also the ensemble structure.
classifier ensemble optimization problem, given that it has the The solutions for NAS include gradient-based methods as well
ability to solve a wide range of optimization problems without as EA-based methods. The former requires artificially defined
gradient information [11], [12], [13]. The ensemble optimization loss functions related to the expected performance. For example,
method based on objective function is the main application of [19] defines a loss function related to the network inference time
EA technology in the ensemble optimization problem. Exist- in order to deploy lightweight networks on mobile. However, in
ing ensemble optimization methods for classification problems, ensemble learning, it is difficult to define loss functions related
whether or not developed on the basis of the EA paradigm, to the diversity of base classifiers. Therefore, gradient-based
focus on constructing multiple balanced sets for training base solutions are hindered in implementing ensemble optimization.
classifiers [14], [15]. The basic idea of these methods is to elim- From this perspective, the importance of using EA-based meth-
inate the correlation of the training sets of each base classifier ods for ensemble optimization is also demonstrated. Besides, the
to generate a group of diverse classifiers. Using diverse training fact that ensemble optimization is a gradient-free optimization
sets to train different classifiers could help the base classifier problem makes it natural to introduce EA technology.
focus on a part of the entire available training set, then the The contributions of this paper are summarized as follows.
diversity of the ensemble can be guaranteed by the integration 1) In this paper, we develop a novel ensemble learning-based
of multiple base classifiers. Such an idea is intuitive, and the framework for imbalanced classification problems. An
validity was also confirmed by [16], [17]. However, this idea optimization strategy based on the EA paradigm is pro-
is to reflect on classification problems only in terms of dataset posed to optimize both the base classifier structure and
construction. For the model design of an ensemble classification the ensemble structure. NN is chosen as the base classifier,
problem, we believe that to make the classification performance hence our method is called Neural Network Ensemble with
more generalizable, the design of the structural parameters of Evolutionary Algorithm (NNEAE for short).
the base classifier itself is a more important issue. Given that 2) In detail, we develop feasible evolutionary operators for
the number of base classifiers in the ensemble is typically large, the base classifier individual representation and population
artificially designing each base classifier is impossible for us. optimization process, including the selection operator,
1396 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 7, NO. 5, OCTOBER 2023
crossover operator, and mutation operator. The evolution- determined by domain knowledge [28]. Datta et al. [29] in-
ary operators can also optimize the ensemble structure. vestigated near-Bayesian support vector machines for imbal-
The objective function drives the classification perfor- anced classification with the cost-sensitive mechanism. Krempl
mance of the ensemble to become more accurate. et al. [30] proposed OPAL, which is a cost-sensitive proba-
3) To ensure the diversity of the ensemble, an ensemble bilistic active learning method. Castro et al. [3] developed a
strategy based on weighted voting is developed. It can dis- cost-sensitive algorithm (i.e., CSMLP), which uses a single
tinguish the status of different evolved classifiers. Exper- cost parameter to differentiate misclassification errors to im-
iments show the contribution of each part of the NNEAE prove the performance of MLPs in binary imbalanced class
algorithm and its superiority over other comparison algo- distributions.
rithms in solving the imbalanced classification problem. 2) Algorithm-Level Methods: The idea of the algorithm-level
4) This work enriches the application of EA techniques methods is to design an imbalanced learning algorithm that could
in the field of ensemble learning. In previous works, obtain a better classification accuracy for imbalanced classifica-
EA techniques have been used to construct training tion problems than directly performing classical classifiers such
sets for base classifiers. Different from the previous as KNN, SVM, NN, and decision trees (DTs). Two strategies
work, our work focuses on exploiting EA techniques have been widely used for solving imbalanced classification:
to optimize the base classifier structure its ensemble ensemble-based strategy [31], [32] and algorithmic classifier
structure. modification [33].
The rest of this paper is organized as follows. In Section II, we Ensemble-based Strategy: The ensemble learns data features
briefly review the related research on imbalanced classification by deploying multiple base classifiers. Then, the classification
methods in recent years. Our proposed method NNEAE is result is determined by the fusion of multiple classifiers. Bag-
described in Section III. Extensive experiments are shown in ging [34] and boosting [35] are the two most basic forms of
Section IV. Lastly, the conclusions and future work are indicated ensemble learning. Excellent algorithms, such as EasyEnsem-
in Section V. ble [1] and EUSBoost [36], are developed on the basis of the
idea of either bagging or boosting, or both.
II. RELATED WORK Algorithmic Classifier Modification: Improving or modifying
an existing classification algorithm to obtain a classifier with
A. Imbalanced Classification better classification performance for the imbalanced classifica-
Generally, the solutions to data classification with class- tion problem is another research direction. In previous research,
imbalanced could roughly be divided into two categories: data- SVM, extreme learning machine, and NN were optimized for
level methods and algorithm-level methods [20]. imbalanced classification problems [20], [37], [38], [39], [40],
1) Data-Level Methods: To address imbalanced classifica- [41].
tion problems, two strategies are introduced in the data-level
methods: data preprocessing and cost-sensitive learning.
Data Preprocessing: Data preprocessing of imbalanced clas- B. Technique of EA
sification in data-level methods includes resampling and feature Many ensemble optimization methods based on EA technol-
selection techniques. Resampling techniques aim to rebalance ogy are developed specifically for class-imbalanced classifica-
the sample space and alleviate the negative effect of skewed class tion problems. The most common method is to use EA as an
distribution for the classifier learning process [21]. Three com- optimization technique to optimize hyperparameters. The most
mon methods can be used to implement resampling techniques, commonly used optimization method is GA or multiobjective
which are the oversampling [22], undersampling [1], [23], and GA. The manifold clustering-based resampling technique was
hybrid ways [24]. The basic idea of oversampling is to create proposed in [11]. The optional operations and key parameters
or replicate a certain number of minority samples to eliminate were optimized by using GA. In addition, GA-based methods are
the imbalanced data distribution in the training dataset [25]. By used to reconstruct imbalanced datasets [12], [13]. Chromosome
contrast, undersampling discards samples from majority classes, coding is adopted to indicate whether the majority of class
while leaving minority classes unchanged. Data may be lost due samples are sampled in the training set. This method is difficult
to the drop in a large amount of data in the majority class. to converge when the dataset is large. It is not data-level, but
The hybrid way combines oversampling and undersampling algorithm-level. The purpose of this method is to create class-
simultaneously. The goal of feature selection methods is to imbalanced learning with optimal classification performance
remove irrelevant features in the original feature space. Then, a adaptively. Given the wide applicability of the EA method to
subset of streamlined features could be obtained, which allows optimization problems, there are some interesting works. In [14],
the classifier to achieve better classification performance. In particle swarm optimization was used to optimize the prediction
general, feature selection could be divided into filters, wrappers, results of the base classifier. In [15], the surrogate model was
and embedded ways [26]. used to speed up the evaluation of evolutionary undersampling.
Cost-sensitive Learning: Cost-sensitive learning [3], [4], [27] This effort, to some extent, alleviates the shortcomings of EA-
is a learning paradigm, which gives higher costs for the mis- based dataset construction methods. However, its classification
classification of minority samples compared with that of ma- accuracy will decrease when the surrogate gives the wrong
jority samples. In actual applications, the cost matrices are evaluation.
In recent years, genetic programming (GP) technology has

also helped the development of ensemble learning. The GP OMajority
class ES] I
program is mainly used to construct base classifiers. Bhowan
“Minority
class SS wabeeeeneee
Subset
‘—
1 Balanced
set
1
et al. [42] proposed a multiobjective GP method to evolve
accurate and diverse ensembles of GP classifiers. On the basis
: | = SS Balanced set 2
sohenenenes
Subset n :
of previous work, they also developed a two-stage GP-based
method for evolving ensemble for imbalanced data [43]. Given E35 — ae
that the cost-sensitive learning method has shown to be effec- Training
set E33 seirreneness
Balanced setn
eee
tive in addressing the imbalanced classification problem, the

(a) Balanced-multisets construction strategy (b) Training set allocation
cost matrix may not be well constructed by manual design be-
cause people can master limited domain knowledge of complex Fig. 1. Whole process of data partitioning.
real-world applications. A GP method was developed for the
construction of classifiers in the cost-sensitive learning method,
in which the cost matrix could automatically be learned through classifiers in the evolving ensemble structure will be lower than
the training of the GP process instead of requiring it from domain that in the initialization, which enables the ensemble structure
knowledge [44]. Different from previous research, to avoid the to be pruned. Furthermore, different from [47], NN, rather than
influence of biased classifiers in evolved nondominated front, SVM, MLP, and DT, is selected as the base classifier in this study.
GP was reused to search for a group of evolved GP classifiers in
their research. It exhibited excellent cooperation in the ensemble. III. METHODOLOGY
Other developments concerning EA techniques in ensemble
learning can be seen [14], [15], [16], [45], [46]. A. Overview
References [17] and [47] are two works similar to this paper. NNEAE includes three stages. In the first stage, a balanced-
In [17], the authors focused on the weight parameter opti- multiset construction strategy is used to search for suitable data
mization and topological structure setting problems for the NN partitioning. In the second stage, classifier evolution based on
classifier. NN, as a powerful classifier, is sensitive to weight EA iteratively optimizes the base classifiers until reaching the
parameters and network topology. A coupling relationship exists termination condition. In the last stage, evolved NN classifiers
between weight parameters and network topology. To solve the are voting for the ensemble to solve imbalanced classification.
above problem, a cooperative coevolutionary algorithm is used
to optimize the weight parameter optimization and topology B. Balanced-Multiset Construction Strategy
structure of the NN classifier simultaneously. Niche technology
Similar to [7], this study divides the majority class into multi-
is adopted, a set of candidate networks with a higher level
ple subsets to solve highly imbalanced classification problems.
of output diversity is obtained. Different from [17], our work
The number of samples in each subset is the same as that in
focuses on the correspondence between training data and net-
the minority class. If the samples in the majority class cannot
work structure. The reason for giving up the concern about
be divided exactly by the number of samples in the minority
weight parameters is that in the proposed NNEAE, the depth
class, some samples in the majority class will be left over. In
and neuron numbers of the NN classifier are strictly limited. As
this case, we randomly select several samples in the majority
a result, the parameter training of the NN classifier is not difficult.
class, then combine them with the leftover samples to construct
Weight parameters can be well learned using general optimiza-
a new subset. Note that the subsets contain only the samples
tion algorithms, such as Adam. When the network complexity
in the majority class. Next, we combine each subset with the
is limited, the importance of weight parameter optimization
samples in the minority class to obtain multiple balanced sets.
is reduced. Determining what kind of network structure can
Fig. 1(a) illustrates the process of the balanced-multiset con-
generalize and understand the rule inside given data is a more
struction strategy. First, majority class samples are divided into
worthy task to explore. In [47], the author claimed that similar
n subsets by the number of minority class samples. Then, all
classification performance may be achieved using more or fewer
minority class samples are concatenated with each subset to
or equal numbers of base classifiers. Excessive classifiers do
form n balanced sets. Each balanced set is bound to a unique
not provide useful information for classification but increase
classifier as the training material.
computational costs. They called this property multimodal. To
The reasons for assigning different balanced subsets to each
achieve the most compact and accurate ensemble structure, a
base classifier are as follows:
dual evolutionary bagging framework was proposed. However,
1) encourage the base classifier to learn the patterns of differ-
in such a framework, the number of base classifiers in a compact
ent substructures in the original dataset and then improve
ensemble still needs to be determined manually in advance.
the classifier diversity in the ensemble;
What we want is an algorithm that automatically “compacts”
2) avoid the trained model overfitting to the minority class
the ensemble structure without introducing human experience.
caused by a unified balanced training set.
The crossover operation proposed in this paper can achieve
optimization at the level of ensemble architecture. Classifiers
that lead to low performance and excess will eventually be C. Classifier Evolution Based on EA
discarded in the course of evolution. Ultimately, the number of Classifier evolutionary based on EA includes the following:
Parent classifier 1 Offspring classifier 1 classifier 2 to form offspring classifier 1. The red blocks from
\IFVP
</ parent classifier 2 are combined with the green blocks from
ier parent classifier 1 to form offspring classifier 2. If we chain
Nel parent classifiers 1 and 2 together to form a “big network”, the
offspring classifier is essentially a big network with a residual
layer added. For example, parent classifier 1 is the result of
M9 adding a jump connection between the first layer and the second
L
OY
\/e.\
KK) SOK
YS 7
K)
Ki
Van)
to the last layer in a big network. This gives the crossover the
RON Oe) ability to avoid overfitting because the structure of the offspring
classifier must inherit from the big network. The evolutionary
Parent classifier 2 Offspring classifier 2
learning process of NNEAE can also be viewed as how to weave
Fig. 2. Crossover operation: The third and second layers of parent classifiers the optimal residual structure from the initial multiple base
1 and 2, respectively, are chosen as the crossover layer. The first and second classifiers.
layers of parent classifier 1 are combined with the first layer of parent classifier
2 to generate offspring classifier 1. Then, the third layer of parent classifier 1
There are two restrictions in the process of crossover
is combined with the second and third layers of parent classifier 2 to generate operation:
offspring classifier 2. 1) To avoid overfitting and long training time consumption
caused by crossover operation, the network depth of off-
spring classifiers is limited to a predetermined range.
1) base classifier initialization; 2) The offspring classifiers must be generated by the two-
2) selection operation; parent classifiers together; i.e., the offspring classifier must
3) crossover operation; include one part of the network structure from both parent
4) mutation operation; classifiers.
5) classifier environmental selection; Finally, one of the two generated offspring classifiers (whether
6) terminal condition. generated by copying the structure of parent classifiers or com-
1) Base Classifier Initialization: The design of the base clas- bining subpart structure from the two-parent classifiers) is ran-
sifier in this paper is for the NN classifier. Therefore, initial NN domly selected as the output of the crossover operation.
base classifiers are created first. The number of hidden layers 4) Mutation Operation: For the offspring classifier gener-
of each classifier and the number of neurons in each hidden ated by the crossover operation, whether to carry out the muta-
layer are assigned randomly within a predetermined range. The tion operation by probability will be decided. For the offspring
number of NN classifiers is equal to the number of balanced classifier that performs mutation operation, one of the following
sets. Each balanced set is assigned to each NN classifier as the three mutation ways will be chosen randomly to perform: (a)
training set. Fig. 1(b) illustrates the process of the training set adding one hidden layer (the number of neurons is randomly
allocation. Given that our method is an ensemble method, the generated), which is limited to a predetermined range; (b) delet-
shallow base classifier is advantageous to avoid overfitting. ing one hidden layer, and (c) modifying the number of neurons
2) Selection Operation: The selection operation randomly of one hidden layer.
selects a certain (even) number of classifiers from all classifiers Likewise, there are two restrictions in mutation operation: 1)
and stores them in a classifier set termed evolution pool (pool If the number of hidden layers in the offspring classifier is greater
for short). than or equal to the predetermined maximum number of layers,
The pool size is a key parameter of NNEAE. It represents the only (b) or (c) can be performed to prevent the classifier from
number of selected classifiers in a once-selection operation. The overfitting. 2) On the contrary, if the number of hidden layers in
impact of the pool size will be further discussed in Section IV. the classifier is less than or equal to the predetermined minimum
3) Crossover Operation: All classifiers in pool will be paired number of layers, only (a) or (c) can be performed to prevent the
randomly to form multiple paired parent classifiers, and then we classifier from underfitting.
judge whether to perform crossover operation by probability. 5) Classifier Environmental Selection: Each paired parent
For those paired parent classifiers not participating in the classifier generates one offspring classifier by crossover oper-
crossover, the structure of two-parent classifiers in each pair ation (the offspring may also mutate through mutation opera-
will be copied directly to generate two offspring classifiers. tion). Classifier evolution occurs between the three classifiers
For those paired parent classifiers who participate in the (two-parent and one-offspring classifiers).
crossover, a crossover operation occurs on each pair of partici- The classifier environmental selection includes classifier evo-
pating parent classifiers. The rule of crossover operation in our lution and classifier fusion. In the process of classifier environ-
method is similar to that of the one-point crossover operation mental selection, the offspring classifier is judged first whether
in EA. Fig. 2 shows an example of how crossover operation it replaces the two-parent classifiers (i.e., classifier fusion). If
is conducted on paired parent classifiers. Parent classifier 1 is not, the offspring classifier will be judged on whether it can
regarded as composed of yellow and green blocks in series, evolve into the parent classifier (i.e., classifier evolution). The
parent classifier 2 is regarded as composed of red and blue blocks two operations are explained as follows.
in series. Through the crossover operator, the yellow blocks of Classifier Fusion: The offspring classifier takes the union
parent classifier 1 are cascaded with the blue blocks of parent training sets of the two-parent classifiers as its training set and
performs training. If its classification performance is better than

Algorithm 1: Classifier Evolutionary Based on EA.
that of the two-parent classifiers, then the two-parent classifiers
are replaced with their offspring classifier. Input: Initial NN classifier: X = {X1 , X2 , . . ., Xn };
Classifier Evolution: The offspring classifier uses the training Training set (i.e., Balanced set of each classifier):
set of each parent classifier as its training set and performs D = {D1 , D2 , . . ., Dn };

training. If the performance of the offspring classifier is better Testing set: T = {Tmaj Tmin };
than that of its parent classifier, then means the structure of the Size of pool: P ;
offspring classifier is better than that of its parent classifier. Minimum number of remaining classifiers: N ;
Therefore, the parent classifier is replaced with the offspring Crossover probability: μ;
classifier. During iterations, classifier evolution allows each Mutation probability: ξ.
classifier to improve. Output: A group of evolved NN classifiers.
6) Terminal Condition: In our method, there are two terminal 1: while iteration < M ax_iteration && the current
conditions of classifier evolution. number of base classifiers ≥ N do
Condition 1: The maximum number of iterations M is met; 2: Randomly select P classifiers from
Condition 2: The current number of base classifiers is less than X = {X1 , X2 , . . ., Xn } and pair them up into N
N . In this paper, N is set to the maximum value between 20% groups: {Xa , Xb }1 , . . ., {Xc , Xd }N according to the
of the total number of initial classifiers and the predetermined rule of selection operation;
minimum number of classifiers. For example, if the total num- 3: for i ← 1 to N do
ber of initial classifiers is 10 and the predetermined minimum 4: if μ < random(0, 1) then
number of classifiers is 3, then N is equal to 3. Condition 2 is 5: Generate one offspring classifier Ẋi from the
designed to ensure that a certain number of classifiers remain two-parent classifiers {Xv , Xw }i according to
when the algorithm is terminated. It is conducive to voting for the rule of crossover operation;
the ensemble by multiple evolved classifiers at the final stage. 6: end if
The algorithm is terminated when one of the two conditions 7: if ξ < random(0, 1) then
is met. The pseudocode of classifier evolution is shown in 8: Generate one offspring classifier Ẋi from the
Algorithm 1. one parent classifier according to the rule of
mutation operation;
9: end if
D. Ensemble Strategy 10: Gi ← G-mean of the trained Ẋi on {Dv } {Dw };
Given that some offspring classifiers replace their two-parent 11: if G1 > G-mean of Xv && G1 > G-mean of Xw
classifiers through classifier fusion, after classifier evolution, the then
number of final evolved NN classifiers for the ensemble will be 12: X ← (X − {Xv , Xw }) Ẋi ;
less than the number of initial NN classifiers. 13: D ← (D − {Dv , Dw }) ({Dv } {Dw });
Our ensemble strategy adopts the voting method. Before the 14: else
classifier evolution begins, each classifier has one ballot. Once 15: (G2 , G3 ) ← G-mean of the trained Ẋi on {Dv }
the (parent) classifier is replaced with its offspring classifier and {Dw }, respectively;
through classifier evolution, the ballot of this parent classifier 16: end if
is transferred to its offspring classifier. If two paired parents 17: if G2 > G-mean of X v then
are replaced with their offspring classifier through classifier 18: X ← (X − {Xv }) Ẋi ;
fusion, the ballots of these two parent classifiers are transferred to 19: end if
their offspring classifier (i.e., the offspring classifier acquires all 20: if G3 > G-mean of Xw then
ballots of its two parent classifiers). Finally, the remaining classi- 21: X ← (X − {Xw }) Ẋi ;
fiers are voting for the ensemble in accordance with their ballots. 22: end if
As shown in Fig. 3, in generation 1, five trained NN classifiers 23: end for
are initialized. The pool size is set to 4. Then, initial classifiers 24: end while
1 and 2 are paired off into group 1; initial classifiers 3 and
4 are paired off into group 2; initial classifier 5 is preserved
into the next generation. Through the crossover operation, initial classifier 3 (one of its parent classifiers). The offspring
mutation operation, and classif ier f usion, if the classifica- classifier (i.e., evolved classif ier 1) replaces initial classifier
tion performance of the offspring classifier is better than that of 3 and inherits the ballot owned by initial classifier 3. Then,
the two-parent classifiers respectively, the two-parent classifiers the offspring classifier with one ballot is preserved in the
will be fused into one classifier. For example, initial classifiers next generation together with the initial classifier 4. Along
1 and 2 (i.e., group 1) are fused into f used classif ier 1. this line of thought, in generation 2, f used classif ier 1 and
F used classif ier 1 replaces its parent classifiers and in- evolved classif ier 1 are paired off into group 3, and initial
herits their ballots. Therefore, f used classif ier 1 gener- classifiers 4 and 5 are paired off into group 4. According
ates two ballots. Similarly, in group 2, according to the to the operation of classifier fusion, f used classif ier 2 is
classif ier evolution, the offspring classifier just outperforms generated from group 3, and it inherits three ballots from
TABLE I
THE PARAMETERS SETTING IN NNEAE
Parameter Type Value Range

The number of initial NN classifiers (classi fier) Integer Number of balanced sets -
The number of hidden layers of each NN classifier Integer [3, 4] -
The number of neural of each hidden layer Integer [5, 10] -
The maximum number of hidden layers of each offspring classifier Integer 6 -
The minimum number of hidden layers of each offspring classifier Integer 2 -
The number of pools (#pool) Integer #classifier*20% #classi fier*20-30% (recommended)
Terminal condition: the maximum number of iterations Integer 30 -
Terminal
condition:
theminimum
number
of remained
classifiers
N Integer mazx{5,#classifier*20%} —-
Crossoverprobability Real 0.95 [0, 1]
Mutationprobability Real 0.30 [0, 1]
TABLE II
Ee
EE
EE
aco Generation 4
SUMMARY OF THE DATASETS USED IN EXPERIMENT A
Datasets
| #Feature #Total #Training
| #Test IR
Generation
3 ShuttleOvs4 9 1,706,123
| 853,61| 853,62
| 13.87
Glass5 9 205,9 102,4 103,5| 22.78
Yeast6 8 1447,37 725,18| 724,17
| 41.40
Generation 2
Abalone19 8 4142,32
| 2072,16
| 2073,1
| 129.50
Generation 1
In imbalanced classification, G − mean is the most com-
monly used evaluation indicator, given that it evaluates the
degree of inductive bias in terms of true positive rate and true
Fig. 3. Example of an ensemble strategy. The blue, green, and orange boxes negative rate [27]. Moreover, AU C is a reference indicator for
denote the initial classifier generated at the initialization stage, the evolved
classifier generated by the operation of classifier evolution, and the fused
classification. The experimental comparison designed in this
classifier generated by the operation of classifier fusion, respectively. The red paper is mainly based on G − mean, supplemented by AU C
circle at the top right of each box indicates its corresponding ballots. as a reference. G − mean can be calculated as follows:

TP TN
parent classifiers (i.e., f used classif ier 1 with two ballots and G − mean = · (1)
TP + FN TN + FP
evolved classif ier 1 with 1 ballot). The f used classif ier 3 is
generated group 4 and it inherits 2 ballots from parent classifiers. where, T P , F P , F N , and T N represent true positive, false
In generation 3, f used classif iers 2 and f used classif iers 3 positive, false negative, and true negative, respectively.
are fused to form f used classif ier 4, which inherits five ballots
from f used classif iers 2 and f used classif iers 3. C. Experimental Result Analysis
1) Experiment A: In Experiment A, NNEAE is compared
IV. EXPERIMENTS with DM-UCML [7], EUSBoost [36], DBSMOTE [49], CoSen-
A. Competing Methods CNN [4], LMLE-kNN [50], and UCML [6]. The training and
test sets are generated by stratified sampling. The partition ratio
To verify the performance of NNEAE in solving highly im-
on the training and test sets (#training : #test) is 1:1, which
balanced classification, two recent papers [7] and [48] are used
follows the settings in DM-UCML. The basic information of
as the baseline. Given that the two papers used different partition
imbalanced datasets and the corresponding partition ratio of each
ratios on the training and test sets, two experiments are designed
dataset are shown in Table II. The representation from columns
to compare. All datasets used in the experiments are downloaded
3–5 means (#majority, #minority).
from the KEEL website 1 .
Table III shows the classification results on four highly imbal-
anced datasets. The second row of Table III gives the imbalance
B. Experimental Settings and Indicators
ratio (IR) of each dataset which is the ratio of the number
The hardware configuration for our experiments is: Core i9- of samples in the majority class to that in the minority class.
10900 K 3.70 GHz; the software configuration is Ubuntu 18.04, NNE is the classification result of directly using NN classifiers
Python 3.7, Tensorflow-CPU-2.2.0. The classifier evolution in for the ensemble with plurality voting. NNEAE w/o ES is the
NNEAE can be parallel computing. Therefore, the classifier classification result of evolved NN classifiers for the ensemble
evolution in our experiments adopts CPU parallel computing. without ensemble strategy (i.e., weighted voting). NNEAE is the
The value types, the value settings, and the ranges of all the classification result of evolved NN classifiers for the ensemble
parameters NNEAE are reported in Table I. Additional informa- with weighted voting. From the comparison between NNE and
tion is available in the GitHub link: https://github.com/polysun/ NNEAE in three experiments (Glass5, Yeast6, and Abalone19),
NNEAE. EA brings an average of 7.38% improvement rate in G − mean
and 5.83% in AU C. This result shows the effectiveness of the
1 https://sci2s.ugr.es/keel/imbalanced.php?order=ir#sub10 classifier evolution. The performance of NNE on Shuttle0vs4 is
TABLE III
COMPARISON RESULTS (30 RUNS) IN EXPERIMENT A ON THE GLASS5, SHUTTLE0VS4, YEAST6, AND ABALONE19 WITH THE PARTITION
RATIO EQUAL TO 1:1. BOLD VALUES CORRESPOND TO THE BEST G − M ean OR AU C (%)
Datasets ShuttleOvs4 Glass5 Yeast6 Abalone1l9

IR 13.87 22.78 41.40 129.50
Indicator G-mean AUC G-mean AUC G-mean AUC G-mean AUC
EUSBoost 89 90 86 86 70 72 64 65
DBSMOTE 86 87 82 85 69 74 63 65
Baseline CoSen-CNN 92 92 89 90 77 80 71 75
LMLE-kNN 94 95 90 91 78 80 71 76
UCML 93 93 88 91 75 77 70 73
DM-UCML 98 99 97 97 86 87 78 82
NNE 100 100 89.72 90.07 90.13 90.28 69.33 74.03

Our NNEAEw/o ES 100 100 94.44 94.64 92.38 92.41 81.82 82.34
NNEAE 100 100 96.45 96.53 92.81 92.82 82.06 82.53
TABLE IV TABLE V
AVERAGE TIME CONSUMPTION OF NNEAE (30 RUNS) WITH COMPARED SUMMARY OF THE DATASETS USED IN EXPERIMENT B
METHODS ON FOUR HIGHLY IMBALANCED DATASETS
Datasets #Feature
| #Total |#Training|#Test
| IR
ShuttleOvs4
|Glass5| Yeast6
|Abalonel9 Glass4 9 201,13 160,10 41,3| 15.47
DBSMOTE 12.70s 3.20s 7.60s 15.50s Pageblock13vs4 10 444,28 355,22 89,6| 15.86
CoSen-CNN 125.50s 32.40s | 83.10s 254.30s Abalone
19vs18 8 1433,51| 1146,40
| 287,11
| 16.40
LMLE-kNN 200.50s 46.80s | 116.40s
| 325.80s GlassO1l6vs5 9 175,9 140,7 35,2| 19.44
UCML 12.80s 3.70s 6.30s 15.80s Shuttle2vs4 9 123,6 98,4 25,2| 20.50
DM-UCML 80.40s 20.50s | 50.70s 150.30s Yeast1458vs7 8 663,30
| 530,24 | 133,6| 22.10
NNEAE 3.84s 44.45s
| 52.22s 92.61s Glass5 9 205,9 164,7 41,2| 22.78
Yeast2vs8 8 462,20
| 369,16 93,4| 23.10
Yeast1289vs7 8 917,30
| 733,24 184,6 | 30.57
Yeast5 8 1440,44| 1152,35
| 288,9| 32.73
Ecoli0137vs26 7 274,7 219,5 55,2| 39.14
already 100%. To NNEAE, there is no room for improvement Yeast6 8 1449,35
| 1159,28| 290,7
| 41.40
on this dataset. Therefore, the dataset is not considered when the
average improvement in EA is calculated. In addition, NNEAE TABLE VI
COMPARISON RESULTS (30 RUNS) IN EXPERIMENT B ON FOUR IMBALANCED
w/o ES is compared with NNEAE. The latter outperforms the DATASETS (IR BETWEEN 15 AND 20) WITH THE PARTITION RATIO EQUAL TO
former in all test cases. Hence, the ensemble strategy can effec- 4:1. BOLD VALUES CORRESPOND TO THE BEST G − M ean (%)
tively improve the algorithm performance.
The classification results of the compared methods come Dataset Glass4 |Pageblock13vs4|Abalone19vs18]GlassO16vs5
IR 15.47 15.86 16.40 19.44
from [7]. NNEAE outperforms these compared methods on SMOTE-DBN
— |59.09+40.41|
55.91+0.33 63.87+40.05
| 45.98+0.41
most datasets. Especially on Abalone19, the performance of Baseline
SMOTE-SVM-DBN|12.34+0.06]
72.11+0.0259.89+0.04
| 0.00+40.00
ADASYN-DBN_§
| 5.94+0.03
| 65.04+0.23 14.2240.14
| 12.01+0.05
NNE is much lower than that of DM-UCML, but after classifier EAS-DBN 94.86+0.03|98.37+0.02 72.3640.01
|67.85+0.01
evolution, NNEAE outperforms DM-UCML even without the NNE 93.96+0.01|69.29+0.00 63.97+40.03
|91.05+0.01
Our NNEAEw/o ES_|99.67+0.01]
98.52+0.02 88.52+0.01
| 97.76+40.02
weighted voting mechanism. This result shows that classifier NNEAE 99.63+0.01|98.69+0.02 88.66+0.01
| 98.45+0.02
evolution is effective. It significantly improves the performance
of NNE. Moreover, the performance of NNEAE has improved
again by adding a weighted voting mechanism. This proves Table VI shows the classification results on four highly im-
once again that the ensemble strategy is effective in improving balanced datasets whose IR is between 15 and 20. On Glass4,
the algorithm performance. We also conduct a significance Pageblock13vs4, and Abalone19vs18, the G − mean value of
analysis of the performance of NNE and NNEAE on Glass5, NNE is lower than that of compared methods, but NNEAE w/o
Yeast6, and Abalone19. The p-values are 5.9517e-13, 2.3934e- ES outperforms all compared methods via EA. The advantages
09 and 1.7158e-47, respectively. Accordingly, the advantage of of NNEAE are highlighted by the weighted voting mechanism.
NNEAE over NNE is statistically significant. On Glass4, the introduction of the ensemble strategy slightly
For NNEAE, its evolutionary process can be regarded as its weakens the algorithm performance. Overall, NNEAE improves
training process. We also record the evolutionary time (training the algorithm performance by 16.79% compared with NNE.
time) of NNEAE. The time consumption of NNEAE and that of Table VII shows the classification results on four highly im-
the compared methods are shown in Table IV. Although NNEAE balanced datasets whose IR is between 20 and 30. In Shuttle2vs4
is an EA-based method, its time consumption in solving highly and Yeast2vs8, NNEAE outperforms compared methods. How-
imbalanced classification is still acceptable. This is due to the ever, on Yeast1457vs8 and Glass5, the results of NNEAE are
compatibility of NNEAE with parallel computing. lower than those of EAS-DBN. The ensemble strategy has a
2) Experiment B: In Experiment B, our method is compared slight negative impact on the experimental performance of the al-
with that of [48]. The partition ratio on the training and test gorithm on Glass5 and Yeast2vs8. Nevertheless, it significantly
sets (#training : #test) is 4:1. The basic information of the improves the performance of NNE on Yeast1458vs7.
imbalanced datasets and the corresponding partition ratio of each Table VIII shows the classification results on four highly im-
dataset are shown in Table V. balanced datasets whose IR is higher than 30. The classification
TABLE VII 1.00

0.98
COMPARISON RESULTS (30 RUNS) IN EXPERIMENT B ON FOUR IMBALANCED 0.96
DATASETS (IR BETWEEN 20 AND 30) WITH THE PARTITION 0.94
RATIO EQUAL TO 4:1 0.92

0.90
= 0.88
Dataset Shuttle2vs4|Yeast1458vs7}
Glass5| Yeast2vs8 3 0.86
IR 20.50 22.10 22.78 23.10 = 0.84
SMOTE-DBN
— |42.49+0.38]27.94+40.21
|33.32+0.33]57.92+0.31 0.82
0.80
Baseline
SMOTE-SVM-DBN|
ADASYN-DBN_
0.0040.00
|
| 7.75+0.04
0.00+0.00
|
| 0.0040.00
0.00+40.00
| 0.00+0.00
|57.53+0.21
|5.19+0.03 0.78
EAS-DBN 75.84+0.00|} 81.01+0.00 |99.51+0.00/89.44+0.03 0.76 [| ——Ablaone -=-Glass4

0.74
NNE 97.0840.04| 33.97+40.04 |93.70+0.00/82.24+0.03 [ te Glass5 = Yeast4
0.72 t fiRYeast5 —®Yeast6
Our NNEAEw/o ES |99.80+0.00]
54.20+0.08|98.22+0.02/97.6540.05 0.70
NNEAE 99.80+0.01|56.62+0.05|97.74+40.05]/97.5640.06 1:1 1:2 1:3 1:4 1:5 1:6 1:7
Bold values correspond to the best G-mean (%). Spilt Ratio
TABLE VIII Fig. 4. G − mean value curves of NNEAE on five imbalanced datasets with
COMPARISON RESULTS (30 RUNS) IN EXPERIMENT B ON FOUR IMBALANCED different data partition ratios (#test : #training).
DATASETS (IR HIGHER THAN 30) WITH THE PARTITION RATIO EQUAL TO 4:1
0.850
Dataset Yeast1289vs7}Yeast5 |Ecoli0137vs26|Yeast6 0.830
IR 30.57 32.73 39.14 41.40 0.810
SMOTE-DBN 51.2040.35|77.1940.21}74.7440.19 |79.79+0.18
0.790
Baseline
SMOTE-SVM-DBN|_
0.00+0.00 |87.34+0.12}
66.2640.33 |95.30+0.00
ADASYN-DBN 0.00+0.00 /96.54+0.00}69.70+40.28|55.51+0.38 = 0.770
EAS-DBN 69.75+0.00 |85.3740.02} 99.2740.01 |71.88+0.04

é0.750
NNE 74.4840.01 |95.3740.00} 97.7040.01 |93.94+0.00 © 0.730
Our NNEAE w/o ES_
| 80.56+0.03|96.7640.01}98.13+0.00 |96.72+0.01 0.710
NNEAE 81.7740.04|96.8740.01}98.1340.00 |97.0140.01 —*—EUSBoost —=—DBSMOTE
0.690 + —#—CoSen-CNN —=- LMLE-kNN
Bold values correspond to the best G-mean (%).
0.670 —<—UCML —eDM-UCML
—#-—
NN Ensemble —*—NNEAEnsemble
0.650
20 30 40 50 60 70 80 90 ~=100
110= 120
results show that NNEAE outperforms the compared methods IR
on Yeast1289vs7, Yeast5, and Yeast6. On Ecoli0137vs26, the

Fig. 5. G − mean value curves of NNEAE and compared methods on
G − mean value of NNEAE is slightly lower than that of EAS- Abalone19 while IR increases from 10:1 to 120:1.
DBN. In this experiment, the ability of the ensemble strategy to
improve the algorithm performance successfully is proved again.
Although the ensemble strategy may have a negative impact on experiments tends to have more samples in the training set
the algorithm performance, the impact is insignificant, i.e., less and fewer samples in the test set. To test the performance of
than 0.10%. Meanwhile, the ensemble strategy can improve the NNEAE further, Abalone19 is taken as an experimental dataset
algorithm performance by more than 1% and this positive impact to investigate the influence of data partition ratio on the training
is significant. and test sets. Table IX gives the results under different data
The classification results of comparison experiments in partition ratios (#training : #test changes from 1:1 to 7:1) to
Tables VI–VIII come from [48]. Experiment B shows that investigate the classification performance and time consumption
NNEAE outperforms others on 9 out of 12 highly imbalanced of NNEAE. #pool is set to 30 in this experiment.
datasets. Except for Yeast1458vs7, NNEAE outperforms or is From Table IX, when the data partition ratio tends to be
close to SOTA. more training data, the performance of NNEAE is improving.
Therefore, in a scenario with sufficient data, i.e., as much data
D. Parameter Analysis as possible are used for training, NNEAE will obtain better
1) Influence of Pool: Pool size is a key factor in our method. performance. The four other imbalanced datasets are also used
Generally, the pool size is set in accordance with a certain for testing. The experimental results are shown in Fig. 4. The
percentage of the total number of initial classifiers. The total G − mean value curves of all datasets are increasing when the
number of initial classifiers is calculated as follows: data partition ratio tends to be more training data.

#classif ier = #majority of training set
#minority of training set (2) E. Influence of IR
Abalone19 is taken as an experimental dataset to investigate To analyze the performance of NNEAE under different IR
the influence of pool on classification performance and time of imbalanced data, Abalone19 is selected as the experimen-
consumption. The pool size changes from 10% to 100% of tal dataset to investigate. In this experiment, #pool is set to
#classif ier. The partition ratio on the training and test sets 30. Fig. 5 gives the results under different IRs (#majority :
is 1:1. The results are shown in Table IX. With the increase #minority changes from 10:1 to 120:1) to investigate the
in #pool, the classification performance improvement is not change in G − mean value. The G − mean value curve of
obvious. Empirically, #pool is appropriate to take 20%-30% of NNEAE is always higher than that of all compared methods,
#classif ier. and it is stable when IR is increasing. Therefore, NNEAE has
2) Influence of the Amount of Training Data: The data par- superiority and robustness under different IR of imbalanced data.
tition ratio on the training and test sets in most research and Compared with the G − mean curves of other methods, that of
TABLE IX
RESULT OF PARAMETER ANALYSIS EXPERIMENTS (30 RUNS); THE FIRST PIECE IS #POOL RELATED EXPERIMENT, AND THE LAST PIECE ONE IS
DATA PARTITION RATIO (#training : #test) RELATED EXPERIMENT
#Pool 10% 20% 30% 40% 60% 80% 100%

G— mean 81.44+0.01 82.15-£0.01 81.97+0.01 82.18+0.01 81.56-£0.01 81.99-+£0.01 81.87+0.01
AUC 83.10+0.01 82.66£0.01 82.39+0.02 82.600.02 81.95-£0.01 82.43+0.01 82.300.02
Partition Ratio 1:1 2:1 3:1 4:1 5:1 6:1 7:1
G —mean 81.96+0.01 84.32-£0.00 84.56+0.01 85.14+0.00 85.70-£0.00 86.33£0.00 87.20+0.00
AUC 82.42+0.02 85.55-£0.00 85.70+0.01 86.25+0.00 86.72-£0.00 87.27+0.00 88.020.00
NNE is lower in most cases. However, NNEAE achieves obvious NNEAE outperforms UCML in all instances. NNEAE outper-
improvement by adding the classifier evolutionary mechanism. forms DM-UCML in three of the four instances, only slightly
This finding shows the effectiveness of our method for different inferior to DM-UCML in Glass5 by 0.55%. It indicates that
degrees of imbalance problems. the designed evolutionary mechanism can effectively optimize
Nonetheless, our method is not suitable for the problem with the performance of the ensemble (both base classifier structure
low IR of imbalanced data. When the IR is too low, it leads to and ensemble structure), leading to a significantly improved
fewer #classif ier. Too few #classif ier will make it difficult classification accuracy of the ensemble compared to the initial
to guarantee a sufficient number of classifiers and the diversity ensemble. Besides, compared to DM-UCML which requires
between multiple classifiers for the ensemble. On the basis of manual tuning of the hyperparameters of the classifiers in the
experiments and analysis, we suggest that NNEAE be applied ensemble (i.e., the number of layers of the deep metric learning
to those imbalance problems in which IR is greater than 10 (i.e., network), our proposed EA-based ensemble model does not
highly imbalanced classification). require manual tuning of the parameters for base classifiers.
F. Discussion V. CONCLUSION
This section discusses the impacts of the balanced-multiset This paper proposes NNEAE, which is an EA-based evolu-
construction strategy and ensemble model on the proposed tionary ensemble method that uses NN base classifiers to solve
method in improving classification accuracy. highly imbalanced classification problems. Extensive experi-
In Experiment B, the training set of four compared methods ments have been carried out to prove the promising performance
is constructed by oversampling the data in the minority class and robustness of our method. In consideration of the relative
from the original data set. It ensures all compared methods can independence of the evolutionary process, the time consumption
be trained on a single balanced training set. Meanwhile, the of NNEAE, as an ensemble method, can be greatly reduced by
deep belief network is adopted as the classification model for parallel computing technology. This provides a good prospect
all compared methods. From Tables VI–VII, it can be seen that for the application of NNEAE.
the classification accuracy of the compared methods on different However, NNEAE still has the defect of limited application
data sets is extremely unstable. For example, the classification scope. This is mainly because the individual encoding form
accuracy of ADASYN-DBN on Yeast5 is 96.54%, while that on of NNEAE is limited to the structure of the MLP. As a re-
Yeast1289vs7 is 0%. We believe the instability of classification sult, this method is only suitable for imbalanced classification
accuracy of compared methods is caused by the randomness problems where the data is organized as 1D vectors. Therefore,
of constructing a single balanced training set by oversampling. it is worthwhile to develop an encoding method for NNEAE
It is because the sparse distribution of samples in the minority that is suitable for high-dimensional feature extractors such as
class makes it difficult to create a single balanced training set convolutional layers and pooling layers. Furthermore, NNEAE
consistent with the distribution of the samples in the original is expected to solve the multiclass and image classification
minority class. At this point, the advantage of constructing mul- problems.
tiple balanced training sets is reflected. It can be seen NNEAE
not only outperforms the compared method in 9 of 12 instances REFERENCES
but also obtains an average accuracy of 92.59% in 12 instances.
It shows our proposed balanced-multiset construction can help [1] X.-Y. Liu, J. Wu, and Z.-H. Zhou, “Exploratory undersampling for class-
imbalance learning,” IEEE Trans. Syst. Man Cybern. Part B-Cybern.,
the algorithm be numerically stable. vol. 39, no. 2, pp. 539–550, Apr. 2009.
Both UCML and DM-UCML can be regarded as ensemble [2] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote:
learning-based methods. The ensemble model construction of Synthetic minority over-sampling technique,” J. Artif. Intell. Res., vol. 16,
no. 1, pp. 321–357, 2002.
the above two methods is consistent with this paper. The clas- [3] C. L. Castro and A. P. Braga, “Novel cost-sensitive approach to improve
sifier of UCML is the nearest neighbor classifier. DM-UCML the multilayer perceptron performance on imbalanced data,” IEEE Trans.
is an optimized version of UCML, and its classifier is a deep Neural Netw. Learn. Syst., vol. 24, no. 6, pp. 888–899, Jun. 2013.
[4] S. H. Khan, M. Hayat, M. Bennamoun, F. A. Sohel, and R. Togneri, “Cost-
metric learning network. From Table VII, it can be found that sensitive learning of deep feature representations from imbalanced data,”
NNE outperforms UCML in three of the four instances, and is IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 8, pp. 3573–3587,
only 0.67% worse than UCML in Abalone19. It shows NN is a Aug. 2018.
[5] K. Tumer and J. Ghosh, “Analysis of decision boundaries in linearly com-
promising base classifier for the ensemble. After introducing bined neural classifiers,” Pattern Recognit., vol. 29, no. 2, pp. 341–348,
the evolution mechanism, as an optimized version of NNE, 1996.
[6] F. Wu, X.-Y. Jing, S. Shan, W. Zuo, and J.-Y. Yang, “Multiset feature [29] S. Datta and S. Das, “Near-Bayesian support vector machines for imbal-
learning for highly imbalanced data classification,” in Proc. AAAI Conf. anced data classification with equal or unequal misclassification costs,”
Artif. Intell., 2017, pp. 1583–1589. Neural Netw., vol. 70, pp. 39–52, 2015.
[7] X.-Y. Jing et al., “Multiset feature learning for highly imbalanced data [30] G. Krempl, D. Kottke, and V. Lemaire, “Optimised probabilistic active
classification,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 1, learning (OPAL) - for fast, non-myopic, cost-sensitive active classifica-
pp. 139–156, Jan. 2021. tion,” Mach. Learn., vol. 100, no. 2–3, pp. 449–476, 2015.
[8] K. Yang et al., “Hybrid classifier ensemble for imbalanced data,” [31] Z. Yu et al., “Adaptive semi-supervised classifier ensemble for high dimen-
IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 4, pp. 1387–1400, sional data classification,” IEEE T. Cybern., vol. 49, no. 2, pp. 366–379,
Apr. 2020. Feb. 2019.
[9] E. R. Q. Fernandes, A. C. P. L. F. de Carvalho, and X. Yao, “Ensemble of [32] Z.-H. Zhou, Ensemble Methods: Foundations Algorithms. London, U.K.:
classifiers based on multiobjective genetic sampling for imbalanced data,” Chapman Hall/CRC, 2012.
IEEE Trans. Knowl. Data Eng., vol. 32, no. 6, pp. 1104–1115, Jun. 2020. [33] Z.-Y. Chen, Z.-P. Fan, and M. Sun, “A hierarchical multiple kernel support
[10] I. Fister, P. N. Suganthan, S. M. Kamal, F. M. Al-Marzouki, M. Perc, and vector machine for customer churn prediction using longitudinal behav-
D. Strnad, “Artificial neural network regression as a local search heuristic ioral data,” Eur. J. Oper. Res., vol. 223, no. 2, pp. 461–472, 2012.
for ensemble strategies in differential evolution,” Nonlinear Dyn., vol. 84, [34] L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, no. 2,
no. 2, pp. 895–914, 2016. pp. 123–140, 1996.
[11] Y. Guo, J. Feng, B. Jiao, L. Yang, H. Lu, and Z. Yu, “Manifold cluster-based [35] P. Bühlmann, Bagging, Boosting Ensemble Methods, Berlin, Germany:
evolutionary ensemble imbalance learning,” Comput. Ind. Eng., vol. 159, Springer 2012, pp. 985–1022.
2021, Art. no. 107523. [36] M. Galar, A. Fernández, E. Barrenechea, and F. Herrera, “Eusboost:
[12] Y. Guo, H. Yang, M. Chen, J. Cheng, and D. Gong, “Ensemble prediction- Enhancing ensembles for highly imbalanced data-sets by evolution-
based dynamic robust multi-objective optimization methods,” Swarm Evol. ary undersampling,” Pattern Recognit., vol. 46, no. 12, pp. 3460–3471,
Comput., vol. 48, pp. 156–171, 2019. 2013.
[13] Y.-N. Guo, X. Zhang, D.-W. Gong, Z. Zhang, and J.-J. Yang, “Novel [37] X. Jiang and Z. Ge, “Data augmentation classifier for imbalanced fault clas-
interactive preference-based multiobjective evolutionary optimization for sification,” IEEE Trans. Autom. Sci. Eng., vol. 18, no. 3, pp. 1206–1217,
bolt supporting networks,” IEEE Trans. Evol. Comput., vol. 24, no. 4, Jul. 2021.
pp. 750–764, Aug. 2020. [38] Y. Zhang, P. Fu, W. Liu, and G. Chen, “Imbalanced data classification
[14] Y. Zhang, M. Lin, Y. Yang, and C. Ding, “A hybrid ensemble evolutionary based on scaling kernel-based support vector machine,” Neural Comput.
algorithm for imbalanced classification its application on bioinformatics,” Appl., vol. 25, no. 3-4, pp. 927–935, 2014.
Comput. Biol. Chem., vol. 98, 2022, Art. no. 107646. [39] Y.-H. Shao, W.-J. Chen, J.-J. Zhang, Z. Wang, and N.-Y. Deng, “An
[15] H. L. Le, D. Landa-Silva, M. Galar, S. Garcia, and I. Triguero, “EUSC: efficient weighted lagrangian twin support vector machine for imbalanced
A clustering-based surrogate model to accelerate evolutionary undersam- data classification,” Pattern Recognit., vol. 47, no. 9, pp. 3158–3167,
pling in imbalanced classification,” Appl. Soft. Comput., vol. 101, 2021, 2014.
Art. no. 107033. [40] W. Xiao, J. Zhang, Y. Li, and W. Yang, “Imbalanced extreme learning
[16] P. Lim, C. K. Goh, and K. C. Tan, “Evolutionary cluster-based synthetic machine for classification with imbalanced data distributions,” in Proc. Of
oversampling ensemble (-ensemble) for imbalance learning,” IEEE T. ELM-2015, 2016, pp. 503–514.
Cybern., vol. 47, no. 9, pp. 2850–2861, Sep. 2017. [41] X. Gao, Z. Chen, S. Tang, Y. Zhang, and J. Li, “Adaptive weighted
[17] J. Liang, G. Chen, B. Qu, C. Yue, K. Yu, and K. Qiao, “Niche-based imbalance learning with application to abnormal activity recognition,”
cooperative co-evolutionary ensemble neural network for classification,” Neurocomputing, vol. 173, pp. 1927–1935, 2016.
Appl. Soft. Comput., vol. 113, 2021, Art. no. 107951. [42] U. Bhowan, M. Johnston, M. Zhang, and X. Yao, “Evolving diverse
[18] Y. Liu, Y. Sun, B. Xue, M. Zhang, G. G. Yen, and K. C. Tan, “A survey on ensembles using genetic programming for classification with unbal-
evolutionary neural architecture search,” IEEE Trans. Neural Netw. Learn. anced data,” IEEE Trans. Evol. Comput., vol. 17, no. 3, pp. 368–386,
Syst., vol. 34, no. 2, pp. 550–570, Feb. 2023. Jun. 2013.
[19] B. Wu et al., “Fbnet: Hardware-aware efficient convnet design via differ- [43] U. Bhowan, M. Johnston, M. Zhang, and X. Yao, “Reusing ge-
entiable neural architecture search,” in Proc. IEEE/CVF Conf. Comput. netic programming for ensemble selection in classification of unbal-
Vis. Pattern Recognit., 2019, pp. 10726–10734. anced data,” IEEE Trans. Evol. Comput., vol. 18, no. 6, pp. 893–908,
[20] G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, Dec. 2014.
“Learning from class-imbalanced data: Review of methods applications,” [44] A. Agapitos, R. Loughran, M. Nicolau, S. Lucas, M. O’Neill, and A.
Expert Syst. Appl., vol. 73, pp. 220–239, 2017. Brabazon, “A survey of statistical machine learning elements in genetic
[21] V. López, A. Fernández, S. García, V. Palade, and F. Herrera, “An insight programming,” IEEE Trans. Evol. Comput., vol. 23, no. 6, pp. 1029–1048,
into classification with imbalanced data: Empirical results current trends Dec. 2019.
on using data intrinsic characteristics,” Inf. Sci., vol. 250, pp. 113–141, [45] Q. Lin, X. Wu, L. Ma, J. Li, M. Gong, and C. A. C. Coello, “An ensem-
2013. ble surrogate-based framework for expensive multiobjective evolutionary
[22] A. Fernández, S. García, M. Galar, R. C. Prati, B. Krawczyk, and F. optimization,” IEEE Trans. Evol. Comput., vol. 26, no. 4, pp. 631–645,
Herrera, Data Level Preprocessing Methods, Berlin, Germany: Springer, Aug. 2022.
2018, pp. 79–121. [46] Y. Guo et al., “Evolutionary dual-ensemble class imbalance learning for
[23] M. A. Tahir, J. Kittler, K. Mikolajczyk, and F. Yan, “A multiple expert human activity recognition,” IEEE Trans. Emerg. Top. Comput. Intell.,
approach to the class imbalance problem using inverse random under vol. 6, no. 4, pp. 728–739, Aug. 2021.
sampling,” in Multiple Classifier Systems, Berlin, Germany: Springer, [47] Y. Guo, J. Feng, B. Jiao, N. Cui, S. Yang, and Z. Yu, “A dual evolutionary
2009, pp. 82–91. bagging for class imbalance learning,” Expert Syst. Appl., vol. 206, 2022,
[24] S. Cateni, V. Colla, and M. Vannucci, “A method for resampling im- Art. no. 117843.
balanced datasets in binary classification tasks for real-world problems,” [48] C. Zhang, K. C. Tan, H. Li, and G. S. Hong, “A cost-sensitive deep belief
Neurocomputing, vol. 135, pp. 32–41, 2014. network for imbalanced classification,” IEEE Trans. Neural Netw. Learn.
[25] A. Ghazikhani, H. S. Yazdi, and R. Monsefi, “Class imbalance handling Syst., vol. 30, no. 1, pp. 109–122, Jan. 2019.
using wrapper-based random oversampling,” in Proc. IEEE Iranian Conf. [49] P. Jia, C. Zhang, and Z. He, “A new sampling approach for classification
Elect. Eng., 2012, pp. 611–616. of imbalanced data sets with high density,” in Proc. IEEE Int. Conf. Big
[26] I. M. Guyon and A. Elisseeff, “An introduction to variable feature selec- Data Smart Comput., 2014, pp. 217–222.
tion,” J. Mach. Learn. Res., vol. 3, pp. 1157–1182, 2003. [50] Y. Yan, M. Chen, M.-L. Shyu, and S.-C. Chen, “Deep learning for
[27] H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Trans. imbalanced multimedia data classification,” in Proc. IEEE Int. Symp.
Knowl. Data Eng., vol. 21, no. 9, pp. 1263–1284, Sep. 2009. Multimedia, 2015, pp. 483–488.
[28] A. Ghazikhani, R. Monsefi, and H. S. Yazdi, “Online cost-sensitive neural
network classifiers for non-stationary imbalanced data streams,” Neural
Comput. Appl., vol. 23, no. 5, pp. 1283–1295, 2013.

Paper 6 - 240417 - 184500 OCR

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Paper 6 - 240417 - 184500 OCR

Uploaded by

Copyright:

Available Formats

1394 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 7, NO.

Neural Network Ensemble With Evolutionary

In recent years, genetic programming (GP) technology has

tive in addressing the imbalanced classification problem, the

performs training. If its classification performance is better than

Parameter Type Value Range

Datasets ShuttleOvs4 Glass5 Yeast6 Abalone1l9

NNE 100 100 89.72 90.07 90.13 90.28 69.33 74.03

TABLE VII 1.00

RATIO EQUAL TO 4:1 0.92

EAS-DBN 75.84+0.00|} 81.01+0.00 |99.51+0.00/89.44+0.03 0.76 [| ——Ablaone -=-Glass4

Bold values correspond to the best G-mean (%). Spilt Ratio

EAS-DBN 69.75+0.00 |85.3740.02} 99.2740.01 |71.88+0.04

on Yeast1289vs7, Yeast5, and Yeast6. On Ecoli0137vs26, the

#Pool 10% 20% 30% 40% 60% 80% 100%

You might also like