Professional Documents
Culture Documents
ABSTRACT The classification in class imbalanced data has drawn significant interest in medical application.
Most existing methods are prone to categorize the samples into the majority class, resulting in bias,
in particular the insufficient identification of minority class. A kind of novel approach, class weights random
forest is introduced to address the problem, by assigning individual weights for each class instead of a single
weight. The validation test on UCI data sets demonstrates that for imbalanced medical data, the proposed
method enhanced the overall performance of the classifier while producing high accuracy in identifying both
majority and minority class.
INDEX TERMS Class imbalanced, random forest, weighted voting, class weights voting.
2169-3536
2018 IEEE. Translations and content mining are permitted for academic research only.
VOLUME 6, 2018 Personal use is also permitted, but republication/redistribution requires IEEE permission. 4641
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
M. Zhu et al.: Class Weights Random Forest Algorithm for Processing Class Imbalanced Medical Data
built by Bayesian and Novel Naive Bayes is proposed [58]. of each classifier into an ultimate prediction [87]. Different
A Combination Method based on the Dempster–Shafer weights per class are obtained from the empirical error of
Theory of Evidence is built [59], an Evidence-based Com- different classifiers. The algorithm assigns individual weights
bining Classifiers method is proposed for brain signal anal- for each class instead of a single weight and focuses on
ysis [60]. Dempster’s Rule is combined multiple classifiers the problem of effective identification for the minority class.
using for text categorization [61]. Dempster-Shafer Fusion It can improve the recognition performance for minority class
is combined models of prostate cancer [62]. Fuzzy Integral while maintaining those for majority class.
and Genetic Algorithms are combined multiple neural net-
II. METHOD
work classifiers [63]. Bayesian and Dempster–Shafer meth-
ods require a priori knowledge, and the calculation of the The algorithm, CWsRF is proposed; there are three proce-
fuzzy measure function is very complex [58]–[63]. To find dures as fellows: Building RF Model, Building CWsV and
a more convenient way to build the model, we chose Voting Classifying Votes; as shown in Fig. 1.
Method, which uses a unique method to integrate classifiers.
Furthermore, the calculation is simple and does not require
an auxiliary combiner or preset function, it is not limited to
decision trees with axis-parallel splits, and it is applicable to
any type of classifiers [64].
The most popular method of voting is major voting, which
based on random forest (RF) algorithm. Random forest clas-
sifiers can achieve high accuracy in data classification com-
pared with many standard classification methods [65]–[68],
and it can minimize the overall classification error rate and
has the ability to handle class imbalanced data [4], [56], [69].
However, when the imbalanced rate increases (e.g., 15%),
the classification ability is weakened [56], [70], [71], it is
because that each classifier has the same weight when the
classifiers are combined [70], [72]. Therefore, to solve this
problem, a method of combining classifiers with different
weights is proposed. Weight voting random forest (WRF),
an algorithm of getting the decisions of each classifier is
multiplied by a weight to reflect the individual confidence
of these decisions [73]–[75]. A weighted majority voting
method which is based on class-conditional independence of
FIGURE 1. The framework: CWsRF.
the classifier outputs is proposed [76]. Endogenous voting
weights is mentioned for elected representatives and redis- A. THE FRAMEWORK OF CWsRF
tricting [64]. Power in Weighted Voting Games with Super- RF takes the same weight for different classes of samples
Increasing Weights is analyzed [77]. A random forests (RF) and ensembles by major voting, which makes the classifiers
method with weighted voting for the task of anomaly detec- sensitive to the majority class(MAJ), and the classification
tion is presented [72], [78]. The internal out-of-bag (OOB) performance of the minority class(MIN) is decreased when it
error metric is used as a tree-weight in RF [79]. However, faced with imbalanced data. Therefore, an approach of CWsV
these approaches have not dramatically improved predictive is designed to distinguish MAJ and MIN.
ability. Each classifier only has a single weight when the There are three procedures: 1) Building RF Model: the
classifiers are combined [80], [81], which is not adequately vote of RF is obtained. 2) Building CWsV Method: it is the
distinguish between the majority class and the minority class; key procedure of the proposed algorithm (CWsRF) is mainly
the class weights (CWs), which refer to the different weights being introduced. It has two steps: a) the most important step
per class of each classifier are not obtained. Based on this is different weights per class were calculated; each classifier
observation, classifiers require multiple weights, and increas- has two different weights (minority and majority weight);
ing the CWs of each classifier has the potential to improve b) the votes of the samples are calculated. 3) Classifying
the overall predictive performances [82]. Therefore, CWs Votes: the improved votes are classified using the threshold:
should be assigned to better represent the classifier’s ability Aggregating Probability (AP).
to distinguish the difference between the majority class and
the minority class [21], [83]–[86]. B. THE PROCEDURES OF CWsRF
Therefore, we proposed a class weights voting (CWsRF) 1) BUILDING RF MODEL
algorithm based on random forest algorithm (RF), which Traditional RF is building to obtain the vote of classifiers.
contains an approach (CWsV) and trains a collection of clas- vi,j,c , v_tei,j,c , the labels denoting the jth classifier in the
sifiers in different weights per class to combine the output ith sample of the training set and test set need to be first
TABLE 1. vi ,j ,c : The labels that classified by jth classifier in ith sample Step 2 (Calculating ACCj,c ): ACCj,c , the accuracy of
(train set).
the cth class in the jth classifier, is the difference accuracy
between each class per classifier. Thus, each classifier has two
ACCj,c values. nMAJ is the number of MAJ samples; nMIN is
the number of MIN samples. HMIN are the classifiers when
vi,j,c ∈ MIN . HMAJ are the classifiers when vi,j,c ∈ MAJ .
PnMAJ H
i=1 PHMAJ
scorei,j,MAJ
j=1
ACCj,MIN = (2)
TABLE 2. v _tei ,j : The labels that classified by jth classifier in ith sample nMAJ
(test set). PnMIN H
i=1 PHMIN score
j=1 i,j,MIN
ACCj,MAJ = (3)
nMIN
Step 3 (Calculating wj,c ): wj,c , the weights of each classi-
fier per class, are calculated to obtain new voting results based
on ACCj,c . They also have two values.
wj,MIN = ACCj,MIN (4)
obtained, they have one statue, either MIN or MAJ, as shown wj,MAJ = ACCj,MAJ (5)
in TABLE 1 and TABLE 2. j is the number of classifiers. i is
(2) Calculating Votes
the number of samples, c belongs to the MIN or MAJ, and xi
Train set and test set’s votes are obtained, each sample
is the original label of samples of the training set.
gets two votes, there are two steps: calculating vtri,j vtei,j ;
calculating vtraini,c , vtest i,c
2) BUILDING CWsV
vtri,j vtei,j are concert to train set, test set, they are the votes
CWsV, Key procedures of CWsRF will be presented in two in the jth classifier to the ith sample as wj,c are promoted. They
steps, as shown in Fig. 2. have one statue, either MIN or MAJ
vtri,j,c = vi,j,c × wj,c (6)
vtei,j,c = v_tei,j,c × wj,c (7)
vtraini,c , vtest i,c are concert to train set, test set; they are
the total votes to ith sample in MIN and MAJ.
XHc
vtraini,c = vtr i,j,c (8)
j=1
XHc
vtest i,c = vtei,j,c (9)
j=1
3) CLASSIFYING VOTES
Threshold voting is used to instead of major voting in tradi-
tional RF. There are two steps
(1) Calculating vtrnewi,j,c
vtr_newi,j,c is the vote of ith sample in different j threshold
weight, which has one statue, either MIN or MAJ.
FIGURE 2. Different weights per class of classifiers. Algorithm 1 Pseudo-Code of Aggregating Probability
for i ∈ [1 . . . n]
The classification capability of each classifier is often used for j ∈ [−H . . . H]
to evaluate the weight of a classifier; therefore the classifier’s if vtraini,MIN − vtraini,MAJ > j
prior accuracy (ACC) is used to measure the different weight Output: vtr_newi,j,c ← MIN (c ∈ MIN)
per class (W ∝ ACC); and the weights were used to calculate else
votes. Output: vtr_newi,j,c ← MAJ (c ∈ MAJ)
(1) Calculating Different Weight per Class end for
Step 1 (Calculating scorei,j,c ): scoreij,c , the score of each end for
classifier for each sample, they have one statue, either 1 or 0.
The equation is given as follows:
vtr_newi,j,c is shown in TABLE 3:
scorei,j,c = 1 vi,j,c == xi = c (1) (2) Obtaining AP
TABLE 3. vtr_newi ,j ,c : The result of vote in ith sample between TABLE 4. Data basic information.
different J.
TABLE 6. The results of AUC, F1_score, Recall for different IRs where IR = 25%, 20%, 15%, and 10%.
than 20-25% [90], [91]. Therefore, the content above was and the worst score at 0. As shown in TABLE 6, the AUC,
combined to show the versatility of the algorithm, and the F1, Recall of CWsRF is higher than those of RF and WRF.
datasets were altered by different imbalanced rates (IRs): Taking CO as an example, with increasing IR, AUC score of
the minority class was set to 25%, 20%, 15%, or 10% of CWsRF has better performances. When IR = 25%, RF (AUC
the majority class, respectively. Moreover, the IR and the score of 0.66, F1 core of 0.76, Recall score of 0.69), WRF
incidences of the datasets used in this paper were matched. (AUC score of 0.73, F1 core of 0.81, Recall score of 0.75).,
These findings show that the algorithm presented here has whereas CWsRF (AUC score of 0.83, F1 core of 0.85, Recall
high practical significance. score of 0.88). Although CWsRF got improved, it does not
show a particular advantage. When IR = 20%, AUC score
B. PARAMETER SETUP for CWsRF is 0.17, 0.09 higher than those for RF and WRF;
The running times of algorithm (t) is set as 50, the data are F1 score is 0.06, 0.02 higher than those for RF and WRF;
randomly selected to construct different IR datasets 50 times; Recall score is 0.21, 0.13 higher than those for RF and WRF.
and the average results are taken as the final outcome. And When IR = 15% and 10%, CWsRF has marked advantages,
thus, random forest is an ensemble algorithm which has as AUC and Recall score is approximately 0.30 higher than
good performance, so the running parameters are setting in those for RF and WRF; F1 score is nearly 0.20 higher than
accordance with tradition. The number of the classifiers (H) those for RF and WRF. Therefore, CWsRF has more advan-
is set as 300 [92], [93].The number of the class is 2, 1 is tages over RF and WRF in dealing with imbalanced data.
used to represent the MIN, and 0 represents the MAJ. Our
algorithm is implemented by C++ and Matlab, the accuracy B. ACCURACY OF THE MINORITY AND
of the classification, AUC, F1-score, Recall and the accuracy MAJORITY SAMPLES
of MIN and MAJ are used to analyze the effectiveness. The In classification in medical diagnostics, the classes of interest
results of RF, WRF, and CWsRF are distinguished. are often scarce, and therefore, in such an unbalanced situa-
tion, accuracy of the minority samples (ACC_MIN) and accu-
IV. RESULTS racy of the majority samples (ACC_MAJ) are more important
A. AUC, F1-SCORE, RECALL than the accuracy. Therefore, we observed the changes in
AUC, F1, Recall is commonly used when the performance these values, especially in ACC_MIN.
of a classifier needs to be evaluated to select a high pro- As shown in TABLE 7, when the imbalance increases,
portion of minority instances in the dataset. AUC is bene- ACC_MAJ, the ability to recognize the majority class,
ficial for being independent of class distribution and cost; changes little, whereas ACC_MIN, the ability to recog-
Recall is a quality measure of completeness/quantity, which nize the minority class, decreases. However, ACC_MIN of
intuitively reflects the proportion of positive samples that CWsRF is less affected. ACC_MIN of CWsRF is better than
are correctly identified. F-score (F1) is a harmonic mean of those of RF and WRF for different IRs. Considering the CO
precision and recall, which can be interpreted as a weighted samples as an example, with the increase in IR, ACC_MIN
average of precision and recall. They can distinguish the per- of CWsRF shows a distinct advantage. When IR = 25%,
formances between classifiers when processing imbalanced ACC_MIN of RF and WRF are 0.68 and 0.76, respectively,
data [94]–[96]. These scores reach the optimum value at 1 whereas that of CWsRF is 0.87. When IR = 20%, ACC_MIN
TABLE 7. ACC_MIN and ACC_MAJ in different IR, where IR = 25%, 20%, 15%, and 10%.
FIGURE 4. 1CWsRF−RF (ACC_MIN improvement (%) between CWsRF and RF)1CWsRF−RF = (ACC_MINCWsRF − ACC_MINRF )/ACC_MINRF .
FIGURE 5. 1CWsRF−WRF (ACC_MIN improvement (%) between CWsRF and WRF) 1CWsRF−WRF = (ACC_MINCWsRF − ACC_MINWRF )/ACC_MINWRF . (a) SPE.
(b) WD. (c) MA. (d) CO. (e) OST.
of RF and WRF are 0.59 and 0.67, respectively, whereas that C. ACC
of CWsRF is 0.79; when IR = 15% and 10%, ACC_MIN of As shown in Fig. 6, the accuracy of all algorithms is approxi-
RF and WRF, which decreased more obviously, are approxi- mately with 80%-90% due to the increase in imbalance when
mately 0.40, whereas that of CWsRF is 0.70. With increasing the performance is not sufficient to identify the data well, and
imbalance, the change in CWsRF is not sensitive to RF and the external results are reflected by the increase in accuracy.
WRF. Therefore, CWsRF can better identify the minority For example, when there are 100 data points, 90 of which
class. belong to the majority class and 10 to the minority class,
Fig. 4 and Fig. 5 show that all the results were posi- if the classifier divides the 100 data points into the majority
tive, so CWsRF performed better than RF and WRF. Addi- class, the correct rate is also 90%. Hence, high accuracy does
tionally, with increasing imbalance, especially for IR = not mean good performance, and it is necessary to combine
15% and 10%, 1CWsRF−RF and 1CWsRF−WRF are increased, the classification accuracies of the majority class and the
and thus, CWsRF is more advantageous than RF and WRF. minority class. An algorithm can be considered a good class
FIGURE 6. Accuracy of different algorithms per IR: (a) IR = 25%, (b) IR = 20%, (c) IR = 15% and (d) IR = 10%.
imbalanced classification algorithm if it fulfills the following Generally, L divides the samples into two classes. If D > 0,
conditions: accuracy without loss (or little loss), increased the samples are classified to MIN, otherwise are MAJ. Larger
AUC, and accuracy of classifying both the minority and distances between the samples and L lead to less misclassifi-
majority samples. Since the accuracy rate is high (greater than cation. Therefore, if DCWsRF − DRF > 0, CWsRF has better
80%) combined with high AUC and ACC_MIN, that CWsRF performances than RF in minority class. The equation is:
achieves better performance.
DCWsRF = (−1) × vtraini,MAJ + (1) × vtraini,MIN + AP
V. DISCUSSION (14)
The Performance and complexity are discussion in the
section. D is calculated from the training set for which is used
to build model; x0 is vtraini,MIN , y0 is vtraini,MIN , C is
A. PERFORMANCE AP.A is −1, B is 1.
The performances of MIN are increased while maintain Substituting (8) into (14), gives:
the accuracy of MAJ (shown in TABLE 6, section IV). XHi,MAJ
So, the performance of MIN has improved significantly, so it DCWsRF = − vi,j,MAJ × wj,MAJ
j=1
is discussed. (Due to limited space, the performance for MAJ XHi,MIN
has not been mentioned.) + vi,jMIN × wj,MIN + AP (15)
j=1
According to the theory of classification, the distance (D) XHi,MAJ XHi,MIN
of the sample to the classification line was used to evaluate DRF =− vi,jMAJ + vi,jMIN (16)
j=1 j=1
the performance of the algorithms. D is calculated from the XHi,MAJ XHi,MIN
DWRF =− vi,j,MAJ × wj + vi,j,MAJ ×wj
measured point to the threshold line. Larger distances lead to j=1 j=1
less misclassification. (17)
There is a sample P (x0 , y0 ), its location is determined by
(X , Y ), X is the number of votes for MAJ, and Y is the number DCWsRF and DRF are compared here, write:
of votes for MIN. The line L is Ax+By+C = 0. The samples XHMAJ
P (x0 , y0 ) can be relatively expressed as D from (x0 , y0 ) to the DCWsRF −DRF = − vi,j,MAJ × wj,MAJ
j=1
line L, Q is the pedal of P on L. Q can be express as XHMIN
+ vi,jMIN × wj,MIN + AP
2 j=1
B x0 − ABy0 − AC A2 y0 − ABx0 − BC
,
XHMAJ XHMIN
(11) + vi,jMAJ − vi,jMIN
A2 + B2 A2 + B2 j=1 j=1
2 2
2 B x0 − ABy0 − AC Since Hc is the classifiers which belong to c class, implying
|PQ| = − x0
A2 + B2 that vi,j,MAJ vi,jMIN can replace by 1, lead to:
2 2
A y0 − ABx0 − BC XHMAJ XHMAJ
+ − y (12)
0 1− wj,MAJ
A2 + B2 j=1 j=1
Ax0 + By0 + C XHMIN XHMIN
D = PQ = √ (13) + wj,MIN − 1 + j (18)
A2 + B2 j=1 j=1
PDetermine
nMIN H
whether nMIN greater than
i=1 PHMIN score ,lead to:
j=1 i,j,MIN
XnMIN H XnMIN
PHMIN − 1
i=1 scorei,j,MIN i=1
j=1
XnMIN H
= PH −1
i=1 MIN
scorei,j,MIN
j=1
VI. CONCLUSION
The classification of class imbalanced data is a new research
topic and represents an urgent problem to be solved.
An algorithm (CWsRF) to develop class weights for pro-
cessing imbalanced medical data was proposed. In the study,
the empirical error is taken as the measurement to obtain
class weights of the classifier. The algorithm yields superior
performance than other schemes. The proposed algorithm had
very high accuracy classifying, AUC, F1 and Recall.
This paper is an attempt to improve the ensemble learning
algorithm of RF dealing with binary classification and could
be extended to ensemble learning with other algorithms and
multi-classification problems.
REFERENCES
FIGURE 9. Improved distance between the different algorithms per IR in [1] Y. Zhu and J. Fang, ‘‘Logistic regression-based trichotomous classification
minority class samples of CO,CWsRF − RF = DCWsRF − DRF , tree and its application in medical diagnosis,’’ Med. Decision Making,
CWsRF − WRF = DCWsRF − DWRF where (a), (b), (c), and (d) represent the vol. 36, no. 8, pp. 973–989, 2016.
improved distances for IR = 25%, 20%, 15%, and 10%, respectively. [2] S. D. Zhao, ‘‘Integrative genetic risk prediction using non-parametric
empirical Bayes classification,’’ Biometrics, vol. 73, no. 2, pp. 582–592,
One approach is added in WRF compared to RF. It is: 2017.
Calculating the weight of each classifier, which mainly [3] E. Dong, C. Li, L. Li, S. Du, A. N. Belkacem, and C. Chen, ‘‘Classification
of multi-class motor imagery with a novel hierarchical SVM algorithm
reflected in the number of classifiers, the number of samples. for brain-computer interfaces,’’ Med. Biol. Eng. Comput., vol. 55, no. 10,
Its complexity is O (Hn). Therefore, the complexity of WRF pp. 1809–1818, 2017.
is O (tHfd · log n + tHn) [4] M. Zhu, J. Xia, M. Yan, G. Cai, J. Yan, and G. Ning, ‘‘Dimension-
ality reduction in complex medical data: Improved self-adaptive niche
Two approaches are added in CWsRF compared to RF. genetic algorithm,’’ Comput. Math. Methods Med., vol. 2015, Oct. 2015,
They are (1) the weights of each classifier per class; (2) the Art. no. 794586, doi: 10.1155/2015/794586.
APs based on the number of classifiers. [5] M. Durgadevi and R. Kalpana, ‘‘Medical distress prediction based on
classification rule discovery using ant-miner algorithm,’’ in Proc. 11th Int.
(1) Approach 1th , which mainly reflected in the num- Conf. Intell. Syst. Control (ISCO), Jan. 2017, pp. 88–92.
ber of classifiers, the number of samples, the complexity [6] M. Chen, Y. Hao, K. Hwang, L. Wang, and L. Wang, ‘‘Disease prediction
of the number of classes. So the complexity is O (Hnc); by machine learning over big data from healthcare communities,’’ IEEE
Access, vol. 5, no. 1, pp. 8869–8879, 2017.
(2) Approach 2th , which has an application of the sorting [7] M. Chen, X. Shi, Y. Zhang, D. Wu, and M. Guizani, ‘‘Deep
process, selecting the maximum AUC value, which mainly features learning for medical image analysis with convolutional
reflected in the number of classifiers. So the complexity is autoencoder neural network,’’ IEEE Trans. Big Data, 2017,
doi: 10.1109/TBDATA.2017.2717439.
O (H · log H ). [8] B. A. Bak and J. L. Jensen, ‘‘High dimensional classifiers in the imbalanced
Therefore, the complexity of CWsRF is O (t · (Hfd · log n+ case,’’ Comput. Stat. Data Anal., vol. 98, pp. 46–59, Jun. 2016.
Hnc + H · log H )). The values (n, k and t) have influence on [9] Y. Zhang, P. Fu, W. Liu, and G. Chen ‘‘Imbalanced data classification based
on scaling kernel-based support vector machine,’’ Neural Comput. Appl.,
the efficiency of CWsRF. vol. 25, nos. 3–4, pp. 927–935, 2014.
In addition, to evaluate the algorithms’ complexity, [10] C. K. Maurya, D. Toshniwal, and G. V. Venkoparao, ‘‘Online sparse class
the average running time on the five different datasets were imbalance learning on big data,’’ Neurocomputing, vol. 216, pp. 250–260,
Dec. 2016.
calculated and the results are shown in TABLE 8. The exper- [11] S. Al-Stouhi and C. K. Reddy, ‘‘Transfer learning for class imbal-
iments are conducted on PC ( Intel Core i7-3537U, 2.5 GHz ance problems with inadequate data,’’ Knowl. Inf. Syst., vol. 48, no. 1,
CPU and 4 GB memory). pp. 201–228, 2016.
[12] M. El-Banna, ‘‘Modified Mahalanobis Taguchi system for imbalance
data classification,’’ Comput. Intell. Neurosci., vol. 2017, Jul. 2017,
TABLE 8. The time cost of the algorithms (seconds). Art. no. 5874896.
[13] B. Mirza et al., ‘‘Efficient representation learning for high-dimensional
imbalance data,’’ in Proc. IEEE Int. Conf. Digit. Signal Process. (DSP),
Oct. 2016, pp. 511–515.
[14] M. M. Al-Rifaie and H. A. Alhakbani, ‘‘Handling class imbalance in direct
marketing dataset using a hybrid data and algorithmic level solutions,’’ in
Proc. SAI Comput. Conf. (SAI), Jul. 2016, pp. 446–451.
[15] H. Lu, K. Yang, and J. Shi, ‘‘Constraining the water imbalance in a
From the TABLE 8, it shows similar runtime on the dif- land data assimilation system through a recursive assimilation scheme,’’
ferent data for the same algorithm. The shortest cost time in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), Jul. 2016,
pp. 2993–2996.
of CWsRF is 2.20s (on the dataset with 962 instances, [16] S. Pouyanfar and S.-C. Chen, ‘‘Automatic video event detection for
5 attributes), while the longest is 2.41s (on the dataset with imbalance data using enhanced ensemble deep learning,’’ Int. J. Semantic
313 instances, 85 attributes). The results are acceptable in the Comput., vol. 11, no. 1, pp. 85–109, 2017.
[17] W. Mao, J. Wang, and Z. Xue, ‘‘An ELM-based model with sparse-
practice of medical diagnosis and big complex data will be weighting strategy for sequential data imbalance problem,’’ Int. J. Mach.
tested in the further study. Learn. Cybern., vol. 8, no. 4, pp. 1333–1345, 2017.
[18] J. Zhai, S. Zhang, and C. Wang, ‘‘The classification of imbalanced large [40] H. Noçairi, C. Gomes, M. Thomas, and G. Saporta, ‘‘Improving stacking
data sets based on MapReduce and ensemble of ELM classifiers,’’ Int. methodology for combining classifiers; applications to cosmetic industry,’’
J. Mach. Learn. Cybern., vol. 8, no. 3, pp. 1009–1017, 2017. Electron. J. Appl. Stat. Anal., vol. 9, no. 2, pp. 340–361, 2016.
[19] A. Amin et al., ‘‘Comparing oversampling techniques to handle the [41] T.-M. Chan, Y. Li, C.-C. Chiau, J. Zhu, J. Jiang, and Y. Huo, ‘‘Imbalanced
class imbalance problem: A customer churn prediction case study,’’ IEEE target prediction with pattern discovery on clinical data repositories,’’ BMC
Access, vol. 4, pp. 7940–7957, 2016. Med. Inform. Decision Making, vol. 17, p. 47, Apr. 2017.
[20] K. Jiang, J. Lu, and K. Xia, ‘‘A novel algorithm for imbalance data [42] Y. Zhang, J. Ren, and J. Jiang, ‘‘Combining MLC and SVM classifiers
classification based on genetic algorithm improved SMOTE,’’ Arabian for learning based decision making: Analysis and evaluations,’’ Comput.
J. Sci. Eng., vol. 41, no. 8, pp. 3255–3266, 2016. Intell. Neurosci., vol. 2015, pp. 1–8, May 2015.
[21] X. Zhang, Q. Song, G. Wang, K. Zhang, L. He, and X. Jia, ‘‘A dissimilarity- [43] D. Zhou, Y. Tang, and W. Jiang, ‘‘A modified belief entropy in
based imbalance data classification algorithm,’’ Appl. Intell., vol. 42, no. 3, Dempster–Shafer framework,’’ PLoS ONE, vol. 12, no. 5, p. e0176832,
pp. 544–565, 2015. 2017.
[22] J. Wang, J. Z. Feng, and Z. Han, ‘‘Discriminative feature selection based [44] X. Deng, W. Jiang, and J. Zhang, ‘‘Zero-Sum matrix game with payoffs
on imbalance SVDD for fault detection of semiconductor manufacturing of Dempster–Shafer belief structures and its applications on sensors,’’
processes,’’ J. Circuits, Syst. Comput., vol. 25, no. 11, p. 1650143, 2016. Sensors, vol. 17, no. 4, p. 922, 2017.
[23] P. Vorraboot, S. Rasmequan, K. Chinnasarn, and C. Lursinsap, ‘‘Improving [45] W. Chen, H. R. Pourghasemi, and Z. Zhao, ‘‘A GIS-based comparative
classification rate constrained to imbalanced data between overlapped and study of Dempster–Shafer, logistic regression and artificial neural network
non-overlapped regions by hybrid algorithms,’’ Neurocomputing, vol. 152, models for landslide susceptibility mapping,’’ Geocarto Int., vol. 32, no. 4,
pp. 429–443, Mar. 2015. pp. 367–385, 2017.
[24] P. Du, A. Samat, B. Waske, S. Liu, and Z. Li, ‘‘Random forest and rotation [46] A. M. Al-Abadi, ‘‘The application of Dempster–Shafer theory of evidence
forest for fully polarized SAR image classification using polarimetric for assessing groundwater vulnerability at Galal Badra basin, Wasit gover-
and spatial features,’’ ISPRS J. Photogramm. Remote Sens., vol. 105, norate, east of iraq,’’ Appl. Water Sci., vol. 7, no. 4, pp. 1725–1740, 2017.
pp. 38–53, Jul. 2015. [47] J. Wang, K. Qiao, Z. Zhang, and F. Xiang, ‘‘A new conflict management
[25] M. Belgiu and L. Drǎguţ, ‘‘Random forest in remote sensing: A review of method in Dempster–Shafer theory,’’ Int. J. Distrib. Sensor Netw., vol. 13,
applications and future directions,’’ ISPRS J. Photogramm. Remote Sens., no. 3, pp. 1–11, 2017.
vol. 114, pp. 24–31, Apr. 2016. [48] C.-D. Zheng, Y. Zhang, and Z. Wang, ‘‘Novel stability condition of
[26] L.-I. Tong, K.-H. Chang, P.-Y. Wu, and Y.-C. Chan, ‘‘Using dual response stochastic fuzzy neural networks with Markovian jumping under impulsive
surface methodology as a benchmark to process multi-class imbalanced perturbations,’’ Int. J. Mach. Learn. Cybern., vol. 7, no. 5, pp. 795–803,
data,’’ J. Ind. Prod. Eng., vol. 34, no. 2, pp. 147–158, 2017. 2016.
[49] P. Chen and D. Zhang, ‘‘Constructing support vector machines ensemble
[27] H. Lee, E. Kim, and S. Kim, ‘‘Anomalous propagation echo classification
classification method for imbalanced datasets based on fuzzy integral,’’
of imbalanced radar data with support vector machine,’’ Adv. Meteorol.,
in Modern Advances in Applied Intelligence (Lecture Notes in Computer
vol. 2016, pp. 1–13, Jan. 2016.
Science), vol. 7. Berlin, Germany: Springer, 2014, pp. 70–76.
[28] M. J. Fernández-Gómez, G. Asencio-Cortés, A. Troncoso, and
[50] R. Tang, Y. Zhu, and G. Chen, ‘‘Imbalanced data classification method
F. Martínez-Álvarez, ‘‘Large earthquake magnitude prediction in
based on clustering and voting mechanism,’’ in Proc. Int. Conf. Inf., 2013,
Chile with imbalanced classifiers and ensemble learning,’’ Appl. Sci.,
pp. 667–674.
vol. 7, no. 6, p. 625, 2017.
[51] R. Hidayati, K. Kanamori, L. Feng, and H. Ohwada, ‘‘Implementing major-
[29] J. Jia, Z. Liu, X. Xiao, B. Liu, and K. C. Chou, ‘‘iPPBS-Opt: A sequence-
ity voting rule to classify corporate value based on environmental efforts,’’
based ensemble classifier for identifying protein-protein binding sites by
in Data Mining and Big Data (Lecture Notes in Computer Science),
optimizing imbalanced training datasets,’’ Molecules, vol. 21, no. 1, p. 95,
vol. 211. Berlin, Germany: Springer, 2016, pp. 59–66.
2016.
[52] M. C. Çolaka, C. Çolakb, N. Erdila, and A. K. Arslan, ‘‘Investigating
[30] U. R. Salunkhe and S. N. Mali, ‘‘Classifier ensemble design for imbalanced
optimal number of cross validation on the prediction of postoperative atrial
data classification: A hybrid approach,’’ Procedia Comput. Sci., vol. 85,
fibrillation by voting ensemble strategy,’’ Turkiye Klinikleri J. Biostat.,
pp. 725–732, May 2016.
vol. 8, no. 1, pp. 30–35, 2016.
[31] J.-H. Xue and P. Hall, ‘‘Why does rebalancing class-unbalanced data [53] A. Tamvakis, G. E. Tsekouras, A. Rigos, C. Kalloniatis,
improve AUC for linear discriminant analysis?’’ IEEE Trans. Pattern Anal. C. N. Anagnostopoulos, and G. Anastassopoulos, ‘‘A methodology to
Mach. Intell., vol. 37, no. 5, pp. 1109–1112, May 2015. carry out voting classification tasks using a particle swarm optimization-
[32] M. Thakong, S. Phimoltares, S. Jaiyen, and C. Lursinsap, ‘‘Fast learn- based neuro-fuzzy competitive learning network,’’ Evolving Syst., vol. 8,
ing and testing for imbalanced multi-class changes in streaming data by no. 1, pp. 49–69, 2017.
dynamic multi-stratum network,’’ IEEE Access, vol. 5, pp. 10633–10648, [54] S. Abbasi, A. Shahriari, and Y. Nemati, ‘‘Retracted: A novel voting math-
2017. ematical rule classification for image recognition,’’ in Computational Sci-
[33] B. Wang and J. Pineau, ‘‘Online bagging and boosting for imbalanced data ence and Its Applications—ICCSA (Lecture Notes in Computer Science),
streams,’’ IEEE Trans. Knowl. Data Eng., vol. 28, no. 12, pp. 3353–3366, vol. 8. Berlin, Germany: Springer, 2016, pp. 257–270.
Dec. 2016. [55] T. Subbulakshmi and R. R. Raja, ‘‘An ensemble approach for sentiment
[34] M. N. Haque, N. Noman, R. Berretta, and P. Moscato, ‘‘Heterogeneous classification: Voting for classes and against them,’’ ICTACT J. Soft
ensemble combination search using genetic algorithm for class imbalanced Comput., vol. 6, no. 4, pp. 1281–1286, 2016.
data classification,’’ PLoS ONE, vol. 11, no. 1, p. e0146116, 2016. [56] B. Xia, H. Jiang, H. Liu, and D. Yi, ‘‘A novel hepatocellular carcinoma
[35] P. Blonda, C. Tarantino, A. D’Addabbo, G. Satalino, and G. Pasquariello, image classification method based on voting ranking random forests,’’
‘‘Combination of multiple classifiers by fuzzy integrals: An application to Comput. Math. Methods Med., vol. 2016, Apr. 2016, Art. no. 2628463.
synthetic aperture radar (SAR) data,’’ in Proc. IEEE Int. Fuzzy Syst. Conf., [57] A. Linden and P. R. Yarnold, ‘‘Using classification tree analysis to generate
vol. 3. Dec. 2001, pp. 944–947. propensity score weights,’’ J. Eval. Clin. Pract., vol. 23, no. 4, pp. 703–712,
[36] Q. Wang et al., ‘‘A novel ensemble method for imbalanced data learn- 2017.
ing: Bagging of extrapolation-SMOTE SVM,’’ Comput. Intell. Neurosci., [58] C. De Stefano, F. Fontanella, and A. S. di Freca, ‘‘A novel naive Bayes
vol. 2017, pp. 1–11, Jan. 2017. voting strategy for combining classifiers,’’ in Proc. Int. Conf. Frontiers
[37] H. Li, Y. Wang, H. Wang, and B. Zhou, ‘‘Multi-window based ensemble Handwriting Recognit., Sep. 2012, pp. 467–472.
learning for classification of imbalanced streaming data,’’ World Wide Web, [59] G. Rogova, ‘‘Combining the results of several neural network classifiers,’’
vol. 20, no. 6, pp. 1507–1525, 2017. in Classic Works of the Dempster-Shafer Theory of Belief Functions,
[38] S. Ali, A. Majid, S. G. Javed, and M. Sattar, ‘‘Can-CSC-GBE: Developing vol. 219. Manchester, U.K.: IEEE, 2008, pp. 683–692.
Cost-sensitive Classifier with Gentleboost Ensemble for breast cancer [60] S. R. Kheradpisheh, A. Nowzari-Dalini, R. Ebrahimpour, and
classification using protein amino acids and imbalanced data,’’ Comput. M. Ganjtabesh, ‘‘An evidence-based combining classifier for brain
Biol. Med., vol. 73, pp. 38–46, Jun. 2016. signal analysis,’’ PLoS ONE, vol. 9, no. 1, p. e84341, 2014.
[39] J. Kevric, S. Jukic, and A. Subasi, ‘‘An effective combining classifier [61] Y. Bi, D. Bell, H. Wang, G. Guo, and K. Greer, ‘‘Combining multiple
approach using tree algorithms for network intrusion detection,’’ Neural classifiers using Dempster’s rule of combination for text categorization,’’
Comput. Appl., vol. 28, pp. 1051–1058, Dec. 2016. in Proc. Int. Conf. Modeling Decisions Artif. Intell., 2004, pp. 127–138.
[62] S. Chandana, H. Leung, and K. Trpkov, ‘‘Staging of prostate cancer [85] C. Tang, C. Hou, P. Wang, and Z. Song, ‘‘Salient object detection using
using automatic feature selection, sampling and Dempster–Shafer fusion,’’ color spatial distribution and minimum spanning tree weight,’’ Multimed
Cancer Inform., vol. 2009, no. 7, pp. 57–73, Feb. 2009. Tools Appl., vol. 75, no. 12, pp. 6963–6978, 2016.
[63] Y. Li and J. Jingping, ‘‘New algorithm for combining classifiers based [86] J. Chiquet, G. Rigaill and P. Gutierrez, ‘‘Fast tree inference with weighted
on fuzzy integral and genetic algorithms,’’ Proc. SPIE, vol. 4554, fusion penalties,’’ J. Comput. Graph. Stat., vol. 26, no. 1, pp. 205–216,
pp. 176–181, Sep. 2001. 2017.
[64] J. Svec and J. Hamilton, ‘‘Endogenous voting weights for elected represen- [87] Z. Xu, C. Voichita, S. Drǎghici, and R. Romero, ‘‘Z-bag: A classification
tatives and redistricting,’’ Constitutional Political Economy, vol. 26, no. 4, ensemble system with posterior probabilistic outputs,’’ Comput. Intell.,
pp. 434–441, 2015. vol. 29, no. 2, pp. 310–330, 2013.
[65] Q. Wu, Y. Ye, Y. Liu, and M. K. Ng, ‘‘SNP selection and classification of [88] C. E. DeSantis, F. Bray, J. Ferlay, J. Lortet-Tieulent, B. O. Anderson,
genome-wide SNP data using stratified sampling random forests,’’ IEEE and A. Jemal, ‘‘International variation in female breast cancer incidence
Trans. Nanobiosci., vol. 11, no. 3, pp. 216–226, Sep. 2012. and mortality rates,’’ Cancer Epidemiol. Biomarkers Prevention, vol. 24,
no. 10, pp. 1495–1506, 2015.
[66] Y. Ye, Q. Wu, J. Z. Huang, M. K. Ng, and X. Li, ‘‘Stratified sampling for
[89] O. Johnell and J. Kanis, ‘‘Epidemiology of osteoporotic fractures,’’ Osteo-
feature subspace selection in random forests for high dimensional data,’’
porosis Int., vol. 16, pp. S3–S7, Mar. 2005.
Pattern Recognit., vol. 46, no. 3, pp. 769–787, 2013.
[90] S. Janitza, C. Strobl, and A.-L. Boulesteix, ‘‘An AUC-based permutation
[67] J. Sun, G. Zhong, J. Dong, H. Saeeda, and Q. Zhang, ‘‘Cooperative profit
variable importance measure for random forests,’’ BMC Bioinf., vol. 14,
random forests with application in ocean front recognition,’’ IEEE Access,
pp. 119–130, Apr. 2013.
vol. 5, pp. 1398–1408, 2017.
[91] A. I. Marqués, V. García, and J. S. Sánchez, ‘‘On the suitability of
[68] W. Lin, Z. Wu, L. Lin, A. Wen, and J. Li, ‘‘An ensemble random resampling techniques for the class imbalance problem in credit scoring,’’
forest algorithm for insurance big data analysis,’’ IEEE Access, vol. 5, J. Oper. Res. Soc., vol. 64, no. 13, pp. 1060–1070, 2013.
pp. 16568–16575, 2017. [92] A. Cuzzocrea, S. L. Francis, and M. M. Gaber, ‘‘An information-theoretic
[69] L. Breiman, ‘‘Random forests,’’ Mach. Learn., vol. 45, no. 1, pp. 5–32, approach for setting the optimal number of decision trees in random
2001. forests,’’ in Proc. IEEE Int. Conf. Syst., Man, Cybern., vol. 177. Oct. 2013,
[70] M. E. H. Daho, N. Settouti, M. E. A. Lazouni, and M. E. A. Chikh, pp. 1013–1019.
‘‘Weighted vote for trees aggregation in random forest,’’ in Proc. Int. Conf. [93] P. Latinne, O. Debeir, and C. Decaestecker, ‘‘Limiting the number of trees
Multimedia Comput. Syst. (ICMCS), Apr. 2014, pp. 438–443. in random forests,’’ in Multiple Classifier Systems, vol. 2013. Manchester,
[71] T. Perry and M. Bader-El-Den, ‘‘Imbalanced classification using geneti- U.K.: IEEE, 2001, pp. 178–187.
cally optimized random forests,’’ in Proc. Companion Publication Annu. [94] R. K. Shahzad, M. Fatima, N. Lavesson, and M. Boldt, ‘‘Consensus deci-
Conf. Gen. Evol. Comput., 2015, vol. 15. no. 7, pp. 1453–1454. sion making in random forests,’’ in Machine Learning, Optimization, and
[72] C. A. Ronao and S.-B. Cho, ‘‘Random forests with weighted voting for Big Data (Lecture Notes in Computer Science). Berlin, Germany: Springer
anomalous query access detection in relational databases,’’ in Artificial 2015, pp. 347–358.
Intelligence and Soft Computing (Lecture Notes in Computer Science), [95] J. Hu, ‘‘Automated detection of driver fatigue based on AdaBoost classifier
vol. 2015. New York, NY, USA: ACM, 2015, pp. 36–48. with EEG signals,’’ Frontiers Comput. Neurosci., vol. 11, no. 72, pp. 1–10,
[73] S. A. Naghibi, K. Ahmadi, and A. Daneshi, ‘‘Application of support 2017.
vector machine, random forest, and genetic algorithm optimized random [96] J. Hu, ‘‘Automated detection of driver fatigue based on AdaBoost classifier
forest models in groundwater potential mapping,’’ Water Resour. Manage., with EEG signals,’’ Frontiers Comput. Neurosci., vol. 11, no. 8, pp. 1–10,
vol. 31, no. 9, pp. 2761–2775, 2017. 2017.
[74] A. M. Youssef, H. R. Pourghasemi, Z. S. Pourtaghi, and M. M. Al-Katheeri, [97] G. Biau, ‘‘Analysis of a random forests model,’’ J. Mach. Learn. Res.,
‘‘Landslide susceptibility mapping using random forest, boosted regression vol. 13, pp. 1063–1095, Apr. 2012.
tree, classification and regression tree, and general linear models and
comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi
Arabia,’’ Landslides, vol. 13, no. 5, pp. 839–856, 2016.
[75] S. A. Naghibi, H. R. Pourghasemi, and B. Dixon, ‘‘GIS-based groundwater
potential mapping using boosted regression tree, classification and regres-
sion tree, and random forest machine learning models in iran,’’ Environ.
Monitor. Assessment, vol. 188, no. 1, p. 44, 2016. MIN ZHU received the M.S. degree from the
[76] L. I. Kuncheva and J. J. Rodríguez, ‘‘A weighted voting framework for College of Computer Science and Technology,
classifiers ensembles,’’ Knowl. Inf. Syst., vol. 38, no. 2, pp. 259–275, 2014.
Guizhou University, Guiyang, China, in 2006. She
[77] Y. Bachrach, Y. Filmus, J. Oren, and Y. Zick, ‘‘Analyzing power in is currently working toward the Ph.D. degree at
weighted voting games with super-increasing weights,’’ in Proc. Int. Symp.
Zhejiang University, Hangzhou, China. She is cur-
Algorithmic Game Theory, 2012, pp. 169–181.
rently a Senior Engineer at the Guizhou Key Lab-
[78] T. Hayes, S. Usami, R. Jacobucci, and J. J. McArdle, ‘‘Using clas-
oratory of Agricultural Bioengineering, Guizhou
sification and regression trees (CART) and random forests to analyze
University. Her research interests include data
attrition: Results from two simulations,’’ Psychol. Aging, vol. 30, no. 4,
pp. 911–929, 2015. mining, pattern recognition, and classification.
[79] H. Ishwaran, U. B. Kogalur, E. H. Blackstone, and H. Lauer, ‘‘Random
survival forests,’’ Ann. Appl. Stat., vol. 2, no. 3, pp. 841–860, 2008.
[80] N. Tóth and B. Pataki, ‘‘Classification confidence weighted majority voting
using decision tree classifiers,’’ Int. J. Intell. Comput. Cybern., vol. 1, no. 2,
pp. 169–192, 2008.
[81] A. F. R. Rahman and M. C. Fairhurst, ‘‘Multiple classifier decision com-
bination strategies for character recognition: A review,’’ Document Anal.
Recognit., vol. 5, no. 4, pp. 166–194, 2003. JING XIA received the B.S. degree in biomedi-
[82] A. Arnaiz-González, J. F. Díez-Pastor, C. García-Osorio, and cal engineering from Zhejiang University, China,
J. J. Rodríguez, ‘‘Random feature weights for regression trees,’’ Progr. in 2013. She is currently working toward the Ph.D.
Artif. Intell., vol. 5, no. 2, pp. 91–103, 2016. degree in biomedical engineering at Zhejiang Uni-
[83] S. J. Winham, R. R. Freimuth, and J. M. Biernacka, ‘‘A weighted random versity, China. Her research interests focus on
forests approach to improve predictive performance,’’ Stat. Anal. Data intelligent medical diagnosis.
Mining, vol. 6, no. 6, pp. 496–505, 2013.
[84] M. Das and S. Bhattacharya, ‘‘A modified history based weighted average
voting with soft-dynamic threshold,’’ in Proc. Int. Conf. Adv. Comput. Eng.,
Jun. 2010, pp. 217–222.
XIAOQING JIN received the Ph.D. degree from JING YAN received the M.S. degree from the
the Department of Acupuncture, Zhejiang Chinese Department of Cardiology, Zhejiang University,
Medical University, China. She is currently the China. He is currently the Dean of Zhejiang
Head of the Department of Acupuncture, Zhejiang Hospital, China.
Hospital, China.
GUOLONG CAI received the M.S. degree from GANGMIN NING received the Dr.-Ing. degree
the Department of Cardiology, Zhejiang Univer- from the Deparment of Biomedical Engineering,
sity, China. He is currently a Physician at the TU Ilmenau, Germany. He is a currently a Profes-
Department of ICU, Zhejiang Hospital, China. sor at the Department of Biomedical Engineering,
Zhejiang University, China.