Professional Documents
Culture Documents
Knowledge-Based Systems
journal homepage: www.elsevier.com/locate/knosys
article info a b s t r a c t
Article history: Imbalanced problems have recently attracted much attention due to their prevalence in numerous
Received 22 December 2020 domains of great importance to the data mining community. However, conventional bi-class clas-
Received in revised form 15 February 2021 sification approaches, e.g., Support vector machine (SVM), generally perform poorly on imbalanced
Accepted 23 February 2021
datasets as they are originally designed to generalize from the training data, and pay little attention
Available online 26 February 2021
to the minority class. In the paper, we extend traditional support vector domain description (SVDD)
Keywords: and propose a novel density-sensitive SVDD classifier based on maximum soft margin (DSMSM-SVDD)
Imbalanced datasets for imbalanced datasets. In the proposed approach, the relative density-based penalty weights are
Support vector machine incorporated into the optimization objective function to represent the importance of the data samples.
Support vector domain description Through optimizing the objective function with the relative density-based penalty weights, the training
Relative density majority samples with high relative densities are more likely to lie inside the hypersphere, thus
Maximum soft margin eliminating noise effects on traditional SVDD. In addition, to make full use of the minority class samples
to refine the boundary in training, the maximum soft margin regularization term is also introduced
in the proposed technique inspired by the idea of maximizing soft margin of traditional SVM. This
method allows the optimal domain description boundary to more skew toward the minority class than
traditional SVDD and thus improves the classification accuracy. Eventually, AdaBoost ensemble version
of DSMSM-SVDD is developed so as to further improve the generalization performance and stability in
dealing with imbalanced datasets. The extensive experimental results on various datasets demonstrate
that the proposed approach significantly outperforms other existing algorithms when dealing with the
imbalanced classification problems in terms of G-Mean, F-Measure and AUC performance measures.
© 2021 Elsevier B.V. All rights reserved.
https://doi.org/10.1016/j.knosys.2021.106897
0950-7051/© 2021 Elsevier B.V. All rights reserved.
X.M. Tao, W. Chen, X.K. Li et al. Knowledge-Based Systems 219 (2021) 106897
we provide a brief introduction of cost-sensitive SVDD (CSVDD). instance, Zhang et al. [48] proposed a stacked denoising autoen-
The relative density-based penalty weights and maximum soft coder neural network (CSO-SDAENN) algorithm based on cost-
margin regularization term are defined in Section 4, which also sensitive oversampling by combining the cost-sensitive learning
includes the detailed description of the proposed DSMSM-SVDD with denoising autoencoder neural network (DAENN). Although
ensemble approach. In Section 5, the experimental results of some successful results are reported in the literatures, it is very
the proposed method compared to other algorithms on vari- difficult to learn and determine the value of cost in the practical
ous datasets selected from UCI repository (the UC Irvine Ma- application, such that cost-sensitive learnings are application-
chine Learning Repository) are presented. Finally, we draw some dependent classification algorithm and are only applied in a cer-
conclusions in Section 6. tain context [49]. Ensemble schemes can effectively handle the
imbalanced dataset problems by the final weighted voting from
2. Related work all classifiers learned by the incorporation of different resam-
pling strategies such as bagging [50], boosting [51], voting [52],
In order to solve the imbalanced classification issues, various and stacking [53]. Among them, boosting ensemble strategy has
solutions have been reported in the literature [27–29], which been indicated by some studies to be an effective technique
can roughly be categorized as data level and algorithm level. for improving the generalization performance of existing learn-
At the data level, the solution objective is to re-balance the ing algorithms [23]. Although various learning algorithms can
be integrated into boosting scheme, ensembles using support
data distribution by re-sampling in the data space and then use
vector machines (SVM) as a base classifier have been reported
the rebalanced datasets to train conventional bi-classifier, which
to achieve good classification performance [23]. The simplest
mainly includes under-sampling the instances of the majority
scheme of SVM ensembles is to embed SVMs into a standard
class [30,31] and over-sampling the instances of the minority
Adaboost framework which is the most popular boosting method
class [32–34], and sometimes involves the combination of the
proposed by Freund and Schapire [54]. In order to boost more
above two techniques. When combined with bi-class classifica-
weights on minority class instances, researchers attempted to
tion algorithms, especially SVM, the under-sampling techniques
make the boosting framework cost-sensitive by adjusting weights
often tend to cause the classification boundary skewing toward
of instances according to not only their classification outputs of
the majority class. This is due to the fact that under-sampling
previous classifier but also their class labels, such as AdaCost [55],
techniques only extract the subset of the majority class instances
CSB1 and CSB2 [56], and a series of AdaC [57]. Although cost-
to train the classifier, and consequently neglect the whole struc-
sensitive Adaboost schemes were reported to perform relatively
tural distribution information of the majority class. On the other
well and stably, they still require users to pre-specify misclassi-
hand, traditional over-sampling techniques re-balance training
fication costs and belong to application-dependent algorithms. In
instances only by randomly replicating the original minority class
addition, in order to avoid the bias of SVM toward the majority
instances. However, these methods do not add any new useful
class due to accuracy-oriented in addressing imbalanced datasets
information for minority class during learning, which can lead
issues, Lee et al. [58] applied cost-sensitive SVMs as weak learners
to model over-fitting. As an extended variant of traditional over-
of standard Adaboost scheme. However, such approach which
sampling technique, Synthetic Minority Over-sampling Technique only considers replacing original accuracy-oriented SVMs with
(SMOTE) [35,36] and its improved versions [37–41] re-balance cost-oriented SVMs in boosting scheme is inconsistent with the
the data distribution by creating synthetic minority class in- Adaboost strategies based on exponential loss function and thus
stances among randomly selected minority class instances. Al- achieves no significant performance improvement [23].
though it was reported that SMOTE-like versions show better From the above analysis, we can find that both data-level
performance in some imbalanced classification cases, the scope and algorithm level approaches usually seek to serve or improve
of classifier decision domain would be reduced influenced by bi-class classification algorithms. In addition to bi-class classifi-
the increase of the synthetic minority class instances, which cation, one-class classification algorithm has been attempted to
thus results in the failure of avoiding model over-fitting [42]. deal with imbalanced problems by some researchers, especially
Besides, the generation of synthetic instances is also likely to for highly skewed imbalanced datasets [59,60] since it usually
produce additional noisy overlapping samples [42], thus reducing only depends on the majority instances easy to be obtained
classification accuracy. and requires no minority instances available for training the
At the algorithm level, the solutions try to adjust the pa- classifier. Due to its special characters, one-class classification
rameters of existing classifier learning algorithms or modify the has been extensively applied in numerous real-world domains
existing classifier learning framework so as to bias toward mi- including machine fault detection [61], medical diagnosis [62],
nority class, such as the cost-sensitive learnings and ensemble image retrieval [63,64], etc. One of the most widely used one-
schemes. Cost-sensitive learnings [43–45] can reduce the over- class classification approaches, support vector domain description
all misclassification rate by assigning the large misclassification method (SVDD) [24,25] tries to construct a hypersphere sur-
cost to minority instances while the low one to majority in- rounding most of majority class data in the feature space. A
stances during classifier learning. For instance, Zhou et al. [46] test sample is labeled as major class if it is enclosed by this
proposed a cost-sensitive neural network (NN) by means of sam- hypersphere, and minority class otherwise. When dealing with
pling and threshold-moving. By combining two above techniques imbalanced problems, as the extended variant of SVM for one-
via hard or soft voting schemes, the hard-ensemble and soft- classification, SVDD does not require any minority class data
ensemble methods were also developed. In order to classify non- available since the construction of its hypersphere is only de-
stationary and imbalanced data streams, online cost-sensitive pendent on majority class data, which enables it applicable for
neural network classifiers are presented using one-layer NNs [47]. imbalanced problems especially for the extreme case without
In the proposed classifiers, two separate cost-sensitive strategies: no minority instances available [65]. However, in some real-
a fixed and an adaptive misclassification cost matrix are used world cases, there do exist, although very few, minority class
to handle class imbalance. In addition, autoencoder neural net- instances. In order to utilize these rare minority class instances
work based on deep learning, which has gained a huge success to refine the boundary of SVDD, Lazzaretti et al. [66] further put
in machine leaning domain has also been attempted by some forward a cost-sensitive SVDD (CSVDD), which incorporates the
researchers to address imbalanced classification problems. For total misclassification cost into optimization objective function,
3
X.M. Tao, W. Chen, X.K. Li et al. Knowledge-Based Systems 219 (2021) 106897
Motivated by the above ideas, we incorporate relative density- Inspired by the idea of traditional SVM, we introduce max-
based misclassification penalty weights into the optimization ob- imum soft margin regularization term to the optimization ob-
jective function of CSVDD, which can effectively reflect the im- jective function of CSVDD, which enables it to make fully use
portance of different instances in the learning hypersphere. Con- of the rare minority data to refine the classification boundary,
cretely, the proposed approach pays more attention to the in- thus improving the generalization performance when dealing
stances with higher densities by assigning high penalty weights, with imbalanced problems. Combining the above two improve-
so that those instances tend to be included in the optimal hyper- ment strategies, we present a novel SVDD which incorporates
sphere as much as possible. In contrast, the proposed approach the relative density-based penalty weights and maximum soft
assigns low penalty weights to the instances with lower densities, margin regularization term into the optimization objective func-
which makes those instances more likely to be excluded from the tion to improve the generalization performance when dealing
optimal hypersphere, thus eliminating the effect of the outliers. In with imbalanced problems, especially with the rare minority data
this study, we utilize the exponentially weighted Parzen-window available. The proposed approach is described in detail below.
Given a set of training instances (xi , yi , ρi ) , i = 1, 2, . . . ,
{
density estimation technique to calculate the relative density for
each training instance. Here, assume that X = [x1 , x2 , . . . , xN ] is N + N , where ρi is the estimated relative density of xi , which
}
a given training majority set, where N is the number of majority is calculated by exponentially weighted Parzen-window density
training instances. The relative density ρi ≥ 1 for xi is expressed estimation technique described in Section 4.1. To obtain a more
as follows: flexible description of the majority class, we first transform the
Par (xi )
{ }
training samples into a high dimensional feature space F using a
ρi = exp ω × , ∀i = 1, 2, . . . , N (7) nonlinear mapping function ϕ (·), and then compute the smallest
ς
enclosing hypersphere S which is characterized by its center a ∈
N (
F and radius R > 0. The above idea can be formulated into the
√ ) (
∑ 2 )
Par (xi ) = (1/N ) 1/ (2π ) s exp − (1/2s) xi − xj
D
following optimization problem:
j=1 ⎧ ⎫
N N
(8) ⎨ ∑ ∑ ⎬
∑N argmin R2 − Md2 + C1 ρi ξi + C2 ρj ξj (9)
where ς = (1/N) i=1 Par (xi ); D is the feature dimension of a,R,d,ξi ,ξj ⎩
i,yi =1 j,yj =−1
⎭
input data; ω is the weight factor and s is the smoothing param-
yi R2 − ⟨ϕ (xi ) − a, ϕ (xi ) − a⟩ ≥ d2 − ξi
( )
eter of the Parzen-window density estimation technique. Note (10)
that the higher ρi for data xi means the more compact the region ξi ≥ 0, ∀i = 1, 2, . . . , N + N (11)
where xi is located among its corresponding class. From the above
definition, we can find that the density for each majority instance where d is the distance between the hypersphere S and the
is only dependent on majority training instances rather than all closest majority or minority class training instances identified
training instances. This is because that in this study we just used correctly; M ≥ 1 is the regularization parameter where controls
to denote the relative densities to each other among majority the trade-off between the volume of the hypersphere S and the
class, and therefore is called relative density here. In addition, it is margin between the majority class and the minority class.
worth pointing out that in dealing with imbalanced problems, the By introducing Lagrange multipliers, the above constrained
minority class instances are rare, which makes the corresponding optimization problems can be formulated into:
relative densities difficult to be accurately estimated. Moreover, N
∑ N
∑
all minority instances are regarded to be more valuable than L a, R, d, ξi , ξj , α, β = R2 − Md2 + C1 ρi ξ i + C 2 ρj ξj
( )
majority instances in imbalance scenarios. Therefore, considering i,yi =1 j,yj =−1
that the penalty constants C2 for minority class is significantly
N +N
larger than the penalty constants C1 for majority class, we usually ∑ )] N∑
+N
αi d2 − ξi − yi R2 − ⟨ϕ (xi ) − a, ϕ (xi ) − a⟩ − βi ξi
[ (
need to set the relative densities of all minority class instances to +
uniformly be 1. i=1 i=1
(12)
4.2. The introduction of maximum soft margin
where) α = α1 , α2 , . . . , αN +N , αi ≥ 0 and β = β1 , β2 , . . . ,
( ) (
Recall that CSVDD can utilize rare minority class instances to βN +N , βi ≥ 0 are the Lagrange multiplier vectors. According to
improve the classification performance by incorporating the dif- the Karush–Kuhn–Tucker (KKT) condition, we can obtain the dual
ferent misclassification costs into optimization objective function. formulations of the above optimization problems: (the detailed
However, through the definition of the optimization objective process of derivation is described in Appendix).
function, we can find that the improvement of classification per-
⎧ ⎫
⎨N∑
+N N +N
)⎬
formance is implicitly based on an assumption that the total
∑
αi yi k (xi , xi ) − αi αj yi yj k xi , xj
(
argmax (13)
misclassification cost of all training minority instances are not α
i,j=1
⎩ ⎭
i=1
zero. In some cases, especially highly skewed imbalanced datasets
with only a few minority instances available, due to the fact that s.t.
the total misclassification cost of all training minority instances N +N
is often zero, the different misclassification costs fail to adjust
∑
αi yi = 1 (14)
the classification boundary as expected. However, in numerous
i=1
real-world imbalanced problems, there do exist, although rare,
N +N
minority class instances. For example, in machine fault detection, ∑
in addition to a large number of measurements under normal αi = M (15)
working conditions, there may be also some valuable measure- i=1
ments under faulty situations. Although they are not sufficient to αi ≥ 0, ∀i = 1, 2, . . . , N + N (16)
be used to construct bi-classifier, they can be incorporated into
0 ≤ αi ≤ C1 ρi , ∀i, yi = 1, i = 1, 2, . . . , N (17)
training process of SVDD to refine the hypersphere enclosing the
majority data. 0 ≤ αj ≤ C2 ρj , ∀j, yj = −1, j = 1, 2, . . . , N (18)
5
X.M. Tao, W. Chen, X.K. Li et al. Knowledge-Based Systems 219 (2021) 106897
As shown in Eq. (13), the optimization problem is a con- The minority training instances that lie outside the minority class
strained quadratic convex optimization problem. α can be calcu- boundary satisfy:
lated by the convex quadratic programming methods.
αj = C2 ρj ⇒∥ ϕ xj − a ∥2 < R2 + d2 , ξj > 0
( )
(22)
Let γi∗ = αi yi , ∀i = 1, 2, . . . , N + N, and we obtain the center
a and radius R of hypersphere S as follows: We can draw the following conclusions from Eq. (22): (1) For
a given majority class training instance xi , only when it is located
N +N
∑ on or outside of the majority class boundary, its corresponding αi
a= γi∗ ϕ (xi ) (19) is nonzero. (2) For a given minority class training instance xj , only
i=1 when it is located on or outside of the minority class boundary,
its corresponding αj is nonzero. (3) For the other remaining
⎛ ⎞
[ Nn N +N N +N
1 1 ∑
instances, their corresponding αi , αj are zero and have no effect
∑ ∑
R2 = γi∗ k (xl , xi ) + γi∗ γj∗ k xi , xj ⎠
( )
⎝1 − 2
2 Nn on the final optimal solution.
l=1 i=1 i,j=1
⎛ ⎞⎤
Na N +N N +N
1 ∑ ∑ ∑ 4.3. Range determination of the regularization parameter M
γi k (xm , xi ) +
∗
γi γj k xi , xj ⎠⎦
∗ ∗
( )
+ ⎝1 − 2
Na
m=1 i=1 i,j=1 For a given majority class instance xi , if it lies outside of the
(20) majority class boundary, i.e., ∥ ϕ (xi ) − a ∥2 > R2 − d2 , its corre-
sponding slack variable ξi > 0 (refer to Eq. (22)). According to the
where Nn represents the number of the support vectors of the KKT conditions (more details can be found in Appendix)−βi ξi =
majority class; Na represents the number of the support vectors 0, thus βi = 0, and then αi = C1 ρi (refer to Eq. (A.5) in Appendix).
of the minority class; xl is the support vector of the majority In the same manner, for a given minority class( instance xj , if it lies
class; xm is the support vector of the minority class; k(xi , xj ) is outside the minority class boundary, i.e., ∥ ϕ xj − a ∥2 < R2 + d2 ,
)
the Gaussian kernel function. its corresponding ξj > 0 (refer to Eq. (22)). According to the KKT
To determine whether a test instance xnew is within the hyper- conditions −βj ξj = 0, thus βj = 0, and then αj = C2 ρj (refer to
sphere S, we firstly need to calculate the distance of xnew to the Eq. (A.6) in Appendix). Subsequently, according to the equation
center of the hypersphere S. The concrete formula of the distance
∑N +N
constraints: i=1 αi = M and αi ≥ 0, ∀i = 1, 2, . . . , N + N, we
can be expressed as: can obtain:
2 Pm+ Pm−
N +N ∑ ∑
M > C1 ρi + C 2 ρj
∑
ϕ ( ) γ ϕ ( )
2
dnew = xne w − ∗
xi
(23)
i
i=1 i=1 j=1
where Pm+ is the number of the majority class instances which lie
N +N
∑ N +N
∑ outside of the majority class boundary. Pm− is the number of the
γi∗ γj∗ k xi , xj − 2 γi∗ k (xnew , xi ) minority samples which lie outside the minority class boundary.
( )
=1+ (21)
i,j=1 i=1
Furthermore, as 0 ≤ αi ≤ C1 ρi , ∀i, yi = 1, i = 1, 2, . . . , N,
0 ≤ αj ≤ C2 ρj , ∀j, yj = −1, j = 1, 2, . . . , N, and for a given
A test instance xnew is accepted as the target class (majority majority class training instance xi , only when it is located on
class) if its distance to the center of the hypersphere S is smaller or outside of the majority class boundary, its corresponding αi
than the radius R of the hypersphere S, that is, dnew 2 ≤ R2 , and is nonzero; for a given minority class training instance xj , only
otherwise rejected as outliers (minority class). when it is located on or outside of the minority class boundary,
For the convenience of analysis, we firstly give the following its corresponding αj is nonzero We can also conclude:
definitions: Pm+ +Qm+ Pm− +Qm−
Decision boundary: ∥ ϕ (xi ) − a ∥2 = R2 or ∥ ϕ xj − a ∥2 = R2 .
( )
∑ ∑
Majority class boundary: ∥ ϕ ((xi )) − a ∥2 = R2 − d2 . M < C1 ρi + C2 ρj (24)
Minority class boundary: ∥ ϕ xj − a ∥2 = R2 + d2 i=1 j=1
According to the KKT optimality conditions, we can obtain: where Qm+ is the number of the majority class instances which
The majority training instances that lie inside of the majority class lie on the majority class boundary. Qm− is the number of the
boundary satisfy: minority class samples which lie on the minority class boundary.
In other words, Pm+ +Qm+ is the total of the majority class support
αi = 0 ⇒∥ ϕ (xi ) − a ∥2 < R2 − d2 , ξi = 0 vectors and Pm− +Qm− is the number of the minority class support
The minority training instances that lie inside of the minority vectors.
class boundary satisfy: Combining the above two inequality constraints, we can ob-
tain the following expression:
αj = 0 ⇒∥ ϕ xj − a ∥2 > R2 + d2 , ξj = 0
( )
Pm+ Pm− Pm+ +Qm+ Pm− +Qm−
∑ ∑ ∑ ∑
The majority training instances that lie on the majority class C1 ρi + C 2 ρj < M < C 1 ρi + C 2 ρj (25)
boundary satisfy: i=1 j=1 i=1 j=1
0 < αi < C1 ρi ⇒∥ ϕ (xi ) − a ∥2 = R2 − d2 , ξi = 0 4.4. The necessity of introducing relative density-based penalty
weights
The minority training instances that lie on the minority class
boundary satisfy:
The inequality (24) constrains the upper bound of regulariza-
0 < αj < C2 ρj ⇒∥ ϕ xj − a ∥2 = R2 + d2 , ξj = 0 tion coefficient M. As Pm+ + Qm+ < N and Pm− + Qm− < N,
( )
inequality (24) can further be simplified as:
The majority training instances that lie outside of the majority
Pm+ +Qm+ Pm− +Qm− N N
class boundary satisfy: ∑ ∑ ∑ ∑
M < C1 ρi + C2 ρj < C 1 ρi + C 2 ρj (26)
αi = C1 ρi ⇒∥ ϕ (xi ) − a ∥2 > R2 − d2 , ξi > 0 i=1 j=1 i=1 j=1
6
X.M. Tao, W. Chen, X.K. Li et al. Knowledge-Based Systems 219 (2021) 106897
Assume that DSMSM-SVDD does not consider the relative width σ , the regularization coefficient M and the number of
density of each instance, let ρi = 1, ρj = 1, i = 1, 2, . . . , N , j = maximum iterations T .
1, 2, . . . N, and inequality (26) can be formulated into an alterna-
The procedure:
tive expression:
1: Initialize the weight vectors: t = 1, Dt (i) = m 1
,i =
M < C1 N + C2 N (27)
1, 2, . . . , m.
The inequality constraint (27) indicates that the upper bound 2: while t ≤ T do
of M entirely depends on the value of nonnegative penalty con- 3: Use Dt (i) to randomly select m training instances with
stant factors C = [C1 , C2 ]. As mentioned in Section 3, for handling replacement from the original training datasets, and put
the imbalanced classification problems, the ratio of the value of the selected ones in the new current training dataset S tr .
nonnegative penalty constant factors for minority class to one for 4: Train the tth DSMSM-SVDD classifier, ht → Y using S tr
majority class is empirically set to C2 = N C1 , such that M < 2C1 N. and the given parameters.
N
When C1 is set to be small for good generalization performance, 5: for i = 1 to m
M value is relatively small constrained by its upper bound and 6: Predict the class label of xi by the above-trained base
thus the effect of the maximum soft margin regularization term classifier ht , and let the predicted class label be ŷi,t .
on the improvement of the generalization performance is not 7: end for ∑m t
significant. Accordingly, ρi ≥ 1, ρj ≥ 1 would increase the 8: ϵt = i=1 D (i) × I [ŷi[ ,t ̸ = yi ], where
] I [·] is an indicator
upper bound of M and thus enable the value of M more flexi- function,
[ if
] ŷ i, t ̸ = y i , I ŷ i, t ̸ = y i = 1 and otherwise
ble, which provides theoretical evidence that introducing relative I ŷi,t ̸ = yi = 0.
density-based penalty weights is necessary. To facilitate reading, 9: if ϵt ≥ 1/2 then
the notation and symbols used in this paper are summarized in 10: T = t − 1 and abort loop
Table 1. 11: else if ϵt = 0 then αt = 10, where ϵt = 0 noted that all
predicted
4.5. The ensemble scheme of DSMSM-SVDD xi belong to the same class, and the maximum value of
αt is set to be 10.
Recall that ensemble of classifiers has been proven to be an 12: else if 0 < ϵt < 1/2 then
effective strategy for improving generalization performance by
combining each decision of individual classifier into a final voting 1 1 − ϵt
αt = min(10, ln( ))
result. When dealing with imbalanced datasets, standard bi-class 2 ϵt
learning methods pay less attention to the minority instances 13: end if
since they are designed with the aim of maximizing overall classi- 14: Update and normalize instance weight vectors:
fication accuracy. Such an aim tends to result in poor performance
D(t ) (i) exp(αt (I ŷi,t ̸ = yi − I ŷi,t = yi ))
[ ] [ ]
on the minority class due to the introduction of bias error [23]. D(t +1) (i) = ,
This introduced bias error can be successfully alleviated in Ad- Zt
aBoost algorithm by focusing more on misclassified instances. i = 1, 2, . . . , m
Specifically, given an imbalanced dataset, the instances in the
D(t +1) (i) is a normalization factor.
∑m
where Zt =
minority class are often misclassified by standard classification i=1
algorithms due to the effect of overall accuracy-oriented opti- 15: end while
mization objective. In such case, when Adaboost technique is 16: Return
applied, those often misclassified minority class instances can Output: ht and αt for all built DSMSM-SVDD classifiers.
be assigned more weights to increase the chance to be selected
into the next training dataset. Hence, the AdaBoost technique When the ensemble scheme stops, a number of built DSMSM-
seems to have great potentials in improving the classification SVDD classifiers are obtained. We can use them to determine to
performance on the minority class. In fact, the weighting strategy which a test instance xp belongs. Suppose that DSMSM-SVDD1 ,
of AdaBoost can be seemed as a resampling technique combining DSMSM-SVDD2 , · · ·, DSMSM-SVDDL are L base classifiers returned
both over-sampling and under-sampling. Therefore, in essence, it by the above procedure and their corresponding predicted re-
also belongs to data-level technique, which enables it applicable sults are ŷp,t , t = 1, 2, . . . , L. The final class label of xp is then
for most classification methods without any modification about determined by
them. The above-mentioned advantages of AdaBoost make it an L
∑
attractive technique in dealing with the imbalanced datasets. H (x) = argmax( αt I(ŷp,t = y)) (28)
In addition, one-class classifier usually has poor stability since y∈{1,−1}
t =1
it only depends on the majority instances to train the model
and rarely utilizes the minority instances to refine the classifica- 4.6. The computation complexity of the proposed method
tion boundary. Therefore, in this study, we develop an ensemble
scheme using DSMSM-SVDD as basic classifier so as to further Recall that the proposed DSMSM-SVDD method consists of
improve its generalization performance and stability in deal- training process and prediction process. In the training phase of
ing with imbalanced datasets. The Pseudocode for the proposed the modal, the proposed approach primarily involves the calcula-
DSMSM-SVDD ensemble scheme is shown in the following. tion of relative densities for all training majority instances and
SVDD modal training using all training instances. The compu-
Algorithm (Ensemble Scheme Using DSMSM-SVDD as Basic Classi- tation complexity of calculating relative densities for all train-
fier). ing majority instances includes calculating Euclidean distances
Input: the labeled training instances {(x1 , y1 , ρ1 ) , (x2 , y2 , ρ2 ) , among all training majority instances, and all relative densities,
. . . , (xm , ym , ρm )}, where xi ∈ RD is an instance with D-tuple of which are respectively O(N 2 ) and O(N), where N denotes the
attribute values and yi ∈ Y = {−1, 1} is a label, m = N + N number of all training majority instances. The computational
is the size of the whole training datasets, as well as the optimal complexity of training DSMSM-SVDD modal is O(D(N + N)2 ),
nonnegative majority penalty constant factor C1 , Gaussian kernel where D is the feature dimension of input data, and N represents
7
X.M. Tao, W. Chen, X.K. Li et al. Knowledge-Based Systems 219 (2021) 106897
Table 1
Notation and symbols used in this paper.
Symbols Description
(xi , yi ) The ith training instance and its corresponding class label
D The feature dimension of input data
N, N The numbers of training majority(minority) instances
S The obtained hypersphere
a The center of the obtained hypersphere
R The radius of the hypersphere
ξi The slack variable for the ith majority instance
ξi The slack variable for the jth minority instance
ϕ (·) The map function for original space to kernel ones
C1 The penalty constant for majority class
C2 The penalty constant for minority class
IR The imbalance ratio
k (·) The kernel function
σ Gaussian kernel width
αi Lagrange multiplier of the ith majority instance
αj Lagrange multiplier of the jth minority instance
ρi The relative density of the ith majority instance
ρj The relative density of the jth minority instance
ω The weight factor
s The smoothing parameter
d The distance
M The regularization parameter
xl , xm xl is SV of majority class; xm is SV of minority class
Na , Nn Na , Nn are the numbers of majority and minority class SVs
Pm+ The number of the majority instances which lie outside of the majority class boundary
Pm− The number of the minority instances which lie outside the minority class boundary
Qm+ The number of the majority instances which lie on the majority class boundary
Qm− The number of the minority samples which lie on the minority class boundary
the number of all training minority instances. Note that the Table 2
calculation of Euclidean distances and relative densities for all Confusion matrix.
majority instances belongs to preprocessing so that they can be Predict positive class Predict negative class
pre-implemented before training. Therefore, the overall compu- True positive class TP FN
tational complexity of the proposed approach in training phase True negative class FP TN
The five performance evaluation measures are defined as fol- denote the target class training instances and five additional
lows: noise samples are labeled in two-dimensional feature space. The
Sensitivity of positive class sample (Sensitivity): points labeled by the black circle denote support vectors, and
the black solid lines represent the classification boundaries of
TPR = Sensitivity = TP/(TP + FN)
the resulting hypersphere optimized after CSVDD and DSMSM-
Precision of positive class sample (Precision): SVDD techniques, which are demonstrated in Fig. 1(a) and (b),
respectively.
Precision = TP/(TP + FP) As has been shown in Fig. 1(a), due to its sensitivity to the
Specificity of negative class sample (Specificity): noises and the effect of the low nonnegative penalty constant
factor for generalization performance, several target class training
TNR = Specificity = TN/(TN + FP) instances are also rejected as outliers, eventually producing an
Geometric mean accuracy (G-M): unreasonable classification boundary. On the other hand, as has
√ been shown in Fig. 1(b), DSMSM-SVDD tends to cover most of
G−M= Sensitivity · Specificity the target class training instances with relatively high densities
and simultaneously reject the noisy samples with low densities.
F-Measure metric of positive class sample (F-M):
Through comparing the results shown in Fig. 1(a) and (b), we can
2 × Sensitivity × Precision conclude that CSVDD does not take into account the different
F−M=
Sensitivity + Precision distributions of the training data in its optimization objective
function, which results in the misclassification of several target
Area under Receiving Operator Characteristic (AUC) is another
class training instances located in edge region and thus under-
widely used evaluation metric for the performance of classifiers
fitting. On the contrary, in the proposed DSMSM-SVDD approach,
especially in imbalanced datasets scenarios. It is referred to as
the relative density-based penalty weights are introduced to the
the area under ROC graph and is not sensitive to the distribution
optimization objective function, which enables the resulting hy-
of two classes. The ROC graph can be obtained by plotting the
persphere to include the region with relatively high densities and
True Positive Rate (Sensitivity) over the False Positive Rate (1-
simultaneously reject the noises with low densities, thus avoiding
Specificity). In order to facilitate plotting ROC curve, we adopt
under-fitting and thus improving the generalization capacity of
positive class membership probabilities output of the proposed
DSMSM-SVDD.
algorithm as scores instead of hard out obtained by sign function
as probabilistic SVM suggested by the reference [68]. The positive
5.2.2. Influence of the maximum soft margin regularization term
class membership probability for xi for the proposed algorithm is
Subsequently, to intuitively illustrate the necessity of adding
calculated by the sigmoid function:
the maximum soft margin regularization term in optimization
1 objective function of the proposed approach, we conducted the
scorei = (29)
1 + exp(−(dnew 2 − R2 )) following comparative experiments with CSVDD on the two-
dimensional artificial datasets including two class imbalanced
where dnew 2 is the output of the proposed algorithm, and R is data. The majority class is the same as the previously used
the radius of the obtained hypersphere S by the proposed algo- dataset, which is also regarded as target class, while the minority
rithm. Similarly, the output of the proposed DSMSM-SVDD en-
class contains only 20 training instances generated from the
semble method is transformed into the positive class membership
Gaussian distribution with mean [1, 1] and variance [0.5, 0.5]. The
probability as follows:
imbalance ratio is about equal to 10:1. For CSVDD and the pro-
∑L
t =1 αt I(ŷp,t = positive class) posed method, we conducted 5-fold stratified cross-validation to
scorei = ∑L (30) determine the best parameter configurations: the preliminary ex-
t =1 αt periments proved that the proposed method reach promising per-
In this study, F-Measure, G-Mean, and AUC are used as the formance when the value of C1 and σ were chosen from a specific
performance measures to compare different methods. scope. Therefore, the majority class penalty constant C1 and σ by
grid search method from the set {10−3 , 10−2 , 10−1 , 100 , 101 , 102 }
5.2. Performance comparison on artificial datasets and {2−3 , 2−2 , 2−1 , 20 , 21 , 22 , 23 }. G-Mean metric is selected as
the cross-validation criterion as it is only a criterion that con-
5.2.1. Influence of the relative density-based penalty weights siders all values in the confusion matrix and thus can provide
To intuitively illustrate the influence of relative density-based more reliable measure. To highlight the essentiality of introducing
penalty weights in the proposed DSMSM-SVDD approach on the the maximum soft margin regularization term in the following
classification boundary, we carried out the following compara- comparison experiments, for DSMSM-SVDD, we set the same
tive experiments with CSVDD on the two-dimensional artificial optimal parameters as those of CSVDD: ρi = ρj = 1, C =
datasets, which contains 200 target class training instances gen- [C1 , C2 ] = [0.1, 1], and σ = 1 with the exception of M = 10.
erated from the Gaussian distribution with mean [−1.4, −1.4] Comparison classification results between CSVDD and DSMSM-
and variance [0.6, 0.6] and five noisy samples. For the conve- SVDD are shown in Fig. 2. The blue circle points denote the
nience of comparison, we empirically set nonnegative penalty target class training instances and the black plus points denote
constant factor C1 = 0.1, Gaussian kernel width σ = 4.5 for minority class instances in two-dimensional feature space. The
CSVDD and the proposed method with the only difference that points labeled by the black circle denote support vectors. The
the propose method has relative density-based penalty weights. black solid lines represent the resulting classification boundaries
For the proposed DSMSM-SVDD, we set the weight factor ω = 2, of the hyperspheres optimized by CSVDD and DSMSM-SVDD,
the smoothing parameter s = 10. In addition, to evaluate the which are shown in Fig. 2(a) and (b), respectively.
effectiveness of the relative density-based penalty weights inde- From the results shown in Fig. 2(a) and (b), we can find
pendently, the used DSMSM-SVDD model does not contain the that although CSVDD can correctly classify the majority class
maximum soft margin regularization term so that M is set to be 0. training instances and the minority class training instances, its
Classification boundaries obtained by CSVDD and DSMSM-SVDD obtained classification boundary is still tightly around the ma-
on the artificial datasets are shown in Fig. 1. The blue circle points jority class training instances. It is mainly due to the fact that
9
X.M. Tao, W. Chen, X.K. Li et al. Knowledge-Based Systems 219 (2021) 106897
Fig. 1. The effect of relative density-based penalty weights on the classification results..
Fig. 2. The effect of maximum soft margin regularization term on the classification results.
misclassification cost plays no role in improving the resulting 5.2.3. The relationship between M and C
hypersphere in this case, thus producing a poor generalization In order to intuitively demonstrate the relationship between
performance. In contrast, due to the introduction of the maxi- the parameters M and C = [C1 , C2 ], we carried out separately
mum soft margin regularization term, which allows the proposed the following two experiments on the artificial datasets used in
approach to use the rare minority training instances to refine the Section 5.2.2 previously. The goal of one experiment is to verify
the effect of C on the range of M. We set C = [0.1, 1] and observe
obtained hypersphere during training, the classification boundary
the change of the resulting classification boundary position and
of DSMSM-SVDD tends to shift toward the middle of the two
support vectors distribution through gradually increasing M from
class training instances, thus producing a better generalization
1 to 1000. Other parameters for the proposed method are the
performance. From the results shown in Fig. 2(a) and (b), we same as the before. In addition, For the convenience of indepen-
can conclude that when dealing with some imbalanced problems, dently investigating the relationship between C and M, we let the
especially with only a few minority class instances, since the sum parameters ρi = 1, ρj = 1 for all training instances in DSMSM-
of misclassification cost equals to zero in such cases, the penalty SVDD and only consider the influence of the maximum soft
terms of misclassified cost play no role in adjusting classification margin regularization term. In such case, according to Eq. (25),
boundary thus producing an undesirable hypersphere. On the we can further obtain the following simpler expression about it:
other hand, because the proposed approach depends on not only
C1 Pm+ + C2 Pm− < M < C1 Pm+ + Qm+ + C2 Pm− + Qm−
( ) ( )
(31)
the penalty terms but also the margin between two class training
instances to refine the classification boundary, the problem of The experimental results are shown in Fig. 3.
classification boundary skewing toward the majority class can As shown in Fig. 3, it can be seen that when the weight
be effectively avoided. Consequently, the generalization capabil- of the maximum soft margin regularization term is small, such
as M = 1, the classification result of DSMSM-SVDD seems to
ity can significantly be improved even in dealing with severely
resemble that of CSVDD in terms of classification boundary po-
imbalanced cases. Compared to CSVDD, DSMSM-SVDD seems to
sition. It indicates that the maximum soft margin regularization
behave similarly to SVM due to the introduction of the maximum
term plays little effect on classification performance due to the
soft margin generalization term. However, compared to SVM the small M value. When M is increased from 1 to 10 or 30 the
advantage of DSMSM-SVDD is that it is not significantly affected resulting classification boundary tends to shift toward the middle
by a small number of minority class instances, which makes it of two class training instances and exhibits the same classifica-
more suitable for dealing with imbalanced datasets, especially tion characteristics as ones obtained by SVM under the balanced
highly imbalanced ones. datasets, reducing the possibilities of misclassifying majority class
10
X.M. Tao, W. Chen, X.K. Li et al. Knowledge-Based Systems 219 (2021) 106897
Fig. 3. Classification results of DSMSM-SVDD change with different M values when fixed C .
increased, such as C = [1, 10] or C = [10, 100], the result- minority one and the rest classes are merged into the majority
ing classification boundary is almost unmodified compared with class. In addition, in order to obtain the imbalanced datasets
when C = [0.1, 1]. The corresponding G-Mean metrics under two with higher imbalance ratios, we select two UCI datasets with
parameter configurations obtained by 5-cross validation results high dimensions and large sizes including Pageblock and Yeast
are the same as C = [0.1, 1]. On the contrary, when M = 100, ones to generate some imbalanced datasets through different
the classification boundary does not move toward the middle class combinations. The detailed class combinations regarding
of the two classes until C = [1, 10]. This is because that if M these generated datasets are indicated in the Category column in
is relatively large and simultaneously C is small, more support Table 3.
vectors are required constrained by upper bound of M, which
can result in over-fitting and eventually deteriorate the gener- 5.3.2. Influence of relative density parameters on DSMSM-SVDD
alization performance, such as C = [0.01, 0.1] , M = 100 or For the proposed DSMSM-SVDD, besides M value which is
C = [0.1, 1] , M = 100. In summary, the theoretical and empirical associated with the maximum soft margin regularization term,
results indicate that the adjustment range of C is influenced by there are also two other parameters which have significant in-
the value of M. When the value of M is set be relatively large, fluence on the performance of the proposed method: smoothing
in order to reduce the number of support vectors and thus avoid parameters s and weight factors ω need to be pre-specified re-
over-fitting, C must be accordingly increased. garding the relative density-based penalty weights. In the previ-
In order to avoid the choice of two separate parameters, we ous cases, we discussed the relation of M value and the penalty
give a simple strategy to set M according to C , which is usually constant C , and gave the simple strategy about the M value
required to be pre-specified along with the Gaussian width. To setting. In this section, in order to choose appropriate values for
effectively deal with imbalanced datasets, we generally set a the two parameters, we need to investigate the effect of different
relatively large value to M for DSMSM-SVDD so as to enhance the smoothing parameters s and weight factors ω on the classification
effect of the maximum soft margin regularization term. Therefore, performance of the proposed approach. We performed separately
we only consider the upper bound constraint of M. ( Since ρi, >
) 1 the two classification experiments on 5 selected datasets from Ta-
ρj > 1 for all training instances, we let M = NCN1 Pm+ + Qm+ + ble 3 including: Wine, Iris, Abalone, Pima and Ecoli. One is to ana-
N ×IR×C1
( ) lyze the classification performance of the proposed DSMSM-SVDD
Pm− + Qm− , which can still satisfy the upper bound under different smoothing parameters s ranged from 0.1 to 30.
N
shown in Eq. (25). Pm+ + Qm+ /N and Pm− + Qm− /N denote
( ) ( )
Similarly, the other is to analyze the classification performance
the fractions of the majority class and minority class support vec- of the proposed DSMSM-SVDD by increasing the weight factors
tors, respectively. Generally, in order to guarantee the satisfactory ω from 0 to 10 with interval 1. Note that the weight factor equal
generalization performance, we both set them to be 10%, such to 0 means that no relative densities are introduced into the pro-
that M = 10% × 2 × C1 × N. posed DSMSM-SVDD. For each parameter setting, we conducted
5-fold stratified cross-validation to determine the best parameter
5.3. Performance comparison on real-world datasets configurations: the majority class penalty constant C1 and σ by
grid search method from the set {10−2 , 10−1 , 100 , 101 , 102 } and
5.3.1. Experimental data configuration {2−3 , 2−2 , 2−1 , 20 , 21 , 22 , 23 }. M is set to be 10% × 2 × C1 × N
To evaluate the proposed method, 15 imbalanced datasets according to the previously given setting strategy. In order to
available from the UCI Machine Learning Repository [69] (http:// possibly preserve original between-class ratios, 5-fold stratified
archive.ics.uci.edu/ml/datasets.html) are used in this study. More cross validation was used and each experiments was repeated 3
detailed information about the 15 datasets is shown in Table 3. times to report their averaged metrics values to avoid random-
Those datasets with more than two classes were converted into ness influences on the results. The results of the two classification
bi-class datasets by means of one versus others strategy, where experiments in terms of G-Mean, F-Measure and AUC metric are
one of the classes with the relatively small size is labeled the shown in Figs. 6 and 7, respectively.
12
X.M. Tao, W. Chen, X.K. Li et al. Knowledge-Based Systems 219 (2021) 106897
Table 3
Description of the experimental datasets.
Dataset Attribute Minority class/Majority class Category Imbalance ratio
Wine 13 48/130 1:others 1:2.71
Iris 4 50/100 2:others 1:2.00
Abalone 8 67/259 16: 6 1:3.86
Pima 8 268/500 1:0 1:1.87
Ecoli 7 52/284 ‘pp’:others 1:5.46
Libra 90 24/336 15:others 1:14.00
Vehicle 18 199/647 ‘van’:others 1:3.25
Balance 4 49/576 ‘B’:others 1:11.76
Haberman 3 81/225 2:1 1:2.78
Car 6 65/1663 ‘v-good’:others 1:25.58
Liver 6 145/200 1:2 1:1.38
Seed 7 70/140 1:others 1:2.00
Spect 22 55/212 1:2 1:3.85
Pageblock 10 560/4913 others:1 1:8.77
Pageblock1 10 329/5144 2:others 1:15.63
Yeast 8 429/1055 2:others 1:2.46
Yeast1 8 244/1240 3:others 1:5.08
Yeast2 8 163/1321 4:others 1:8.10
Yeast3 8 51/463 5:1 1:9.07
Yeast4 8 35/429 7:2 1:12.25
Yeast5 8 20/463 9:1 1:23.15
Yeast6 8 51/1433 5:others 1:28.09
Yeast7 8 44/1440 6:others 1: 32.72
Yeast8 8 35/1449 7:others 1:41.40
Fig. 6. The change of three metrics of DSMSM-SVDD with different smoothing parameters.
Fig. 7. The change of three metrics of DSMSM-SVDD with different weight factors.
From the results, we can find that the averaged G-Mean, F- utilized just for the magnification or reduction of relative densi-
Measure and AUC metrics values on all selected UCI datasets ties absolute values of different samples while the relative ratio
do not significantly change along with the increasing of the of them keeps unchanged. In addition, the change of absolute
smoothing parameters and the density weights except for the values can be compensated due to the effect of later optimal
density weights equal to 0, where its all averaged performance penalty constant factors and thus have little effect on the clas-
metrics are relatively lower than those with relative densities. sification performance. Therefore, according to the above results
and discussion, to ensure good generalization performance for the
Since the 0 weight factor indicates no relative densities in the
proposed algorithm, the smoothing parameter can be set to the
proposed DSMSM-SVDD modal, the significant increase on three
range from 2 to 10, and the density weight can be empirically set
metrics from 0 to 1 can show the introduction of relative den- to 2 or 3.
sities is helpful to improving the performance of the proposed
algorithm. Moreover, no significant change on performance under 5.3.3. Comparison of computational complexity with C-SVDD
other parameters setting is possibly because that the smoothing In order to compare the computational complexity between
parameters (magnification times) and the weight factors are both the proposed DSMSM-SVDD and traditional C-SVDD, we provided
13
X.M. Tao, W. Chen, X.K. Li et al. Knowledge-Based Systems 219 (2021) 106897
the averaged training time and the number of support vectors M is set to be 10% × 2 × C1 × N according to the previously
(SVs) in Table 4 on 15 selected UCI datasets. Since the class pred- given setting strategy. The parameters for all compared methods
ication of unknown instances by SVDD-variants depends on the are optimized using stratified cross-validation in the training
number of support vectors discussed in Section 4.6, it can denote dataset based on G-Means performance measure. Table 5 shows
the computational complexity of the proposed DSMSM-SVDD in the experimental results of the mean and standard deviation for
execution phase. From the results, the averaged training time of all compared techniques on 24 datasets. The best measures are
the proposed DSMSM-SVDD are comparable to classical C-SVDD. highlighted in bold.
However, the average number of support vectors in the proposed From the results shown in Table 5, we can find that con-
DSMSM-SVDD is significantly smaller than that of C-SVDD on all ventional SVM performs poorly when dealing with imbalanced
15 datasets except for Wine. Therefore, we can conclude that datasets, especially for highly imbalanced ones such Balance,
the computational complexity of the proposed DSMSM-SVDD in Libra, Ecoli, Pageblock, Pageblock1, and all Yeast sub datasets
training phase is comparable to that of C-SVDD and is much lower except for Yeast2 and Yeast5. In particular, for Yeast6, Yeast7
than that of C-SVDD in execution phase. and Yeast8 with imbalance ratio greater than 28:1, the classifi-
cation performance of conventional SVM becomes significantly
5.3.4. Experimental results and discussion on UCI datasets deteriorated due to the extreme domination of majority class.
In order to verify the effectiveness of our proposed ensem- Compared to SVM, CSVDD works not better than conventional
ble DSMSM-SVDD approach in handling imbalanced datasets, SVM in some imbalanced datasets with relatively small imbalance
we performed comparable experiments on all datasets listed in ratio while outperforms conventional SVM in some highly imbal-
Table 3 with other imbalanced classification techniques: (1) SVM, anced datasets including Pageblock1, and all Yeast sub datasets
(2) CSVDD, (3) random under-sampling based SVM (RUSVM), (4) except Yeast2 and Yeast5. This is mainly because that compared
random over-sampling based SVM (ROSVM), (5) SMOTE over- to conventional SVM, CSVDD belongs to one-class classifier and
sampling based SVM (SMOTESVM), (6) BSMOTE over-sampling primarily depends on majority class instances to learn classifica-
based SVM (BSMOTESVM), (7) WKSMOTESVM over-sampling tion boundary, which makes it not significantly sensitive to the
based SVM proposed in [41], (8) cost-sensitive SVM (CSVM), small number of minority class instances. In addition, from the
(9) CSO-SDAENN, (10) AdaBoost SVM ensemble (AdaSVM), (11) results shown in Table 5, we can also find that CSVDD fails to ob-
AdaCost SVM ensemble (AdaCSVM) (12) SVM-based AdaBoost tain satisfactory results in terms of AUC, which shows that CSVDD
with weights determined by instance categorization (AdaBoos- classifier has no better robustness compared to SVM classifier.
tIC) presented in [58]. As a representative bi-class classifier, Unlike CSVDD, the proposed DSMSM-SVDD algorithm incorpo-
conventional SVM serves as the baseline method. CSVDD is clas- rates the relative density-based penalty weights and the maxi-
sical cost sensitive one-class classifier and is more related to mum soft margin regularization term into SVDD model, which
the proposed DSMSM-SVDD. Four data pre-processing meth- makes it able to effectively avoid the effect of noises and make
ods which include Random under-sampling (RU), Random over- use of rare minority class instances to refine the resulting hy-
sampling (RO), the synthetic positive over-sampling technique persphere, thus resulting in better performance regardless of
(SMOTE), the borderline BSMOTE (BSMOTE), and WKSMOTE are G-Mean, F-Measure and AUC metrics. Compared to other algo-
selected to combine with an SVM algorithm. CSVM is widely rithms, the proposed DSMSM-SVDD ensemble method can obtain
used algorithm-level imbalanced classification method based on the best results in at least one of performance measures in all
cost-sensitive strategy. Since the proposed ensemble DSMSM- datasets. In addition, performance measures for the proposed
SVDD approach belongs to ensemble learning, the benchmarking method do not vary significantly over the five-fold cross vali-
method must include ensemble-like methods. Consequently, we dation and 3 iterations on all datasets except Yeast5, indicating
selected three ensemble methods including AdaBoost with SVM that the proposed method has good stability. In order to eval-
(AdaSVM), AdaCost with SVM (AdaCSVM), and AdaBoostIC with uate the classification performance of different methods across
SVM (AdaBoostICSVM). AdaSVM served as a baseline ensemble multiple datasets, we calculate and analyze mean rankings of
method. AdaCSVM is selected since it is the most useful cost- performance measures for different methods on these datasets
sensitive boosting variants of AdaBoost. In addition, AdaBoost- instead of comparing directly obtained performance measures
ICSVM is the hybrid ensemble method which used cost-sensitive according to Ref. [70]. Fig. 8 shows the results of mean rank-
SVM based on instance categorization as base learner. The tech- ings for our proposed method and other compared ones on 24
niques were implemented using Matlab on a PC with 64-bit datasets in terms of G-Mean, F-Measure and AUC, respectively.
operation system, 4.00 GB RAM, and 3.40 GHz CPU. In order to For each dataset, the method with the best performance can
possibly preserve original between-class ratios, 5-fold stratified be assigned a mean ranking of 1 while the worst performing
cross validation was used and each experiment was repeated 3 method can be assigned a mean ranking of 12. From the results
times to report their averaged metrics values to avoid random- in Fig. 8, we can find that conventional SVM produces greater
ness influences on the results. For SMOTE, the number of nearest mean rankings in terms of G-Mean and F-Measure metrics than
minority neighbors (NN) to be found for each minority instance those of other techniques, indicating that the classification per-
to generate synthetic instances are selected among values (3, 5, 7, formance of traditional SVM severely deteriorates affected by
9). Similarly, the number of nearest (k) used to identify borderline the imbalanced data. This is due to the fact that in the trained
instances for BSMOTE was also selected among values (3, 5, 7, SVM model, the number of majority class support vectors is
9). For CSVDD and CSVM, the misclassification cost values for usually bigger than that of minority class support vectors due
majority class and minority class are set inversely proportional to the domination of majority class, the classification boundary
to the imbalance ratio. For AdaCost-SVM and AdaC2-SVM, the tends to skew toward the minority class or even over the re-
cost setting is the same as that for CSVM, and the only difference gion of the minority, thus producing low classification accuracy
is that cost values for AdaCost-SVM is normalized to lie within on the minority class. From the results indicated in Table 5,
[0,1]. For all ensemble versions, the maximum iteration number we find that as a representative cost-sensitive method, CSVM
is set to 15 so as to avoid over fit the minority class due to the cannot show good performance when dealing with imbalanced
effect of different cost values [56]. The majority class penalty datasets due to its application-dependence and the effect of im-
constant C1 and σ are selected by grid search method from the proper misclassification cost values. In addition, compared to
set {10−2 , 10−1 , 100 , 101 , 102 } and {2−3 , 2−2 , 2−1 , 20 , 21 , 22 , 23 }. RUSVM, all oversampling methods such as ROSVM, SMOTESVM,
14
X.M. Tao, W. Chen, X.K. Li et al. Knowledge-Based Systems 219 (2021) 106897
Table 4
The averaged training time (sec) and SVs by DSMSM-SVDD and C-SVDD.
Datasets Wine Iris Abalone Pima Ecoli
C-SVDD 0.028(50) 0.011(35) 0.023(65) 0.126(235) 0.021(71)
DSMSM-SVDD 0.031(50) 0.012(21) 0.023(37) 0.123(110) 0.022(41)
Datasets Libra Vehicle Balance Haberman Car
C-SVDD 1.23(156) 0.567(133) 0.103(51) 0.017(37) 0.813(512)
DSMSM-SVDD 1.25(117) 0.512(67) 0.101(25) 0.017(15) 0.803(131)
Datasets Liver Seed Spect Pageblock Yeast
C-SVDD 0.123(48) 0.015(32) 0.092(56) 11.133(787) 3.357(313)
DSMSM-SVDD 0.121(16) 0.015(11) 0.093(23) 11.216(315) 3.279(115)
Fig. 8. Mean ranking of all compared imbalanced classification techniques on all tested datasets.
BSMOTESVM, and WKSMOTESVM obtains better results in the respectively and are much smaller than a significance level of
mean rankings of all three performance metrics. This is primarily α = 0.05, which indicates that there exists sufficient evidence
because that the RU-SVM belongs to under-sampling technique to reject the hypothesis and thus all compared methods do not
and tends to risk loss of some informative majority instances perform similarly. Since the null hypothesis is rejected for all
thus deteriorating subsequent classification performance. Com- performance measures, a post-hoc test is applied to make pair-
pared to other methods, all ensemble variants show relatively wise comparisons of the proposed method and other imbalanced
good performance in dealing with imbalanced datasets due to classification methods. Holm’s test was used in this study where
the effect of boosting scheme. This result indicates that boosting the proposed method was treated as the control method. The
strategy can improve the generalization performance of the SVM Holm’s test is a non-parametric equivalent of multiple t-test that
classifier. However, AdaSVM is accuracy-oriented and its weight- adjusts α to compensate for multiple comparisons in a step-down
ing strategy may favor the majority class since it contributes procedure. The null hypothesis is that the proposed method does
more to the overall classification accuracy, which can result in not perform better than all other methods as the control algo-
no significant improvement of the classification performance in rithm. Table 6 shows all adjusted α values and the corresponding
some datasets. Although AdaCSVM and AdaBoostICSVM methods p-values for each compared results.
can overcome the shortcomings of AdaSVM by adopting cost- From the results, we can find that the null hypothesis is
oriented boosting strategy, they still suffer from the inconsistency rejected for all pairwise comparison at a significance level of
between accuracy-oriented SVM base learner and cost-oriented α = 0.05 except WKSMOTESVM regarding G-Mean, indicat-
boosting scheme. Additionally, inappropriate cost-sensitive val- ing that the proposed method outperforms the other compared
ues also hinder the further improvement of their generalization methods with significant difference. Although the p-value of the
performance [23]. Compared to other ensemble methods, the pairwise comparison for WKSMOTESVM is slightly higher than
proposed method significantly outperforms other methods with 0.05, which is 0.189, the proposed algorithm still outperforms
regard to the mean rankings of all evaluation metrics. This is it with weak predominance according to the above results. In
primarily because that used base classifier DSMSM-SVDD signifi- addition, as shown in Table 6, CSVDD obtains higher adjusted α
cantly outperforms SVM in dealing with imbalanced datasets and values and p-values in terms of G-Mean and F-Measure compared
its classification boundary is not affected by rare minority in- to those of SVM with the proposed method as a control method,
stances. In addition, the proposed method can obtain satisfactory which indicates that CSVDD can perform relatively better than
AUC mean ranking, which indicates that the robustness of the SVM on G-Mean and F-Measure evaluation metrics when dealing
proposed approach is significantly improved due to the effect of with imbalanced datasets, especially highly imbalanced ones. In
boosting scheme strategy. addition, the adjusted α values and p-values obtained by all over-
In order to test whether differences in terms of mean rankings sampling methods including ROSVM, SMOTESVM, BSMOTESVM
among different methods are merely a matter of chance, we and WKSMOTESVM are higher than those of RUSVM when the
perform Friedman test followed by Holm’s test to verify the proposed method is used as a control method. The results demon-
statistical significance of the proposed method compared to the strate that the oversampling techniques can generally show su-
other imbalanced classification methods with regard to the cal- perior performance in most cases compared to under sampling
culated mean rankings. The null hypothesis is that all compared methods since removal of majority instances may lead to the loss
methods perform similarly in mean rankings without significant of some important information from the datasets, especially in
difference. After Friedman test, the p-values for all three perfor- cases where the dataset is small. As a classical algorithm-level
mance measures are 1.0513e−26, 4.6187e−18, and 5.1331e−19, imbalanced classification technique, CSVM needs to determine
15
X.M. Tao, W. Chen, X.K. Li et al. Knowledge-Based Systems 219 (2021) 106897
Table 5
Results of different imbalanced classification methods on all used datasets using SVM.
Dataset Wine Iris Abalone
Measure G-M F-M AUC G-M F-M AUC G-M F-M AUC
SVM 0.813 ± 0.096 0.737 ± 0.119 0.884 ± 0.073 0.969 ± 0.035 0.960 ± 0.039 0.996 ± 0.006 0.921 ± 0.069 0.891 ± 0.080 0.985 ± 0.013
CSVDD 0.674 ± 0.076 0.558 ± 0.092 0.299 ± 0.061 0.746 ± 0.048 0.687 ± 0.051 0.224 ± 0.041 0.886 ± 0.039 0.741 ± 0.060 0.111 ± 0.038
DSMSM-SVDD 0.8512 ± 0.031 0.767 ± 0.035 0.905 ± 0.033 0.968 ± 0.025 0.961 ± 0.016 0.986 ± 0.033 0.928 ± 0.023 0.892 ± 0.056 0.986 ± 0.011
ROSVM 0.845 ± 0.063 0.746 ± 0.073 0.922 ± 0.056 0.966 ± 0.027 0.950 ± 0.037 0.997 ± 0.003 0.932 ± 0.041 0.863 ± 0.068 0.987 ± 0.012
SMOTESVM 0.826 ± 0.066 0.723 ± 0.080 0.902 ± 0.072 0.966 ± 0.022 0.949 ± 0.028 0.998 ± 0.002 0.942 ± 0.037 0.886 ± 0.065 0.987 ± 0.010
BSMOTESVM 0.843 ± 0.069 0.700 ± 0.110 0.905 ± 0.079 0.944 ± 0.034 0.949 ± 0.056 0.995 ± 0.005 0.932 ± 0.046 0.898 ± 0.073 0.980 ± 0.016
WKSMOTESVM 0.866 ± 0.076 0.778 ± 0.092 0.917 ± 0.061 0.976 ± 0.048 0.957 ± 0.051 0.997 ± 0.041 0.946 ± 0.039 0.871 ± 0.060 0.987 ± 0.038
RUSVM 0.805 ± 0.092 0.741 ± 0.091 0.858 ± 0.084 0.964 ± 0.040 0.913 ± 0.046 0.997 ± 0.005 0.935 ± 0.056 0.870 ± 0.046 0.980 ± 0.022
CSVM 0.847 ± 0.060 0.745 ± 0.080 0.903 ± 0.053 0.950 ± 0.036 0.923 ± 0.057 0.996 ± 0.005 0.931 ± 0.035 0.837 ± 0.073 0.983 ± 0.014
CSO-SDAENN 0.807 ± 0.025 0.725 ± 0.031 0.883 ± 0.015 0.955 ± 0.037 0.933 ± 0.028 0.992 ± 0.015 0.933 ± 0.027 0.841 ± 0.015 0.985 ± 0.028
AdaSVM 0.855 ± 0.075 0.755 ± 0.100 0.903 ± 0.070 0.968 ± 0.031 0.947 ± 0.052 0.998 ± 0.002 0.923 ± 0.041 0.852 ± 0.076 0.987 ± 0.010
AdaCSVM 0.849 ± 0.063 0.751 ± 0.090 0.913 ± 0.076 0.971 ± 0.027 0.958 ± 0.035 0.997 ± 0.003 0.935 ± 0.028 0.869 ± 0.051 0.987 ± 0.011
AdaBoostICSVM 0.863 ± 0.037 0.774 ± 0.054 0.906 ± 0.053 0.968 ± 0.029 0.956 ± 0.035 0.996 ± 0.005 0.925 ± 0.059 0.879 ± 0.085 0.980 ± 0.027
Our method 0.873 ± 0.032 0.789 ± 0.029 0.920 ± 0.033 0.975 ± 0.033 0.961 ± 0.021 0.998 ± 0.017 0.940 ± 0.023 0.895 ± 0.025 0.991 ± 0.017
Dataset Pima Ecoli Libra
Measure G-M F-M AUC G-M F-M AUC G-M F-M AUC
SVM 0.660 ± 0.042 0.575 ± 0.056 0.800 ± 0.024 0.902 ± 0.053 0.826 ± 0.061 0.964 ± 0.030 0.845 ± 0.103 0.813 ± 0.129 0.996 ± 0.005
CSVDD 0.697 ± 0.037 0.642 ± 0.048 0.301 ± 0.036 0.929 ± 0.026 0.847 ± 0.048 0.696 ± 0.026 0.679 ± 0.256 0.614 ± 0.270 0.240 ± 0.143
DSMSM-SVDD 0.751 ± 0.027 0.651 ± 0.015 0.805 ± 0.023 0.925 ± 0.012 0.851 ± 0.016 0.966 ± 0.017 0.918 ± 0.016 0.882 ± 0.021 0.991 ± 0.025
ROSVM 0.729 ± 0.040 0.651 ± 0.053 0.805 ± 0.025 0.930 ± 0.041 0.846 ± 0.071 0.960 ± 0.032 0.911 ± 0.104 0.869 ± 0.125 0.995 ± 0.006
SMOTESVM 0.729 ± 0.055 0.641 ± 0.069 0.807 ± 0.035 0.940 ± 0.039 0.859 ± 0.050 0.959 ± 0.042 0.839 ± 0.149 0.802 ± 0.188 0.995 ± 0.007
BSMOTESVM 0.735 ± 0.037 0.618 ± 0.065 0.808 ± 0.035 0.934 ± 0.041 0.863 ± 0.050 0.962 ± 0.038 0.916 ± 0.078 0.840 ± 0.155 0.994 ± 0.013
WKSMOTESVM 0.747 ± 0.037 0.650 ± 0.048 0.815 ± 0.036 0.945 ± 0.026 0.857 ± 0.048 0.966 ± 0.026 0.919 ± 0.256 0.894 ± 0.270 0.996 ± 0.143
RUSVM 0.701 ± 0.048 0.658 ± 0.052 0.815 ± 0.036 0.938 ± 0.026 0.838 ± 0.058 0.960 ± 0.031 0.877 ± 0.133 0.887 ± 0.087 0.993 ± 0.009
CSVM 0.726 ± 0.043 0.647 ± 0.066 0.808 ± 0.033 0.937 ± 0.030 0.862 ± 0.051 0.962 ± 0.030 0.885 ± 0.120 0.857 ± 0.150 0.994 ± 0.009
CSO-SDAENN 0.731 ± 0.031 0.627 ± 0.025 0.813 ± 0.022 0.935 ± 0.032 0.853 ± 0.021 0.963 ± 0.035 0.883 ± 0.019 0.851 ± 0.025 0.989 ± 0.022
AdaSVM 0.723 ± 0.035 0.644 ± 0.045 0.810 ± 0.031 0.939 ± 0.029 0.857 ± 0.061 0.961 ± 0.027 0.908 ± 0.115 0.871 ± 0.169 0.998 ± 0.003
AdaCSVM 0.739 ± 0.054 0.647 ± 0.065 0.811 ± 0.040 0.936 ± 0.043 0.856 ± 0.056 0.958 ± 0.042 0.908 ± 0.110 0.883 ± 0.134 0.997 ± 0.004
AdaBoostICSVM 0.733 ± 0.047 0.643 ± 0.068 0.813 ± 0.031 0.932 ± 0.043 0.851 ± 0.053 0.957 ± 0.039 0.925 ± 0.094 0.902 ± 0.117 0.997 ± 0.004
Our method 0.779 ± 0.015 0.657 ± 0.038 0.827 ± 0.021 0.949 ± 0.017 0.861 ± 0.031 0.975 ± 0.027 0.931 ± 0.013 0.915 ± 0.077 0.996 ± 0.015
Dataset Vehicle Balance Haberman
Measure G-M F-M AUC G-M F-M AUC G-M F-M AUC
SVM 0.968 ± 0.016 0.950 ± 0.026 0.998 ± 0.001 0.000 ± 0.000 0.000 ± 0.000 0.928 ± 0.030 0.460 ± 0.116 0.299 ± 0.125 0.657 ± 0.086
CSVDD 0.773 ± 0.041 0.604 ± 0.048 0.207 ± 0.035 0.000 ± 0.000 0.000 ± 0.000 0.500 ± 0.000 0.656 ± 0.053 0.504 ± 0.064 0.336 ± 0.048
DSMSM-SVDD 0.971 ± 0.032 0.945 ± 0.023 0.985 ± 0.011 0.728 ± 0.025 0.361 ± 0.016 0.852 ± 0.015 0.660 ± 0.031 0.508 ± 0.022 0.686 ± 0.018
ROSVM 0.978 ± 0.011 0.951 ± 0.019 0.998 ± 0.001 0.710 ± 0.064 0.296 ± 0.056 0.842 ± 0.052 0.630 ± 0.102 0.482 ± 0.120 0.716 ± 0.094
SMOTESVM 0.984 ± 0.008 0.963 ± 0.015 0.998 ± 0.001 0.733 ± 0.154 0.352 ± 0.127 0.849 ± 0.068 0.637 ± 0.060 0.494 ± 0.075 0.720 ± 0.057
BSMOTESVM 0.973 ± 0.013 0.948 ± 0.027 0.997 ± 0.001 0.690 ± 0.129 0.308 ± 0.077 0.834 ± 0.059 0.644 ± 0.054 0.454 ± 0.089 0.720 ± 0.053
WKSMOTESVM 0.985 ± 0.041 0.955 ± 0.048 0.998 ± 0.035 0.747 ± 0.042 0.379 ± 0.031 0.851 ± 0.035 0.656 ± 0.053 0.504 ± 0.064 0.726 ± 0.048
RUSVM 0.973 ± 0.013 0.939 ± 0.027 0.997 ± 0.002 0.727 ± 0.086 0.283 ± 0.104 0.864 ± 0.058 0.593 ± 0.074 0.500 ± 0.082 0.707 ± 0.073
CSVM 0.980 ± 0.010 0.955 ± 0.023 0.997 ± 0.001 0.717 ± 0.059 0.309 ± 0.057 0.831 ± 0.051 0.635 ± 0.073 0.490 ± 0.087 0.707 ± 0.066
CSO-SDAENN 0.981 ± 0.036 0.947 ± 0.025 0.993 ± 0.023 0.733 ± 0.028 0.333 ± 0.031 0.863 ± 0.015 0.641 ± 0.019 0.501 ± 0.023 0.719 ± 0.026
AdaSVM 0.980 ± 0.009 0.956 ± 0.019 0.997 ± 0.002 0.721 ± 0.073 0.304 ± 0.069 0.830 ± 0.063 0.645 ± 0.057 0.502 ± 0.068 0.726 ± 0.062
AdaCSVM 0.978 ± 0.015 0.952 ± 0.033 0.997 ± 0.001 0.737 ± 0.100 0.379 ± 0.083 0.844 ± 0.057 0.637 ± 0.103 0.497 ± 0.119 0.727 ± 0.066
AdaBoostICSVM 0.973 ± 0.012 0.958 ± 0.021 0.997 ± 0.003 0.737 ± 0.082 0.370 ± 0.061 0.846 ± 0.064 0.625 ± 0.111 0.489 ± 0.123 0.722 ± 0.067
Our method 0.988 ± 0.017 0.958 ± 0.025 0.998 ± 0.007 0.765 ± 0.035 0.381 ± 0.051 0.865 ± 0.055 0.667 ± 0.035 0.519 ± 0.055 0.727 ± 0.062
Dataset Car Liver Seed
Measure G-M F-M AUC G-M F-M AUC G-M F-M AUC
SVM 0.977 ± 0.039 0.945 ± 0.045 0.999 ± 0.000 0.666 ± 0.068 0.599 ± 0.070 0.749 ± 0.062 0.902 ± 0.038 0.870 ± 0.050 0.979 ± 0.012
CSVDD 0.934 ± 0.014 0.940 ± 0.067 0.063 ± 0.015 0.557 ± 0.080 0.595 ± 0.078 0.407 ± 0.065 0.894 ± 0.036 0.840 ± 0.068 0.104 ± 0.035
DSMSM-SVDD 0.979 ± 0.025 0.956 ± 0.033 0.999 ± 0.012 0.657 ± 0.017 0.641 ± 0.019 0.737 ± 0.021 0.910 ± 0.027 0.889 ± 0.013 0.977 ± 0.028
ROSVM 0.992 ± 0.013 0.939 ± 0.024 0.999 ± 0.000 0.687 ± 0.041 0.646 ± 0.049 0.749 ± 0.025 0.923 ± 0.051 0.889 ± 0.057 0.977 ± 0.017
SMOTESVM 0.977 ± 0.047 0.973 ± 0.051 0.999 ± 0.001 0.686 ± 0.050 0.647 ± 0.066 0.755 ± 0.052 0.936 ± 0.042 0.905 ± 0.056 0.980 ± 0.018
BSMOTESVM 0.990 ± 0.016 0.955 ± 0.055 0.999 ± 0.001 0.693 ± 0.045 0.636 ± 0.080 0.754 ± 0.053 0.918 ± 0.033 0.882 ± 0.103 0.977 ± 0.011
WKSMOTESVM 0.994 ± 0.014 0.973 ± 0.067 0.999 ± 0.015 0.695 ± 0.080 0.659 ± 0.078 0.755 ± 0.065 0.934 ± 0.036 0.903 ± 0.068 0.974 ± 0.035
RUSVM 0.990 ± 0.017 0.938 ± 0.065 0.999 ± 0.000 0.684 ± 0.065 0.654 ± 0.047 0.745 ± 0.068 0.913 ± 0.068 0.868 ± 0.063 0.972 ± 0.026
CSVM 0.991 ± 0.015 0.933 ± 0.051 0.999 ± 0.000 0.685 ± 0.047 0.649 ± 0.051 0.752 ± 0.059 0.915 ± 0.070 0.877 ± 0.091 0.978 ± 0.019
CSO-SDAENN 0.991 ± 0.023 0.941 ± 0.015 0.999 ± 0.000 0.692 ± 0.027 0.643 ± 0.011 0.753 ± 0.035 0.917 ± 0.018 0.880 ± 0.026 0.977 ± 0.033
AdaSVM 0.991 ± 0.015 0.942 ± 0.047 0.999 ± 0.000 0.685 ± 0.043 0.647 ± 0.053 0.740 ± 0.052 0.922 ± 0.053 0.881 ± 0.067 0.978 ± 0.021
AdaCSVM 0.988 ± 0.020 0.932 ± 0.055 0.999 ± 0.000 0.684 ± 0.036 0.643 ± 0.050 0.755 ± 0.044 0.913 ± 0.045 0.874 ± 0.055 0.976 ± 0.020
AdaBoostICSVM 0.983 ± 0.032 0.938 ± 0.049 0.999 ± 0.000 0.684 ± 0.041 0.646 ± 0.048 0.742 ± 0.047 0.928 ± 0.031 0.898 ± 0.037 0.978 ± 0.009
Our method 0.995 ± 0.015 0.977 ± 0.023 0.999 ± 0.000 0.697 ± 0.047 0.667 ± 0.025 0.753 ± 0.033 0.933 ± 0.035 0.915 ± 0.031 0.985 ± 0.023
Dataset Spect Pageblock Pageblock1
Measure G-M F-M AUC G-M F-M AUC G-M F-M AUC
SVM 0.626 ± 0.110 0.500 ± 0.115 0.633 ± 0.100 0.227 ± 0.038 0.099 ± 0.031 0.914 ± 0.020 0.049 ± 0.062 0.011 ± 0.015 0.977 ± 0.009
CSVDD 0.524 ± 0.209 0.427 ± 0.218 0.352 ± 0.096 0.250 ± 0.011 0.113 ± 0.037 0.915 ± 0.016 0.507 ± 0.012 0.621 ± 0.014 0.428 ± 0.013
DSMSM-SVDD 0.721 ± 0.018 0.586 ± 0.023 0.801 ± 0.017 0.857 ± 0.017 0.627 ± 0.021 0.907 ± 0.035 0.915 ± 0.023 0.609 ± 0.016 0.951 ± 0.015
ROSVM 0.704 ± 0.062 0.532 ± 0.097 0.814 ± 0.040 0.851 ± 0.023 0.600 ± 0.034 0.912 ± 0.018 0.882 ± 0.030 0.606 ± 0.035 0.945 ± 0.018
SMOTESVM 0.744 ± 0.070 0.587 ± 0.114 0.812 ± 0.085 0.850 ± 0.020 0.604 ± 0.034 0.913 ± 0.016 0.881 ± 0.030 0.603 ± 0.043 0.946 ± 0.018
BSMOTESVM 0.724 ± 0.081 0.545 ± 0.090 0.820 ± 0.043 0.861 ± 0.014 0.632 ± 0.039 0.923 ± 0.013 0.916 ± 0.016 0.630 ± 0.040 0.972 ± 0.009
WKSMOTESVM 0.752 ± 0.209 0.607 ± 0.218 0.812 ± 0.096 0.857 ± 0.011 0.637 ± 0.037 0.915 ± 0.016 0.917 ± 0.012 0.621 ± 0.014 0.948 ± 0.013
RUSVM 0.702 ± 0.076 0.550 ± 0.091 0.793 ± 0.062 0.814 ± 0.034 0.569 ± 0.030 0.902 ± 0.024 0.865 ± 0.031 0.596 ± 0.036 0.941 ± 0.015
CSVM 0.706 ± 0.085 0.540 ± 0.103 0.794 ± 0.051 0.849 ± 0.018 0.597 ± 0.031 0.911 ± 0.015 0.882 ± 0.022 0.605 ± 0.036 0.947 ± 0.019
CSO-SDAENN 0.711 ± 0.021 0.548 ± 0.017 0.806 ± 0.026 0.851 ± 0.023 0.603 ± 0.018 0.913 ± 0.025 0.897 ± 0.026 0.612 ± 0.017 0.946 ± 0.013
AdaSVM 0.727 ± 0.059 0.567 ± 0.096 0.816 ± 0.053 0.850 ± 0.020 0.601 ± 0.031 0.911 ± 0.017 0.881 ± 0.018 0.605 ± 0.032 0.945 ± 0.018
AdaCSVM 0.743 ± 0.086 0.584 ± 0.132 0.822 ± 0.065 0.847 ± 0.044 0.612 ± 0.0575 0.897 ± 0.024 0.908 ± 0.029 0.613 ± 0.044 0.953 ± 0.019
AdaBoostICSVM 0.729 ± 0.111 0.556 ± 0.126 0.836 ± 0.077 0.856 ± 0.022 0.621 ± 0.013 0.903 ± 0.033 0.909 ± 0.041 0.623 ± 0.027 0.963 ± 0.051
Our method 0.771 ± 0.053 0.625 ± 0.037 0.827 ± 0.021 0.860 ± 0.021 0.639 ± 0.027 0.921 ± 0.019 0.925 ± 0.035 0.627 ± 0.051 0.976 ± 0.026
Dataset Yeast Yeast1 Yeast2
Measure G-M F-M AUC G-M F-M AUC G-M F-M AUC
SVM 0.611 ± 0.041 0.510 ± 0.051 0.791 ± 0.033 0.663 ± 0.040 0.564 ± 0.048 0.858 ± 0.034 0.853 ± 0.049 0.776 ± 0.060 0.976 ± 0.009
CSVDD 0.673 ± 0.033 0.545 ± 0.041 0.325 ± 0.033 0.765 ± 0.032 0.569 ± 0.061 0.232 ± 0.031 0.893 ± 0.031 0.778 ± 0.040 0.106 ± 0.031
DSMSM-SVDD 0.712 ± 0.022 0.596 ± 0.019 0.805 ± 0.021 0.772 ± 0.023 0.601 ± 0.011 0.860 ± 0.021 0.925 ± 0.016 0.779 ± 0.025 0.971 ± 0.031
ROSVM 0.712 ± 0.024 0.589 ± 0.040 0.794 ± 0.022 0.778 ± 0.033 0.575 ± 0.062 0.858 ± 0.031 0.914 ± 0.037 0.746 ± 0.044 0.975 ± 0.007
SMOTESVM 0.716 ± 0.032 0.593 ± 0.041 0.798 ± 0.025 0.777 ± 0.045 0.575 ± 0.064 0.858 ± 0.031 0.923 ± 0.019 0.744 ± 0.055 0.976 ± 0.007
BSMOTESVM 0.710 ± 0.028 0.591 ± 0.063 0.796 ± 0.019 0.783 ± 0.037 0.634 ± 0.034 0.860 ± 0.033 0.926 ± 0.023 0.795 ± 0.048 0.973 ± 0.009
WKSMOTESVM 0.721 ± 0.033 0.595 ± 0.041 0.799 ± 0.033 0.785 ± 0.032 0.596 ± 0.061 0.858 ± 0.031 0.926 ± 0.031 0.788 ± 0.040 0.976 ± 0.033
RUSVM 0.708 ± 0.046 0.589 ± 0.032 0.796 ± 0.037 0.770 ± 0.029 0.557 ± 0.050 0.854 ± 0.030 0.890 ± 0.045 0.735 ± 0.052 0.973 ± 0.012
CSVM 0.706 ± 0.024 0.583 ± 0.040 0.797 ± 0.018 0.780 ± 0.038 0.576 ± 0.037 0.854 ± 0.030 0.914 ± 0.037 0.749 ± 0.055 0.974 ± 0.008
CSO-SDAENN 0.710 ± 0.013 0.588 ± 0.025 0.801 ± 0.012 0.781 ± 0.018 0.583 ± 0.029 0.855 ± 0.020 0.917 ± 0.017 0.752 ± 0.021 0.973 ± 0.023
AdaSVM 0.709 ± 0.025 0.586 ± 0.047 0.795 ± 0.021 0.782 ± 0.044 0.577 ± 0.050 0.860 ± 0.046 0.919 ± 0.032 0.765 ± 0.047 0.975 ± 0.012
AdaCSVM 0.714 ± 0.026 0.591 ± 0.034 0.795 ± 0.027 0.786 ± 0.036 0.597 ± 0.051 0.858 ± 0.029 0.918 ± 0.029 0.752 ± 0.048 0.975 ± 0.008
AdaBoostICSVM 0.717 ± 0.051 0.594 ± 0.051 0.799 ± 0.022 0.783 ± 0.029 0.588 ± 0.046 0.859 ± 0.043 0.919 ± 0.033 0.753 ± 0.056 0.974 ± 0.011
Our method 0.727 ± 0.025 0.619 ± 0.033 0.815 ± 0.027 0.789 ± 0.041 0.613 ± 0.063 0.867 ± 0.032 0.933 ± 0.017 0.781 ± 0.076 0.975 ± 0.015
Dataset Yeast3 Yeast4 Yeast5
Measure G-M F-M AUC G-M F-M AUC G-M F-M AUC
SVM 0.832 ± 0.092 0.766 ± 0.114 0.982 ± 0.010 0.923 ± 0.066 0.747 ± 0.073 0.957 ± 0.043 0.711 ± 0.235 0.663 ± 0.240 0.785 ± 0.157
(continued on next page)
16
X.M. Tao, W. Chen, X.K. Li et al. Knowledge-Based Systems 219 (2021) 106897
Table 5 (continued).
CSVDD 0.868 ± 0.056 0.769 ± 0.104 0.129 ± 0.054 0.925 ± 0.051 0.841 ± 0.098 0.072 ± 0.049 0.721 ± 0.100 0.665 ± 0.144 0.221 ± 0.127
DSMSM-SVDD 0.882 ± 0.022 0.773 ± 0.018 0.971 ± 0.029 0.920 ± 0.037 0.851 ± 0.021 0.961 ± 0.015 0.725 ± 0.017 0.661 ± 0.025 0.819 ± 0.013
ROSVM 0.888 ± 0.057 0.706 ± 0.103 0.981 ± 0.010 0.910 ± 0.059 0.737 ± 0.118 0.969 ± 0.031 0.725 ± 0.226 0.460 ± 0.200 0.817 ± 0.152
SMOTESVM 0.898 ± 0.043 0.708 ± 0.097 0.981 ± 0.016 0.914 ± 0.067 0.708 ± 0.120 0.974 ± 0.026 0.710 ± 0.233 0.565 ± 0.231 0.837 ± 0.085
BSMOTESVM 0.903 ± 0.055 0.789 ± 0.116 0.974 ± 0.027 0.883 ± 0.048 0.832 ± 0.130 0.963 ± 0.015 0.663 ± 0.116 0.639 ± 0.290 0.774 ± 0.152
WKSMOTESVM 0.898 ± 0.056 0.769 ± 0.104 0.979 ± 0.054 0.925 ± 0.051 0.841 ± 0.098 0.972 ± 0.049 0.721 ± 0.100 0.665 ± 0.144 0.821 ± 0.127
RUSVM 0.873 ± 0.077 0.684 ± 0.109 0.958 ± 0.040 0.910 ± 0.081 0.484 ± 0.127 0.965 ± 0.036 0.679 ± 0.295 0.188 ± 0.137 0.768 ± 0.185
CSVM 0.888 ± 0.061 0.716 ± 0.072 0.979 ± 0.015 0.886 ± 0.114 0.728 ± 0.141 0.976 ± 0.022 0.658 ± 0.293 0.479 ± 0.237 0.798 ± 0.155
CSO-SDAENN 0.890 ± 0.021 0.718 ± 0.015 0.971 ± 0.013 0.891 ± 0.018 0.733 ± 0.029 0.875 ± 0.025 0.717 ± 0.015 0.496 ± 0.021 0.797 ± 0.023
AdaSVM 0.897 ± 0.081 0.725 ± 0.084 0.979 ± 0.021 0.918 ± 0.050 0.737 ± 0.087 0.974 ± 0.024 0.718 ± 0.320 0.492 ± 0.299 0.811 ± 0.194
AdaCSVM 0.894 ± 0.043 0.709 ± 0.105 0.982 ± 0.011 0.913 ± 0.050 0.786 ± 0.071 0.973 ± 0.026 0.712 ± 0.223 0.531 ± 0.194 0.804 ± 0.176
AdaBoostICSVM 0.886 ± 0.054 0.717 ± 0.064 0.977 ± 0.013 0.912 ± 0.084 0.757 ± 0.106 0.971 ± 0.036 0.720 ± 0.139 0.542 ± 0.122 0.816 ± 0.212
Our method 0.896 ± 0.013 0.797 ± 0.035 0.982 ± 0.031 0.927 ± 0.076 0.879 ± 0.051 0.977 ± 0.031 0.742 ± 0.190 0.667 ± 0.113 0.830 ± 0.168
Dataset Yeast6 Yeast7 Yeast8
Measure G-M F-M AUC G-M F-M AUC G-M F-M AUC
SVM 0.112 ± 0.170 0.068 ± 0.110 0.805 ± 0.069 0.766 ± 0.122 0.617 ± 0.035 0.986 ± 0.011 0.653 ± 0.131 0.493 ± 0.164 0.842 ± 0.107
CSVDD 0.822 ± 0.043 0.261 ± 0.061 0.175 ± 0.042 0.956 ± 0.009 0.631 ± 0.069 0.042 ± 0.009 0.784 ± 0.105 0.497 ± 0.118 0.191 ± 0.082
DSMSM-SVDD 0.823 ± 0.013 0.303 ± 0.011 0.891 ± 0.029 0.957 ± 0.037 0.621 ± 0.021 0.981 ± 0.015 0.855 ± 0.017 0.501 ± 0.021 0.947 ± 0.023
ROSVM 0.818 ± 0.092 0.291 ± 0.064 0.893 ± 0.066 0.944 ± 0.046 0.533 ± 0.107 0.985 ± 0.007 0.853 ± 0.081 0.324 ± 0.084 0.948 ± 0.038
SMOTESVM 0.809 ± 0.068 0.271 ± 0.058 0.888 ± 0.059 0.965 ± 0.020 0.543 ± 0.101 0.986 ± 0.006 0.855 ± 0.082 0.285 ± 0.074 0.938 ± 0.034
BSMOTESVM 0.811 ± 0.069 0.304 ± 0.067 0.891 ± 0.065 0.933 ± 0.049 0.587 ± 0.116 0.985 ± 0.006 0.841 ± 0.101 0.516 ± 0.192 0.938 ± 0.033
WKSMOTESVM 0.825 ± 0.043 0.309 ± 0.061 0.895 ± 0.042 0.956 ± 0.009 0.631 ± 0.069 0.985 ± 0.009 0.860 ± 0.105 0.507 ± 0.118 0.951 ± 0.082
RUSVM 0.768 ± 0.092 0.299 ± 0.070 0.872 ± 0.072 0.896 ± 0.072 0.520 ± 0.106 0.983 ± 0.008 0.802 ± 0.231 0.302 ± 0.131 0.935 ± 0.069
CSVM 0.819 ± 0.106 0.277 ± 0.082 0.877 ± 0.077 0.949 ± 0.044 0.539 ± 0.131 0.987 ± 0.007 0.858 ± 0.065 0.303 ± 0.062 0.943 ± 0.030
CSO-SDAENN 0.820 ± 0.017 0.289 ± 0.025 0.897 ± 0.013 0.951 ± 0.013 0.543 ± 0.029 0.983 ± 0.017 0.857 ± 0.025 0.416 ± 0.027 0.937 ± 0.013
AdaSVM 0.822 ± 0.079 0.291 ± 0.070 0.903 ± 0.051 0.950 ± 0.037 0.539 ± 0.071 0.986 ± 0.006 0.848 ± 0.087 0.314 ± 0.106 0.942 ± 0.035
AdaCSVM 0.829 ± 0.066 0.296 ± 0.055 0.893 ± 0.050 0.950 ± 0.035 0.541 ± 0.081 0.986 ± 0.006 0.854 ± 0.071 0.325 ± 0.067 0.937 ± 0.043
AdaBoostICSVM 0.827 ± 0.088 0.296 ± 0.084 0.891 ± 0.070 0.954 ± 0.047 0.538 ± 0.081 0.985 ± 0.007 0.859 ± 0.072 0.318 ± 0.112 0.940 ± 0.048
Our method 0.837 ± 0.013 0.335 ± 0.023 0.899 ± 0.053 0.959 ± 0.105 0.636 ± 0.163 0.987 ± 0.006 0.863 ± 0.032 0.513 ± 0.027 0.955 ± 0.015
The authors declare that they have no known competing finan- From Eqs. (A.5) and (A.8), we can obtain:
cial interests or personal relationships that could have appeared 0 ≤ α i ≤ C 1 ρi , ∀i , y i = 1 (A.13)
to influence the work reported in this paper.
From Eqs. (A.6) and (A.8), we can obtain:
Acknowledgments 0 ≤ αj ≤ C2 ρj , ∀j, yj = −1 (A.14)
From Eqs. (A.4) and (A.11), we can obtain:
This work was supported in part by the Fundamental Re-
search Funds for the Central Universities, China [grant num- N +N
∑
bers 2572017EB02], Innovative talent fund of Harbin science and a= αi yi ϕ (xi ) (A.15)
technology Bureau, China [grant number 2017RAXXJ018], Double i=1
first-class scientific research foundation of Northeast Forestry Substituting Eqs. (A.5), (A.6), (A.11), (A.12), (A.15) into Eq. (A.1),
University, China [grant number 411112438]. The authors are the corresponding dual problem is expressed by:
grateful to the anonymous reviewers for their valuable comments
L a, R, d, ξi , ξj , α, β
( )
and suggestions which were very helpful in improving the quality
and presentation of this paper.
2
N +N N +N N +N
∑ ∑ ∑
αi yi ∥ϕ (xi ) − a∥2 = αi yi ϕ (xi ) − αi yi ϕ (xi )
=
Appendix i=1 i=1 i=1
⎛
N +N N +N
By introducing the Lagrange multipliers, the primal problem of ∑ ∑
finding an optimal domain description can be formulated into: = αi yi ⎝ϕ (xi ) · ϕ (xi ) − 2ϕ (xi ) · αi yi ϕ (xi )
∑ ∑ i=1 i=1
L a, R, d, ξi , ξj , α, β = R2 − Md2 + C1 ρi ξi + C2 ρj ξj
( )
N +N
i,yi =1 j,yj =−1
∑
+ αi αj yi yj ϕ (xi )
N +N N +N
∑ ∑ i,j=1
αi d2 − ξi − yi R2 − ⟨ϕ (xi ) − a, ϕ (xi ) − a⟩ βi ξ i
[ ( )]
+ − ⎞
i=1 i=1
( )
·ϕ xj ⎠
(A.1)
18
X.M. Tao, W. Chen, X.K. Li et al. Knowledge-Based Systems 219 (2021) 106897
⎡ ⎛ ⎞
N +N N +N Nn N +N N +N
1
⎣ 1
∑ ∑ ∑
⎝k (xl , xl ) − 2 αi yi k (xl , xi ) + αi αj yi yj k xi , xj ⎠
∑ ∑ ( )
αi yi ϕ (xi ) · ϕ (xi ) − 2 αi αj yi yj ϕ (xi ) · ϕ xj
( )
= =
2 Nn
l=1 i=1 i,j=1
i=1 i,j=1 ⎛ ⎞⎤
Na N +N N +N
N +N 1 ∑ ∑ ∑
⎝k ( x m , x m ) − 2 αi yi k (xm , xi ) + αi αj yi yj k xi , xj ⎠⎦
( )
∑ +
αi αj yi yj ϕ (xi ) · ϕ xj
( )
+ Na
m=1 i=1 i,j=1
i,j=1
(A.24)
N +N N +N
Let γi = αi yi , ∀i = 1, 2, . . . , N + N. We obtain:
∑ ∑ ∗
αi yi ϕ (xi ) · ϕ (xi ) − αi αj yi yj ϕ (xi ) · ϕ xj
( )
= (A.16)
i=1 i,j=1 N +N
∑
According to kernel trick, the inner product ϕ (xi ) · ϕ (xi ) be- a= γi∗ ϕ (xi ) (A.25)
tween(two input feature vectors can be replaced by a kernel func- i=1
(A.26)
Through solving the dual quadratic programming problem, we ( )
can obtain the Lagrange multiplier vectors α. And thus the center
2
where k xi , xj = exp − xi − xj /2σ 2 ; k (xl , xl ) = exp
( )
a and the radius R of the minimum enclosing hypersphere S can
− ∥xl − xl ∥2 /2σ 2 = 1; Nn represents the number of majority
( )
be expressed as follows: class support vectors; Na represents the number of minority class
N +N support vectors; xl denotes the majority class support vector; xm
denotes the minority class support vector.
∑
a= αi yi ϕ (xi ) (A.23)
According to the complementary slackness condition of KKT,
i=1
A majority (minority) class training instance xi (xj ) and its corre-
[ ] sponding αi (αj ) satisfies the following conditions:
Nn Na
1 1 ∑( 1 ∑(
R2 = ∥ϕ (xl ) − a∥2 + ∥ϕ (xm ) − a∥2 (1) If αi = 0.∀i, yi = 1. According to (C1 ) ρi − αi − βi =
) )
2 Nn Na
⎡ ⎛
l=1 m=1 0, −βi ξi = 0, we obtain βi = (C1 ) ρi > 0, ξi = 0. According
2 ⎞
to αi d[2 − ξi − yi R(2 − ⟨ϕ (xi ) − a, ϕ (xi ) − a⟩ )]= 0, we
(⏐⏐ [ ( )]
Nn N +N Na
1 ⎢ 1 ∑ ⎜ 1 ∑ ⏐⏐
∑ ⏐⏐
⎝ ϕ ( x l ) − αi yi ϕ (xi ) ⎠ + ⏐⏐ϕ (xm )
⎟
= ⎣ obtain d2 − ξi − yi R2 − ⟨ϕ (xi ) − a, ϕ (xi ) − a⟩ < 0, ∥
2 Nn Na
ϕ (xi ) − a ∥2 < R2 − d2 . That is, xi is classified correctly.
⏐⏐
l=1 i=1 m=1
0, we obtain d2 − ξj − yj R2 − ϕ xj − a, ϕ xj − a
[ (
⟨ ( ) ( ) ⟩)]
= [15] R.J. Yang, L. Yu, Y.J. Zhao, et al., Big data analytics for financial market
0, ∥ ϕ (xi ) − a ∥= R2 + d2 . That is, xj is correctly classified, volatility forecast based on support vector machine, Int. J. Inf. Manage. 50
(2020) 452–462, http://dx.doi.org/10.1016/j.ijinfomgt.2019.05.027.
and it is also the minority class support vectors.
[16] Y.J. Li, T. Zhang, Deep neural mapping support vector machines, Neural
(3) If αi = (C1 ) ρi , ∀i, yi = 1 According to (C1 ) ρi − αi − βi = Netw. 93 (2017) 185–194, http://dx.doi.org/10.1016/j.neunet.2017.05.010.
0, −βi ξi = 0, we obtain βi = 0, ξi > 0. That is, xi belongs to [17] K.L. Han, S.B. Kim, An overlap-sensitive margin classifier for imbalanced
the majority class yet it is misjudged as the minority class. and overlapping data, Expert Syst. Appl. 98 (2018) 72–83, http://dx.doi.
If αj = (C2 ) ρj , ∀j, yj = −1 According to (C2 ) ρj − αj − βj = org/10.1016/j.eswa.2018.01.008.
[18] P. Zhou, X. Hu, P. Li, et al., Online feature selection for high-dimensional
0, −βj ξj = 0, we obtain βj = 0, ξj > 0. That is, xj belongs
class-imbalanced data, Knowl.-Based Syst. 136 (2017) 187–199, http://dx.
to the minority class yet it is misclassified as the majority doi.org/10.1016/j.knosys.2017.09.006.
class. [19] A. Neocleous, K. Nicolaides, C. Schizas, Intelligent noninvasive diagnosis
of aneuploidy: raw values and highly imbalanced dataset, IEEE J. Biomed.
To determine whether an unseen test instance xnew is within Health Inf. 21 (5) (2017) 1271–1279, http://dx.doi.org/10.1109/JBHI.2016.
the hypersphere, the distance from xnew to the center of the 2608859.
hypersphere S should be calculated as follow: [20] A. Daraei, H. Hamidi, An efficient predictive model for myocardial infarc-
2 tion using cost-sensitive J48 model, Iran. J. Public Health 46 (5) (2017)
N +N N +N 682–692.
[21] X.R. Chao, Y. Peng, A cost-sensitive multi-criteria quadratic programming
∑ ∑
ϕ ( ) γ ϕ ( ) γi∗ γj∗ k xi , xj
dnew 2 ∗
( )
=
xnew − i xi
= 1 + model for imbalanced data, J. Oper. Res. Soc. 69 (4) (2018) 500–516,
i=1 i,j=1 http://dx.doi.org/10.1057/s41274-017-0233-4.
N +N
[22] Y.Y. Zhu, J.W. Liang, J.Y. Chen, et al., An improved NSGA-III algorithm
∑ for feature selection used in intrusion detection, Knowl.-Based Syst. 116
−2 γi∗ k (xnew , xi ) (A.27) (2017) 74–85, http://dx.doi.org/10.1016/j.knosys.2016.10.030.
i=1 [23] X.M. Tao, Q. Li, W.J. Guo, et al., Self-adaptive cost weights-based support
vector machine cost-sensitive ensemble for imbalanced data classification,
An unseen test sample xnew is accepted as majority class when Inform. Sci. 487 (2019) 31–56, http://dx.doi.org/10.1016/j.ins.2019.02.062.
the distance to the center of the hypersphere is smaller than the [24] J. Huang, X.F. Yan, Related and independent variable fault detection based
radius R, that is dnew 2 ≤ R2 , otherwise as minority class. on KPCA and SVDD, J. Process Control 39 (2016) 88–99, http://dx.doi.org/
10.1016/j.jprocont.2016.01.001.
[25] S. Ye, D.M. Chen, J. Yu, A targeted change-detection procedure by com-
References
bining change vector analysis and post-classification approach, Isprs J.
Photogramm. Remote Sens. 114 (2016) 115–124.
[1] B. Gu, X. Sun, V.S. Sheng, Structural minimax probability machine, IEEE
[26] M. Cha, J.S. Kim, J.G. Baek, Density weighted support vector data descrip-
Trans. Neural Netw. Learn. Syst. 28 (7) (2016) 1646–1656, http://dx.doi.
tion, Expert Syst. Appl. 41 (7) (2014) 3343–3350, http://dx.doi.org/10.1016/
org/10.1109/TNNLS.2016.2544779.
j.eswa.2013.11.025.
[2] L. Zhang, D. Zhang, Evolutionary cost-sensitive extreme learning machine,
[27] X.M. Tao, Q. Li, C. Ren, et al., Affinity and class probability-based fuzzy
IEEE Trans. Neural Netw. Learn. Syst. 28 (12) (2017) 3045–3060, http:
support vector machine for imbalanced data sets, Neural Netw. 122 (2020)
//dx.doi.org/10.1109/TNNLS.2016.2607757.
289–307, http://dx.doi.org/10.1016/j.neunet.2019.10.016.
[3] M. Shafiq, Z. Tian, A.K. Bashir, Data mining and machine learning methods
[28] X.M. Tao, Q. Li, W.J. Guo, et al., Adaptive weighted over-sampling for imbal-
for sustainable smart cities traffic classification: a survey, Sustainable Cities
anced datasets based on density peaks clustering with heuristic filtering,
Soc. 60 (2020) http://dx.doi.org/10.1016/j.scs.2020.102177.
Inform. Sci. 519 (2020) 43–73, http://dx.doi.org/10.1016/j.ins.2020.01.032.
[4] M. Ruiz, L.E. Mujica, S. Alférez, et al., Wind turbine fault detection and clas-
[29] C. Jimenez-Castaño, A. Alvarez-Meza, A. Orozco-Gutierrez, Enhanced au-
sification by means of image texture analysis, Mech. Syst. Signal Process.
tomatic twin support vector machine for imbalanced data classification,
107 (2018) 149–167, http://dx.doi.org/10.1016/j.ymssp.2017.12.035.
Pattern Recognit. 107 (2020) 107442, http://dx.doi.org/10.1016/j.patcog.
[5] Q. Zhang, L.T. Yang, Z. Chen, et al., A survey on deep learning for big data,
2020.107442.
Inf. Fusion 42 (2018) 146–157, http://dx.doi.org/10.1016/j.inffus.2017.10.
[30] A. Roy, R.M.O. Cruz, R. Sabourin, et al., A study on combining dynamic
006.
selection and data preprocessing for imbalance learning, Neurocomputing
[6] S.K. Ghosh, A. Ghosh, Classification of gene expression patterns using a
286 (2018) 179–192, http://dx.doi.org/10.1016/j.neucom.2018.01.060.
novel type-2 fuzzy multigranulation-based SVM model for the recognition
of cancer mediating biomarkers, Neural Comput. Appl. (2) (2020) http: [31] Q. Kang, X.S. Chen, S.S. Li, et al., A noise-filtered under-sampling scheme for
//dx.doi.org/10.1007/s00521-020-05241-7. imbalanced classification, IEEE Trans. Cybern. 47 (12) (2017) 4263–4274,
http://dx.doi.org/10.1109/TCYB.2016.2606104.
[7] M. Elkano, M. Galar, J. Sanz, et al., CHI-PG: a fast prototype generation
algorithm for big data classification problems, Neurocomputing 287 (2018) [32] A. Amin, S. Anwar, A. Adnan, et al., Comparing oversampling techniques
22–33, http://dx.doi.org/10.1016/j.neucom.2018.01.056. to handle the class imbalance problem: a customer churn prediction case
[8] J. Gola, D. Britz, T. Staudt, et al., Advanced microstructure classification study, IEEE Access 4 (2016) 7940–7957, http://dx.doi.org/10.1109/ACCESS.
by data mining methods, Comput. Mater. Sci. 148 (2018) 324–335, http: 2016.2619719.
//dx.doi.org/10.1016/j.commatsci.2018.03.004. [33] T.F. Zhu, Y.P. Lin, Y.H. Liu, Synthetic minority oversampling technique
[9] J.P. Barddal, L. Loezer, F. Enembreck, et al., Lessons learned from data for multiclass imbalance problems, Pattern Recognit. 72 (2017) 327–340,
stream classification applied to credit scoring, Expert Syst. Appl. 162 (2020) http://dx.doi.org/10.1016/j.patcog.2017.07.024.
113899, http://dx.doi.org/10.1016/j.eswa.2020.113899. [34] L. Abdi, S. Hashemi, To combat multi-class imbalanced problems by means
[10] W. Chen, H.R. Pourghasemi, A. Kornejady, Landslide spatial modeling: of over-sampling techniques, IEEE Trans. Knowl. Data Eng. 28 (1) (2016)
introducing new ensembles of ANN, maxent, and SVM machine learning 238–251, http://dx.doi.org/10.1109/TKDE.2015.2458858.
techniques, Geofis. Int. 305 (2017) 314–327, http://dx.doi.org/10.1016/j. [35] N.V. Chawla, K.W. Bowyer, L.O. Hall, et al., SMOTE: Synthetic Minority
geoderma.2017.06.020. Over-sampling Technique, J. Artificial Intelligence Res. 16 (2002) 321–357.
[11] X. Yao, J. Crook, G. Andreeva, Enhancing two-stage modelling methodology [36] J. Sun, J. Lang, H. Fujita, et al., Imbalanced enterprise credit evaluation
for loss given default with support vector machines, European J. Oper. Res. with DTE-SBD: decision tree ensemble based on SMOTE and bagging
263 (2) (2017) 679–689, http://dx.doi.org/10.1016/j.ejor.2017.05.017. with differentiated sampling rates, Inform. Sci. 425 (2018) 76–91, http:
[12] A.A. Aburomman, M.B.I. Reaz, A novel weighted support vector machines //dx.doi.org/10.1016/j.ins.2017.10.017.
multiclass classifier based on differential evolution for intrusion detection [37] C. Bunkhumpornpat, K. Sinapiromsaran, C. Lursinsap, Safe-Level-SMOTE:
systems, Inform. Sci. 414 (2017) 225–246, http://dx.doi.org/10.1016/j.ins. Safe-level-synthetic minority over-sampling technique for handling the
2017.06.007. class imbalanced problem, in: Pacific-Asia Conference on Advances in
[13] F. Kang, J.S. Li, J.J. Li, System reliability analysis of slopes using least Knowledge Discovery & Data Mining, Springer-Verlag, 2009.
squares support vector machines with particle swarm optimization, Neu- [38] H. Han, W. Wang, B. Mao, Borderline-SMOTE: A new over-sampling
rocomputing 209 (2016) 46–56, http://dx.doi.org/10.1016/j.neucom.2015. method in imbalanced data sets learning, Adv. Intell. Comput. 17 (12)
11.122. (2005) 878–887, http://dx.doi.org/10.1007/11538059_91.
[14] J. Masino, J. Pinay, M. Reischl, et al., Road surface prediction from [39] S. Barua, M.M. Islam, X. Yao, et al., MWMOTE-Majority weighted minority
acoustical measurements in the tire cavity using support vector machine, oversampling technique for imbalanced data set learning, IEEE Trans.
Appl. Acoust. 125 (2017) 41–48, http://dx.doi.org/10.1016/j.apacoust.2017. Knowl. Data Eng. 26 (2) (2014) 405–425, http://dx.doi.org/10.1109/TKDE.
03.018. 2012.232.
20
X.M. Tao, W. Chen, X.K. Li et al. Knowledge-Based Systems 219 (2021) 106897
[40] J. Mathew, M. Luo, C.K. Pang, et al., Kernel-based SMOTE for SVM Classi- [55] W. Fan, S.J. Stolfo, J.X. Zhang, et al., AdaCost: misclassification cost-sensitive
fication of Imbalanced Datasets, in: IECON 2015-41ST Annual Conference boosting, in: Proceedings of the Sixteenth International Conference on
of the IEEE Industrial Electronics Society, 2005. Machine Learning, 1999.
[41] J. Mathew, C.K. Pang, M. Luo, et al., Classification of imbalanced data by [56] K.M. Ting, A comparative study of cost-sensitive boosting algorithms, in:
oversampling in Kernel Space of support vector machines, IEEE Trans. Proceedings of the 17th International Conference on Machine Learning,
Neural Netw. Learn. Syst. 29 (9) (2018) 4065–4076, http://dx.doi.org/10. Stanford University, CA, 2000.
1109/TNNLS.2017.2751612. [57] Y. Sun, M.S. Kamel, A.K.C. Wong, et al., Cost-sensitive boosting for classi-
[42] X.M. Tao, Q. Li, C. Ren, W.J. Guo, et al., Real-value negative selection over- fication of imbalanced data, Pattern Recognit. 40 (12) (2007) 3358–3378,
sampling for imbalanced data set learning, Expert Syst. Appl. 129 (2019) http://dx.doi.org/10.1016/j.patcog.2007.04.009.
118–134, http://dx.doi.org/10.1016/j.eswa.2019.04.011. [58] W. Lee, C.H. Jun, J.S. Lee, Instance categorization by support vector
[43] B. Gu, V.S. Sheng, K.Y. Tay, et al., Cross validation through two-dimensional machines to adjust weights in adaboost for imbalanced data classification,
solution surface for cost-sensitive SVM, IEEE Trans. Pattern Anal. Mach. Inform. Sci. 381 (2017) 92–103, http://dx.doi.org/10.1016/j.ins.2016.11.014.
Intell. 39 (6) (2017) 1103–1121, http://dx.doi.org/10.1109/TPAMI.2016. [59] C. Bellinger, S. Sharma, N. Japkowicz, One-class classification-From theory
2578326. to practice: A case-study in radioactive threat detection, Expert Syst. Appl.
[44] Q. Zhang, X.X. Chen, Z. Fang, et al., Reducing false arrhythmia alarm 108 (2018) 223–232, http://dx.doi.org/10.1016/j.eswa.2018.05.009.
rates using robust heart rate estimation and cost-sensitive support vector [60] I. Jeong, D.G. Kim, J.Y. Choi, et al., Geometric one-class classifiers using
machines, Physiol. Meas. 38 (2) (2017) 259–271, http://dx.doi.org/10.1088/ hyper-rectangles for knowledge extraction, Expert Syst. Appl. 117 (2019)
1361-6579/38/2/259. 112–124, http://dx.doi.org/10.1016/j.eswa.2018.09.042.
[45] F.Y. Cheng, J. Zhang, C.H. Wen, Cost-sensitive large margin distribution [61] V. Camerini, G. Coppotelli, S. Bendisch, Fault detection in operating heli-
machine for classification of imbalanced data, Pattern Recognit. Lett. 80 copter drivetrain components based on support vector data description,
(2016) 107–112, http://dx.doi.org/10.1016/j.patrec.2016.06.009. Aerosp. Sci. Technol. 73 (2018) 48–60, http://dx.doi.org/10.1016/j.ast.2017.
[46] Z.H. Zhou, X.Y. Liu, Training cost-sensitive neural networks with methods 11.043.
addressing the class imbalance problem, IEEE Trans. Knowl. Data Eng. 18 [62] S.C. Pang, M.A. Orgun, Z.Z. Yu, A novel biomedical image indexing and
(1) (2006) 63–77, http://dx.doi.org/10.1109/TKDE.2006.17. retrieval system via deep preference learning, Comput. Methods Programs
[47] A. Ghazikhani, R. Monsefi, H.S. Yazdi, Online cost-sensitive neural network Biomed. 158 (2018) 53–69, http://dx.doi.org/10.1016/j.cmpb.2018.02.003.
classifiers for non-stationary and imbalanced data streams, Neural Comput. [63] G.G. Cabral, A.L.I. Oliveira, One-class classification based on searching for
Appl. 23 (5) (2013) 1283–1295, http://dx.doi.org/10.1007/s00521-012- the problem features limits, Expert Syst. Appl. 41 (16) (2014) 7182–7199,
1071-6. http://dx.doi.org/10.1016/j.eswa.2014.05.037.
[48] C. Zhang, W. Gao, J. Song, et al., An imbalanced data classification algorithm [64] X.Q. Wang, D. Wei, H. Cheng, et al., Multi-instance learning based on
of improved autoencoder neural network, in: 2016 Eighth International representative instance and feature mapping, Neurocomputing 216 (2016)
Conference on Advanced Computational Intelligence (ICACI), 2016. 790–796, http://dx.doi.org/10.1016/j.neucom.2016.07.055.
[49] Y.H. Zhou, Z.H. Zhou, Large margin distribution learning with cost interval [65] A. Belghith, C. Bowd, F.A. Medeiros, et al., Learning from healthy and stable
and unlabeled data, IEEE Trans. Knowl. Data Eng. 28 (7) (2016) 1749–1763, eyes: a new approach for detection of glaucomatous progression, Artif.
http://dx.doi.org/10.1109/TKDE.2016.2535283. Intell. Med. 64 (2) (2015) 105–115, http://dx.doi.org/10.1016/j.artmed.
[50] G. Tuysuzoglu, D. Birant, Enhanced Bagging (eBagging): A novel approach 2015.04.002.
for ensemble learning, Int. Arab J. Inf. Technol. 17 (4) (2020) 515–528, [66] A.E. Lazzaretti, D.M.J. Tax, H.V. Neto, et al., Novelty detection and multi-
http://dx.doi.org/10.34028/iajit/17/4/10. class classification in power distribution voltage waveforms, Expert Syst.
[51] H.R. Kadkhodaei, A.M.E. Moghadam, M. Dehghan, HBoost: A heterogeneous Appl. 45 (2016) 322–330, http://dx.doi.org/10.1016/j.eswa.2015.09.048.
ensemble classifier based on the boosting method and entropy measure- [67] C.F. Zhang, K.X. Peng, J. Dong, A novel plant-wide process monitoring
ment, Expert Syst. Appl. 157 (2020) 113482, http://dx.doi.org/10.1016/j. framework based on distributed Gap-SVDD with adaptive radius, Neu-
eswa.2020.113482. rocomputing 350 (2019) 1–12, http://dx.doi.org/10.1016/j.neucom.2019.04.
[52] C.J. Tsai, New feature selection and voting scheme to improve classification 026.
accuracy, Soft Comput. 23 (22) (2019) 12017–12030, http://dx.doi.org/10. [68] H.T. Lin, C.J. Lin, R.C. Weng, A note on Platt’s probabilistic outputs for
1007/s00500-019-03757-2. support vector machines, Mach. Learn. 68 (2007) 267–276, http://dx.doi.
[53] N. Mahendran, P.M.D.R. Vincent, K. Srinivasan, et al., Realizing a stack- org/10.1007/s10994-007-5018-6.
ing generalization model to improve the prediction accuracy of major [69] [dataset], Machine Learning Repository UCI, 2015, http://archive.ics.uci.
depressive disorder in adults, IEEE Access 8 (2020) 49509–49522, http: edu/ml/datasets.html.
//dx.doi.org/10.1109/access.2020.2977887. [70] J. Demsar, Statistical comparisons of classifiers over multiple data sets,
[54] Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line J. Mach. Learn. Res. 7 (2006) 1–30, http://dx.doi.org/10.1007/s10846-005-
learning and an application to boosting, J. Comput. System Sci. 55 (1997) 9016-2.
119–139, http://dx.doi.org/10.1006/jcss.1997.1504.
21