You are on page 1of 6

2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC)

Banff Center, Banff, Canada, October 5-8, 2017

Imbalanced Data Classification using


Complementary Fuzzy Support Vector Machine
Techniques and SMOTE
Ratchakoon Pruengkarn, Kok Wai Wong, Chun Che Fung
School of Engineering and Information Technology
Murdoch University
{r.pruengkarn, k.wong, l.fung} @murdoch.edu.au

Abstract— A hybrid sampling technique is proposed by presents the concept of the proposed technique and information
combining Complementary Fuzzy Support Vector Machine about the evaluation measures of the classifiers. The
(CMTFSVM) and Synthetic Minority Oversampling Technique characteristic of data used in the experiments and the experiment
(SMOTE) for handling the imbalanced classification problem. The process are presented in Section IV. Section V contains results
proposed technique uses an optimised membership function to of the classification performance from the benchmark and real
enhance the classification performance and it is compared with world datasets. Section VI provides the conclusion of this study.
three different classifiers. The experiments consisted of four
standard benchmark datasets and one real world data of plant II. RELATED WORK
cells. The results revealed that implementing CMTFSVM followed
by SMOTE provided better result over other FSVM classifiers for Modifying the data distribution on training set is one of the
the benchmark datasets. Furthermore, it presented the best result solutions for handling imbalanced data classification problems.
on real world dataset with 0.9589 of G-mean and 0.9598 of AUC. Undersampling and oversampling approaches are commonly
It can be concluded that the proposed techniques work well with used to deal with the distribution of the datasets. Undersampling
imbalanced benchmark and real world data. methods decrease the number of majority classes while
oversampling methods replicate the minority class instances
Keywords— Imbalanced data; Complementary Fuzzy Support from existing instances in order to achieve equitable class
Vector Machine; Synthetic Minority Oversampling Technique distribution. Hybrid methods are alternatives which combine
both undersampling and oversampling techniques in order to
I. INTRODUCTION balance the data set.
The problem of imbalanced distribution in real world
Regarding undersampling, the Random Undersampling
datasets is common. It has drawn significant attention from
(RU) is a simple approach in which majority samples are
researchers in machine learning, data mining and pattern
randomly selected for use as training data. The Wilsons Edited
classification disciplines [1]. This issue refers to a situation
Nearest Neighbor rule (ENN) [4] removes a majority sample
where class distribution of the data is unequal between the data
where two of its three nearest neighbors are minority samples.
classes. A class is considered as the majority class or negative
Tomek Links [5] eliminates a pair of samples which belongs to
class, when it overwhelms the other class, the minority class or
different classes and their nearest neighbors. Inverse Random
positive class. Too few samples in the minority class could lead
Under Sampling (IRUS) [6] under samples the majority class in
to false detection or the data being ignored as noise [2]. These
order to create a large number of distinct training sets which
problems exit in various fields such as information retrieval,
could then find a decision boundary and separates the minority
medical diagnosis and text classification [3]. A number of
class from the majority class.
proposed data level solutions to handle the class imbalanced
problems generally undersampling negative class instances or In oversampling techniques, the Synthetic Minority
oversampling positive class instances, or using a combination of Oversampling Technique (SMOTE) [7] generates synthetic
the two approaches in order to deal with skewed class samples without taking consideration to neighbor examples. The
distribution on the training data. Random Walk Over-Sampling approach (RWO-Sampling) [8]
created synthetic samples through randomly walking from the
This study focuses on applying Complementary technique
real data and expanded the minority class boundary after
(CMT), Fuzzy Support Vector Machine (FSVM) and Synthetic
synthetic samples have been generated. Borderline-SMOTE [9]
Minority Oversampling Technique (SMOTE) for addressing the
generated the minority class instances near the class boundaries.
imbalanced classification problem. Three classifiers are
implemented to compare the classification performance of the In general, oversampling techniques perform better than
proposed approach. The performance is evaluated in term of undersampling techniques with the AUC measurement.
Geometric mean (G-mean) and Area Under the receiver However, combing both sampling techniques performed
operating characteristic Curve (AUC). better results over applying only undersampling or
The structure of the paper is as follows. Section II reviews oversampling [10][11][12].
previous research on learning from imbalanced data. Section III

978-1-5386-1645-1/17/$31.00 ©2017 IEEE 978


It was observed in past research that SVM technique is 𝐶𝑀𝑇𝐹𝑆𝑉𝑀1 = 𝑇 − (𝑀𝑇𝑟𝑢𝑡ℎ ∪ 𝑀𝐹𝑎𝑙𝑠𝑖𝑡𝑦 ) (1)
particularly sensitive to noise and abnormal value. This can
indirectly affect the classification accuracy. For this reason, 𝐶𝑀𝑇𝐹𝑆𝑉𝑀2 = 𝑇 − (𝑀𝑇𝑟𝑢𝑡ℎ ∩ 𝑀𝐹𝑎𝑙𝑠𝑖𝑡𝑦 ) (2)
FSVM has been introduced to eliminate the impact of noise The pseudocode of the CMTFSVM technique is shown in
and abnormal value [13]. Several studies have been carried Fig.2.
out by applying fuzzy theory to the classifier in order to get
better recognition rate of the minority classes [14][15].
Furthermore, assigning different fuzzy membership values to
the training instances could handle both imbalanced outlier
and noise problems [16]. In the previous work by the authors,
fuzzy theory was combined with complementary techniques
called CMTFSVM in order to eliminate outliers and noise,
and the technique was also compared with the CMTNN
technique [17]. The results indicated that CMTFSVM was
robust and outperformed the CMTNN technique. Therefore,
this study continues with the investigation of CMTFSVM
and SMOTE techniques on skewed imbalanced data and
worked with a larger dataset. The CMTFSVM is applied as
an undersampling technique, and the SMOTE is used as an
oversampling technique. In order to evaluate the
classification performance from the imbalance datasets, the Fig. 1. CMTFSVM structure design.
G-mean and AUC are used to assess the performance of the
classifier between the minority class and majority class.
III. PROPOSED METHOD
This section describes the Complementary Fuzzy Support
Vector Machine, proposed model and the classification
performance evaluation.
A. Complementary Fuzzy Support Vector Machine
(CMTFSVM) Concept
Complementary Fuzzy Support Vector Machine
(CMTFSVM) applies the concepts of complementary (CMT)
[18] of Truth target output by using Fuzzy Support Vector
Machine (FSVM) as a classifier to identify uncertainty data.
CMTFSVM consists of a truth model and a falsity model. The
architecture of falsity model is similar to the truth model except
the target outputs of falsity model are complementary of the Fig. 2. Pseudocode of the CMTFSVM undersampling technique
truth model. The truth model is trained with truth memberships
whereas the falsity model is trained to predict with falsity B. Assigning Fuzzy Membership Values
memberships. The memberships are the exponential decaying
The membership values or weights are assigned to each
fuzzy membership values based on the distance from the actual
training examples to representing their importance for their
hyperplane [16]. The optimised truth and false fuzzy
class, to reflect the within-class importance in order to suppress
membership values which provide the best results are chosen
the effect of outliers and noise [16] [19].
automatically. The prediction output of training data from the
truth fuzzy membership values and the falsity fuzzy membership Let 𝑚𝑖+ represents the membership value of a positive class
values are then compared. The difference between two fuzzy example 𝑥𝑖+ , while 𝑚𝑖− represents the membership value of
membership values can indicate the uncertainty instances and it negative class example 𝑥𝑖− in their own class. The membership
can adapt to perform as an undersampling technique to eliminate function is defined as follows:
misclassification (MTruth, MFalsity) instances from the training
dataset (T) as shows in Fig.1. 𝑚𝑖+ = 𝑓(𝑥𝑖+ )𝑟 + (3)
There are two types of undersampling techniques which are: 𝑚𝑖− = 𝑓(𝑥𝑖− )𝑟 − (4)
CMTFSVM1 and CMTFSVM2. CMTFSVM1 training dataset
is constructed by eliminating all misclassification samples by
where 𝑓(𝑥𝑖 ) reflects the importance of 𝑥𝑖 in its own class
the truth and the falsity model respectively, whereas the which generates a value between 0 and 1. 𝑟 + and 𝑟 − represent
misclassification instances in CMTFSVM2 are eliminated if the reflection of class imbalance, such that 𝑟 + > 𝑟 − . The
they appear in both the truth and the falsity model. CMTFSVM1 membership value of positive class is in the [0, 𝑟 + ] interval,
and CMTFSVM2 new training datasets are described as follows: while the [0, 𝑟 − ] interval, where 𝑟 < 1 is the membership
value of negative class. In order to reflect the class imbalance,

979
𝑟 + = 1 and 𝑟 − = 𝑟 are assigned to fuzzy membership for
training examples.
The within-class importance of training sample function
𝑓(𝑥𝑖 ) is defined by exponential decaying function which is
based on the distance from the actual hyperplane as follows:
ℎ𝑦𝑝 2
𝑓𝑒𝑥𝑝 (𝑥𝑖 ) = ℎ𝑦𝑝 (5)
1+𝑒𝑥𝑝(𝛽𝑑𝑖 )

ℎ𝑦𝑝
where β is the steepness of the decay and 𝛽 ∈ [0,1], 𝑑𝑖
is the functional margin for each example 𝑥𝑖 which is
Fig. 3. Undersampling with CMTFSVM then Oversampling with SMOTE
equivalent to the absolute value of the SVM decision value
and it is defined in (2).
ℎ𝑦𝑝
𝑑𝑖 = 𝑦𝑖 (ω ∙ Φ(𝑥𝑖 ) + 𝑏) (6)
The higher membership values are assigned to examples
closer to the actual separating hyperplane and treated as more
informative, while the lower membership values are assigned to
the less informative examples far away from the separating
hyperplane.
In this study, the decay value (β) steps from 0 to 1 by 0.1
increments, thus there are 10 possible fuzzy membership
functions. The optimised fuzzy membership function is chosen Fig. 4. Oversampling with SMOTE then undersampling both classes with
from the fuzzy membership function that presents the highest CMTFSVM
value on G-mean measurement.
C. Proposed Method
G-mean [3] is a standard technique to measure an
imbalance dataset classifier. It can balance the performance
In previous studies [20] [12], complementary technique was between majority and minority class. The G-mean is defined
applied to neural networks in order to perform data cleaning and
handle imbalanced data based on benchmarks datasets. In
as the square root of the product of true positive rate and true
additional, application of fuzzy concept has been investigated negative rate as shown in equation (7).
[17]. The result showed that combining complementary with 𝑇𝑃 𝑇𝑁
𝐺 − 𝑚𝑒𝑎𝑛 = √𝑇𝑃+𝐹𝑁 × 𝑇𝑁+𝐹𝑃 (7)
fuzzy technique outperformed neural network. In this study,
applying complementary fuzzy support vector machine and
SMOTE oversampling is investigated with focus on addressing where the True Positive (TP) and the True Negative (TN)
the imbalanced problems in both benchmarks data and real are the positive instances and negative instances correctly
world data. classified, respectively. On the other hand, the False Positive
CMTSMT1 and CMTSMT2 are proposed by implementing (FP) and the False Negative (FN) are the negative instances
CMTFSVM1 and CMTFSVM2 undersampling techniques and the positive instances misclassified.
respectively, followed by SMOTE oversampling technique as The AUC curve [2] is a well-known approach for
shows in Fig.3. evaluating the performance of classification which is a trade-
SMTCMT1 and SMTCMT2 is another proposed approach off between the benefits (TPrate = TP/(TP+FN)) and costs
by executing CMTFSVM1 or CMTFSVM2 undersampling (FPrate = FP/(FP+TN)). It can be seen that the solid line is a
followed after SMOTE oversampling technique as shows in classifier which performs better than the random classifier.
Fig.4.Classification Performance Evaluation In additional, it presents that the classifier cannot increase
The evaluation criteria is a key factor in assessing the number of true positive without increasing the number of
classification performance. The confusion matrix is a typical false positive.
measurement based on recording the results of correctly and
IV. EXPERIAMENTAL DESIGN
incorrectly recognized examples of each class for binary
classification problem. The accuracy rate is commonly used for This section describes the dataset characteristics and the
measurement [2]. However, the predictive accuracy may not be experimental steps used in this study.
appropriate for imbalanced data. For example, the predictive A. Data Characteristics
accuracy would give 98% when minority class has only 2% of
This study focused on binary classification. Four benchmark
data. There are other measurements to maximise the datasets obtained from KEEL [21] and UCI [22] repositories are
accuracy between the minority and majority class. Hence, G- used. A real world data on DNA cells of plants detection [23] is
mean and the AUC are applied to deal with class imbalance also used to assess the performance of the proposed
problem. methodology. The attributes of the datasets are mainly numeric

980
and discrete. Some of the nominal type attributes are converted combined with minority instances to create the new
to numerical type during the preprocessing phrase. training dataset.
Table I summarises the characteristics of each dataset which  Oversampling: The original training datasets are
include the number of total instances (#instances), number of divided into majority sets and minority sets. The
attributes (#attributes), number of minority samples (#minority), minority instances are oversampled to the number of
number of majority samples (#majority), number of imbalance majority instances using SMOTE technique called
ratio (#IR), percentage of minority class (%minority), SMT.
percentage of majority class (%majority), number of training
data (#training) and number of testing data (#testing).  Undersampling then oversampling: CMT
undersampling technique is applied for majority
TABLE I. SUMMARY OF IMBALANCED DATASETS IN THE EXPERIMENTS instances in training data first and then apply SMOTE
oversampling to the minority instances with the same
Datasets
Characteristics
German Adult Yeast3 Glass5 Gnome
number of undersampled majority instances called
Source REAL- CMTSMT1 and CMTSMT2 approaches.
UCI UCI KEEL KEEL
WORLD
#instances 1,000 45,222 1,484 214 3,375  Oversampling then undersampling: minority instances
#attributes 21 15 9 10 10 is implemented with SMOTE algorithm first with the
#minority 300 11,208 163 9 20
#majority 700 34,014 1,321 205 3,355
same number of the original majority instances.
#IR 2.33 3.03 8.10 22.78 167.75 Oversampling minority instances are then combined
%minority 30.00 24.78 10.98 4.21 0.59 with the original majority instances and subsequently
%majority 70.00 75.22 89.02 95.79 99.41 applying CMTFSVM undersampling technique on both
#training 800 36,178 1,187 171 2,700
#testing 200 9,044 297 43 675 classes to the new datasets called SMTCMT1 and
SMTCMT2 approaches.
B. Experimental Processes
Finally, the four major sampling techniques are classified by
The experiments are conducted using the 10-fold cross- a set of classifier and evaluated using G-mean and AUC
validation strategy. Each dataset consists of ten folds with the measurement. The overall experimental processes are showed in
same number of instances. The instances are rearranged in each Fig. 5.
fold to obtain reliable results. The dataset is divided into 80%
training dataset and 20% testing dataset.
A set of classifiers used in the experiments is defined at the
first phase of the research.
 Since Neural Networks (NN) can predict accurately
outcomes from complex problems [24], feed-forward
backprogation network is created with one hidden layer,
12 nodes in hidden layer, log-sigmoid as transfer
function in hidden layer, gradient descent with
momentum backpropagation training function, 5000 of
epochs to train, and 0.5 of learning rate.
 Support Vector Machine (SVM) is primarily set to
maximize the margin, which input pattern would be
classified correctly [25]. SVM with Radial Basis
Function (RBF) is used with 3 degrees in kernel function
 Fuzzy Support Vector Machine (FSVM) has been Fig. 5. The overall experimental processes
proposed to handle outlier and noise problems [16].
Fuzzy memberships or weights are computed using V. RESULT AND DISCUSSION
exponential decaying. The optimal memberships are
selected using highest G-mean value. Experiments are conducted to test the classification
performance based on G-mean and AUC with the proposed
The four major sampling techniques are applied to the approaches. The proposed techniques are compared with three
dataset: 1) undersampling, 2) undersampling then oversampling, major classifiers: NN, SVM and FSVM. The four datasets from
3) oversampling, and 4) oversampling then undersampling. The KEEL and UCI, and a real world dataset are used to evaluate the
details of each sampling technique are described as follows. proposed methods. The results are shown below.
 Undersampling: The original training datasets are A. Benchmarks datasets
divided into majority sets and minority sets. CMT As can be seen from Table II, all classifiers can classify
technique is applied to majority instances only in order reasonably well on the original datasets which have low IR such
to do undersampling in the training dataset. The CMT1 as German and Adult. The FSVM classifier performs better
and CMT2 majority instances are generated and then when compared to NN and SVM. On the other hand, the

981
classification performance is affected by the increase of the IR. TABLE IV. EXPERIMENTAL RESULTS OF ADULT DATASET
For Yeast3, NN is only able to label a few of the minority Approaches Classifiers
instances compared to SVM and FSVM. However, no NN SVM FSVM
significant difference is found between SVM and FSVM. For G- AUC G- AUC G- AUC
mean mean mean
the Glass5 dataset with IR of 22.78, the classification CMT1 0.7891 0.7895 0.7993 0.7998 0.8010 0.8017
performance of NN and SVM classifiers are the same with G- CMT2 0.7785 0.7808 0.7896 0.7923 0.7987 0.7991
mean of 0.0000 and AUC of 0.5000. It is apparent that the NN SMOTE 0.4511 0.5729 0.7989 0.7993 0.7995 0.7999
CMTSMT1 0.6854 0.7121 0.8013 0.8034 0.8012 0.8034
and SVM classifiers cannot recognise the minority instances but CMTSMT2 0.7271 0.7343 0.8039 0.8050 0.8040 0.8052
identify majority instances only. On the other hand, FSVM can SMTCMT1 0.5100 0.5425 0.7964 0.7966 0.7979 0.7980
classify both majority and the minority instances on this SMTCMT2 0.3393 0.4465 0.7965 0.7967 0.7982 0.7983
imbalanced dataset with G-mean of 0.1000 and AUC of 0.5500.
In brief, the FSVM classifier can perform better than NN and TABLE V. EXPERIMENTAL RESULTS OF YEAST3 DATASET
SVM classifiers on the original imbalanced datasets. Approaches Classifiers
NN SVM FSVM
TABLE II. COMPARING THREE DIFFERENT CLASSIFIERS FOR HANDLING G- AUC G- AUC G- AUC
ORIGINAL IMBALANCED DATASETS mean mean mean
CMT1 0.4235 0.6507 0.8119 0.8279 0.8244 0.8380
Original Classifiers CMT2 0.4563 0.6690 0.8241 0.8263 0.8241 0.8380
datasets NN SVM FSVM SMOTE 0.6708 0.7295 0.8971 0.8981 0.9032 0.9043
G-mean AUC G-mean AUC G-mean AUC CMTSMT1 0.4509 0.6548 0.9004 0.9014 0.9026 0.9034
German 0.6311 0.6717 0.6371 0.6820 0.7328 0.7386 CMTSMT2 0.6787 0.7475 0.8966 0.8973 0.8997 0.9006
Adult 0.7058 0.7318 0.7183 0.7443 0.7364 0.7414 SMTCMT1 0.4666 0.6903 0.8953 0.8962 0.9008 0.9017
Yeast3 0.2584 0.5935 0.7773 0.8009 0.7749 0.7994 SMTCMT2 0.7113 0.7882 0.8971 0.8979 0.8988 0.8998
Glass5 0.0000 0.5000 0.0000 0.5000 0.1000 0.5500
For the German dataset, CMT is implemented as the TABLE VI. EXPERIMENTAL RESULTS OF GLASS5 DATASET
undersampling technique and it can improve classification
Approaches Classifiers
performance for NN and SVM classifiers. By combining CMT NN SVM FSVM
and SMOTE technique, CMTSMT2 performs better for all G- AUC G- AUC G- AUC
classifiers. FSVM classifier performs the best for German mean mean mean
CMT1 0.0000 0.5000 0.0000 0.5000 0.0000 0.5000
testing data with G-mean of 0.7367 and AUC of 0.7388. CMT2 0.0000 0.5000 0.0000 0.5000 0.1000 0.5500
SMOTE 0.5465 0.7173 0.9625 0.9634 0.9625 0.9634
TABLE III. EXPERIMENTAL RESULTS OF GERMAN DATASET CMTSMT1 0.5811 0.7521 0.9637 0.9646 0.9625 0.9634
CMTSMT2 0.5274 0.6830 0.9625 0.9634 0.9638 0.9646
Approaches Classifiers SMTCMT1 0.2887 0.6319 0.9474 0.9490 0.9475 0.9490
NN SVM FSVM SMTCMT2 0.5483 0.7478 0.9474 0.9490 0.9475 0.9490
G- AUC G- AUC G- AUC
mean mean mean Table VI presents the experimental data on Glass5 dataset.
CMT1 0.7129 0.7242 0.7080 0.7202 0.7244 0.7313 The undersampling CMT1 and CMT2 techniques are applied to
CMT2 0.7177 0.7294 0.7090 0.7226 0.7262 0.7358 the original data. The classifiers can identify only the majority
SMOTE 0.4525 0.5346 0.7244 0.7299 0.7294 0.7358
CMTSMT1 0.6191 0.6385 0.7223 0.7249 0.7141 0.7187
class with CMT1 technique. It implies that by reducing all
CMTSMT2 0.6486 0.6605 0.7227 0.7253 0.7367 0.7388 majority samples present from both truth and falsity model is not
SMTCMT1 0.5559 0.5919 0.7282 0.7312 0.7331 0.7354 appropriate for highly imbalance data. While eliminating
SMTCMT2 0.6508 0.6692 0.7278 0.7309 0.7338 0.7361 misclassification majority sample appeared in both truth and
For the case of the Adult dataset in Table IV, CMT falsity model on CMT2 technique can help to recognise the
oversampling technique can enhance the classification minority class. The experiments using the SMOTE
performance by approximately 6% in G-mean when compared oversampling technique present that all classifiers recognise
with all the classifiers. Also, all classifiers performance is both the majority and minority samples. SVM and FSVM
improved by implementing only the SMOTE oversampling present similar results on both G-mean and AUC with 0.9625
technique. Furthermore, CMTSMT2 generated the best result and 0.9634 respectively. Further CMTSMT1 and CMTSMT2
over other methods by approximately 8% in G-mean and AUC, implemented SMOTE oversampling after CMT undersampling
and FSVM provided the best results over two classifiers with G- technique present better results. The best classification
mean of 0.8040 and AUC of 0. performance in this experiment is the results generated by those
As regard to the Yeast3 dataset in Table V, SMOTE from CMTSMT2 with G-mean of 0.9638 and AUC of 0.9646.
oversampling provided the best results on SVM and FSVM. In contrast, there is no significant difference in results between
Also, FSVM performs better over NN and SVM classifiers with SMTCMT1 and SMTCMT2 on FSVM and SVM.
G-mean of 0.9032 and AUC of 0.9043. A comparison of B. Real World dataset
combination techniques reveals CMTSMT1 shows higher G-
mean and AUC over CMTSMT2, SMTCMT1 and SMTCMT2 The Gnome dataset is a real world data which has high
approaches for SVM and FSVM classifiers, while SMTCMT2 skewed data distribution to identify hypermethylation and
generated good results for NN classifier. Furthermore, hypomethylation gene in cells of plants [23]. It can be observed
CMTSMT1 approach can increase the classification from Table VII that classifiers can only recognise the majority
performance based on G-mean measurement by approximately instances in the original dataset with G-mean of 0.000 and AUC
13% from the original dataset. of 0.5000 of AUC.

982
The proposed method is applied to the Gnome dataset. By [6] M. A. Tahir, J. Kittler, and F. Yan, "Inverse random under sampling for
executing only CMT undersampling technique, all classifiers class imbalance problem and its application to multi-label classification,"
Pattern Recognition, vol. 45, pp. 3738-3750, 2012.
can label only majority class, which is the same original dataset.
[7] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer,
While by using SMOTE oversampling, it generated reasonable "SMOTE: synthetic minority over-sampling technique," Journal of
results for all classifiers. Further investigation on implementing Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.
SMOTE followed CMT, the highest classification performance [8] H. Zhang and M. Li, "RWO-Sampling: A random walk over-sampling
belongs to CMTSMT2 on FSVM classifier with G-mean of approach to imbalanced data classification," Information Fusion, vol. 20,
0.9589 and AUC of 0.9598 of AUC. However, by applying pp. 99-116, 2014.
CMT after SMOTE shows similar results on both FSVM and [9] W. W. Han H., Mao BH, "Borderline-SMOTE: A New Over-Sampling
SVM classifiers. Method in Imbalanced Data Sets Learning," presented at the International
Conference on Intelligent Computing (ICIC 2005), Hefei, China, 2005.
TABLE VII. EXPERIMENTAL RESULTS OF GNOME DATASET [10] G. Batista, R. Prati, and M. Monard, "A study of the behavior of several
methods for balancing machine learning training data," ACM SIGKDD
Approaches Classifiers Explorations Newsletter - Special issue on learning from imbalanced
NN SVM FSVM datasets, vol. 6, pp. 20-29, 2004.
G- AUC G- AUC G- AUC
mean mean mean
[11] S. Cateni, V. Colla, and M. Vannucci, "A method for resampling
Original 0.0000 0.5000 0.0000 0.5000 0.0000 0.5000 imbalanced datasets in binary classification tasks for real-world
CMT1 0.0000 0.5000 0.0000 0.5000 0.0000 0.5000 problems," Neurocomputing, vol. 135, pp. 32-41, 2014.
CMT2 0.0000 0.5000 0.0000 0.5000 0.0000 0.5000 [12] P. Jeatrakul, K. W. Wong, and C. C. Fung, "Classification of Imbalanced
SMOTE 0.6569 0.7598 0.9470 0.9482 0.9588 0.9596 Data by Combining the Complementary Neural Network and SMOTE
CMTSMT1 0.5633 0.7075 0.9368 0.9381 0.9589 0.9598 Algorithm " in 17th International Conference on Neural Information
CMTSMT2 0.5950 0.7128 0.9470 0.9482 0.9589 0.9598 Processing (ICONIP), Sydney, 2010.
SMTCMT1 0.6859 0.7462 0.9577 0.9586 0.9576 0.9585
[13] X. Fan and Z. He, "A Fuzzy Support Vector Machine for Imbalanced Data
SMTCMT2 0.7866 0.7921 0.9579 0.9588 0.9577 0.9586
Classification," presented at the 2010 International Conference on
Optoelectronics and Image Processing (ICOIP), Haiko, 2010.
VI. CONCLUSION
[14] A. Ralescu and S. Visa, "Fuzzy classifiers versus cost-based Bayes
This study proposed a combination of the Complementary classifiers," Montreal, Que., 2006, pp. 302 - 305.
Fuzzy Support Vector Machine (CMTFSVM), and the Synthetic [15] S. Visa and A. Ralescu, "Fuzzy Classifiers for Imbalanced, Complex
Minority Oversampling Technique (SMOTE) to address the Classes of Varying Size," in The Internation Conference on Information
imbalanced classification problem. It was then compared with Processing and Management of Uncertainty in Knowledge-Based System,
Perugia, 2004, pp. 393-400.
three difference classifiers: NN, SVM and FSVM. The four [16] R. Batuwita and V. Palade, "FSVM-CIL: Fuzzy Support Vector Machines
benchmarks datasets from KEEL and UCI repositories are for Class Imbalance Learning," IEEE Transactions on Fuzzy Systems,
characterised with various imblanced ratios. The proposed vol. 18, pp. 558-571, 2010.
technique was also applied to a real world dataset which was [17] R. Pruengkarn, K. W. Wong, and C. C. Fung, "Data Cleaning Using
used to identify hypermethylation and hypomethylation gene in Complementary Fuzzy Support Vector Machine Technique," presented at
the The 23rd International Conference on Neural Information Processing
cells of plants. The results revealed that implementing only (ICONIP 2016), Kyoto, Japan, 2016.
undersampling techniques on this dataset cannot improve the
[18] P. Kraipeerapun, C. C. Fung, and S. Nakkrasae, "Porosity Prediction
classifier performance much when the imbalanced ratios are Using Bagging of Complementary Neural Networks," presented at the
high. On the other hand, applying only oversampling technique Sixth International Symposium on Neural Networks (ISNN 2009),,
could identify both majority and minority class for all classifiers. Wuhan, China, 2009.
However, combining CMT undersampling technique followed [19] S. Abe and T. Inoue, "Fuzzy Support Vector Machines for Multiclass
by oversampling technique presented the best results over other Problems," in European Symposium on Artificial Neural Networks,
techniques on most of the datasets. Furthermore, the FSVM Bruges (Belgium), 2002, pp. 113-118.
classifier is robust for handling imbalanced problem as shown [20] P. Jeatrakul, K. W. Wong, and C. C. Fung, "Data cleaning for
classification using misclassification analysis," Journal of Advanced
in this study, and it performs well on almost all the sampling Computational Intelligence and Intelligent Informatics, vol. 14, pp. 297-
techniques. 302, 2010.
[21] J. Alcalá-Fdez, A. Fernandez, J. Luengo, J. Derrac, S. García, L. Sánchez,
REFERENCES et al., "KEEL Data-Mining Software Tool: Data Set Repository,
[1] S. M. A. Elrahman and A. Abraham, "A Review of Class Imbalance Integration of Algorithms and Experimental Analysis Framework,"
Problem," Journal of Network and Innovative Computing, vol. 1, pp. 332- Journal of Multiple-Valued Logic and Soft Computing, vol. 17, pp. 255-
340, 2013. 287, 2011.
[2] V. López, A. Fernández, S. García, V. Palade, and F. Herrera, "An insight [22] M. Lichman, "UCI Machine Learning Repository," ed: University of
into classification with imbalanced data: Empirical results and current California, Irvine, School of Information and Computer Sciences, 2013.
trends on using data intrinsic characteristics," Information Sciences, vol. [23] O. M. R. Lister R, Tonti-Filippini J, Gregory BD, Berry CC, Millar AH,
250, pp. 113-141, 11/20/ 2013. Ecker JR, "Highly integrated single-base resolution maps of the
[3] D.Ramyachitra and P.Manikandan, "Imbalanced Dataset Classification epigenome in Arabidopsis," Cell, vol. 1333, pp. 523-536, 2008.
and Solutions: A Review," International Journal of Computing and [24] J. V. Tu, "Advantages and disadvantages of using artificial neural
Business Research (IJCBR), vol. 5, 2014. networks versus logistic regression for predicting medical outcomes,"
[4] D. L. Wilson, "Asymptotic Properties of Nearest Neighbor Rules Using Journal of Clinical Epidemiology, vol. 49, pp. 1225-1231, 1996.
Edited Data,," IEEE Transactions on Systems, Man, and Cybernetics, vol. [25] S. M. A. S. Karamizadeh, M. Halimi, J. Shayan and M. j. Rajabi,,
SMC-2, pp. 408-421, 1972. "Advantage and drawback of support vector machine functionality,"
[5] I. Tomek, "Two Modifications of CNN," IEEE Transactions on Systems, presented at the International Conference on Computer, Communications,
Man and Cybernetics, vol. SMC-6, pp. 769-772, 1976. and Control Technology (I4CT 2014), Langkawi, Kedah, Malaysia, 2014.

983

You might also like