Professional Documents
Culture Documents
2805
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)
Figure 1: Illustration of the framework of our proposed iterative metric learning for imbalance data classification.
tal results and discussions are provided in Section 4. Finally, its abundantly unrelated metrics. Thus, it is meaningful to
we conclude this paper in Section 5. construct a more effective data space for classification. Wu
et al. [Wu et al., 2017] introduced the idea of MFL into im-
2 Related Works balanced learning and provided a multiple sets construction
strategy, incorporated the cost-sensitive factor into MFL. Mo-
Most of traditional classification works assume that the data hamed et al. [Bader-El-Den et al., 2016] took advantage of
space is balance and sufficient. Thus, many traditional ma- the local neighborhood of the minority instances to identify
chine learning algorithms have been widely employed in con- and treated difficult examples belonging to the same classes
structing classifier, such as Support Vector Machines (SVM) to solve the imbalance issue. Kwabena et al. [Bennin et al.,
[Yan et al., 2010], Bayesian network [Jeet et al., 2011] and 2017] interpreted two distinct sub-classes as parents and gen-
so on. However, in fact, the imbalance issue widely exists erated a new instance that inherits different traits from each
in many classification applications and the amount of minor parent and contributed to the diversity within the data distri-
class is limited, such as medical diagnosis, fraud defection bution.
and so on. Due to the specific of imbalance data, the tradi- Although there are many works concentrating on imbal-
tional classifiers need to be further modified. ance data classification, most of existing methods fail to take
In order to increase the accuracy of unbalance data classi- the limited of the training data into account. The limited train-
fier, many research works concerned about imbalance classi- ing data may influence the development of the unbalance data
fication field. For instance, Mesquita et al. [Mesquita et al., classifier. Therefore, many research works focus on obtaining
2016] employed reject option into weighted extreme learn- more information from limited training data for classification
ing machine and proposed rejoELM. Based on reioELM, they models. For instance, a semi-supervised learning method was
further proposed its variant, IrejoELM, and made the model introduced in [Zhang et al., 2017], which utilized the rela-
possible to postpone the final decision of non-classified mod- tionships among training data, and then constructed a label
ules. Moreover, Sun et al. [Sun et al., 2012] proposed a propagation method based on these relationships. Yang et al.
method which transformed the binary classification work into [Yang et al., 2015] proposed a learning-to-rank method which
a multi-classification work, and the method can effectively directly optimized the ranking performance and constructed
handle the highly imbalanced data. Recently, cost-sensitive a software defect prediction models according to the ranking
learning is widely applied to solve the imbalanced problem. information.
Divya et al. [Tomar and Agarwal, 2016] proposed a method
named WLSTSVM to improve the performance by assigning
higher misclassification cost to minority. Zhang et al. [Zhang 3 Iterative Metric Learning for Imbalance
et al., 2017] developed a nonnegative sparse graph by label Data Classification
propagation, and they also took information from copious un-
labeled data into consideration. 3.1 Overview of the Method
Considering that the performances of classifiers largely de- This section, we introduce our iterative metric learning
pend on the quality of the data space and some research method for imbalance data classification. The main objective
works concerned about data distribution demonstrate that of our work is to explore a stable neighborhood space for a
multi-metrics limits the performance of the classifier due to testing data under the scenario of imbalance training data. In
2806
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)
our method, we propose to find a more stable neighborhood As for the selection of metric learning algorithm, it is not
space for the testing data using the iterative metric learning vital for our method, and we select one of the most bench-
strategy, which consists of three components: iterative met- mark metric learning method, i.e., Large Margin Nearest
ric learning, iterative samples selection and training samples Neighbor (LMNN) method [Weinberger and Saul, 2009b] to
matching. transform the data space. In order to select the appropriate
In the first stage, due to the absence of prior knowledge, training data space, LMNN constructs a loss function to pull
most classifiers just like KNN, used the Euclidean distance the samples with similar labels close to the testing sample
to measure the similarities between samples and constructed and push training samples with different labels far away from
their models. But actually, Euclidean distance metric is inca- the testing sample. The loss function of LMNN is defined as
pable of describing the statistical regularities of the training φ(L) = (1−µ)φpull (L)+µφpush (L). TheP first component of
data [Weinberger and Saul, 2009a]. To solve this problem, the loss function is defined as φpull (L) = kL(xi − xj )k2 ,
we utilize the benefit of metric learning, which has the ability i,j
to reduce distances between same categories samples and in- which penalizes the large distance between the testing sam-
crease distances between different categories samples. Then, ples and the similarly labeled training samples. In this func-
in the new data space, the nearby samples are more likely to tion, L is the linear transformation of the input space, and xi
have the same labels. is the testing sample, and xj is the training samples which
In the second stage, considering that the training data is im- have similar labels with the testing sample. The second
balanced and usually unrelated, we conduct sample selection component of the loss function penalizes the small distances
process to select the most relevant data according to the test- between the testing samples and training samples with dif-
ing data. After samples selection process, a subset of training P labels, and the function is2 defined as φpush
ferent (L) =
(1 − yim )[1 + kL(xi − xj )k − kL(xi − xl )k2 ]+ . The
P
data which locates nearest to the testing data are selected to
construct the training data space. In this way, the neighbor- i,j m
hood training samples are more relevant to the testing data, term [q]+ = max (q, 0) denotes the standard hinge loss. In
and the imbalanced issue can be solved in this step. this formula, yil =1 only if yi = yl , and in other cases, yi l =
Considering that single metric learning process may not 0. LMNN utilizes positive semidefinite matrices to optimize
transform the data into the most effective data space, we op- this formula. P Pcan be rewritten as φ(M ) =
Then the formula
(1−µ) DM (→ −
xi , →
−
P
timize the neighborhood space for testing samples by itera- xj )+µ (1−yil ). In this formulation,
i,j i,j l
tively executing metric learning process and sample selection
process. We stop the iteration procedure by comparing the M = LT L and DM (→ −
xi , →
−
xj ) = ( →
−
xi − →
−
xj )T M (→
−
xi − →
−
xj ). The
current selected training samples with the previous selected parameter µ is used to balance the two terms. After getting
samples. If the selected samples are relatively stable, for in- the distance matrix M , The data space can be transformed
stance, the samples selected in two adjacent iteration process into a new distinguishable data space. In the new data space,
are similar. We interrupt the iterative procedure, otherwise, the neighborhood samples among the testing samples become
we repeat the above steps. more related and the nearby samples are more like to have the
same labels.
3.2 Iterative Metric Learning
In order to improve the accuracy of imbalance data classifi- 3.3 Iterative Training Samples Selection
cation model, we employ metric learning methods to modify
structure of data space. The objective of metric learning is The second step of our method is iterative training samples
to separate the samples from different classes by a large mar- selection. In this procedure, considering that the neighbor-
gin and minimize pairwise distances between samples from hood samples are more likely to have the same labels, for each
the same class, simultaneously. As a result, the newly trans- testing sample, we calculate the distance between the train-
formed data space is more efficient for label classification. ing samples and the testing sample, and choose the closest
The process is demonstrated in Figure 2. It is noted that training samples. Giving a set of testing and training samples
only conducting metric learning process once may not find {ζtrain1 , ζtrain3 , . . . , ζtrainn , ζtest1 , ζtest2 , . . . , ζtestn }, the
the most effective and stable neighbor for the testing data. feature of samples can be represented as feature vectors
Thus, we propose an iterative metric learning method for data {xtrain1 , xtrain3 , . . . , xtrainn , xtest1 , xtest2 , . . . , xtestn }.
space modification. As we can see in Figure 2, by repeating As shown in Figure 3, we uses ζ to donate one of the testing
metric learning process, the training data space becomes more samples which is represented as green circle. Firstly, we
related to the testing data. calculate the distances between the testing sample ζ and all
of the training samples. For the positive and negative training
data classes, we both choose the top np and top nf training
samples, and the selected positive samples are represented as
red circles and the negative samples are represented as blue
circles in Figure 3. By this step, we can only focus on the
samples which are more likely to influence the classification
results of testing data, and the training samples selection
Figure 2: A example of finding a stable neighborhood space for test- process can significantly reduce the size of training samples,
ing sample . especially the defect-prone training samples.
2807
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)
2808
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)
2809
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)
2810
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)
method with KNN classifier using data space handled by dif- [Lichman, 2013] M. Lichman. UCI machine learning repos-
ferent times of iterations. The experimental results show that itory, 2013.
iterative metric learning has the superiority in finding a more [Liu et al., 2015] Wei Liu, Cun Mu, Rongrong Ji, Shiqian
stable neighborhood space for the testing data, and almost all Ma, John R. Smith, and Shih-Fu Chang. Low-rank sim-
the selected samples become stable after twice iterations. ilarity metric learning in high dimensions. In Proceedings
Although the performance of the IML shows its advantage of the Twenty-Ninth AAAI Conference on Artificial Intelli-
in classification, there are still several limitations. On the one gence, pages 2792–2799, 2015.
hand, considering that misclassification of positive and nega-
tive modules is associated with different costs in real applica- [Menzies et al., 2007] Tim Menzies, Jeremy Greenwald, and
tion, the proposed method should add the cost information in Art Frank. Data mining static code attributes to learn de-
the learning process. On the other hand, it is better to modify fect predictors. IEEE Transactions on Software Engineer-
the weight of training data according to the structure of the ing, 33(1):2–13, 2007.
data space. [Mesquita et al., 2016] Diego Parente Paiva Mesquita, Lin-
coln S. Rocha, João P. P. Gomes, and Ajalmar R.
Acknowledgments da Rocha Neto. Classification with reject option for soft-
This work was supported by National Key R and D Pro- ware defect prediction. Applied Soft Computing, 49:1085–
gram of China (Grant No. 2017YFC0113000), National 1093, 2016.
Natural Science Funds of China (U1701262, 61671267), [Sobhani et al., 2014] Parinaz Sobhani, Herna Viktor, and
National Science and Technology Major Project (No. Stan Matwin. Learning from Imbalanced Data Using
2016ZX01038101), MIIT IT funds (Research and application Ensemble Methods and Cluster-Based Undersampling.
of TCN key technologies) of China, and The National Key Springer International Publishing, 2014.
Technology R and D Program (No. 2015BAG14B01-02). [Sun et al., 2012] Zhongbin Sun, Qinbao Song, and Xiaoyan
Zhu. Using coding-based ensemble learning to improve
References software defect prediction. IEEE Transactions on Systems,
[Aci et al., 2010] Mehmet Aci, Cigdem Inan Aci, and Mutlu Man, and Cybernetics, 42(6):1806–1817, 2012.
Avci. A hybrid classification method of k nearest neighbor, [Tomar and Agarwal, 2016] Divya Tomar and Sonali Agar-
bayesian methods and genetic algorithm. Expert Systems wal. Prediction of defective software modules using class
with Applications, 37(7):5061–5067, 2010. imbalance learning. Proceedings of Annual Computer
[Bader-El-Den et al., 2016] Mohamed Bader-El-Den, Ele- Software and Applications Conference, 2016:7658207:1–
man Teitei, and Adda Mo. Hierarchical classification for 7658207:12, 2016.
dealing with the class imbalance problem. In International [Weinberger and Saul, 2009a] Kilian Q. Weinberger and
Joint Conference on Neural Networks, pages 3584–3591, Lawrence K. Saul. Distance metric learning for large mar-
2016. gin nearest neighbor classification. Journal of Machine
[Barua et al., 2014] Sukarna Barua, Md. Monirul Islam, Xin Learning Research, 10:207–244, 2009.
Yao, and Kazuyuki Murase. Mwmote–majority weighted [Weinberger and Saul, 2009b] Kilian Q. Weinberger and
minority oversampling technique for imbalanced data set Lawrence K. Saul. Distance metric learning for large mar-
learning. IEEE Transactions on Knowledge and Data En- gin nearest neighbor classification. Journal of Machine
gineering, 26(2):405–425, 2014. Learning Research, 10:207–244, 2009.
[Bennin et al., 2017] K. Ebo Bennin, J. Keung, P. Phanna- [Wu et al., 2017] Fei Wu, Xiao-Yuan Jing, Shiguang Shan,
chitta, A. Monden, and S. Mensah. Mahakil:diversity Wangmeng Zuo, and Jing-Yu Yang. Multiset feature learn-
based oversampling approach to alleviate the class imbal- ing for highly imbalanced data classification. In Proceed-
ance issue in software defect prediction. IEEE Transac- ings of the Thirty-First AAAI Conference on Artificial In-
tions on Software Engineering, pages 1–1, 2017. telligence,, pages 1583–1589, 2017.
[He and Garcia, 2009] Haibo He and Edwardo A Garcia. [Yan et al., 2010] Zhen Yan, Xinyu Chen, and Ping Guo.
Learning from imbalanced data. IEEE Transactions Software defect prediction using fuzzy support vector re-
on Knowledge and Data Engineering, 21(9):1263–1284, gression. In Proceedings of the 7th International Confer-
2009. ence on Advances in Neural Networks, pages 17–24, 2010.
[Jeet et al., 2011] Kawal Jeet, Nitin Bhatia, and Rajin- [Yang et al., 2015] Xiaoxing Yang, Ke Tang, and Xin Yao.
der Singh Minhas. A bayesian network based approach A learning-to-rank approach to software defect prediction.
for software defects prediction. ACM SIGSOFT Software IEEE Transactions on Reliability, 64(1):234–246, 2015.
Engineering Notes, 36(4):1–5, 2011. [Zhang et al., 2017] Zhi-Wu Zhang, Xiao-Yuan Jing, and
[Jing et al., 2014] Xiao-Yuan Jing, Shi Ying, Zhi-Wu Zhang, Tiejian Wang. Label propagation based semi-supervised
Shanshan Wu, and Jin Liu. Dictionary learning based learning for software defect prediction. Automated Soft-
software defect prediction. In Proceedings of Interna- ware Engineering, 24(1):47–69, 2017.
tional Conference on Software Engineering, pages 414–
423, 2014.
2811