You are on page 1of 1

C ROSS -P ROJECT D EFECT P REDICTION BY USING O PTIMIZED

L IGHT G RADIENT B OOSTING M ACHINE A LGORITHM


1 2
Shailza Kanwar & Vivek Shrivastava
1 2
CSED, NIT Delhi, EED, NIT Delhi
A BSTRACT I NTRODUCTION R ESULTS
Software Defect Prediction (SDP) is a process for Cross-project defect prediction (CPDP) is a pro- The proposed model is compared with other ensemble algorithms, which are also optimized. The other
predicting software modules that are prone to cess that trains the prediction model on the algorithms are Xtreme Gradient Boosting (XGBoost) and Random Forest. The hyperparameters of these
failure. The use of SDP can reduce the num- source project and predicts the defect of the tar- algorithms are also optimized by using Bayesian optimization.
ber of demanded testing resources.We present an get project.
amended Light GBM algorithm as a defect pre- Sr. No. Algorithm Accuracy AUC Score Sensitivity FPR FNR F1-Score
diction model for the defect prediction process. 1 LightGBM 0.9912 0.9913 0.9951 0.0048 0.0128 0.9911
The hyperparameters of the algorithm are op- 2 XGBoost 0.8354 0.8391 0.8037 0.2421 0.1016 0.8436
timized by using the Bayesian optimization to 3 Random Forest 0.9013 0.9014 0.9061 0.0917 0.1057 0.9008
maximize the area under the curve. The proposed
model uses the values of optimized hyperparam- Table 1: Comparison of optimized LightGBM with optimized XGBoost and optimized Random Forest Algorithms
eters to train the data and then classify the defec-
tive and non-defective instances on test data.
Keywords: Cross-project defect prediction,
LightGBM, hyperparameter optimization

M ATERIALS & M ETHODS H YPER PARAMETERS


The following methods and steps were required Hyper-parameter Default Value
to complete the research: application regression
data Training
• SMOTE
num_iterations 100
num_leaves 31 (a) (b) (c)
• LightGBM
device CPU Figure 1: ROC curve (a) Optimized LightGBM (b) Optimized XGBoost (c) Optimized Random Forest
• Bayesian Hyperparameter optimization max_depth 6
Data Preprocessing min_data_in_leaf 20
feature_fraction 1 C ONCLUSION
Software Label Feature Resample Train-Test
bagging_fraction 1
Fault Dataset Encoding
Normalization
Selection Data Split

min_split_gain 1 • The hyperparameters are optimized by the • The proposed model’s performance is sig-
Bayesian optimization method, which uti- nificantly better than the optimized random
min_child_weight 1
lizes the Gaussian process as a surrogate forest and optimized XGBoost.
lambda_l1 0
Testing
model. • The reason of the observed results is the
Data
lambda_l2 0
• The proposed model achieves an accuracy high-efficiency parallelization, fast speed,
num_class 1
of 99.12 %, which is significantly better high model accuracy, and low FPR and FNR
Evaluation
than Random Forest (90.13 %) and XGBoost of the proposed model.
Optimized Mean LightGBM Bayesian Training
Data
Reports Model AUC Score Training Optimization
We tune 8 hyperparameters: bagging_fraction,
Refinement
lambda_l1, lambda_l2, feature_fraction, (83.54%)
Hyperparameter Optimization and Training max_depth, min_split_gain, min_child_weight,
and num_leaves.

R EFERENCES F UTURE R ESEARCH C ONTACT I NFORMATION


[1] Wu et al. Hyperparameter optimization for machine learning mod- • In future, instead of considering the source project to train the defect prediction model. Department Computer Science and Engineering.
els based on Bayesian optimization. Journal of Electronic Science and Institute National Institute of Technology Delhi.
Technology, 2019. project based on the similarity score, a
recommendation-based model can be pro- • Domain adaptation can be utlized to im- Email shailza@nitdelhi.ac.in
[2] Ni et al. Revisiting supervised and unsupervised methods for Phone +91 9717433020
effort-aware cross-project defect prediction. IEEE Transactions on posed to select the most appropriate source prove learning process of prediction model.
Software Engineering, 2020

You might also like