You are on page 1of 5

Software Metrics for Fault Prediction Using Machine

Learning Approaches
A Literature Review with PROMISE Repository Dataset

Meiliana1, Syaeful Karim2 Harco Leslie Hendric Spits Warnars1,


1,2
Computer Science Department, School Ford Lumban Gaol2, Edi Abdurachman3
of Computer Science Computer Science Department, BINUS Benfano Soewito
1
Computer Science Department, BINUS Graduate Program - Doctor of Computer Computer Science Department, BINUS
Graduate Program - Doctor of Computer Science Graduate Program – Master of Computer
Science Bina Nusantara University Science,
1,2
Bina Nusantara University Jakarta, Indonesia 11480 Bina Nusantara University
1
Jakarta, Indonesia 11480 shendric@binus.edu, 2fgaol@binus.edu, Jakarta, Indonesia 11480
1
meiliana@binus.edu, 2karim@binus.ac.id 3
edia@binus.edu bsoewito@binus.edu

Abstract— Software testing is an important and critical phase from software metrics provides a prediction in the early phase
of software development life cycle to find software faults or of software development process. Thus, software construction
defects and then correct those faults. However, testing process is and testing can be focused on the fault prone module to
a time-consuming activity that requires good planning and a lot improve the quality. The most widely used methods are
of resources. Therefore, technique and methodology for statistical and machine learning, or the combination between
predicting the testing effort is important process prior the testing both methods. But in this paper, we will focus on machine
process to significantly increase efficiency of time, effort and cost learning implementation to predict the model. Applying
usage. Correspond to software metric usage for measuring machine learning technique will allow computer to “learn”
software quality, software metric can be used to identify the
and able to predict the fault prone modules. Significant
faulty modules in software. Furthermore, implementing machine
learning technique will allow computer to “learn” and able to
development of machine learning methods on the last decades
predict the fault prone modules. Research in this field has contributes significant research in this field, such as Least
become a hot issue for more than ten years ago. However, Squares Support Vector Machine (LSSVM) [1] or
considering the high importance of software quality with support Conventional radial basis function neural network (RBFNN)
of machine learning methods development, this research area is [2].
still being highlighted until this year. In this paper, a survey of Public dataset such as Open Source “Apache POI” [3],
various software metric used for predicting software fault by
Rhino (an open-source implementation of JavaScript written
using machine learning algorithm is examined. According to our
in Java) [4] are used as experiment data to examine the model.
review, this is the first study of software fault prediction that
focuses to PROMISE repository dataset usage. Some conducted Therefore, most dataset used in this field are NASA dataset
experiments from PROMISE repository dataset are compared to from PROMISE repository that will be a focus on this review.
contribute a consensus on what constitute effective software PROMISE repository is design for predictive models and data
metrics and machine learning method in software fault analytics in software engineering [5]. The rest of this paper are
prediction. organized as follows. The second section will discuss about
software metrics and possibilities to use software metrics for
Keywords—software metric; machine learning algorithm; fault prediction. Subsequent section describes some machine
software defect prediction; software testing learning approaches used to predict software faults. The 4th
section presents several conducted researches regarding
I. INTRODUCTION software fault prediction based on software metrics using
machine learning approaches. Conclusions and future
Software testing is an important phase of software development gaps are given in the last section.
development life cycle that verify and validate software
quality. One way to monitor software quality is to find
software faults or defects and then correct those faults. II. STATE OF THE ART
However, testing process is a time-consuming activity that
makes it impossible to find all faults within given resources. A. Software Metrics
Therefore, methodologies and techniques for predicting the Software metrics is a quantitative measurement that
testing effort is important process prior the testing process to assigns numbers or symbols to attributes of the measured
significantly increase efficiency of time, effort and cost usage. entity [6]. An attribute is a property/feature of an entity that
can be measured, e. g. length, age and cost. An entity can be
Correspond to software metric usage for measuring source code of application or a software development process
software quality, software metric can be used to identify the activity. Measures obtained from entities which have been
faulty modules in software. Deriving a fault prediction model associated with risk factor can be used to create defect

19
prediction models. There are many types of available software development process is a superiority of this research. The idea
metrics for different purposes that will be discussed on the behind software fault prediction is to use measurement
next section. extracted from development process e.g. source code that can
be found in software metrics. Many approaches have been
1) Software Metric Classification studied to conduct software fault prediction, start with simple
Software metrics can be classified static code metrics and equation, statistical analysis, expert estimation, and machine
process metrics. Static code metrics can be directly extracted learning. Among all approaches, machine learning has been
from source code, such as Lines of Code (LOC), Cyclomatic proved as the most successful approaches based on research of
Complexity Number (CCN), or other static code metrics that [11].
include compiler instruction and data declaration counts. To predict the fault proneness module automatically, a
Object oriented metrics is a subcategory of static code metrics, large variety of machine learning algorithms is used. Machine
such as Depth of Inheritance Tree (DIT), Coupling Between learning algorithm is known as classification. Classification
Objects (CBO), Number of Children (NOC), and Response for procedure basically means a class labelling to a new sample
Class (RFC). Aggarwal et al [7] observed that import coupling based on a given set of samples that have been labeled.
metrics are strongly associated with fault proneness and Several machine learning algorithms will be discussed in this
predict faulty classes with high accuracy in object oriented subsection.
system.
Support Vector Machine is a machine learning technique
Process metrics can be extracted from Source Code for classifying unseen data correctly by constructing N-
Management system based on historic changes on source code dimensional hyperplane. Finding optimal hyperplane that
overtime. Moser et al [8] concluded that change (process) separate cluster from vector is the main task of Support Vector
metrics are more efficient than static code metrics for defect Machine modeling. Therefore, cases with one category of
prediction because process data contains more discriminatory dependent variable will separate side by side from cases with
information about defect distribution than the source code another category of the plane. The term support vectors refer
itself. Other research by Nagappan et al [9] used metrics to all vectors near hyperplane. SVM modelling finds oriented
concerning code churn, a measure of changes in source code hyperplane to maximize margin between the support vectors.
over a specified period of time. They conclude that the use of SVM able to handle nonlinear separator between points by
relative code churn metrics predict the defect per source file using kernel function to map the data into different space, and
better than other metrics and that code churn can be used to then hyperplane will be used to separate the space.
discriminate between faulty and non-faulty files.
Random forest is another machine learning type that
Metrics can also be classified based on development phase consist of many classification trees. The classification trees are
of software life cycle, into source code level metrics, detailed decision trees that represents major advance in knowledge
design level metrics or test level metrics. discovery and data mining. Random forest classifier offers
2) Minimum Number of Software Metrics for Fault prediction with high level accuracy. The classification
algorithm classifies a new object from an input vector. The
Prediction Model input vector is put down to each tree in the forest, where each
Excessive software metrics during software development tree gives a classification result. The tree votes for that class,
life cycle often collected for different reasons and stored in then the forest will choose most votes of classification.
software project repositories. Therefore, selection of software The next machine learning method is bagging algorithm or
metrics is another critical point to build a software defect known as bootstrap aggregating. Bootstrap aggregating is a
prediction model. Empirical study conducted by [10] technique that will repeatedly samples from dataset based on a
investigated five threshold-based feature selection techniques uniform probability distribution. The size of each sample of
to remove irrelevant and redundant software metrics. The bootstrap and the original data is same. It is possible for an
experiments demonstration able to remove 98.5% available instance to occur several times during the same training set
software metrics to construct a model of defect prediction with because the sampling is done with replacement. On the other
better performance. The main conclusion of that study is that hand, other instance could be omitted from the training set.
three combination of software metrics are sufficient. On Each sample has probability of 1-(1-1/N)N of being selected.
average, this number will create effective software prediction Therefore, a bootstrap sample may contain around 63%
models. This number can be used as a reference for selecting original training data in sufficiently large data of N. An
minimum number of software metrics with regard to fault instance will be assigned to a class after training K classifiers
prediction if the class receives the highest number of votes.

B. Machine Learning Approaches for Fault Prediction Artificial Neural Network is being popular lately, such as
Multilayer Perceptron (MLP). MLP could be used to solve
Software fault prediction is the process to predict almost any problems, such as pattern recognition,
software’s parts that are fault prone. By focusing on fault interpolation, etc. Multilayer Perceptron contains of one or
prone files that are detected, testers can save testing effort. As two hidden layers as advancement to the perceptron neural
mentioned before, it is important to find software fault/defect network model. Feedforward neural networks is trained with
as early as possible to reduce testing cost. Therefore, using back propagation algorithm that consists of two passes
software metric that is produced on the early stage of software (forward and backward pass). Forward pass presented an input

20
to the neural network that will propagated through the software engineering. The repository contains a lot of dataset
network. that can be refute, confirm, repeat, or improve from previous
result.
An instance-based classifier, K-star, used similarity
function to determine class of test instance according to the
class that has similar instance. By using an entropic measure, III. PREVIOUS RESEARCH
this method retrieves the nearest stored example. Considerable research has been performed on software
Naive Bayesian is a classification method that quite metrics for fault prediction process using machine learning
efficient and easy to implement. Most used and well-known algorithm. Most of the research used NASA MDP (Metric
classifier method that need smaller data quantity to estimate Data Program) repository as the experiment dataset. Each
the parameters. The basic algorithm is a probabilistic dataset list all discovered faults of the system and the number
according to theorem of Bayes, that assumes class feature of modules containing the fault, while each module is
existence does not depend on another feature existence. described by a set of code-level or design level attribute.
Catal et. al [12] used algorithm of Artificial Immune
C. PROMISE repository dataset Recognition System (AIRS) to create defect model based on
PROMISE repository aims to provide verifiable and method-level metrics and Chidamber-Kemerer metrics suite.
repeatable models that can be used for software engineering The experiments showed that AIRS algorithm obtain the best
research. It’s an online repository to share dataset among fault prediction model by using combination of CK metrics
software engineering researchers and practitioners. A and Line of Code metrics. Catal et al. [13] on other research
researcher or practitioner can store their dataset or access work examined nine classifiers for each of the five public
another model share by others. Thus, user will be able to NASA dataset. According to the research, Naïve Bayes
compare their result and make conclusion, sustain from algorithm provides the best prediction performance for small
previous works [5]. PROMISE was started to as a research to datasets, while Random Forest is the best prediction algorithm
predict effort and defect of a software and then developed to for large datasets.
be model based requirements engineering and value based

TABLE I. COMPARISON TABLE FOR MACHINE LEARNING ALGORITHM AND METRIC FROM SEVERAL RESEARCH

Authors Data source Metrics Algorithm/Method Result


Catal et. al 5 public NASA datasets 13 method level metrics Random Forest Naïve Bayes algorithm is better for smaller
[13] Naive Bayes dataset, while Random forest algorithm
J48 perform best for larger dataset
Immunos 1 & 2
CLONALG
AIRS1 & 2
AIRS 2 Parallel
Catal et. al 6 OO metrics of WMC (Weighted Methods per AIRS - Artificial The best fault prediction result is
[12] Chidamber-Kemerer Class) Immune Recognition combination between CK metrics and the
metrics, DIT (Depth of Inheritance Tree) System algorithm lines of code (LOC) metric
4 metrics from KC1 RFC (Response for a Class)
projects, and NOC (Number of Children)
Halstead & McCabe CBO (Coupling Between Object
metric Classes)
LCOM (Lack of Cohesion in
Methods)
Percent Pub Data
Access To Pub Data
Dep On Child
Fan In
Shanthini [14] Public domain KC1 21 method-level metrics Naïve Bayes Precision, recall and accuracy of SVM show
NASA data set, method proposed by Halstead and SVM better result compare to other machine
level and class level McCabe. K-Star learning methods for both class and method
metrics 10 class-level Random Forest level metrics.
oriented metrics are used
Mundada et al PROMISE repository: Object Oriented CK metrics Artificial Neural ANN show better accuracy compare to
[15] JM1/Software Defect Network and Resilient previous research
Prediction Back Propagation
Lines of Code (LoC) metric,
Unique Operator (UOp),
Cyclomatic Complexity
AR3, AR4, AR5 from Quad K-Means algorithm more efficient in the term
Bishnu et al. (CC), Total
Promise data Tree-Based K-Means of iterations except for AR5, and
[16] Operator (TOp), Unique
Iris dataset Clustering Algorithm the SSE.
Operand (UOpnd), Total
Operand (TOpnd)

21
Authors Data source Metrics Algorithm/Method Result
An open source Eclipse Fifteen Bayesian Compared to common Naïve Bayes classifiet,
Dejaeger et al Halstead, Line Of Code, and
Foundation & NASA Network (BN) Augmented Naive Bayes classifiers show
[17] McCabe complexity metrics
IV&V facility dataset classifiers better performance.
nine data sets
from Promise data repository,
Okutan et al LOC, LOCQ, and RFC are the most effective
Promise data repository NOD for the number of Bayesian networks
[18] metrics.
developers and LOCQ for the
source code quality
MVE approach performs the best result. Fault
Majority Voting
Kumar et al 45 real-life datasets from Chidamber and Kemerer Java prediction model developed using MVE
Ensemble (MVE)
[19] the PROMISE repository Metrics method consume less fault removal cost as
method
compare to other techniques
four NASA datasets ( two
combination of Random Forts (RF) with
of them are based on SCM 8 classifiers: NB, NN,
Wondaferaw Object Oriented Metrics (OOM) Information Gain (IG) FS yields the highest
and the rest two SVM, RF, KNN,
et al [20] and Static Code Metrics (SCM) Receiver Operating Characteristic (ROC)
datasets are based on DTr, DTa, and RTr
curve value.
OOM)
Conventional radial
Five datasets from the basis function neural
Proposed ADBBO-RBFNN classifier
Kumudha et NASA data program NECM is employed as a key network and the novel
approach show more effective result compare
al [2] repository (CM1, JM1, performance metric adaptive dimensional
to early existing predictors.
KC1, KC2, and PC1) biogeography based
optimization model.
derive threshold values
The statistical comparison showed better fault
Gupta et al CK metrics: DIT, WMC, CBO, and fault classification
PROMISE data repository classification after redundancy removal and
[21] NOC, RFC, LCOM using the derived
log-transformed data than otherwise
metrics
Fuzzy Inference
Ant, jEdit, Camel, Xalan, Fault prone module can be located
Systems (FISs) and
Erturka et al Log4j and Lucene projects automatically with iterative software fault
CBO, WMC, and RFC Adaptive Neuro Fuzzy
[22] from the PROMISE prediction
Inference System are
repository
employed
The results show the effectiveness of the used
feature selection method method. Therefore, our proposed WF method
Alighardashi NASA and PROMISE by using combination of can find the best features with the highest
20 object oriented metrics
et al [23] with ten datasets filter feature selection speed for the improvement of the fault
methods prediction accuracy. In this research, we use
the five filter methods

Shanthini [14] evaluated 21 method-level metrics and 10 Table 1 present better comparison of all related works
class-level metrics from the same dataset (public domain KC1 discuss in this paper. The comparison table identified software
NASA data set) using four different classifiers such as Naïve metrics used and machine learning algorithm applied for each
Bayes, K-star, Random Forest and SVM. Based on conducted research with the result.
experiments, Random Forest classifier showed better result
(precision, recall F-Measure, and accuracy) for method level IV. CONCLUSION
metric. The best performance for class level metrics is
produced from SVM method. In fault prediction studies, the most widely used dataset is
NASA datasets from PROMISE repository that provide many
In object oriented software system, object oriented metrics metrics to be investigated. The usage of public dataset is
are useful in predicting the fault proneness of classes, such as recommended to make fault prediction models repeatable,
Object Oriented CK metrics and QMOOD metrics used by refutable and verifiable. Moreover, the use of same public
Ruchika [3]. The research apply statistical method and six dataset will simplify comparison study from one research to
machine learning methods; Random Forest, Adaboost, another research. From several research observed in this
Bagging, Multilayer Perceptron, SVM, Genetic Programming. paper, software metric is proved as one efficient source to
Prediction models to estimate fault proneness are validated by provide fault predictive model. In despite of every type of
using dataset of Open Source “Apache POI” (pure Java library metric can be used to estimate fault proneness module, class
for manipulating Microsoft documents). The best result is level metric show better prediction performance compare to
showed by random forest and bagging algorithm. method level metric. Average three software metrics are
sufficient to build effective software prediction models.

22
Machine learning algorithm is applied to automatically Engineering, 2005. ICSE 2005. (pp. 284-292). IEEE..
construct fault predictive model by classifying module into [10] H. Wang, “How Many Software Metrics Should be Selected
defective and non-defective one. Based on performance for Defect Prediction ?,” no. Mi, pp. 69–74, 2005.
(accuracy, sensitivity, specificity, and receiver operating
characteristic analysis) that is showed from previous [11] Catal, C., & Diri, B. (2009). A systematic review of software
experiments, SVM and Random Forest algorithm provide best fault prediction studies. Expert systems with applications,
prediction model for two different datasets. These two 36(4), 7346-7354
algorithms can be used as a recommendation machine learning [12] Catal, C., Diri, B., & Ozumut, B. (2007, June). An artificial
algorithm for fault prediction. Most of these studies have immune system approach for fault prediction in object-
proved the usage of software metric for fault prediction using oriented software. In Dependability of Computer Systems,
most machine learning algorithms. However, research in this 2007. DepCoS-RELCOMEX'07. 2nd International
field remains open for implementation of deep learning Conference on (pp. 238-245). IEEE..
algorithm as part of broader family of machine learning to get
better prediction model as started by Yang et. al [24] [13] Catal, C., & Diri, B. (2009). Investigating the effect of
dataset size, metrics sets, and feature selection techniques on
software fault prediction problem. Information Sciences,
REFERENCES
179(8), 1040-1058.
[1] L. Kumar, S. Krishna, A. Sureka, and S. Ku, “Effective fault [14] A. Shanthini, “Applying Machine Learning for Fault
prediction model developed using Least Square Support,” Prediction Using Software Metrics,” vol. 2, no. 6, pp. 274–
vol. 0, 2017. 278, 2012.
[2] P. Kumudha and R. Venkatesan, “Cost-Sensitive Radial [15] D. Mundada, A. Murade, O. Vaidya, and J. N. Swathi,
Basis Function Neural Network Classifier for Software “Software Fault Prediction Using Artificial Neural Network
Defect Prediction,” vol. 2016, 2016. And Resilient Back Propagation,” Int. J. Comput. Sci. Eng.,
[3] R. Malhotra and A. Jain, “Fault prediction using statistical vol. 5, no. 03, pp. 173–179, 2016.
and machine learning methods for improving software [16] P. S. Bishnu. and V. Bhattacherjee, “Software Fault
quality,” J. Inf. Process. Syst., vol. 8, no. 2, pp. 241–262, Prediction Using Quad Tree-Based K-Means Clustering
2012. Algorithm,” vol. 24, no. 6, pp. 1146–1150, 2012.
[4] H. M. Olague, L. H. Etzkorn, S. Member, S. Gholston, and [17] K. Dejaeger, T. Verbraken, and B. Baesens, “Prediction
S. Quattlebaum, “Empirical Validation of Three Software Models Using Bayesian Network Classifiers,” vol. 39, no. 2,
Metrics Suites to Predict Fault-Proneness of Object-Oriented pp. 237–257, 2013.
Classes Developed Using Highly Iterative or Agile Software
[18] A. Okutan and O. Taner, “Software defect prediction using
Development Processes,” vol. 33, no. 6, pp. 402–419, 2007.
Bayesian networks,” no. 2, pp. 154–181, 2014.
[5] G. Boetticher, T. Menzies, T. Ostrand, PROMISE
[19] Kumar, L., Rath, S., & Sureka, A. (2017). Using Source
Repository of Empirical Software Engineering Data, West
Code Metrics and Ensemble Methods for Fault Proneness
Virginia University, Department of Computer Science,
Prediction. arXiv preprint arXiv:1704.04383.
2007.
[20] C. W. Yohannese and T. Li, “A Combined-Learning Based
[6] Fenton, N. E., & Pfleeger, S. L. (2009). Software metrics: a
Framework for Improved Software Fault Prediction,” vol.
rigorous and practical approach. Methods, 5, 26.
10, pp. 647–662, 2017.
[7] K. K. Aggarwal, Y. Singh, A. Kaur, and R. Malhotra,
[21] S. Gupta and D. L. Gupta, “Fault Prediction using Metric
“Investigating effect of design metrics on fault proneness in
Threshold Value of Object Oriented Systems,” vol. 7, no. 6,
object-oriented systems,” J. Object Technol., vol. 6, no. 10,
pp. 13629–13643, 2017.
pp. 127–141, 2007.
[22] E. Erturk and E. Akcapinar, “Iterative software fault
[8] Moser, R., Pedrycz, W., & Succi, G. (2008, May). A
prediction with a hybrid approach,” Appl. Soft Comput. J.,
comparative analysis of the efficiency of change metrics and
vol. 49, pp. 1020–1033, 2016.
static code attributes for defect prediction. In 2008
ACM/IEEE 30th International Conference on Software [23] F. Alighardashi, M. Ali, and Z. Chahooki, “The
Engineering (pp. 181-190). IEEE. Effectiveness of the Fused Weighted Filter Feature Selection
Method to Improve Software Fault Prediction,” no. 8, 2016.
[9] Nagappan, N., & Ball, T. (2005, May). Use of relative code
churn measures to predict system defect density. In [24] X. Yang, D. Lo, X. Xia, Y. Zhang, and J. Sun, “Deep
Proceedings. 27th International Conference on Software Learning for Just-In-Time Defect Prediction,” no. 1.

23

You might also like