You are on page 1of 7

International Conference on Computing, Communication and Automation (ICCCA2015)

Analysis of Approach for Predicting Software


Defect Density using Static Metrics
Neeraj Mandhan Dinesh Kumar Verma Shishir Kumar
Department of Computer Science Department of Computer Science Department of Computer Science
& Engineering & Engineering & Engineering
Jaypee University of Engineering Jaypee University of Engineering Jaypee University of Engineering
& Technology, Guna (M.P.) India & Technology, Guna (M.P.) India & Technology, Guna (M.P.) India
neerajmandhan5@gmail.com dinesh.hpp@gmail.com dr.shishir@yahoo.com

Abstract — now a day’s software development is growing regression method has been used to predict the defect
rapidly. Due to this, there is also a rapid growth in the density with the metrics jointly.
number of occurrences of defects. In this paper, defect
density had been predicted using the Linear Regression
II. RELATED WORK
Method and had been applied to Static Metrics. It helps to
determine that to which module more reliability techniques Software fault prediction generally uses previous
should be applied. Static metric is used for prediction of software metrics and fault to predict fault prone modules
defects which requires extraction of abstract information for the next release of the software [1][2]. For the
from the code. In this paper, the relationship has been prediction of fault, software metrics are used as an
established between the static metrics with defect density independent variable and defect data are used as a
individually and jointly. This relationship is used to predict dependent variable.
the number of defects. Simple and multiple linear regression
statistical methods have been used for the analysis. The Olague. et.al [14] investigated three software metrics
results reveal that which static metric is more useful in in Rhino project’s for fault prediction. They reported that
prediction of defect density and which metric is less useful CK metric are very useful for fault prediction.
and will also see that which metric has positive correlation or Furthermore, WMC (weighted methods) and RFC
negative correlation with defects. (response for children) metrics from CK metrics are more
useful.
Keywords— Defect Density, Static Metrics, Simple Linear
regression, Multiple Linear Regression. Ostrand. Et.al [15] predicted the fault with a negative
binomial regression model with the help of static metrics
I. INTRODUCTION like file size, programming language, a number of
changes on file and file status. They concluded that their
Software Products are getting increased day by day proposed model, i.e. negative binomial regression model
covering almost each and every field like nuclear power is very useful according to the accuracy parameter.
plant control, infrastructure, mobile applications, PC
applications, etc. With the increased software products, Hassan. Et.al [11] identified top ten fault-prone
the complexities of the software products have increased components on six open source project using static
[19]. Therefore, the need has grown to solve the problems metrics like change frequency, size metrics. They
of this industry, i.e. to complete projects on time, within proposed some techniques such as most frequently
budgets with less error. Defect Prediction is generally modified (MFM), most recently modified (MRM), most
used to guide software testing [19]. The prediction of frequently fixed (MFF), and most recently fixed (MRF).
defects in any software helps to maintain the quality for MFM and MFF were more successful than the other
the further version of that software. The metrics used to methods.
predict the defects is Static Metrics. Chidamber and Denaro [4] used antenna configuration software to
Kemerer [3] define the object oriented static metrics of the estimate software fault-proneness by using logistic
software. In this paper metrics like lines of codes, regression model. He gave the idea that there is a
coupling, cohesion, lines of comments, responses, correlation between the static metrics and fault proneness.
weighted methods, depth and number of defects have been
taken. These metrics have been taken from public data Denaro. Et.al [5] in other research of them used
sets. In this paper, the relationship between different logistic regression with method level metrics on antenna
metrics with the number of defects, i.e. defect density has configuration system. They showed that logistic
been established. This relationship is generally used to regression with cross- validation is an effective approach
predict the defect density with different metrics which for software fault prediction.
eventually helps the developer to maintain the quality of Tomaszewksi. Et.al [20] applied regression model on a
next version of the software. In this paper, first the large telecommunication system. They showed that
normality test has been conducted to check whether the models built after the system is implemented are 34%
data is normalized or not. Then, the prediction of defects more accurate than modules built before the system is
with individual metrics and then the prediction with all implemented.
metrics have been discussed. In this proposed approach,
the linear regression method has been used to predict the Gyimothy. Et.al [10] predicted the fault proneness by
defect density with the different metrics and multiple using Linear regression on the Mozilla Open source
project. They showed that coupling metric is very useful

ISBN:978-1-4799-8890-7/15/$31.00 ©2015 IEEE 880

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY KURUKSHETRA. Downloaded on May 05,2020 at 18:29:20 UTC from IEEE Xplore. Restrictions apply.
International Conference on Computing, Communication and Automation (ICCCA2015)
for fault prediction. Multivariate models are more useful. project statistics page for analysis of this study. Total
Number of children should not be used. number of defects are the numbers that have been
collected at the time of data collection.
In other research on static metrics, Emam. Et.al [8] did
the fault prediction by using logistic regression on a
commercial Java application. Two metrics from IV. RESEARCH METHODOLOGY
Chidamber-kemerer suite were taken. According to their The linear regression statistical method has been used
study, coupling is more useful than the depth metric. to predict the defect density with different static metrics.
Dinesh Verma. Et.al [6] formulated the model for For linear regression method SPSS tool [18] has been
optimization of overall defect density using the used. As it has been mentioned in section 1, simple linear
distribution of module sizes. Their study shows that the regression method has been used to predict the defect
smaller modules can be effective to minimize the overall density with different static metrics and multiple
defect density. regression method has been used to predict the defect
density with the static metrics jointly.
III. STATIC METRICS Before predicting the defect density, normality test
[16] has been performed with the help of SPSS tool [18].
In the proposed approach, static metrics are generally
The data collected from the NASA PROMISE datasets
used to predict the defect density. Static metrics are [12], could not clear the normality test. So, that’s why the
generally preferred for the prediction of defects as they
logarithmic value of each metric has been taken and
are directly extracted from the source code [1]. In the checked it again for the normality test.
proposed approach, eight static metrics have been taken
from NASA PROMISE data sets [12] to establish the After successfully performing the normality test,
relationship between the different metrics and the defect simple regression method and multiple regression
count. The conceptual definition of each metric is as methods have been performed to see the prediction with
follows: individual and joint static metrics respectively.
a. Lines of Code: Size measures of software have In this method, the dependent variable will be the
direct application to the planning, tracking, and estimating defects and the independent variable will be the different
the software projects [17]. A line of code generally helps static metric individually and jointly.
to measure the size of the project. In this paper, it has
been shown that how lines of code impact the number of Through this, it has been shown that changes in
defects in any project. dependent variable, i.e. defects due to independent
variable, i.e. different static metrics with the help of an
b. Cohesion: Cohesion refers to the degree to which IBM SPSS tool [18].
the elements of a module belong together [21]. Thus, it
generally helps to measure how strongly one functionality V. EXPERIMENTAL RESULTS
of the software is paired with other functionality of the
software. In the NASA PROMISE data sets [12] there are
around twenty different metrics to show the data, but used
c. Weighted Methods: This metric is equal to the only eight metrics, i.e. Coupling, Depth, Cohesion,
number of methods in a class [3]. Numbers of methods in Response, Lines of Comments, Weighted Methods, Lines
a class are directly proportional to the complexity of a of Codes and no. of defects.
class. In the results, will see the impact of weighted
methods in defect prediction. The following data set is shown in Fig 5.1

d. Depth: This metric is defined as the maximum


depth of the inheritance graph of each class [3].
e. Coupling: A class is considered to be coupled to
another class if it uses and/or instance variables of that
other class. CBO counts the number of couplings between
classes [3].
f. Response: This metric tells the number of
methods that can be executed for a message received by
an object of a class [3].
g. Lines of Comment: It is a programming language
construct [9]. It increases the readability of the source
code of a computer program. These comments are of great
importance, of programmers, but are generally ignored by
all interpreters and compilers. The rules for writing the
comments may vary according to the developer.
Fig 5.1 NASA KC1 Promise Datasets
h. Defects: According to the IEEE standard
classification of software anomalies [13], defects are As discussed in the previous section, check whether the
imperfection or deficiency in a work product that work metrics are normalized or not with the help of an IBM
product does not meet its requirement or specifications SPSS tool [18]. If the datasets don’t clear the normality
and needs to be either repaired or replaced. The total test, then take the logarithmic value of each metric and
number of software defects has been taken from the again apply the normality test with the SPSS tool. The
881

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY KURUKSHETRA. Downloaded on May 05,2020 at 18:29:20 UTC from IEEE Xplore. Restrictions apply.
International Conference on Computing, Communication and Automation (ICCCA2015)
normality tests are supplementary to the graphical
assessment of normality [7].
The result of normality tests for coupling has been
given in fig. 5.2

Figure 5.2 (a) Normality Test descriptive for Coupling,


(b) Normality Graph for Coupling
Figure 5.4 (a) Normality Test descriptive for response (b)
Figure 5.2 (a) indicates the descriptive result of Normality Graph for response.
Normality. In this result the skewness statistic value is
.377 and standard Error is 0.201. The result of normality tests for Lines of Comments
shown in Figure 5.5
Skewness = (Standard Error/Statistics)
The value of skewness should lie in between -1.96 to 1.96
[7]. Figure 5.2 (b) indicates the graph of normal
distribution of the Coupling, most of the points are
conceding on the line indicates the normal distribution of
Coupling.
The result of normality tests for Weighted Methods shown
in Figure 5.3

Figure 5.5 (a) Normality Test descriptive for Lines of


Comments (b) Normality Graph for Lines of Comments.
The result of normality tests for Depth shown in Figure
5.6

Figure 5.3 (a) Normality Test descriptive for Weighted


Methods (b) Normality Graph for Weighted Methods.
The result of normality tests for Response shown in
Figure 5.4

882

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY KURUKSHETRA. Downloaded on May 05,2020 at 18:29:20 UTC from IEEE Xplore. Restrictions apply.
International Conference on Computing, Communication and Automation (ICCCA2015)

Figure 5.8 (a) Normality Test descriptive for Defects (b)


Normality Graph for Defects.
The result of normality tests for Cohesion shown in Figure
5.9

Figure 5.6 (a) Normality Test descriptive for Depth (b)


Normality Graph for Depth.
The result of normality tests for Lines of Codes shown in
Figure 5.7

Figure 5.9 (a) Normality Test Cohesion (b) Normality


Graph for Cohesion
The regression tool in SPSS provides the result for linear
regression test in terms of R-squared value, significance
of the observed regression line (p-value). P- Value is the
significance level that is used to accept or reject the
hypothesis [7]. P-value defined as a probability of error
that involved in accepting our observed result as a valid
result. The R-squared value represents the percentage
extent of variation in the dependent variable presented by
the independent variable [7]. For example, in a linear
regression result, the p-value 0.03 indicates that, accepting
variability of dependent variable by independent variable,
only 3% of the population does not valid. The R-squared
value 0.16 indicates that 16% of variability presented by
Figure 5.7 (a) Normality Test descriptive for Lines of the independent variable.
Codes (b) Normality Graph for Lines of Codes.
Defect Prediction Due to Cohesion can be shown through
The result of normality tests for Defects shown in Figure Figure 5.10
5.8

Fig 5.10 (a) Defect Prediction due to Cohesion (b)


Correlation of Defects with Cohesion

883

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY KURUKSHETRA. Downloaded on May 05,2020 at 18:29:20 UTC from IEEE Xplore. Restrictions apply.
International Conference on Computing, Communication and Automation (ICCCA2015)
According to the Figure 5.10 (a), the R-Squared value
indicates that 1.6% of variability of defect density
predicted by the Cohesion. It also indicates that some
other factors predict the remaining 98.4% of variability.
From the Figure 5.10 (b), the positive value of B for
Cohesion indicated that defect density has the positive
relation with Cohesion. So when the cohesion will
increase the defect density will also be increased.
Defect Prediction Due to Coupling can be shown through
Figure 5.11
Fig 5.13 (a) Defect Prediction due to Depth (b)
Correlation of Defects with Depth
According to the Figure 5.13 (a), the R-Squared value
indicates that 0.1% of variability of defect density
predicted by the Depth. It also indicates that some other
factors predict the remaining 99.9% of variability. From
the Figure 5.13 (b), the positive value of B for Depth
indicated that defect density has the positive relation with
Depth. So when the Depth will increase the defect density
will also be increased.
Defect Prediction Due to the lines of Codes can be shown
through Figure 5.14

Fig 5.11 (a) Defect Prediction due to Coupling (b)


Correlation of Defects with Coupling
According to the Figure 5.11 (a), the R-Squared value
indicates that 13.4% of variability of defect density
predicted by the Coupling. It also indicates that some
other factors predict the remaining 86.6% of variability.
From the Figure 5.11 (b), the positive value of B for
Cohesion indicated that defect density has the positive
relation with Coupling. So when the coupling will Fig 5.14 (a) Fault Prediction due to Lines of Codes (b)
increase the defect density will also be increased. Correlation of Defects with Lines of Codes

Fault Prediction Due to Comments can be shown through According to the Figure 5.14 (a), the R-Squared value
Figure 5.12 indicates that 47.6% of variability of defect density
predicted by the Lines of Codes. It also indicates that
some other factors predict the remaining 52.4% of
variability. From the Figure 5.14 (b), the positive value of
B for Lines of Codes indicated that defect density has the
positive relation with Lines of Codes. So when the Lines
of Codes will increase the defect density will also be
increased.
Defect Prediction Due to Response can be shown through
Figure5.15

Fig 5.12 (a) Defect Prediction due to Comments (b)


Correlation of Defects with Comments
According to the Figure 5.12 (a), the R-Squared value
indicates that 16.9% of variability of defect density
predicted by the Comments. It also indicates that some
other factors predict the remaining 83.1% of variability.
From the Figure 5.12 (b), the positive value of B for
Comments indicated that defect density has the positive
relation with Comments. So when the Comments will Fig 5.15 (a) Fault Prediction due to Response (b)
increase the defect density will also be increased. Correlation of Defects with Response
Defect Prediction Due to Depth can be shown through According to the Figure 5.15 (a), the R-Squared value
Figure 5.13 indicates that 7.4% of variability of defect density
predicted by the Response. It also indicates that some
other factors predict the remaining 92.6% of variability.
From the Figure 5.15 (b), the positive value of B for
Response indicated that defect density has the positive

884

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY KURUKSHETRA. Downloaded on May 05,2020 at 18:29:20 UTC from IEEE Xplore. Restrictions apply.
International Conference on Computing, Communication and Automation (ICCCA2015)
relation with Response. So when the Response will analysis using an IBM SPSS tool. The result shows that
increase the defect density will also be increased. there is a significant level of acceptance for prediction of
defect density with these static metrics individually and
Defect Prediction Due to weighted methods can be shown
jointly. Further, these datasets are class level datasets and
through Figure 5.16 should see whether the result would be similar in the case
of method level datasets and also try to find out the effort
needed to extract all these static metrics and their impact
on the prediction.
REFERENCES
[1] Cagatay Catal, A systematic review of software fault prediction
studies, Elsevier, 2008 G. Eason, B. Noble, and I.N. Sneddon, “On
certain integrals of Lipschitz-Hankel type involving products of
Bessel functions,” Phil. Trans. Roy. Soc. London, vol. A247, pp.
529-551, April 1955.
Fig 5.16 (a) Fault Prediction due to Weighted Methods
[2] Cagatay Catal, Software fault prediction: A literature review and
(b) Correlation of Defects with Weighted Methods current trends, Elsevier, 2010
According to the Figure 5.16 (a), the R-Squared value [3] Chidamber, S. and C. Kemerer, A metrics suite for object oriented
design", Software Engineering, IEEE Transactions on, Vol. 20, No.
indicates that 21.5% of variability of defect density
6, pp. 476 493, Jun 1994,
predicted by the Weighted Methods. It also indicates that http://ieeexplore.ieee.org/search/srchabstract.jsp?
some other factors predict the remaining 78.5% of [4] Denaro, G. (2000). Estimating software fault-proneness for tuning
variability. From the Figure 5.16 (b), the positive value of testing activities. In Twenty-second international conference on
B for Weighted Methods indicated that defect density has software engineering (pp. 704–706). New York, NY: ACM.
the positive relation to Weighted Methods. So when the [5] Denaro, G., Pezzè, M., & Morasca, S. (2003). Towards industrially
Weighted Methods will increase the defect density will relevant fault proneness models. International Journal of Software
also be increased. Engineering and Knowledge Engineering, 13(4), 395–417.
[6] Dinesh Verma, Shishir Kumar (2014). An Improved Approach for
Defect Prediction due to all static Metrics jointly can be Reduction of Defect Density Using Optimal Module Sizes.
shown through figure 5.17 Hindwai Publishing Corporation, Advances in Software
Engineering, Volume 2014, Article ID 803530.
[7] Elliot AC, Woodward WA, Statistical analysis quick reference
guidebook with SPSS examples. 1st ed. London: Sage
Publications,2007.
[8] Emam, K. E., Melo, W., & Machado, J. C. (2001). The prediction
of faulty classes using object-oriented design metrics. Journal of
Systems and Software, 56(1), 63–75.
[9] Ganguli, Madhushree (2002). Making Use of Jsp. New York:
Wiley. ISBN 0-471-21974-6
[10] Gyimothy, T., Ferenc, R., & Siket, I. (2005). Empirical validation
of object-oriented metrics on open source software for fault
prediction. IEEE Transactions on Software Engineering, 31(10),
Fig 5.17 (a) Defect Prediction due to all metric (b) 897–910.
Correlation of Defects with all metrics [11] Hassan, A. E., & Holt, R. C. (2005). The top ten list: Dynamic fault
Figure 5.17 (a) indicates that all the metrics jointly predict prediction. In Twenty-first IEEE international conference on
software maintenance (pp. 263–272). Budapest, Hungary: IEEE
53.3% variability of defect density. From the figure 5.17 Computer Society.
(c), the linear relation between these static metrics and [12] http://tunedit.org/repo/PROMISE/DefectPrediction
defect density is given by
[13] IEEE std. 1044-2009.: IEEE Standards Classification for Software
Defect Density = 2.72 +.110 (Coupling) – 2.29 Anamolies Available at
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5399061
(Depth) + 0.000 (Cohesion) + 0.031(Response) -.051 . (2010)
(Weighted Methods) - .063 (Comments) +.021 (LOC)
[14] Olague, H. M., Gholston, S., & Quattlebaum, S. (2007). Empirical
……………. (1) validation of three software metrics suites to predict fault-
proneness of object-oriented classes developed using highly
Equation 1 shows the cohesion plays a small or negligible iterative or agile software development processes. IEEE
impact to predict the variation in defect density. Transactions on Software Engineering, 33(6), 402–419.
According to the Figure 5.17 (a) the R-squared value [15] Ostrand, T. J., Weyuker, E. J., & Bell, R. M. (2007). Automating
algorithms for the identification of fault-prone files. In
indicates that 53.3% variability of defect density predicted International symposium on software testing and analysis, London,
by the all the five metrics jointly, and the relationship of United Kingdom (pp. 219–227).
all these five metrics with defect density is indicated in [16] Poole A. Michel et al., The assumptions of the linear regression
equation (1). model. Inst. Brit. Geogr., Trans., No. 52, p. 145–158. (1971)
[17] Robert E. Park.: Software Size Measurement: A framework for
Counting Source Statements. Technical Report CMU/SEI-92-TR-
VI. CONCLUSION AND FUTURE WORK 020. Available at www.sei.cmu.edu/reports/92tr020.pdf.
In this work, a relationship of defect density with seven [18] SPSS. Available at http://www.spss.com/statistics. IBM SPSS
different static metrics using NASA KC1 promise datasets Statistics Version 20 64-bit.
has been established. Seven static metrics namely [19] The Standish Group Report: Chaos", 1995, available
coupling, depth, cohesion, response, weighted methods, fromhttp://www4.in.tum.de/lehre/vorlesungen/vse/WS2004/1995_
Standish_Chaos.pdf.
comments and Lines of code have been used. Simple and
multiple regression methods have been used for the [20] Tomaszewski, P., Lundberg, L., & Grahn, H. (2005). The accuracy
of early fault prediction in modified code. In Fifth conference on
885

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY KURUKSHETRA. Downloaded on May 05,2020 at 18:29:20 UTC from IEEE Xplore. Restrictions apply.
International Conference on Computing, Communication and Automation (ICCCA2015)
software engineering research and practice in Sweden, Västerås, [21] Yourdon, Edward; Constantine, Larry L. (1979) [1975]. Structured
Sweden (pp. 57–63). Design: Fundamentals of a Discipline of Computer Program and
Systems Design. Yourdon Press. ISBN 0-13-854471-9

886

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY KURUKSHETRA. Downloaded on May 05,2020 at 18:29:20 UTC from IEEE Xplore. Restrictions apply.

You might also like