You are on page 1of 4

Editor: Tore

Editor: Dybå
Editor Name
SINTEF
affiliation

VOICE OF EVIDENCE tore.dyba@sintef.no


email@email.com

Editor:Helen
Editor: EditorSharp
Name
The Open University, London
affiliation
h.c.sharp@open.ac.uk
email@email.com

Developing Fault-
Prediction Models:
What the Research Can Show Industry
Tracy Hall, Sarah Beecham, David Bowes, David Gray, and Steve Counsell

CODE FAULTS ARE a daily reality develop the models, they must under- Although it would be nice if com-
for software development companies. stand questions such as which metrics panies could simply select one of these
Finding and removing them costs the to include, which modeling techniques published models and use it in their
industry billions of dollars each year. perform best, and how context affects own environment, the evidence sug-
The lure of potential cost savings fault prediction. gests that fault-prediction models
and quality improvements has moti- To answer these questions, we con- perform less well when transferred to
vated considerable research on fault- ducted a systematic review of the pub- different contexts. 2 This means that
prediction models, which try to iden- lished studies of fault prediction in practitioners must understand how ex-
tify areas of code where faults are code from the beginning of 2000 to isting models might or might not relate
most likely to lurk. Developers can the end of 2010.1 On the basis of this to their own model development. In
then focus their testing efforts on review and subsequent analysis of 206 particular, they must understand the
these areas. Effectively focused test- models, we present key features of suc- specific context in which an existing
ing should reduce the overall cost of cessful models here. model was developed, so they can base
finding faults, eliminate them earlier, their own models on those that are
and improve the delivered system’s Context Is Key contextually compatible.
quality. Problem solved. We found 208 studies published on However, we found three challenges
to this apparently simple requirement.
First, many studies presented insuffi-
cient information about the develop-
Analysis of 206 fault-prediction models ment context. Second, most studies re-
ported models built using open source
reported in 19 mature research studies data, which can limit their compatibil-
reveals key features to help industry ity with commercial systems. Third, it
developers build models suitable remains unclear which context vari-
ables (application domain, program-
to their specific contexts. ming language, size, system maturity,
and so on) are tied to a fault-prediction
model’s performance.

So why do so few companies seem to predicting code faults over the 11-year Establishing Confidence
be developing fault-prediction models? period covered in our review. These in Existing Models
One reason is probably the sheer studies contain many hundreds of indi- Practitioners need a basic level of con-
number and complexity of studies in vidual models, whose construction var- fidence in an existing model’s perfor-
this field. Before companies can start to ies considerably. mance. Such confidence is based on

96 I E E E S O F T W A R E | P U B L I S H E D B Y T H E I E E E C O M P U T E R S O C I E T Y 0 74 0 -74 5 9 / 11 / $ 2 6 . 0 0 © 2 0 11 I E E E
understanding how well the model has
Phase One: Prediction Criteria
been constructed, its development con-
text, and its performance relative to Is a prediction model reported?
other models.
Is the prediction model tested on unseen data?
Our review suggests that few of the
published models provide sufficient Phase Two: Context Criteria
information to adequately support
Is the source of data reported?
this understanding. We developed a
set of criteria based on research (for Is the maturity of data reported?
example, see Kai Petersen and Claes
Is the size of data reported?
Wohlin 3), to assess whether a fault-
prediction study reports the basic in- Is the application domain of data reported?
formation to support confidence in Is the programming language of data reported?
a model. Figure 1 shows a checklist
based on these criteria (details are Phase Three: Model Criteria
available elsewhere 1). Are the independent variables clearly reported?
When we applied these criteria to
the 208 studies we reviewed, only 36 Is the dependent variable clearly reported?
passed all criteria. Each of these 36 Is the granularity of the dependent variable reported?
studies presents clear models and pro-
vides the information necessary to Is the modeling technique used reported?
understand the model’s relevance to a Phase Four: Data Criteria
particular context.
We quantitatively analyzed the per- Is the fault data acquisition process described?
formance of the models presented in Is the independent variables data acquisition process described?
19 of the 36 studies. These 19 studies
all report categorical predictions, such Is the faulty/nonfaulty balance of data reported?
as whether a code unit was likely to be
faulty or not (see the sidebar). Such pre- FIGURE 1. Checklist of criteria for establishing confidence in a model. Only 36 of 208 studies
dictions use performance measures that from the systematic literature review met all criteria.
usually stem from a confusion matrix
(see Figure 2). This matrix supports a
performance comparison across cate-
Number of code units
gorical studies.
We omitted 13 studies from further
analysis because they reported contin- Predicted faulty Predicted not faulty
uous predictions, such as how many
Actually faulty True positive (TP) False negative (FN)
faults are likely to occur in each code
unit. Such studies employ a wide va- Actually not faulty False positive (FP) True negative (TN)
riety of performance measures that
are difficult to compare. Likewise, we
omitted four categorical studies that re- FIGURE 2. Confusion matrix. The two columns and two rows capture the ways in which a
ported performance data based on the prediction can be correct or incorrect.
area under a curve.
The 19 studies reporting categorical
predictions contained 206 individual were faulty; For studies that didn’t report these
models. For each model, we extracted • recall = TP/(TP + FN): proportion of measures, we recomputed them from the
performance data for faulty units correctly classified; and available confusion-matrix-based data.
• f-measure = (2 × recall × precision)/ This let us compare the performance of
• precision = TP/(TP + FP): propor- (recall + precision): the harmonic all 206 models across the 19 studies and
tion of units predicted as faulty that mean of precision and recall. to draw quantitative conclusions.

N OV E M B E R / D E C E M B E R 2 0 11 | IEEE S O F T W A R E  97
VOICE OF EVIDENCE

using LOC metrics performed surpris-


ingly competitively.
FAULT-PREDICTION MODEL ELEMENTS

O
There are three essential elements to a fault-prediction model.
ur systematic literature re-
PREDICTOR VARIABLES view suggests fi rst that suc-
These independent variables are usually metrics based on software artifacts, such cessful fault-prediction mod-
as static code or change data. Models have used a variety of metrics—from simple els are built or optimized to specific
LOC metrics to complex combinations of static-code features, previous fault data, contexts. The 36 mature studies we
and information about developers. identified can support this task with
clearly defi ned models that include de-
OUTPUT VARIABLES velopment contexts and methodolo-
A model’s output, or independent variable, is a prediction of fault proneness in terms gies. Our quantitative analysis of the
of faulty versus nonfaulty code units. This output typically takes the form of either 19 categorical studies from this set
categorical or continuous output variables. further suggests that successful mod-
Categorical outputs predict code units as either faulty or nonfaulty. Continuous els are based on both simple modeling
outputs usually predict the number of faults in a code unit. Predictions can address techniques and a wide combination of
varying units of code—from high-granularity units, such as plug-in level, to low- metrics. Practitioners can use these re-
granularity units, such as method level. sults in developing their own fault-pre-
diction models.
MODELING TECHNIQUES
Model developers can use one or more techniques to explore the relationship
between the predictor (or independent) variables and the output (or dependent) Acknowledgments
variables. Among the many available techniques are statistically based regression We thank the UK’s Engineering and Physical
Science Research Council, which support-
techniques and machine-learning techniques such as support vector machines. ed this research at Brunel University under
Ian Witten and Eibe Frank provide an excellent guide to using machine-learning grant EPSRC EP/E063039/1, and the Science
techniques.1 Foundation Ireland, which partially sup-
ported this work at Lero under grant 3/CE2/
I303_1. We are also grateful to Sue Black and
Reference Paul Wernick for providing input to the early
1. I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, Morgan stages of the work reported.
Kaufmann, 2005.

References
1. T. Hall et al., “A Systematic Review of Fault
Prediction Performance in Software Engineer-
ing,” accepted for publication in IEEE Trans.
Software Eng.; preprint available at http://
What Works in Existing Models Additionally, the models that per- bura.brunel.ac.uk/handle/2438/5743.
When compared with each other, most formed relatively well tended to com- 2. N. Nagappan, T. Ball, and A. Zeller, “Mining
of the 206 models peaked at about 70 bine a wide range of metrics, typically Metrics to Predict Component Failures,” Proc.
28th Int’l Conf. Software Eng., (ICSE 06),
percent recall—that is, they correctly including metrics based on the source ACM Press, 2006, pp. 452–461.
predict about 70 percent of actual real code, change data, and data about de- 3. K. Petersen and C. Wohlin, “Context in
faults. Some models performed consid- velopers (see Christian Bird and his Industrial Software Eng. Research,” Proc. 3rd
Int’l Symp. Empirical Software Eng. and Mea-
erably higher (for example, see Shiv- colleagues6). Often, the models per- surement (ESEM 09), IEEE CS Press, 2009,
kumar Shivaji and his colleagues 4), forming best had optimized this set of pp. 401–404.
4. S. Shivaji et al., “Reducing Features to Im-
while others performed considerably metrics (for example, by using Prin- prove Bug Prediction,” Proc. 24th IEEE/ACM
lower (see Erik Arishholm and his cipal Component Analysis or Feature Int’l Conf. Automated Software Eng. (ASE
colleagues5). Selection as in Shivaji4). Models using 09), IEEE CS Press, 2009, pp. 600–604.
5. E. Arisholm, L.C. Briand, and E.B. Johannes-
Models based on techniques such source-code text directly as a predictor sen, “A Systematic and Comprehensive Investi-
as naïve Bayes and logistic regression yielded promising results (see Osamu gation of Methods to Build and Evaluate Fault
seemed to perform best. Such tech- Mizuno and Tohru Kikuno7).Models Prediction Models,” J. Systems and Software,
vol. 83, no. 1, 2010, pp. 2–17.
niques are comparatively easy to under- using static-code or change-based met- 6. C. Bird et al., “Putting It All Together: Us-
stand and simple to use. rics alone performed least well. Models ing Socio-Technical Networks to Predict

98 I E E E S O F T W A R E | W W W. C O M P U T E R . O R G / S O F T W A R E
VOICE OF EVIDENCE

ADVERTISER INFORMATION • NOVEMBER/DECEMBER 2011


Failures,” Proc. 20th Int’l Symp. Software ADVERTISERS PAGE
Reliability Eng. (ISSRE 09), IEEE CS Press, Charter Communications-Business Cover 2
2009, pp. 109–119. Qualcomm 55
7. O. Mizuno and T. Kikuno, “Training on Saturn 2012 8
Errors Experiment to Detect Fault-Prone Seapine Software, Inc. Cover 4
Software Modules by Spam Filter,” Proc. 6th
Joint Meeting European Software Eng. Conf.
and ACM SIGSOFT Symp. Foundations of
Software Eng. (ESEC-FSE 07), ACM Press, Advertising Personnel
2007, pp. 405–414.
Marian Anderson: Sr. Advertising Coordinator
Email: manderson@computer.org; Phone: +1 714 816 2139 | Fax: +1 714 821 4010
Sandy Brown: Sr. Business Development Mgr.
Email: sbrown@computer.org; Phone: +1 714 816 2144 | Fax: +1 714 821 4010
IEEE Computer Society, 10662 Los Vaqueros Circle, Los Alamitos, CA 90720 USA
TRACY HALL is a reader in software engineering www.computer.org
at Brunel University, UK. Contact her at tracy.hall@
Advertising Sales Representatives
brunel.ac.uk.
Central, Northwest, Far East: Eric Kincaid; Email: e.kincaid@computer.org;
SARAH BEECHAM is a research fellow at Lero, the Phone: +1 214 673 3742; Fax: +1 888 886 8599
Irish Software Engineering Research Centre. Contact
Northeast, Midwest, Europe, Middle East: Ann & David Schissler; Email: a.schissler@computer.
her at sarah.beecham@lero.ie.
org, d.schissler@computer.org; Phone: +1 508 394 4026; Fax: +1 508 394 1707
DAVID BOWES is a senior lecturer at the University
of Hertfordshire, UK. Contact him at d.h.bowes@ Southeast: Heather Buonadies, Email: h.bounadies@computer.org;
herts.ac.uk. Phone: +1 973 585 7070; Fax: +1 973 585 7071

DAVID GRAY is a PhD student at the University Southwest: Mike Hughes, Email: mikehughes@computer.org;
of Hertfordshire, UK. Contact him at d.gray@herts. Phone: +1 805 529 6790; Fax: +1 941 966 2590
ac.uk.
Advertising Sales Representative (Classified Line/Jobs Board)
STEVE COUNSELL is a reader in software Heather Buonadies, Email: h.bounadies@computer.org;
engineering at Brunel University, UK. Contact him at Phone: +1 973 585 7070; Fax: +1 973 585 7071
steve.counsell@brunel.ac.uk.

IEEE SOFTWARE CALL FOR PAPERS

Lean Software Development


Submission deadline: 1 February 2012 • Publication: September/October 2012
The lean product development paradigm entails an end-to-end • systems thinking;
focus on delivering to customer needs, minimized rework, efficient • case studies of notable successes or failures;
work streams, empowered teams, and continuous improvement. • empirical studies on adoption and use of lean principles in
We are interested to learn from industry experiences and software engineering; and
academic empirical studies what principles deliver value and how • tool support for lean development.
organizations introduce lean. This issue will emphasize lean issues
that influence software design, development, and management, Questions?
and thus the success or failure of software projects. Our target is For more information about the special issue,
commercial and industry software, and issues of broad interest contact the corresponding guest editor:
across software products and services, embedded software, and • Christof Ebert, Vector Consulting Services;
end-user-developed software. Christof.Ebert@vector.com
We solicit articles in the following areas, among others:
Editorial team: Pekka Abrahamsson,
• managing the transition from traditional development to lean; Christof Ebert, Nilay Oza, Mary Poppendieck
• applying lean to critical (such as safety-critical) environments;
• experiences with combining lean and agile techniques; For full call for papers: www.computer.org/software/cfp5
• lean methods and experiences in commercial software, e.g., For full author guidelines: www.computer.org/software/author.htm
Kanban, value stream analysis, options thinking, queuing For submission details: software@computer.org
theory, and pull systems;

N OV E M B E R / D E C E M B E R 2 0 11 | IEEE S O F T W A R E  99

You might also like