You are on page 1of 24

Empirical Evaluation of

Defect Projection Models


for Widely-deployed
Production Software Systems
FSE 2004
Paul Li, Mary Shaw, Jim Herbsleb Bonnie Ray, P. Santhanam
Institute for Software Research Intl., Center for Software Engineering
School of Computer Science IBM T.J. Watson Research Center
Carnegie Mellon University Hawthorne, NY 10532
Pittsburgh, PA 15213
Overview one
 Defect occurrences are problems
 Methods that deal with the economic
consequences require accurate defect
occurrence rate projections
 Defect occurrence rate projection for
widely-deployed production software
systems has novel problems
Overview two
 We have two empirical results that can
help defect occurrence rate
projections:
 Part 1: The Weibull model is better than other
previous published models
 Part 2: Naïve parameter extrapolation methods
that do not consider changes in characteristics
are inadequate
The real world problem
 Methods to deal with the economic
consequences:
 Maintenance resource allocation
 Service contracts
 Software insurance
 All require accurate defect occurrence
rate projections:
 Projections of the rate of user reported defect
occurrences after a software release becomes
available for each distinct release
The context
 Widely-deployed production software
systems:
 Many software and hardware configurations in use
 Unknown deployment and usage patterns
 Constrained development process
 Evolving contents over time over multiple releases
The research problem
 How do you predict the rate of defect
occurrences?
Now
Release N
Release N-2 Release N-1
Defect occurrences

Release N+1

Months
The research questions
 Is therethis
Given a model
model,that
howdescribes
can we predict
the
model parameters
defect occurrence pattern?
for the next release?
Now

?
Defect occurrences

Months
The research approach
 In the context of widely-deployed
production software:
 Perform analysis to develop hypotheses
concerning models/methods
 Use real world data to empirically test
hypotheses
The data
 User-reported defects in 22 releases of
four widely-deployed productions
software systems:
 8 releases of a commercial operating system
 3 releases of a commercial middleware system
 8 releases of an open source operating system
(OpenBSD)
 3 releases of an open source middleware system
(Jakarta Tomcat)
Relation to prior work
 Software reliability modeling and software
certification:
 Assume software and hardware configurations and
deployment and usage patterns are known
 Total number of defects prediction and
defect prone module identification:
 Produce results that are insufficient for maintenance
planning and software insurance
 No work on projecting defect occurrence
rates for open source software systems
Part 1: which model to use?

Now
?
Defect occurrences

Months
Previously published models
Total number Increasing component, Decreasing component,
of defect occurrences dominates when t is small dominates when t is large
Model type Model shape Model form

Exponential
Goel & Okumoto [1979] λ(t) = N α e – α t

Weibull βt α
Schick-Wolverton [1978] λ(t) = N α β t α-1 e –

Gamma
Yamada, Ohba, & λ(t) = N β α t α-1 e – βt
Osaki [1983]

Power
Duane [1964] λ(t) = α β e – β t

Logarithmic
Musa-Okumoto [1975] λ(t) = α (α β t +1) – 1
Model comparison
Defect occurrences

Model AIC Score

Exponential model 110

Power model 113

Logarithmic model 112

Months Gamma model 90

Weibull model 83
Conclusion: Weibull is better
 Has the best AIC score in 73% of the
releases
 Is within the 95% C.I. of the best AIC
score in 95% of the releases
 Is good despite differences in the type of
system, style of development, and the
kind of data
Part 2 : How to extrapolate
model parameters?
α
Weibull = N α β t α-1 e – β t
Now
Defect occurrences

Months
Parameter extrapolation methods

Tomcat 3.3 Tomcat 4.0


β: β:
15.4439 16.8946
.5 .5 .41 .59
Moving averages (2 releases) Exponential smoothing (2 releases)
estimate of Tomcat 4.1, β : estimate of Tomcat 4.1, β :
16.16925 16.29725

 No consideration of similarities and


differences in characteristics between
historical releases and current release.
Extrapolation process
α=2.79 α=2.28 α=2.51 N known
β=6.83 β=4.66 β=5.69 α projected
Now β projected

previous
Defect occurrences

projected

projection
uninformed
difference
guess
baseline difference

actual

t1 t2
Months
Defect projection evaluation
Releases/ one two three four five six seven
System release releases releases releases releases releases releases

Open source
1.06 0.70          
OS R2.8
Open source
1.32 0.93 1.04        
OS R2.9
Open source
0.87 0.42 0.43 0.44      
OS R3.0
Open source
0.72 0.70 0.73 0.71 0.73    
OS R3.1
Open source
0.76 0.91 0.87 0.99 0.97 1.02  
OS R3.2
Open source
1.56 1.10 0.85 0.86 0.66 0.66 0.57
OS R3.3

Theil statistics for forecasting experiments using moving averages method


Conclusion: Naïve methods
are inadequate
 In 50% of forecasting experiments, more
information did not improve projections
 In 44% of forecasting experiments, Theil
statistics are greater than or equal to 1
 Methods that consider changes in
characteristics of widely-deployed
production software systems should be
considered
Summary
 Results
 Weibull model is the preferred model:
 May allow us to quantify effects of changes in
characteristics by examining changes in parameter values
 Naïve parameter extrapolation methods are
inadequate:
 Motivates further work to capture and account for changes
in characteristics to improve projections
 Accurate defect occurrence rate
projections may aid better planning and
may enabled software insurance
The end

Questions, suggestions, comments


Email: Paul.Li@cs.cmu.edu
The AIC model selection
criterion
Number of observations Number of model parameters

AIC = n log σ2 + 2 |S|

Variance Bias
Residual standard error
 Compares model fits with different number
of parameters.
 Accounts for variance and bias.
 Follows a ~ X2 (Chi-squared) distribution.
 4 ~ 95% Confidence Interval.
The Theil forecasting statistic
Historical releases: A2
P
Theil forecasting statistic:
A1 P2 √ (Σ(Actual – Predicted)2)
√( Σ(Actual)2)

Parameter P2 Actual = (A2-A1)


extrapolation Predicted = (P2-A1)
method
Special cases:

Perfect forecast: P2 = A2
(Actual – Predicted) = ((A2-A1) – (P2-A1)) = ((A2-A1) – (P2-A1))
= ((A2-A1) – (A2-A1)) = 0 → Theil statistic of 0
Uninformed forecast: P2 = A1
(Actual – Predicted) = ((A2-A1) – (P2-A1)) = ((A2-A1) – (A1-A1))
= ((A2-A1) – 0) = ((A2-A1) – 0) = Actual → Theil statistic of 1
Current release:

You might also like