Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
10Activity
0 of .
Results for:
No results containing your search query
P. 1
A Study of Estimation Methods for Defect Estimation

A Study of Estimation Methods for Defect Estimation

Ratings: (0)|Views: 979|Likes:
Published by sumitkalra3962

More info:

Published by: sumitkalra3962 on Nov 29, 2008
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

11/10/2011

pdf

text

original

 
A Study of Estimation Methods for Defect Estimation
Syed Waseem Haider , Jo˜ao W. CangussuDepartment of Computer ScienceThe University of Texas at Dallas
 
waseem,cangussu
 
@utdallas.edu
Abstract
 Accurate defect prediction is useful for planning and management of software testing. We discuss here the statis-tical characteristics and requirements of estimation meth-ods available for developing the defect estimators. We havealso analyzed the models, assumptions and statistical per- formance of three defect estimators available in the field using data from the literature. The study of the estimationmethods here is general enough to be applied to other es-timation problems in software engineering or other related  field.
1. Introduction
The major goal of a software testing process is to findand fix, during debugging, as many defects as possible andrelease a product with a reasonable reliability. A trade-off between releasing a product earlier or investing more timeon testing is always an issue for the organization. The clearviewofthestatusofthetestingprocessiscrucial tocomputethe pros and cons of possible alternatives. Time to achievethe established goal and percentage of the goal achieved upto the moment are important factors to determine the sta-tus of a software testing process. Many defect predictiontechniques [1, 3, 6, 11, 14, 15, 16] have addressed this im-portant problem by estimating the total number of defects,which then provide the status of a testing process in termsof remaining number of defects in a software product. Theavailability of an accurate estimation of the number of de-fects at early stages of the testing process allows for properplanning of resource usage, estimation of completion time,and current status of the process. Also, an estimate of thenumber of defects in the product by the time of release, al-lows the inference of required customer support.Our goal in this paper is to discuss various estimationmethods which are used to develop these defect estimationtechniques. We would discuss the assumptions that eachmethod makes about the data model, probability distribu-tion, and mean and variance of data and estimator. Wewill also discuss the statistical efficiency of the estima-tors developed from these estimation methods. There aretwo broad groups of estimation methods called ClassicalApproach and Bayesian Approach as shown in Figure 1.If there is prior statistical knowledge about the parameterwhichwe are interestedtoestimate, then an estimator basedon Bayesian Approach can be found. Note that the priorknowledge can be in the form of prior mean and varianceand probabilitydistributionof the parameter. There are sev-eral Bayesian estimators available [4, 5, 9]. Main advantageof Bayesian estimators is that they provide better early esti-mates compared toestimatorsdevelopedfromClassical Ap-proach. In the initial phase of software testing not enoughtest data is available, consequently in the absence of priorknowledge initial performance of estimator suffers.Organizationofthepaper isas follows,due tothe brevityof space we will limit the discussion of estimation methodsto Classical Approaches only in Section 2. We discuss inSection 3 three defect estimators available in the field. Weconclude the paper in Section 4.
ClassicalApproachBayesianApproach
NoNoPriorKnoweldgeYesYesPriorKnoweldgeNoYesget more Dataor change ModelOver TimeParameters Varying
Figure 1. Estimation Approaches.
2. Classical Approach
In this Section we discuss which estimation methods areavailable and what are the requirements of each method.
 
MomentsEstimatorLSEBLUEMVU EstimatorMLE
Distribution KnownProbabilityCRLB SatisfiedComplete SufficientStatistic ExistYesYesYesNoYesYesNoEvaluate Method of Moments EstimatorNoMean and Varianceof Noise KnownYesNoLinear ModelNoYesNoMake it UnbiasedYesFind MLENoNoget more Dataor Change Model
Figure 2. Hierarchical classification of classical approaches for estimators.
When the requirements are met, then the issue to be ad-dressed is how to develop an estimator using this method.We will also compare the performance of each method inorder to provide the merits and demerits of the estimatorsbased on these methods. Figure 2 provides a hierarchicalorganization of these methods. Collectively these estima-tion methods as shown in Figure 2 are called Classical Ap-proach [7].During system testing we are interested in finding in ad-vance the total number of defects which will be discoveredwhen the system testing is completed. As pointed out inSection 1 the knowledge of the total number of defects inadvance will help us in scheduling testing tasks, manage-ment of teams and allocation of resources to meet the targetvalue (total number of defects) in assigned time. Let us usethe variable
 
to represent the total number of defects tobe estimated. Estimation techniques need samples or datafrom the ongoing system testing process. Sample can be inthe form of the number of defects found each day or week or any other time unit. Time unit can be either executiontime or calendar time. Sample can also be the total num-ber of defects found by any time instant. In reliabilitymod-els [6, 12, 11, 15] samples are usuallythe number of defectsdiscovered per execution time but models for calendar timehave also been proposed [6, 12]. Authors have proposedthe
 
¿ 
Å 
(Estimation of defects based on Defect DecayModel)[3] model which can be used for bothexecution andcalendar time without any change.These samples must be related to the parameter
 
whichis to be estimated. Generally a data model is developedwhichrelates dataorsamples to
 
. Data model must alsoac-count for random noise or unforeseen perturbations whichnaturally occur in any real life phenomenon. In softwaretesting random behavior can be due to work force reloca-tion, noise in the testing data collection process, testingof “complex or poorly implemented” parts of the system,among others. In order to explain the estimation methodsgiven in Figure 2 we provide a simple general data model.Assume that we takean
ÒØ 
sample
Ü 
 
Ò 
whichcontains theparameter of interest
 
corrupted by random noise
Û 
 
Ò 
asgiven by
Ü 
 
Ò 
 
 
· 
Û 
 
Ò 
(1)Note that in Eq. 1
Ü 
 
Ò 
is linearly related to
 
. The obser-vations of 
 
made at
Æ 
intervals is given by the data set
Ü 
¼
Ü 
½
Ü 
 
Æ 
 
½
. A more generalized linear datamodel is
Ü 
 
 
 
· 
Û 
(2)where
Ü 
Æ 
¢ 
½ 
isthedatavector,
 
Æ 
¢ 
½ 
istheobservationvec-tor,
 
is the parameter of interest, and
Û 
Æ 
¢ 
½ 
is the noisevector. Note that
 
can contain information such as num-ber of testers, failure intensity rate, number of rediscovered
 
faults, etc. for each sample. A linear data model is recom-mended for two reasons: first it is more likely to provide aclosed form solution and second its more efficient as willbe discussed later. The joint probabilitydistributionof datais given by
Ô 
´ 
Ü 
¼
Ü 
½
Ü 
 
Æ 
 
½
 
µ 
or simply in vec-tor form
Ô 
´ 
Ü 
 
µ 
. This PDF (Probability Density Function)is a function of both data
Ü 
and the unknown parameter
 
.For example if unknownparameter
 
changes to
 
½ 
then thevalue of 
Ô 
´ 
Ü 
 
½ 
µ 
for the given data set will change. WhenPDF is seen as the function of unknown parameter
 
it iscalled likelihood function. Intuitively PDF provides howaccurately we can estimate
 
.If the probability distributionof the data is known, thenthe problem of finding an estimator
 
 
is simply finding afunction of data (as given by Eq. 6) which gives estimatesof the parameter. Many defect estimators [1, 3, 6, 11, 14,15, 16] assume the probabilitydistributionfor the data. Thevariability of estimates determines the efficiency of the es-timator. The higher the variance of the estimator the lesseffective (or reliable) are the estimates. Hence various esti-mators can be found for the data but the one with the low-est variance is the best estimator. Another important char-acteristic of the estimator is that it must be unbiased, i.e.
 
  
 
 
 
.There are various methods available to determine thelower bound on the variance of the estimators, e.g. [8, 10],but Cramer-Rao Lower Bound (CRLB) [7] is easier to de-termine. CRLB Theorem states:
it is assumed that the PDF 
Ô 
´ 
Ü 
 
µ 
satisfies the regularity condition
 
 
ÐÒÔ 
´ 
Ü 
 
µ 
 
¼ 
 
 
(3)
where theexpectationis takenwithrespect to
Ô 
´ 
Ü 
 
µ 
. Then,the variance of any unbiased estimator 
 
 
must satisfy
ÎÊ 
´  
 
µ 
 
½ 
 
 
 
 
¾ 
ÐÒÔ 
´ 
Ü 
 
µ 
 
¾ 
(4)
£ 
An estimator which is unbiased, satisfies the CRLB the-orem, and is based on a linear model is called an effi-cient Minimum Variance Unbiased (MVU) estimator [7].In other words it efficiently uses the data to find the esti-mates. An estimator whose variance is always minimumwhen compared to other estimators but is not less thanCRLB is simply called MVU estimator. For a linear datamodel (Eq. 2) it is easier to find the efficient MVU estima-tor as given by Eq. 5. The efficient MVU estimator and itsvariance bounded by CRLB are given by Eqs 6 and 7 re-spectively.
ÐÒÔ 
´ 
Ü 
 
µ 
 
 
Á 
´ 
 
µ´ 
 
´ 
Ü 
µ 
 
 
µ 
(5)
 
 
 
 
´ 
Ü 
µ 
(6)
ÎÊ 
´  
 
µ ½ 
Á 
´ 
 
µ 
(7)A defect estimatorwhichis an efficient MVUestimator willprovide very accurate estimates. In other fields such as sig-nal processing and communication systems where the sys-tem or model under investigationis well defined in terms of physical constraints, it is possible to find an efficient MVUestimator. Butinsoftwareengineeringnomodel completelycaptures all the aspects of a software testing process. Dif-ferent models [1, 3, 6, 11, 14, 15, 16] are based on differentassumptions and this lack of consistency hints towards theabsence of a mature testingmodel. Therefore its unlikelytofind an efficient MVU estimator.It is possible that for a given data model CRLB can-not be achieved. In other words a solution similar to theone given by Eq. 5 may not exist. Another approach tofind an MVU estimator is based on finding sufficient statis-tic such that minimal data is required to make PDF of data
Ô 
´ 
Ü 
 
µ 
independent of unknown parameter
 
. As dis-cussed earlier
Ô 
´ 
Ü 
 
Ò 
 
µ 
is dependent on both data
Ü 
 
Ò 
and
 
. For example a simple estimator
 
 
 
Ü 
 
Ò 
willhave high variance in estimating
 
. On the other handif sufficient data
Ü 
¼
Ü 
½
Ü 
 
Æ 
 
½
is available thennew data points will not provide additional informationabout
 
. For example if 
Ü 
 
Æ 
is a new sample after suffi-cient statistic has been acquired then the conditional PDF
Ô 
´ 
Ü 
 
Æ 
Ü 
¼
Ü 
½
Ü 
 
Æ 
 
½
 
µ 
is independent of 
 
asgiven by 8.
Ô 
´ 
Ü 
 
Æ 
Ü 
 
¼ 
Ü 
 
½ 

Ü 
 
Æ 
 
½ 
 
µ 
Ô 
´ 
Ü 
 
Æ 
Ü 
 
¼ 
Ü 
 
½ 

Ü 
 
Æ 
 
½ 
µ 
(8)If the sufficient statistic exist then according to Neyman-Fisher Factorization Theorem [7]
Ô 
´ 
Ü 
 
µ 
 
´ 
Ì 
´ 
Ü 
µ 
 
µ 
 
´ 
Ü 
µ 
(9)where
Ì 
´ 
Ü 
µ 
is the sufficient statistic. If 
Ô 
´ 
Ü 
 
µ 
can be fac-torized as given by Eq. 9 then sufficient statistic
Ì 
´ 
Ü 
µ 
ex-ists. Then we have to find a function of 
Ì 
´ 
Ü 
µ 
which is theestimator
 
 
such that
 
 
is unbiased
 
  
 
 
 
.
 
 
 
 
´ 
Ì 
´ 
Ü 
µµ 
(10)The
 
 
is also an MVU estimator. We can intuitively seethat
 
 
will have lower bound on the variance as more newdata is not contributing any new information about
 
. ThePDF for
 
¿ 
Å 
[3] can be factorized as given by Eq. 9.The estimator of 
 
¿ 
Å 
is an unbiased function of 
Ì 
´ 
Ü 
µ 
as can be seen from Eq. 25, but we do not claim that theestimator of 
 
¿ 
Å 
is based on sufficient statistic for thefollowing reason. Software testing is a process which pro-duces new defects along with rediscovered defects. An ex-ample of sufficient statistic is that we want to estimate theaccuracy of a surgical precision laser so samples of the de-vice are taken. These samples will be ‘around’ the speci-fied desired value. We take sufficient samples to estimate

Activity (10)

You've already reviewed this. Edit your review.
1 hundred reads
1 thousand reads
Terry L Cowart liked this
dudedudydude liked this
abhishekqai liked this
chandrashekar liked this
Nguyen Duc Duy liked this
dhaval4hamesha liked this
Dhananjaya1984 liked this
flyhead liked this

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->