Professional Documents
Culture Documents
Landmines of Software Testing Metrics
Landmines of Software Testing Metrics
Testing Metrics
1. Abstract
It is not only desirable but also necessary to assess the quality of
testing being delivered by a vendor. Specific to software testing,
there are some discerning metrics that one can look at, however it
must be kept in mind that there are multiple factors that affect these
metrics which are not necessarily under the control of testing team.
The SLAs for testing initiatives can, and should, only be committed
after a detailed understanding of the customers IT organization in
terms of culture and process maturity and after analyzing the
various trends among these metrics. This white paper lists some of
the popular testing metrics and the factors one must keep in mind
while reading in to their values.
2. Introduction
This white paper discusses some of the popular metrics for testing
outsourcing engagements and the factors one must keep in mind
while looking at the values of these metrics.
Metric 1.Residual defects after a testing stage
?
Definition
The absolute number of defects that are detected after the testing
stage (owned by the vendors testing team).
The lower the number of defects found after the current testing
stage the better the quality of testing is considered.
Factors to consider
?
Quality of requirements
The ambiguity in the requirements results in misinterpretations and
misunderstandings, leading to ineffectiveness in defect detection.
The clearer the requirements the higher the chances of testing team
understanding it right and hence noticing any deviations or defects
in the system under test (SUT).
?
Quality of development
The planning for testing is usually done with an assumption that the
system will be thoroughly unit tested prior to handling it to the
testing team. However if the quality of the development process is
poor and if the unit testing is not thoroughly done, the testing team
is likely to be encountering more of unit-level defects and might be
If the Business users are not available for answering the queries in
time, the testing team is likely to struggle in establishing the
intended behavior of the system under test (whether it is a defect or
not etc.) and hence some amount of productivity would be lost and
it is likely to have more defects remaining by the time the testing
window is used up.
?
Incomplete test cycles - Delay in defect fixes, higher defect fixes,
environment availability
The estimates and planning for testing is based on certain
assumptions and available historical data. However if there are
higher number of disruptions (than anticipated) to testing in terms
of environment unavailability or higher number of defects being
found and fixed, the quality time available for testing the system
would be less and hence higher number of defects slip through the
testing stage.
Metric 2. Test effectiveness of a testing stage (or) Defect
Containment effectiveness.
?
Definition
The % of system level defects that have slipped through the testing
stage (owned by the vendors testing team) and are detected at a later
testing stage or in Production.
The higher the effectiveness the lower the chances of defects being
found downstream.
Test Stage Effectiveness =
(Defects detected in the current testing stage)
------------------------------------------------------------(Detects detected in current testing stage + defects detected in all
subsequent stages)
Factors to consider
?
Availability of accurate defects data at all stages
We must ensure that the data on defects on all subsequent stages are
also available and are accurate. Production defects are usually
handled by a separate Production support team and testing team is
at times not given much insight in to this data. Also, since multiple
projects and/or Programs would be going live, one after another,
there are usually challenges in identifying which defects in
Production can be attributed to which Project or Program.
Inaccuracies in assignment would lead to inaccurate measure of test
stage effectiveness.
All the factors listed above for Residual defects also apply for this
metric.
Metric 3. % improvement in test case productivity
?
Definitions
Some people might consider test data set up as part of test case
(Testing cost per unit dev effort in curr year last year)
= ---------------------------------------------------------(Testing cost per unit development effort last year)
Factors to consider
?
Lack of accurate measurement of development efforts
This metric heavily depends on the measurement of actual
development effort which might not be accurate. The number of
projects that have formal measurement units such as FP is relatively
few.
?
Testing effort variance with dev effort
there may be lot of structural changes to the existing code with high
development efforts (for performance enhancement or some other
refactoring needs) but the (black box) testing effort might be not be
as high if the end functionality is not undergoing a drastic change.
?
Scope of outsourcing
?
Feasibility of automation
?
Legacy systems with no documented requirements
Most of the legacy systems might not have any documentation on
the basic functionality thus making the reference point difficult to
establish.
?
Configuration Management
Where people try to over emphasize this metric and refrain from
writing more test cases - there may be a risk of not detecting some
of defects we could otherwise find (thus resulting in bad test stage
effectiveness). In order to reduce this risk people might try and
write more and more test cases even though the possibility of
finding a defect might be low.
Metric 8. Defect Rejection Ratio
A defect initially raised by a tester could be later rejected for multiple
reasons and the main objective of having this metric is to ensure the
testers correctly understand the application/requirements and do
their ground work well before everyones time and attention is
invested in solving the defect. Too many defects being rejected
results in inefficiency (due to time and effort spent on something
that wasnt a problem).
?
Definitions
At times the behavior of the system changes from the time a defect
is raised to the time the verification of the defect is done (it didnt
work when I raised the defect but now it is working. Dont know
10
3. Conclusion
There are some discerning metrics for testing engagements which
can be considered for drafting SLAs. However for each metric one
must take in to consideration the factors (including other metrics)
that influence the value of metric and the scope of the testing team
in controlling those factors. This white paper lists down some of
the popular metrics and discusses about the factors that affect each
of these metrics.
One shouldnt underestimate the fact that it usually takes some time
(a few months to over a year) in order to establish consistent trends
on these metrics. It is thus recommended that enough time is given
to assess the trends of these metrics before an SLA is worked upon
for the engagement.
Where applicable, assumptions must clearly be documented to
indicate/reflect the influencing factors on the SLAs being
committed.
11
Hello there. I am from HCL Technologies. We work behind the scenes, helping our
customers to shift paradigms and start revolutions. We use digital engineering to
build superhuman capabilities. We make sure that the rate of progress far
exceeds the price. And right now, 59000 of us bright sparks are busy developing
solutions for 500 customers in 20 countries across the world.
How can I help you?
12
transform@hcl.in