You are on page 1of 15

WHEN SHOULD YOU STOP TESTING?

A Presentation for CD-adapco Jim Giangrande September 27, 2012

Topics
Why Do We Test Software? When Should We Stop Testing? Software Coverage Is Code Coverage Enough? Whats Missing? What Else is Missing? The Two Key Problems of Program Testing Properties of an Ideal Test Set Categories of Test Data Adequacy Criteria Functional Coverage Business Usage Coverage Using Metrics as a Guide There are No Silver Bullets

Why Do We Test Software?


To prove the software works? - Cant prove it works, you can only show it doesnt work (as expected) To show it is a quality product? - You cant test quality into a product, plus quality is much more than showing it works correctly To show the product complies with its design specifications - It works as expected or per design To find as many important defects as possible - Defects important to whom (business/end users) - How many defects? All, most, some, how do we know? To increase our confidence that the software will satisfy business needs (can accomplish core business functions)

When Should We Stop Testing?


When we have found all the defects. - Dont know how many there are, so not a practical rule. When we have executed all our test cases (successfully) When we run out of time based on the testing or delivery schedule - Arbitrary rule with no guarantee that the most important defects have been uncovered or that most defects have been found When we have found the most important defects - How do we know we have found the most important defects and that we have found all of them What other rule could we use that would allow us to achieve our testing objective

Software Coverage
One concept that could guide us in our decision to stop testing is to consider some form of code coverage There are a few variants of code coverage including - Statement coverage (all executable statements) - Branch coverage (all branches are executed at least once) - Path coverage (all paths are executed at least once) - Loop coverage (all loops are executed) Code coverage that provides a complete cover is a good necessary criterion for enough testing. A complete cover is one where: 1) every instruction in a module or program has been executed at least once 2) Every decision (branch or case statement) has been taken at least once in each possible way

Is Code Coverage Enough?


Complete code coverage is good since it guarantees the basic paths through the software have been exercised (all statements and decision points have been exercised) If loop coverage is included, then all basic loops are exercised, too - A basic loop is where we exercise the loop 0, 1, max-1, max times (but do we always know the maximum # of iterations for a loop?) Code coverage is more a white box testing technique and does not guarantee that all functions work properly or that typical business scenarios can be performed So, what is missing? A LOT of critical test cases!

Whats Missing?
Test scenarios not covered: - Various combinations of path execution (basic path execution is not the same as all paths or even the most used business paths) - Combinations of logical decisions in addition to just taking a branch in all possible ways (example: 10 yes-no branches can be covered in two paths, taking all Yes branches in a path and taking all No branches in a path) (all combinations for 10 Yes-No branches would need 2**10 paths)

Whats Else Is Missing?


Loop coverage for all possible loops - There are an infinite number of possible loops and only a finite number of these can be executed Functional testing - Path testing does not guarantee a good set of functional testing nor does it ensure typical variants of basic functional tests are executed Business scenario testing - Functional testing gets us part of the way there, but business scenarios are a richer set of tests that are goalbased and involve many functions being performed to reach an ultimate business goal or need Other types of testing (performance, stress, volume, etc.), but here I concentrate on functional over non-functional

The Two Key Problems of Program Testing


According to (Elaine Weyuker), the two key problems of program testing are: 1) Given a program and specification, how to select data which test the program most effectively. 2) Given a program, specification and test data which are processed correctly, how to determine whether or not the testing has been sufficient to justify a claim that the program has been adequately tested. This is referred to in the literature as the test data adequacy problem. According to Goodenough and Gerhart (G&G), an ideal set of tests would have the property that the tests are capable of exposing all errors in the program.

Properties of an Ideal Test Set


As described by G&G an ideal test set should have the following properties represented in the tests: 1) Every individual branching condition in the program (all branches are covered) 2) Every potential termination condition in the program (all the ways of terminating the program) 3) Every decision variable is correctly classed and treated the same in the program 4) Every condition relevant to the correct operation of the program implied by the specification, knowledge of the programs data structures, or knowledge of the general method implemented

Categories of Test Data Adequacy Criteria


By Information Source 1) Specification-based (referred to as black box testing) 2) Program-based (referred to as white box) 3) Interface-based (input domain driven; often achieved via random or statistical testing) By Test Approach 1) Structural (coverage of elements in the program or in the specification) 2) Fault-based (test efforts directed at detecting software faults) 3) Error-based (test efforts directed at historical patterns of human errors, typically made when developing/coding software

Functional Coverage
To use all the information about a programs intended use you must test it from a functional perspective - Guided by a formal spec, use cases, user stories or whatever is available to say how the program should behave A minimal necessary condition is to test all functions completely - Complete meaning all expected paths through the function, plus alternative paths - Need to translate paths into input values to drive the paths Testing all functions is not sufficient - Does not cover interface testing (how program works with other system, programs and data sources) - Does not address dependencies between parts of the programs and between various functions

Business Usage Coverage


All commercial software programs are built with some business need(s) in mind. If the test case coverage from a business usage perspective is not done, test data adequacy will probably not be achieved and the test cases will not be seen as providing assurances that the business needs will be met Test case coverage of critical business usage (achieving typical business goals as a result of program execution) must be one of the test data adequacy criterion used in deciding what to test Business scenario testing (exercising the programs functionality in support of business goals) is unlikely to catch all defects without supplementary testing from a structural perspective

Using Metrics as a Guide to Test Stoppage


One area that needs to be considered in determining when to stop testing is the metrics on defects found by testing so far High test case failure rates, substantial defect run rates per test day, high densities or defect clustering within a module or functional area, defect counts that remain constant per N test cases, all point to a need to do more testing (preferably with variants of the test cases already run and new test cases that exercise those areas of functionality that currently have few or no test cases or where significant errors have been uncovered) Use the existing metrics as a guide on where and how much additional testing is needed during the test cycle

There Are No Silver Bullets


Developing a set of test data that provides or approximates adequacy is no easy task A combination of approaches should be used to reach the test data adequacy goal on a program-by-program basis When to stop testing is a multi-dimensional decision based on satisfying various types of coverage and making appropriate inferences from the results of testing (so far) Determine the need for incremental testing (beyond what is considered necessary) based on the trade-off between the cost vs. expected-benefit of each additional test Use defect prediction models where practical and existing metric on defects (both numbers and clustering) to estimate the benefits of additional testing (defect discovery and removal)

You might also like