You are on page 1of 4

How Do You Know When

You Are Done Testing?

By Richard Bender

and the system state after the test exe- The first two steps are totally inter-
When you ask testers how they know cutes. twined. Testing, by definition, is com-
they are done testing, the most common paring an expected answer to the
responses are: Build Test Cases: There are two parts observed answer. You need to define
needed to build test cases from logical quantitatively and qualitatively how
We test until we are out of time and test cases: creating the necessary data; much testing is enough and then design
resources; and building the components to support tests that will ensure that criteria is met.
testing (e.g., build the navigation to get to You must do this for each type of testing:
We test until all of the test cases we cre- the portion of the program being tested). functional, performance, usability, secu-
ated ran successfully at least once and rity, etc. Given the space constraints, we
there are no outstanding severe defects. Execute Tests: Execute the test case steps will only address functional testing in this
against the system being tested and docu- paper.
I admire the honesty of the first answer ment the results.
which comes from the “clean con- The first thing we need to understand is
science” school of testing – “I did all the Verify Test Results: Verify that the that you cannot exhaustively test any
testing I could under the constraints man- expected test results match the observed software system. The upper limit to the
agement gave me and my conscience is results. [Note: This pre-supposes that the total number of tests for a program is:
clear”. The obvious question that follows specifications are clear enough and
the second answer is how much function detailed enough to actually calculate the [2n(L1*L2*...*Lx)(V1*V2*...*Vy)]!
and code were actually tested? In the expected answer ahead of time. Testing
vast majority of cases the team has no specifications to ensure that they are cor- where “n” is the number of decisions,
quantitative measure of their level of test- rect, unambiguous, logically consistent, “Li” is the number of times a given deci-
ing. and written in sufficient detail is a non- sion can loop, x is the number of deci-
trivial issue and the subject of another sions which cause loops (x < or = to n),
Stepping back, testing is divided into the paper.] “Vi” is the number of all of the possible
following eight activities: values that each input variable can have,
Verify Test Coverage: Track the amount and y is the number of input variables.
Define Test Completion Criteria: The of coverage achieved by the successful The factorial (“!”) is because the order in
test effort has specific, quantifiable goals. execution of each test. which the set of tests are executed does
Testing is completed only when the goals make a difference as to the results. This
have been reached (e.g., testing is com- Manage the Test Library: Maintain the number is actually absolutely meaning-
plete when the tests that address 100% relationships between the test cases and less mathematically as well as being
functional coverage of the system all the programs being tested. Keep track of practically impossible to achieve. In
have executed successfully). what tests have/have not been executed, many programs this number exceeds the
and whether the executed tests have number of molecules in the universe
passed or failed.
Design Test Cases: Logical test cases are [1080 according to Stephen Hawkings].
defined by five characteristics: the initial
state of the system prior to executing the Manage the Resolution of Identified
The goal of test case design is to identify
test; the data in the system (e.g., data base Defects: Track the status of defects and
an extremely small subset of the possible
values); the inputs; the expected results; retest as needed.
combinations of data that will give you

it now fails (see Figure 4). C and F are not The design of the set of tests must be run and we are ready for production. The code is shipped into production. able. When we run test variation 1 the that two or more defects can sometimes Therefore.. Figure 3 Figure 4 B is always true. Let us further assume fixed. externally observable. Therefore. The first is that software. However. This should result in C. second test variation we enter B true When you run a test how do you know it The constraints between the data attrib- which the software always thinks is the worked? You look at the outputs. When this test is “untestable”. defect scenarios will be observable. The issue of The relations between the variables (e. the only one that failed. data on two to be true at the same time). This meets the common test completion criteria that every test has run correctly at least once and no severe defects are unresolved. and data in communications The functional variations to test (i. People look at the test results and Figure 5 just see if they look “reasonable”. Sadly. and. When test variation four failed it lead to identifying the B stuck true defect.g.. primitives to test for each logical the right answer. There is no Geneva can now see the A defect. However. Only one more test to the observable output. the software thinks both B and C are true. The true. When any as we expected because the D. That means there is no way expecting and compare it to the answer entered the software says A is not true. F leg Convention for software which limits us defect is detected all of the related tests worked. not). The result is we get the right answer problem is worse than that. is also says B is not false. and G some variations get flagged as do not pre-calculate the answer you were being set to true. The second thing is that if you to true. even when it is riddled with defects. will what the input is. databases. data on screens. A fairly obvious test still produce correct results for many of case would be to have all of the inputs set A by product of these algorithms is that the tests. In this case we did not see the to one defect per function.. When we most systems these are updates to the impossible for variables one and enter the third variation with just C true. when we enter the fourth test deduce that the A. is rerun. Figure 3 shows the results of running the The above example addresses the issue tests. These are all externally observ. defect at C because it was hidden by the F leg working correctly. and ing to management that we are three Node observability. When we run the observability must be taken into account. is false. E. expected true value but is set to false. the test case design algorithms software says A is not true. F function worked observable point. cancel each other out giving the right must factor in: However. C function worked ent. it is answers for the wrong reasons. when we get to G it is still true is caused by a combination of constraints teria. relationship). For utes (e. always assumes that A is true no matter observable point. When the B defect is fixed you . if you rerun test variation one. B.g. It now gives the correct results. then one or more tests will fail at an example so far. if any additional defects are pres- There are two key things about this there is a defect at A where the code ent. F. We are by now report. We will indirectly that at least one test case will fail at an that we know we have a problem. must be rerun. you are mathematically guaranteed with all inputs false and still get D true by looking at G. Part of the problem is that the specifications are not in sufficient detail to meet the most basic definition of testing. The code is fixed and test variation four. it to design a set of tests which include this you got you are not really testing. deduce that the D. it is false. E. reports. When that defect is by looking at G. We will indirectly such that if one or more defects are pres- However. This ing in our industry does not meet this cri.. for the wrong reason. quarters done our testing and everything In Figure 5 let us assume that node G is is looking great.e. C is not set to the variation and still guarantee that all the majority of what purports to be test. The “A stuck true” defect was not caused by fixing the B defect. it is physically case – we get the right answer. the Since this is an inclusive “or” we still get packets. or.

e.] Testing to the C1 level does not guarantee that each of these data flows will be exe- cuted. This is a sig- nificant amount of function untested. We There is another interesting insight from Figure 8 need to add Test 3 data flow analysis about test suite design which follows the – i. set X prior to executing segment “8”.. we never executed a path tests. most non-repro- it loops. In order to test a given 5(true). each with the right tests first. There might be three different places in the program that modify A where there is a path from that place to the statement being tested where A is not overridden in between. be types that would have been difficult to how many times to loop. When we executed debug. ducible defects have spurious data flows to see if X is now “0”.year Earl Pottorff and I took on the prob- lem and discovered the need for a higher level of test coverage – data flow based testing. let’s say we have a statement that adds A to B to get C. the test suite. However. segment “8” will at their root. if it is “0” it terminates the loop 20 times. It byte field). multiple transactions must be executed in These also tended to a particular sequence. X was last set at segment “3”. Static data flow analysis loops again. X was last set by segment “7”. the test criteria from C1 to D1 we did Yet another interesting by product is that find 25% more in order to execute certain data flows. It is not unusual to have multiple loop and subtract 1. X is modified by segments “1”. the test case which includes it However. data flow. In fact when you reach C1 cover- age you usually still have 20% to 40% of these data flows not tested. 8. The basic data flow coverage is called D1 and includes the C1 coverage. 2 (false). packaging the test cases into sets path 1. these will occur before even running the ment. For each variable used as input to a statement. segment “8” there. When we executed Test 1 where segment “1” was the last place to above. Data flow coverage adds a higher level of rigor to the testing process. tests satisfied 100% C1 coverage. I have even This path causes a seen data flows which would be tested nearly infinite loop only if included in a test which happened (actually 232-1 iter. 4. [Note: data flows are sometimes called “set-use pairs” in the literature. and checks Test 2. This requires the test suite to be variable is now broken into smaller execution packets minus 2 and so on. Each of these three data flow relationships must be test- ed. After we cuted. The logic is that fore loops 10 times. code-based defects. If it is not “0” it This sets X to “20”. “3”. What happens is . Remember that these two actually is able to predict where many of loop and continues on to the next state. data flows each requiring their test to be the loop control first. Looking at Figure 6 again we see that Figure 9 segment “8” uses variable X to determine This sets X to “10”. to be in just the sixteenth through the ations if X is a four nineteenth position in the test suite. For example. for execution. and “7”. 6. The most common require- ment “1” sets X to ment is for the test to be the first one exe- minus 1. For example. let us might have to be in a certain position in assume that seg. would work fine if placed anywhere from the first to the fifteenth position or in the When we increased twentieth position or later. subtracts “1” from X. it determines if each possible source of the data has been tested.