You are on page 1of 7
Software Test Data Generation using the Chaining Approach Roger Ferguson Department of Computer Science Lawrence Technological University ABSTRACT Software testing, specifically, test data generation is very labor-intensive and expensive. As a result it accounts for a significant portion of software system development cost. In this paper we present a chaining approach for automated software test data generation. The chaining approach uses data ependence analysis 10 guide the test data generation process. The experiments have shown at the chaining approach may significantly improve the chances of finding test data as ‘compared to the existing methods of automated test ‘data generation 1. INTRODUCTION Software testing is avery labor-intensive and hence, very expensive process [1,23]. If the process of testing could "be automated), significant reductions in the cos of software development could be achieved, Data generation fo software testing the process of identifying program input which satisfy selected testing criterion, Sutural testing coverage teria is the requirement that certain program elements, or combinations thereof, be evaluated (eg, statement and dataflow coverage) ‘The probler After initial testing, programmers face the problem of finding additional west data to evaluate program elements not covered, e.g. branches. Finding input test data to evaluate those remaining statements (also referred to as nodes) requires a programmer to hhave a good working knowledge of the program, and. this requires a significant amount of time, increasing the overall cost of the software, This is because programmers do not always work on their own code. For example, in software maintenance, programmers modify someone else's programs, which are often poorly, or only partially understood. A test data INTERNATIONAL TEST CONFERENCE 0-7803-2001-0195 $4.00 © 1995 IEEE Bogdan Kore! Deparment of Computer Science inois Insutute of Technology ‘generator is a tool which assists a programmer in the generation of input data to evaluate the desired statements. There are two different approaches taken by test data generators, namely, path orlented [4,5] and goal oriented approach (7]. [Existing research: ‘The path oriented approach is the process of selecting program paths) and then” generating {input data for that pa, The path tobe tested can be generated automaiically, oF provided by the user ‘Two methods have been proposed to find program input x to execute the selected path, namely, symbolic execution (4,6, and execution based test daga generation [5.7 Symbolic execution generates path constraints, which consist ofa set of equalities and inequalities on the program's symbolic input ‘variables. A number of algorithms have been used for solving these constraints [8.9]. Symbolic execution isa promising approach, however, there are several weaknesses. including. dificulty with dynamic data structures, arrays, and procedures. ‘The other approach for finding program input was, proposed by Korel {5,7}, which is an execution ‘based technique. In this approach, the goal of finding a program input is achieved by solving a Sequence of subgoas, where function minimization aethods [8.9] ae used to solve these subgoals. This approach helps to overcome several problems of symbolic execution methods, i.e. handling of says, procedures and dynamic data structures. ‘There are some weaknesses (3) with the existing peath-oriemed test data generators; those weaknesses fre mainly associated with the path selection process. The disadvantage of not Knowing if the selected path is feasible makes the path oriented approach ofimited use, ha is, very often infeasible paths are selected and significant. comy effor can be wasted in analyzing these infeasible pats, Paper 31.1 708 program sample; var target.i : integer; a, b: array [1.10] of integer; fa, fb: boolean; input (a,; a) 1 2 3 4 5 while (i < 10) do begin 6 7 8 if (ali) = target) then fa := true; i+h end; 9 if (fa = true) then begin 10 1 fb := true; 12 while (i < 10) do begin B if (bli] # target) then 14 fh := false; 15 imit+l; end; end; 16 if (fb = true) then uu writeln (‘message 1”); 18 else writeln (‘message 2°); end. Figure 1. A simple program with cooresponding ‘control flow graph. ‘The other approach to test data generation is the {goal oriented approach. In this approach the process fof generating input test data to evaluate a goal statement (or node) is irespective of the path taken, ‘That is, the general idea of the goal orlented approach proposed by Kore! [7] is to concentrate on “essential” branches, ic. branches which "influence" the execution of the goal node g, and to ignore “non- cssential” branches, i.e. branches which in no way luence the execution of the goal node g. This approach overcomes many limitations of the path: criented methods. The goal oriented approach is based purely on a flow graph of the program, This limited amount of information may make’ some nodes g's very difficult to reach for some programs Paper 31.1 ‘because execution of a particular node g may require ‘execution of some other nodes in the program prior to execution of node g. For example, prior to reaching node 10 in a sample program in Figure 1, node 7 must be executed. In this paper, we extend the goal oriented approach, Our approach, referred wo as we chaining approach, uses program dependency concepts combined with the flow graph of the program to identify a sequence ‘of “essential” nodes to be executed prior to execution of node g. AS a result, the chances of reaching node g are significantly increased. Our experiments show that this approach is a more efficient strategy for generating test data as ‘compared to existing methods 2. GENERAL DESCRIPTION OF THE CHAINING APPROACH ‘The approach is presented for the node problem; however, this approach can be used for other types ‘of problems, e.g. the branch problem, the defiition- use chain problem, etc. The node problem is stated as follows: Given node g (referred to as the goal node) in a program Q. The object isto find program {input x on which node g willbe executed. ‘The basic idea of the chaining approach is to identify a sequence of “essential” nodes (9], M2,» ) to be executed prior to execution of node 8. The approach stats by executing a program for an arbitrary program input x. During program ‘execution for each executed branch (p,q), @ search process [5,7] decides whether the execution should continue through this branch or whether an altemative branch should be taken (because, for instance, the current branch doesn't lead to the goal node g). If the search process decides that the alternative branch of (p.q) should be taken, then program execution is suspended and a new program input x is determined wo change the flow of ‘execution at this branch. If, at this point, the search ‘process cannot find program input x to change the flow of execution at branch (p,q), then the chaining approach attempts to alter the flow at branch (p,q) by identifying the “essential” nodes in the program by using data dependence concepts and requiring that these nodes to be executed firs, i.e. prior 10 reaching branch (P.). Definition Node Problem Node Goal node Oo06-6 Figure 2. A event sequence for problem node 9. ‘To illustrate the chaining approach, the program FIND is used and is shown in Figure 1. Let the goal node be node 10. The chaining approach stars by executing the program for an arbitrary input x (e.. {A= (1,23, ..), B= (2,4, 6, .), target= -1)). The execution starts at the entry node § and proceeds to node 9. At node 9, atest node, the flow of execution proceeds down the FALSE (right) branch (9,16). At this point execution halls, since branch (9,16) is a critical branch, i.e, branch (9,16) never leads to node 10. ‘Assuming the search process (5,7] (section 3) cannot find new program input x to alter the flow at node 9, the chaining approach finds definitions of variables used at node 9 to determine the “essential” nodes ‘which are to be reached prior to reaching node 9. ‘Those definitions are nodes 3 and 7. That is, these “essential” nodes are used to determine the onder of ‘execution of nodes in the program; we refer to this ‘ordering of nodes as an event sequence, An ‘example of an event sequence using node 7 is shown in Figure 2. ‘The chaining approach uses this event sequence in ‘order to control the execution of the program. In ‘ther words, the chaining approach requires. that ‘node 7 is reached first, then node 9, and finally node 10. Inorder to reach node 7, any path from s (entry node) to 7 can be traversed. To reach node 9 from node 7, only those subpaths can be traversed that do ‘not modify the variable "fa". Once problem node 9 is reached, the new definition forthe variable found assigned at node 7 may cause the flow of execution to proceed down the True’ path attest node 9, ie. ‘branch (9,10). Finally, any subpath can be traversed to reach node 10. Using the chaining approach, the ‘chances of finding input x are increased Finally, when the chaining approach is attempting (0 reach a particular node, and a different critical branch [7] is traversed, then the chaining approach expands the event sequence (potentially several times) in an attempt to alter the flow of execution at {his new critical branch. This process continues until the goal node is reached or the computing resources are exhausted. The experimental results have shown tat for many programs, especially complex programs the chaining approach outperformed the existing methods of test data generation 3. THE SEARCH PROCESS Before we describe the search process, we formally Introduce the event sequence. An event sequence E fa sequence of events where each event isa touple ej = (nj, Si) where n; € N is a node and S; isa set of variables referred to as a constraint set, For every two adjacent events, ej = (njSi) and cis t= (oie Sie) there exists a definition clear path with respect t0 S} from nj to nig1. Informally, a