You are on page 1of 30

# Survey on Automatic Test Data Generation

2002. 4. 16 Han, Seung-hee

Contents
Introduction
Automation of testing

Problems of test data generation
Automatic test data generation techniques

Conclusion
Future challenges

Introduction
Software testing accounts for 50% of the total cost of software development
This cost could be reduced if the process of testing is automated

Testing techniques
Functional testing Structural testing

Testing Process
Specification or Software Selection Criteria Test Cases Test Case Design

Software Under Testing

Test Oracles

Test Verdicts

Automation of Testing
test data generation tasks mechanical tasks Still performed manually Laborious Time consuming Costly

running & monitoring for coverage analysis

Served by commercial tools

recording test outcomes test report generation

Definition of Test Data Generation
Process of identifying program input which satisfy the selected testing criterion
Given a program P, a path u, generate input x S so that x traverses u.
 P:S→R  S : the set of all possible inputs
the set of all vectors x = (d1,d2,…,dn) such that di  Dxi Dxi : the domain of input variable xi

 R : the set of all possible outputs

1) Find the path predicate for the path 2) Solve the path predicate in terms of input variables

Process of Test Data Generation
1

x = y*2

Select a path for testing criterion

2
F

If (x>=10)
T

3

x = x+10 y = y+1

( x >= 10 )  ( x+y < 110 )

4 5
F T

Extract constraints (a path condition) of the path

If (x+y <110)
6

printf(“OK”);

x? y?

7

Solve the constraints  test data

Complexity of Test Data Generation
The problem of determining whether a solution exists to a system of inequalities is undecidable
Path feasibility problem is undecidable

 Most work has been performed on toy programs, i.e.

small and less complex program

Problems of Test Data Generation
Arrays, Pointers
Ambiguity Complex-heap
input(i,j); a[j] = 2; a[i] = 0; a[j] = a[j]+1; If (a[j]=3)…
if i = j, a[j] = 1 if i  j, a[j] = 3

Objects
Dynamically allocated Inheritance, Polymorphism

*x = 20; *y = 50; c = *x;

if x  y, c = 20 if x = y, c = 50

Loops
Not having a constant number of iterations

Problems of Test Data Generation (cont.)
Modules
Source code is not accessible

Infeasible paths
Solving an arbitrary system of equations is undecidable

Oracle
The only way of achieving an oracle is to supply extra information
 requirement/design spec, assertion…

Architecture of Test Data Generator
Program Analyzer
Control Flowgraph Control Flowgraph

Data Dependence Graph

Path Selector
Test Paths

Test Data Generator
Test Data

Path Info concerning infeasible paths

Automatic Test Data Generation Techniques
Functional testing (spec-based) Structural testing (code-based)

Approaches of test data generator
(1) Random

Implementation methods
(1) Static • Using symbolic execution

(2) Path-oriented

(2) Dynamic
• Using function minimization

(3) Goal-oriented (3) Hybrid of 2 methods

Random Test Data Generation

test data generator

01110011… a stream of bits

program execution

It used as a benchmark, since it is considered to be of the lowest acceptance rate

Path-Oriented Test Data Generation
s
int triType(int a, int b, int c) { 1 int type = PLAIN; 1 if (a > b) 2 swap(a,b); 3 if (a > c) 4 swap(a,c); 5 if (b > c) 6 swap(b,c); 7 if (a==b) { 8 if (b==c) 9 type = EQUILATERAL; . else 10 type = ISOSCELES; . } 11 else if (b==c) 12 type = ISOSCELES; 13 return type; }
1

a>b
2

ab
3

a>c
4

Path Condition P’ = (a > b)  (a  c )  (b > c)  (a = b)  (b  c) data dependency

ac
5

bc
7

b>c
6

ab
11

a=b
8

Valid Path Condition P = (a > b)  (b  c )  (a > c)  (b = c)  (c  a)
b=c
9

bc
10

b=c bc
12

solve using ? CLP, IRM, MILP… Test Data (a, b, c) = (5, 4, 4)

13

e

Goal-Oriented Test Data Generation
int triType(int a, int b, int c) { 1 int type = PLAIN; 1 if (a > b) 2 swap(a,b); 3 if (a > c) 4 swap(a,c); 5 if (b > c) 6 swap(b,c); 7 if (a==b) { 8 if (b==c) 9 type = EQUILATERAL; . else 10 type = ISOSCELES; . } 11 else if (b==c) 12 type = ISOSCELES; 13 return type; }
1

a>b
2

ab
3

Unspecific path <3,10,13> : <3> and <10,13>

a>c
4

ac
5

bc
7

b>c
6

ab
11

a=b
8

Set of paths <3,5,7,8,10,13> <3,4,5,7,8,10,13> <3,5,6,7,8,10,13> <3,4,5,6,7,8,10,13> b=c
9

bc
10

Choose one path

b=c bc
12

Test Data e

13

Goal-Oriented Test Data Generation
Find-any-path concept
Related work (Bogdan Korel)
Chaining approach (IEEE TSE, 1995)
 Data dependence analysis

Assertion-oriented approach (IEEE TSE, 1996)
 Assertions are inserted  Oracle is given in the code

Tool
TESTGEN system

Comparison of Approaches
Random approach Low probability in finding semantically small faults
Path-oriented approach Better prediction of coverage Harder to find test data Often selection of infeasible paths
void foo(int a , int b) { if (a == b) then write(1); else write(2); }

Goal-oriented approach Hard to predict the coverage More flexible to find test data Reduces the selection relatively infeasible paths

Static Test Data Generation
Symbolic execution is representative Proposed by James C. King in 1976 Executing a program symbolically means that instead of using actual values variable substitution is used Related work “Test-Case Generation with IOGen” (Timothy E. Lindquist, IEEE Soft., 1988) “Automatic Unit Test Data Generation using Mixed-Integer Linear Programming and Execution Trees” (S. Lapierre, ICSM, 1999) “ATGen: Automatic Test Data Generation using Constraint Logic Programming and Symbolic Execution” (Christophe Meudec, STVR, 2001)

Merits & Demerits of Static Approach
Merits
No consistency checks of branch predicates since all can be solved at once

Demerits
Difficulty with dynamic data structures, arrays, procedures, and loop conditions Overheads of repeated algebraic manipulation and simplification of variable and path expressions Not applicable to real-time software systems

Dynamic Test Data Generation
Execution-based approach
“Automated software test data generation”,
(Bogdan Korel, IEEE TSE, 1990)
Function minimization search algorithm Dynamic data flow analysis

Related Work of Dynamic Approach
“ADTEST:A Test Data Generation Suite for Ada Software Systems” (Matthew J. Gallagher, IEEE TSE, 1997) Program instrumentation Numerical optimization problem
“Generating Software Test Data Generation by Evolution” (Christoph Michael & Gary McGraw, (1997), IEEE TSE 2001) “Automated Software Test Data Generation for Complex Program”(Christoph Michael & Gary McGraw, IEEE ASEC, 1998) Genetic algorithm GADGET System

Merits & Demerits of Dynamic Approach
Merits
Dynamic data structure (pointer, array) can be handled Function call can be handled

Demerits
Expensive : require many iterations before a suitable input is found Monitoring by instrumentation Actual execution suffers from the speed of execution of program Inefficient in handling infeasible paths

Hybrid of Two Forms
“Automated Test Data Generation using an Iterative Relaxation Method” (Gupta et al., ACM SIGSOFT, 1998)
Techniques
 Relaxation technique  Predicate slice and input dependency set

Solved problems
 General approach for nonlinear expressions  Dynamic data structure  Effective handling of infeasible paths

Hybrid of Two Forms (cont.)
“The Dynamic Domain Reduction Procedure for Test Data Generation”, (A. Jefferson Offutt et al., SP&E, 1999)
Techniques
 Constraint-based testing domain reduction procedure  Symbolic evaluation  dynamic test data generation approach

Solved problems
 Array, loop
 Only applicable numerical software  Unit level testing (not inter-procedural)

Conclusion
Most of research work has performed on toy programs Applied programs are very short, less complex, and using a subset of language features Test data generator handle only integer, Boolean, real number, array, pointer, not general data object Some work supports inter-procedural level testing, oracle, and effective detection of the infeasible paths
No papers concerning objects

Combining static and dynamic method is useful Static : cost reduction Dynamic : dynamic data structure

Future Challenges
Constraint-satisfaction techniques
Pointers and shapes Object-oriented programs

Path selection
Oracle problem

Symbolic Execution
input variable : y → Y
1

x = y*2 Node Path condition Path action x = Y*2 Y*2>=10 x = Y*2+10 y = Y+1 (Y*2+10) + (Y+1) >= 110

2
F

If (x>=10)
T

1 2 3 4 5

3

x = x+10 y = y+1

4 5
F T

If (x+y <110)
6

printf(“OK”); Y >= 5  Y >= 33  Y >= 33

7

Dynamic Test Data Generation
input variable : y
1

x = y*2 If (x>=10)
T

y→3 y→4 y → 10
F(y) = 10 – x2(y) if x2(y) < 10, 0 otherwise

2
F

3 4

x = x+10 y = y+1

F(y) = 4 F(y) = 2 F(y) = 0

5
F T

If (x+y <110)
6

printf(“OK”);

7

Iterative Relaxation Method
initial input I0 (x0,y0)
1

T

2
F

predicate function F = x+y -10
linear function f(x,y) = ax+by+c

ax+by+c = 0 ax0+by0+c = r

3
4

x = x+10 y = y+1

ax + by = -r  x, y a(x0+x) + b(y0+y) + c = 0

5

F(x0 + x, y0) – F(x0, y0) =a x F(x0, y0+ y ) – F(x0, y0) =b y ax0 + by0 + c = F(I0)

Test Data (x0+x, y0+y)

Dynamic Domain Reduction
Find test data ?

2
3

1. Assume initial input domain

X: < -10 .. 10 > Y: < -10 .. 10 > Z: < -10 .. 10 >
2. Split at 0 3. Split at -5 y < -10 .. 0 > x < -5 .. 0 > z < 1 .. 10 > y < -10 .. -5 >
4. Split at 2 x < -5 .. 2 > z < 3 .. 10 >

4

A test case X = 0, y = -10, z = 8