Professional Documents
Culture Documents
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/236255552
CITATIONS READS
12 64
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Mike Papadakis on 23 April 2014.
Abstract
Mutation testing is usually regarded as an important method towards fault revealing. Despite this advantage, it has
proved to be impractical for industrial use because of its expenses. To this extend, automated techniques are
needed in order to apply and reduce the method’s demands. Whilst there is much evidence that automated test
data generation techniques can effectively automate the testing process, there has been little work on applying
them in the context of mutation testing. In this paper, search-based testing is used in order to effectively generate
test inputs capable of revealing mutants. To this end, a dynamic execution scheme capable of introducing and
guiding the search towards the sought mutants is proposed. Experimentation with the proposed approach reveals
its superiority from the previously proposed methods. Additionally, the framework’s feasibility and practicality of
producing mutation based test cases are also demonstrated.
Keywords: Test case generation, Search based testing, Mutation testing
© 2013 Papadakis and Malevris; licensee Springer. This is an Open Access article distributed under the terms of the Creative
Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.
Papadakis and Malevris SpringerPlus 2013, 2:121 Page 2 of 12
http://www.springerplus.com/content/2/1/121
automated generation of test cases according to its re- known as the alternating variable method (AVM) pro-
quirements. This can prove to be a very laborious task posed by Korel (1990) for searching and producing the
(Offutt and Untch 2001) while, little has been done into sought test cases for strong mutation. The choice of the
automating effectively the mutation-based test produc- AVM method was due to its simplicity and the high
tion. Additionally, little has also been done into auto- expected effectiveness in the context of structural testing
mating effectively the production of the sought mutation (Harman and McMinn 2007) and (Lakhotia et al. 2010).
based test cases. This constitutes the main issue of the The origins of the present approach are due to the uti-
present research. lized dynamic fitness scheme. Thus, it becomes possible
Automating mutation testing requires the production to effectively direct the search process towards reaching,
of the candidate mutant programs and their execution infecting and impacting the targeted mutants. A performed
with the candidate test cases. This can be efficiently case study suggests that it can be more effective than ran-
automated with the mutant schemata approach (Untch dom testing and a previously proposed approach (Ayari
et al. 1993), (Papadakis et al. 2010), (Ma et al. 2005) et al. 2007).
which embeds all the candidate mutants into one sche- The contribution of the present work can be summa-
matic meta-program and thus, all tests are executed rized into the following points:
against this schematic program. Test execution poses an
additional barrier to mutation analysis as it requires test A novel scheme able to perform both mutation and
cases to be executed against all live mutants. To effect- search based testing.
ively reduce the execution time required for mutation, A novel fitness function for strongly killing mutants.
alternative methods called weak or firm mutation (Jia A novel dynamically adjusted fitness scheme able to
and Harman 2010), (Howden 1982), (Offutt and Untch improve the effectiveness of search based
2001) have been proposed. According to these methods approaches.
the program execution may stop after the mutated or a
succeeding program expression. Evaluation of the mutant The rest of this paper is organized as follows: Section
can be performed by checking the program state at the 2 presents the proposed system by detailing the pro-
stopping execution statement. Thus, remarkable execution posed approach. In Section 3 presents and analyzes the
savings can be achieved. Additionally, by utilizing weak obtained results from the application of the proposed
mutation and mutant schemata techniques all the weakly approach. Sections 4 and 5 discuss the relevance and the
killable mutants can be recorded with only one execution benefits of the application of the proposed system with
run (Papadakis et al. 2010), (Papadakis and Malevris previously proposed systems and approaches. Finally in
2011). The framework proposed in this paper takes advan- section 6 conclusions and future directions are given.
tage of this fact and executes only the weakly killed mu-
tants in order to determine the strongly killed ones. Thus, Framework description
the mutant execution cost is kept to a low level. The proposed framework tries to effectively automate
The practical use of an adequacy criterion requires the the production and evaluation of mutation based test
automated generation of test cases according to its re- data. The framework embeds all the candidate mutants
quirements. This can be a very laborious task (Offutt into one schematic meta-program suitable for both exe-
and Untch 2001) for any selected criterion including cuting mutants and recording the required test cases fit-
mutation. Recently, search based optimization techniques ness calculations. Then it produces test cases according
and tools have succeeded in automating the test case to the alternating variable method (Korel 1990) guided
generation activity effectively i.e. (Harman and McMinn by the schematic program. In the succeeding subsections
2007) and (Lakhotia et al. 2010). This paper introduces an details of the framework are given.
automated framework that produces test cases based on
strong mutation testing. In the proposed framework, the Generating mutants
mutants are automatically generated based on a novel ver- Dynamic approaches are based on the information
sion of the mutant schemata technique (Untch et al. 1993) gained through dynamic program runtime execution. In
for performing both mutation and search based testing. the context of structural testing the programs under test
The use of mutant schemata for mutation test data gener- host all the needed information in their structure and
ation purposes has also been investigated by (Papadakis thus, it is straightforward to implement a monitoring
and Malevris 2011), (Papadakis et al. 2010) in the context mechanism for the data evolution purpose. In the con-
of weak mutation, utilizing existing structural testing tools, text of mutation, there is a special need for unifying both
and in the context of strong mutation using dynamic sym- the original’s and the mutants’ runtime information. The
bolic execution (Papadakis and Malevris 2010a). Here the difficulties originate from the mutations’ method nature,
proposed approach incorporates a hill climbing algorithm as the needed information is spanned across the original
Papadakis and Malevris SpringerPlus 2013, 2:121 Page 3 of 12
http://www.springerplus.com/content/2/1/121
and the various mutants versions (Papadakis et al. 2010), Malevris 2011). Additionally, the fitness function calcula-
(Papadakis and Malevris 2009) programs. To efficiently tions have also been embedded into the schematic classes.
overcome this difficulty a special form of the mutant Thus, test execution requires only an initialization of the
schemata technique was employed in order to unify all mutant schemata class at the beginning of the execution
the mutation analysis requirements into a unique version and an additional call to the calculation function at the end
suitable for the test evolution representation. This ap- in order to produce all the required fitness calculations.
proach was initially introduced in (Papadakis et al. 2010) The proposed system employs the reflection mechan-
in the context of using existing test data generation tools ism of the Java language in order to execute the meta-
for performing mutation. Here the technique has been ex- program with the produced test cases and extracts the
tended in order to effectively guide the test data gener- fitness function calculations. As the proposed system
ation process. This is achieved by embedding a fitness performs a different search per live mutant, it executes
guide and evaluation inside the schematic functions. all the live mutants only when a search has come to a
The Mutant Schemata Generation (MSG) (Untch et al. success (kills the targeted mutant). This approach might
1993) technique encodes into one meta-program all the be less effective than executing all mutants against all
introduced mutants. This is achieved by appropriately produced tests (fitness evaluations) but keeps mutation
replacing each pair of operands participating in an execution overheads at a low level. Additionally, for fur-
operation with a call to a schematic function, with this ther savings it determines the strongly killed mutants by
pair of operands as parameters (e.g. a > b becomes executing only those that have been previously weakly
RelationalGT (a, b)). Expanding the suggestions of the killed, as pointed out in the introduction section.
MSG approach, the evaluation of the mutants’ execution
and all the required fitness calculations are performed Search based testing
within the schematic function. This as it is shown in Search based testing (Harman and McMinn 2007),
(Papadakis et al. 2010) reflects the killing mutants prob- (Wegener et al. 2001) formulates the test case gener-
lem to a path - branch coverage problem. By incorporat- ation problem to a search problem and tries to tackle it
ing the mutant evaluation into the schematic function, using search based optimization techniques. The search
the necessary conditions for killing each considered mu- process is guided by an appropriate fitness function
tant are also embedded. These conditions, which take which indicates how close the tests are in covering the
the following expression, are formed as decisions in the aimed program elements. To achieve this, a separate
schematic function. These conditions are of the follow- search is performed according to each live mutant. To
ing form: keep mutation execution overheads at a low level the
framework executes all the live mutants only when a
Original expression ≠ Mutated expression ð1Þ search has come to a success (kills the targeted mutant),
as suggested in Fraser and Zeller (2010).
The above decision expression (1) consists of two pos- The proposed framework uses the alternating variable
sible outcomes (i.e. the original is either equal to the method, proposed by Korel (1990). This method forms a
mutated or not). Thus, entailing the introduction of true hill climbing algorithm which has been shown to be
(mutant is killed) and false (mutant is alive) cases. Meas- quite effective compared with other search algorithms in
uring the closeness of making the above decision (1) the context of structural testing (Harman and McMinn
true, results in an effective measure of the test case fit- 2007). Hence, it forms an ideal choice as it is a quite
ness according to the weak mutation testing criterion simple to implement method and as expected, a quite
(Papadakis et al. 2010) and forms a part of the proposed powerful one. Here, it should be noted that mutation, in
fitness function (section 2.4). After the mutants’ evalu- particular weak mutation, can be transformed to branch
ation point, the program execution continues in order to testing (Papadakis and Malevris 2011), (Papadakis et al.
evaluate the mutant’s output and its propagation fitness, 2010) and since hill climbing performs similarly to its
as it is required by strong mutation. rivals, in the structural testing context (Harman and
McMinn 2007), there is no reason why this should not
Executing mutants with tests hold for mutants too. Nevertheless, this is beyond the
The present approach takes advantage of the unified scope of the present paper and is left open for future
representation of all mutants and their killing conditions research.
into one meta-program. Based on the use of the intro- The method starts by randomly initializing the input
duced schemata technique, mutant execution can be program variable values. Then it selects repeatedly and
performed straightforwardly utilizing only one program. adjusts one of those values by alternating it. This is
This is a direct consequence of the parameterized intro- performed until no further fitness improvement can be
duced mutants (Untch et al. 1993), (Papadakis and obtained i.e. no further alternations are fruitful. In this
Papadakis and Malevris SpringerPlus 2013, 2:121 Page 4 of 12
http://www.springerplus.com/content/2/1/121
case the method switches to the next input variable. The signify the True and False branch fitness of clause x re-
algorithm stops when no further fitness improvement spectively. Fulfilling the necessity constraints has been
can be recorded by selecting and alternating any of the found to be relatively ineffective at killing mutants that
input variables. Details of the method can be found in involve changes to predicate expressions (DeMillo and
Korel (1990). Offutt 1991). Thus, mutation fitness calculations should
quantify the distance of making changes to the mutant
and original program predicates (at the mutated state-
Fitness function ment). To achieve this it is needed to quantify the dis-
Search based testing requires the employment of an ap- tance of fulfilling the following expression:
propriate fitness function in order to be effective. The
present framework utilizes a fitness function composed ðOriginal pred ¼¼ T Mutated pred ¼¼ F Þk
of four parts. The first two are known as the approach ðOriginal pred ¼¼ FMutated pred ¼¼ T Þ
level and the branch distance introduced by Wegener
et al. (2001) in the context of structural testing. The Following the branch fitness calculations of Table 2, the
third one is named mutation distance and the fourth fitness of the above expression named predicate mutation
one is named impact distance. distance (pmd), is defined according to expression 2.
The approach level measures the closeness, of a candi- pmd ¼ min½Tfit ðOÞ þ Ffit ðMÞ; Tfit ðMÞ þ Ffit ðOÞ
date test case, for executing a target mutant statement.
It is calculated by counting the number of the target ð2Þ
mutant’s control dependent nodes missed by the candi- The O and M denote the original and the mutant
date input test. The branch distance quantifies the dis- predicates fitness calculations.
tance from flipping a branch i.e. making it from true to Impact distance tries to approximate the mutant suffi-
false or the opposite. It is computed using the runtime ciency condition (DeMillo and Offutt 1991), which is a
values of the branch expression of interest. The expres- difficult task to formalize (DeMillo and Offutt 1991).
sion of interest is the topmost of the missed ones from Following the suggestions made in Fraser and Zeller
the mutant control dependencies. This measure is calcu- (2010), one way of approximating this condition is to
lated based on the expression formulas of Table 1 which measure the impact on the mutant program execution.
was taken from the Awedikian et al. (2009) study on As this approach does not guide the search towards
MC/DC testing. Mutation distance as introduced in this some specific program parts, able to expose the intro-
paper reflects the branch distance measure on mutants. duced mutants, it was found to be ineffective with the
This approach is in line with the suggestions made by AVM method. Thus, the proposed fitness function tries
Bottaci (2001) for the mutation testing fitness calcula- to guide the search towards some specific program ele-
tions. It should be noted that these three measures guide ments which will hopefully be capable of exposing the
the search towards fulfilling the reachability and mutant mutants. To achieve this, it is suggested to record the
necessity constraints proposed by Demillo and Offutt impact (differences on the execution paths between the
(1991). Table 2 presents the expression formulas based original and mutant program versions) (Fraser and
on which mutation distance fitness calculations were Zeller 2010) of each mutant during the test generation
made. These formulas were obtained by simplifying and process. For each program node that has been impacted
reducing the necessity constraints and provide useful in- the ratio of the killed over the total number of mutants
formation for killing the considered mutants (based on is recorded. Informally, as tests are produced and exe-
the expression 1). In Table 2 the Ffit(x) and Tfit(x) cuted with mutants the nodes are ranked according to
their ability to expose mutants, when they are impacted.
Impact distance reflects the approach level and the
Table 1 Branch fitness branch distance on the mutant program towards cover-
Expression True branch False branch ing a selected top ranked node.
a == b abs( a - b) a == b?k : 0 Conclusively, the proposed fitness function guides the
a !=b a ! = b? 0 : k abs (a ! = b?a - b : 0) search towards reaching (approach level + branch dis-
a<b abs (a < b?0 : a - b + k) abs (a < b?a - b + k : 0) tance) a mutant, causing a discrepancy at the mutation
a<= b abs (a < = b?0 : a - b) abs (a < = b?a - b : 0) point (mutation distance), propagate it to the outcome
a>b abs (a > b?0 : a - b + k) abs (a > b?a - b + k : 0) of the mutant statement (predicate mutation distance)
a>= b abs (a > = b?0 : a - b) abs (a > = b?a - b : 0)
and impact specific likely to expose mutants, program
nodes (here referred to as impact nodes). Computing the
a || b min[fit(a), fit(b)] fit(a) + fit(b)
overall fitness of the test cases requires a unification of
a && b fit(a) + fit(b) min[fit(a), fit(b)]
the three used measures. This is done based on the
Papadakis and Malevris SpringerPlus 2013, 2:121 Page 5 of 12
http://www.springerplus.com/content/2/1/121
following equation where branch and mutation distances It was observed that trying to kill them, results into cov-
are normalized as in (Arcuri 2010): ering - reaching many other mutants (possible hard to
reach) collaterally i.e. without aiming at them. It is noted
fitness¼reach disþmutation disþimpact dis that many mutants are equivalent and thus by their def-
reach dis¼2approach levelþnormalizedðbranch distanceÞ;
mutation dis¼normalizedðmutation distanceÞ
inition aiming at them will result in a waste of effort. In
þnormalizedðpmdÞ practice, these two characteristics of mutation can pro-
impact dis¼approach levelþnormalizedðbranch distanceÞ vide useful information to assist the killing of some
ð3Þ other mutants. This paper proposes the concept of dy-
namic approach level that will serve as a yardstick for
improving the search process.
Dynamic approach level Search based approaches utilizing approach level and
Mutation testing introduces a vast number of mutants branch distance fitness functions have the drawback of
which are spanned across the whole program structure. leading to random testing (McMinn and Holcombe
Papadakis and Malevris SpringerPlus 2013, 2:121 Page 6 of 12
http://www.springerplus.com/content/2/1/121
2006) for certain types of programs. This is due to the particular weak mutation, can be transformed to branch
use of certain programming constructs such as the use testing (Papadakis and Malevris 2009), (Papadakis et al.
of flags, enumeration types and various data dependen- 2010), (Papadakis and Malevris 2011), (Papadakis and
cies (Awedikian et al. 2009). To avoid such a situation, Malevris 2012) and since hill climbing performs similarly
the dynamic approach level tries to include some data to its rivals, in the structural testing context (Harman
dependencies in the fitness evaluation. These data de- and McMinn 2007), there is no reason why this should
pendencies are not included in the “static” predefined not hold for mutants too. Nevertheless, this is beyond
approach level. The need for such an inclusion has also the scope of the present paper and is left open for future
been pointed out by Awedikian et al. (2009), who also research.
argued that doing so is quite easy and leads to an im- The method starts by randomly initializing the input
proved performance. program variable values. Then it selects repeatedly and
The rationale behind the use of the standard approach adjusts one of those values by alternating it. This is
level (Wegener et al. 2001) is to include only the struc- performed until no further fitness improvement can be
tural elements (control dependencies) that must be tra- obtained i.e. no further alternations are fruitful. In this
versed by any of the possible sought test cases. Consider case the method switches to the next input variable. The
a case where in order to traverse a targeted branch re- algorithm stops when no further fitness improvement
quires the program execution to execute a specific pro- can be recorded by selecting and alternating any of the
gram statement (data dependency) that is not part of the input variables. Consider the example of Figure 1. To
control dependencies of the targeted branch. Then, all make this example more understandable, let us assume
the possible test cases that traverse this branch also that when a mutant is weakly killed, it is also strongly
traverse the specific program statement. The dynamic killed. The same approach holds and in the opposite
approach level identifies all the common structural ele- case with the difference in the fitness calculations. In the
ments that traverse the produced test cases and thus, left part of Figure 1 the original sample program is
necessary data dependencies too. This way the path in- presented. In its right part the mutated meta-program is
formation gained during the whole search process can detailed. The introduction of the mutants is recorded in
be used and reproduced for infecting and eventually kill- the alterations made to the original program e.g. the
ing the aimed mutants. statement if ( i < k ) has become if (RelationalGT(i, k,
The dynamic approach level is defined as the intersec- 15)). The variables i and j are the two original operand
tion of all nodes that are contained in all the encoun- variables while 15 signifies that this expression contains
tered execution paths that reach a targeted node. Thus, the mutants identified by the relational operator (7 mu-
for example if a target node is x and during the search tants) with identification numbers from 15 to 21. Let the
process 5 different execution paths have been encoun- initial random inputs be: i = 150, j = 400, k = 300 and the
tered that lead to node x, then the dynamic approach target mutant the 15th one i.e. ( i < k to i < = k with mu-
level is formed as the common nodes of these 5 paths. If tant fitness abs(i–k)). The process at first selects the i in-
there is no path leading to node x, then the standard ap- put variable and performs exploratory steps (small
proach level is used. It is noted that in the absence of increases and decreases say p - here 1 for integer and
data dependencies the dynamic approach level could ap- 0.1 for real variables - of the input variable). These steps
proximate the standard approach level if most of the en- indicate the search direction. In the example here, i
countered paths have been executed. This approach should be increased as it results in better fitness values.
relies on the excessive search performed to kill all the in- After the determination of the search direction the
troduced mutants.
process continues with pattern steps (these steps are RQ 3: What is the impact of the dynamic approach
computed based on the formula: 2^n*direction*p, where level on the effectiveness of the examined
direction is 1 for increase or −1 for decrease). Thus, in approaches?
the above example the next obtained input values (pat-
tern steps) will be for the i variable 152, 154, 158, 166, To answer the above questions, the proposed frame-
182, 214, 278, 406. At this point the fitness function can- work was employed to generate test cases for a set of
not be further improved by altering the i input variable programs based on the mutation operators presented in
as the fitness also relies on the second branch point ( j < Table 2. It is noted that the results reported here are the
k ). The process continues with input variable j, it per- average values obtained from applying the examined ap-
forms exploratory steps and starts to decrease the j value proaches 10 times independently. In order to answer
as follows: 398, 396, 392, 384, 368, 336, 272. At that RQ1 the number of killed mutants was measured. With
point the process chooses the k input variable and starts respect to RQ2 the required fitness evaluations to pro-
increasing its value accordingly to 302, 304, 308, 316, duce the sought test data were measured. With respect
332, 364, 428. After value 428 it performs exploratory to RQ3 the experiment was repeated by utilizing the dy-
steps again and starts to decrease its value to 426, 424, namic approach level. Specifically, in the conducted ex-
420, 412, 396. Here it changes direction again and con- periments random testing and three fitness functions
tinues to 398, 400, 404, 412 where it decreases to 410, were utilized. The first fitness function named “Reach”
408, 404 and finally finds the required value 406 that uses only the reach distance of expression 3 and corre-
kills the mutant. The process has effectively achieved to sponds to the fitness function suggested by Ayari et al.
produce the test case ( i = 406, j = 272, k = 406 ) that kills (2007). The second one called “Infect” uses the Reach
the mutant ( i < k to i < = k ). If this procedure fails to and Infect distances of expression 3 and the third one
kill the required mutant the process restarts by using named “Impact” utilizes expression 3. In the second ex-
new randomly selected inputs for i, j and k. Of course, periment the followed process was to iteratively and
this could be a consequence of hitting a local minimum continuously perform one attempt to kill all the live mu-
or a consequence of an equivalent mutant. tants using the standard approach level and one using
the dynamic approach level. For each test subject the
Evaluation test generation process considered up to 50,000 fitness
This section empirically evaluates the effectiveness of evaluations or random test generations (for random test-
the propositions made in this paper based on two ex- ing) for all the introduced mutants. This is a consider-
periments. The first experiment compares the effective- ably high number of evaluations but it is forced due to
ness of the proposed framework to perform mutation the existence of equivalent mutants.
using three fitness functions and random testing. The
second experiment, examines the impact on the frame- Results and analysis
work’s effectiveness to when utilizing the dynamic The experimental evaluation of the proposed system was
approach level. performed based on the use of 8 program units. The se-
lected programs have been used in various studies
Experimental design (DeMillo and Offutte1991), (Papadakis et al. 2010),
The experiment described in this section uses the pro- (Sthamer 1996), (Murrill 2008). The test objects are pro-
posed mutation testing framework on a set of Java pro- grams with various characteristics such as mathematical
grams using the mutation operators presented in Table 2 computations, array manipulations, state based behavior
along with the incorporated fitness necessity formulas. and complex branching conditions. A considerable num-
The framework works on Java programs (produces mu- ber of mutants, 2,759, were produced based on the use
tation operators) with a primary target at the intra of all the operators employed by the framework. Particu-
method level (Ma et al. 2005), similar to non Object- lars of all the used programs are given in Table 3. Table 3
Oriented languages (see section V.A for details about records details about the test objects’ lines of code, input
the framework capabilities and limitations). The experi- settings (input domain search space) and the number of
ment described in this section empirically investigates produced mutants.
the following Research Questions (RQs): The experiment tries to reveal the ability of the frame-
work to perform mutation testing and its effectiveness
RQ 1: How effective are the adopted fitness compared to a previously proposed approach without
functions compared to a previously proposed one any particular assistance. That is, none of the equivalent
(Ayari et al. 2007) and random testing? mutants was eliminated from the candidate mutant set,
RQ 2: What is the relative efficiency of the adopted fact that allows considerable overheads to the conducted
fitness functions? experiment. Furthermore, no data dependencies, state
Papadakis and Malevris SpringerPlus 2013, 2:121 Page 8 of 12
http://www.springerplus.com/content/2/1/121
related information or flag removal approaches were Compared with random testing it can be observed that
employed in order to make the test search more effi- in general it performs worse than the Infect and Impact
cient. The effective incorporation of such approaches is fitness irrespective of the use or not of the dynamic ap-
considered out of the scope of the present paper and proach level. However, it performs similarly to Reach
thus, has been left for future work. without the use of the dynamic approach level and
Results of the performed experiments are presented in worsens when Reach utilizes it. Here it must be noted
Table 4. Table 4 records per test subject the number of that under the framework’s process of executing mu-
killed mutants by the produced test data according to tants, which determines the collaterally killed ones when
random testing (Random) and the three employed fit- and only when it has killed a targeted mutant, the com-
ness functions utilizing the standard (Reach, Infect and parison made is in favor of random testing (in random
Impact) and the dynamic approach levels (DReach, testing all tests are executed against all mutants). Never-
DInfect and DImpact). Additionally, Figure 2 reports the theless, even in such a case the proposed approach out-
sum of the killed mutants for various fitness evaluation performs random testing.
limits when using either static or dynamic approach level. By employing the proposed framework with 150,000
The obtained results provide evidence in support of fitness evaluations on the Trityp program (DeMillo and
the proposed fitness functions (Infect and Impact) which Offutt 1991), which is a well established benchmark in
outperform a previously proposed one (Reach) and ran- both search based and mutation testing studies, the re-
dom testing (RQ1). Additionally, the use of dynamic sults presented in Figure 3 can be obtained. From this
approach level improves the effectiveness of all the ex- figure it becomes evident that the Impact fitness utilizing
amined fitness functions (RQ3). Specifically, the Infect the dynamic approach level can lead to a considerably
and Impact fitness functions kill on average 113 and 122 high number of killed mutants. In this case it manages
more mutants than the Reach one respectively, using the to kill all but two of the killable mutants. This fact sug-
standard approach level. The use of the dynamic ap- gests that the proposed approach can be quite powerful
proach level results in an increase with all three exam- in producing mutation based test cases.
ined functions by killing 71, 40 and 87 more mutants for
Reach, Infect and Impact fitness functions respectively. Related work
Additionally, the convergence of all the examined fitness The alternating variable method was initially proposed
functions is higher for the high number of evaluations. by Korel (1990) which was adopted by the present
This is due to the fact that for higher number of execu- framework for finding the appropriate tests. Daimler
tions more paths are included in the dynamic approach (Wegener et al. 2001) developed an automated tool for
level. Recall that the dynamic approach level is adopted testing C programs based on various structural testing
according to all the encountered execution paths. criteria. It is this tool’s fitness function that is extended
Considering the approach efficiency (RQ2) the number by the present research.
of mutant evaluations should be examined. From Figure 2 Test case generation techniques for mutation testing
it can be observed that both the Infect and Impact fitness have received little attention in the literature. The most
are more efficient than the Reach one even for a small fundamental attempt is the one due to DeMillo and
number of evaluations (approximately over 4,500 evalua- Offutt who developed the constraint based testing
tions). For less than 4,500 evaluations all the approaches Technique (DeMillo and Offutt 1991). This technique
share a similar effectiveness and efficiency. The use of dy- introduced the concept of reachability, necessity and
namic approach level generally improves the efficiency of sufficiency conditions which have been embodied in a
the utilized fitness functions as for the same number of tool called Godzilla. Godzilla contains the first two of
evaluations it kills more mutants. these conditions, formulating and resolving them as
mathematical systems of constraints. Formulating and gain and dynamically adopt the required fitness informa-
resolving reachability and necessity constraints forms a tion through mutant schemata is presented by the present
difficult task. In order to efficiently handle this task, in paper.
(Papadakis and Malevris 2012) and (Papadakis and The idea of utilizing mutant schemata in order to help
Malevris 2009) it is suggested to use a path selection automated tools to perform mutation was initially intro-
strategy that reduces the effects of infeasible paths. duced in (Papadakis et al. 2010) with the aim of reusing
Bottaci (2001) proposed a fitness function composed of existing structural automated tools for performing muta-
the reachability distance (measures the closeness of the tion. The underlying technique to achieve this was to re-
test data and the mutant statement) of the produced duce the weakly killing mutant problem to the covering
tests and the necessity distance (measures the closeness branches one. In (Papadakis and Malevris 2010a) the
to kill the mutant statement). In (Ayari et al. 2007) a schemata approach was extended to utilize dynamic
search based approach for the generation of mutation symbolic execution for producing strong mutation based
test data was proposed by implementing only the tests. In the present paper, mutant schemata were used
reachability part of the Bottaci (2001) fitness function. in order to enable the fitness guidance towards killing
More recently, Fraser and Zeller (2010) proposed an- mutants for strong mutation.
other evolutionary based approach to automate the pro-
duction of mutation tests. This approach uses the Discussion
rechability part of the Bottaci’s fitness function (Bottaci The origin of the proposed framework is due to the inte-
2001) and approximates the necessity and sufficiency grated use of mutant schemata and evolutionary testing
conditions by measuring the mutant’s impact (Fraser techniques utilizing a novel fitness function. This inte-
and Zeller 2010). They argue that producing tests with gration helps to extract dynamic program information
higher mutants’ impact, results in tests closer to kill concerning the introduced mutants and fitness calcula-
those mutants. The above two approaches are the clos- tions efficiently. The conducted case study indicates the
est ones to the present proposed framework. The main ability of the proposed method to produce high quality
differences are that the proposed framework extends test cases from scratch (starting from random inputs).
the fitness function to effectively direct the search to- Additionally, it indicates the improved performance over
wards necessity and sufficiency conditions (DeMillo and random testing and a previously proposed approach. In
Offutt 1991). Additionally, a novel technique to efficiently fact the proposed fitness functions are compared with
Figure 2 Mutants killed by the utilized approaches for various fitness evaluation limits.
Papadakis and Malevris SpringerPlus 2013, 2:121 Page 10 of 12
http://www.springerplus.com/content/2/1/121
the one proposed in (Ayari et al. 2007). A similar to Tool characteristics and limitations
(Ayari et al. 2007) approach has also been proposed by The proposed framework in this paper has several spe-
Fraser and Zeller (2010) who extend it by including the cial characteristics and limitations which are currently
mutants impact in their fitness. Such an inclusion was under research. Generally, it can handle Java programs
attempted in the conducted case study. The obtained re- only at the intra method level. Thus, it does not handle
sults were similar to the ones obtained by the reach fit- method sequences or Object Oriented features. Its appli-
ness function. This is due to the use of hill climbing and cation treats one method at a time using predefined
the absence of effective guidance towards some specific method sequences. The inability of the mutant schemata
program statements. technique to handle certain Object Oriented mutants as
Equivalent mutants help the proposed approach to identified in (Ma et al. 2005) limits the propositions
build the dynamic approach level as they force the made in this paper to the intra method level. Here it
search towards various and different program statements must be noted that in the case of Logical operators, a
and conditions. Despite this, from the conducted case necessary special handling was enforced. This is due to
study it becomes evident that equivalent mutants pose the short circuit evaluation mechanism performed by
an additional burden to test case evolution as they force the Java language. In order to keep the program execu-
the method not only to search for non killable mutants tion paths unaffected with the presence of mutants, the
(huge effort) but also by misleading the mutation score logical operator’s evaluations were performed when both
calculated due to their presence. This fact explains why logical operands were executed.
the proposed approach spends so many fitness evalua-
tions in order to kill additional mutants. Perhaps the use Threats to validity
of some heuristic approaches such as the one suggested The present paper focuses on presenting an automated
in (Kintis et al. 2012) for isolating equivalent mutants, mutation testing system. One possible threat to the val-
could be employed in order to overcome this problem. idity of the results reported here may be related to the
This paper also reveals that simple dynamic approaches generalization of the obtained results. Thus, the frame-
can be quite effective for the production of high quality work’s effectiveness may vary in other cases. However, as
test cases. Based on the dynamic nature of the adopted the proposed method utilizes and extends the sugges-
approach, the problems caused by pointers and non lin- tions made by DeMillo and Offutt (1991), Bottaci (2001)
ear expressions are limited. and Fraser and Zeller (2010), their use is expected to in-
The framework described here uses a quite simple but crease the effectiveness of the search based approaches.
practical approach, based on the mutant schemata tech- Additionally, the results obtained may serve as a yard-
nique (Untch et al. 1993), in order to perform the test stick towards the employment of mutation testing in an
data generation process. This approach is an incremental automated fashion and in order to indicate that it is pos-
approach that targets first on reaching, then weakly kill- sible to adopt mutation for the testing activity, resulting
ing and then strongly killing the introduced mutants. in the production of high quality tests.
This incremental process helps on producing some tests The proposed framework uses weak/strong mutation
capable to weakly kill some strongly equivalent mutants. and mutants’ impact for guiding and evaluating its test
These tests should be valuable and should increase the production effectiveness utilizing the AVM method. This
performed testing quality. does not necessarily mean that the results here can be
Papadakis and Malevris SpringerPlus 2013, 2:121 Page 11 of 12
http://www.springerplus.com/content/2/1/121
doi:10.1186/2193-1801-2-121
Cite this article as: Papadakis and Malevris: Searching and generating
test inputs for mutation testing. SpringerPlus 2013 2:121.