You are on page 1of 13

Engineering Applications of Artificial Intelligence 106 (2021) 104454

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence


journal homepage: www.elsevier.com/locate/engappai

Automated test case generation for path coverage by using grey prediction
evolution algorithm with improved scatter search strategy
Gaocheng Cai, Qinghua Su ∗, Zhongbo Hu ∗
School of Information and Mathematics, Yangtze University, Jingzhou, Hubei, China

ARTICLE INFO ABSTRACT


Keywords: Automated test case generation for path coverage (ATCG-PC), as an important task in software testing, aims to
Automated test case generation achieve the highest path coverage of a tested program by using as little computational overhead as possible. In
Grey prediction evolution algorithm ATCG-PC, ‘‘similar paths are usually executed by similar test cases’’ is a problem-specific knowledge which was
Improved scatter search strategy
touched by a handful of researchers but still underutilized. Inspired by the problem-specific knowledge, this
Path coverage
paper designs a local search strategy by improving a scatter search strategy, and then proposes a grey prediction
evolution algorithm with the improved scatter search strategy for ATCG-PC. Here, the improved scatter search
strategy could obtain two feasible test cases by exploiting a dimension of a test case covering a certain path.
The proposed algorithm is constructed by importing the improved scatter search strategy to the end of the
reproduction operation of the grey prediction evolution algorithm holding strong exploration ability. Grey
prediction evolution algorithm is first applied to solve ATCG-PC. The performance of the proposed algorithm is
evaluated on six fog computing benchmark programs and six natural language processing benchmark programs.
The experimental results demonstrate that the proposed algorithm can achieve the highest path coverage with
the fewer test cases and running time than some state-of-the-art algorithms.

1. Introduction a tested program rather than actual data, generates test cases by solving
related constraints. It can check the behavior of a tested program
With the rapid development of computer technology, software test- along specific paths, but is less effective in handling arrays, procedure
ing aiming at finding software errors becomes increasingly important calls, pointer references, and infinite loops. The search-based testing
in software industry. The ever-increasing software size is leading to an converts ATCG-PC into an optimization problem and then generates
increasing complexity of software testing. As a result, automated soft- test cases according to the solution of the optimization problem via
ware testing (AST) (Anand et al., 2013; Xing et al., 2015) has received search-based algorithms. For search-based algorithms do not depend
a growing attention. The main task of AST is to automatically generate on gradient information and can automatically retrieve the solution in
test cases which can meet a certain coverage criterion, i.e., condi-
a certain search space, the search-based testing has become a develop-
tion coverage, sentence coverage, branch coverage, or path coverage
ment trend of solving ATCG-PC. Many search-based algorithms, i.e., hill
criterion. In particular, automated test case generation for path cov-
climbing (HC) (Khari et al., 2020), tabu search (TS) (Bhattacharjee and
erage (ATCG-PC) becomes the hottest topic in the AST community
Pati, 2014), alternating variable method (AVM) (Korel, 1990), genetic
when researchers realize that the condition coverage, sentence cover-
age, and branch coverage can be transformed into the path coverage algorithm (GA) (Yao and Gong, 2014), particle swarm optimization
criterion (Clarke, 1976; Sun et al., 2019; Gong et al., 2020). (PSO) (Sahoo and Ray, 2020), artificial bee colony (ABC) (Mala et al.,
ATCG-PC is aimed to generate a test case set which can obtain the 2010), differential evolution (DE) (Huang et al., 2017), ant colony op-
highest path coverage of a tested program. This is a difficult problem timization (ACO) (Srivastava et al., 2009), negative selection algorithm
with a highly non-linear structure. The technologies to solve ATCG-PC (NSA) (Mohi-Aldeen et al., 2016) or their improved versions, have
mainly include three categories, i.e., random testing, symbolic execu- obtained competitive simulation results for ATCG-PC.
tion, and search-based testing (Anand et al., 2013; Zamli et al., 2016; As we all know, application-oriented algorithms should be designed
Nosrati et al., 2020). The random testing generates test cases by random to take advantage of problem-specific knowledge as much as possible
sampling in the entire input domain of a tested program. It is easy to guide algorithmic search. The search-based algorithms which have
to implement and runs fast, but generally has low path coverage. The been applied to solving ATCG-PC adopt two kinds of approaches to
symbolic execution, in which symbolic values are used as input data of gain searching power from the specific knowledge of ATCG-PC. One

∗ Corresponding authors.
E-mail addresses: caigcs@126.com (G. Cai), suqhdd@126.com (Q. Su), huzbdd@126.com (Z. Hu).

https://doi.org/10.1016/j.engappai.2021.104454
Received 26 February 2021; Received in revised form 28 August 2021; Accepted 28 August 2021
Available online xxxx
0952-1976/© 2021 Published by Elsevier Ltd.
G. Cai, Q. Su and Z. Hu Engineering Applications of Artificial Intelligence 106 (2021) 104454

approach is to obtain hidden knowledge of ATCG-PC by a certain mean two categories of the search-based algorithms which obtain their search
of data mining. The other is to directly apply a certain prior knowledge power from the specific knowledge of ATCG-PC.
of ATCG-PC. A prior knowledge which has been noticed by a handful The first category of algorithms improve their search power from
of researchers is: similar paths are usually executed by similar test cases. the hidden knowledge of ATCG-PC obtained by certain data mining
And considering that a test case is actually a higher-dimensional vector methods. Huang et al. (2018) proposed an improved DE with a rela-
composed of all input variable values of a tested program. So it can tionship matrix (RP-DE), in which the relationship matrix between the
be inferred that there is a certain relevance between the dimensions dimensions of test case and covered paths can guide DE to perform
of test cases covering similar paths. We define the prior knowledge as more searches on the dimensions relevant to the target path. Dai
‘‘dimensional relevance of test cases’’ (DRTC). A detailed review of the et al. (2021) proposed a DE with node branch archive (NBAr-DE), in
two kinds of approaches can be found in the related work subsection which a three-dimensional matrix is used to record the relationship
of the next section. between the values of test case variables and node branches according
Inspired by DRTC, we found that the test case of an uncovered path to the information of the path covered by generated test cases, and the
may only be different from the test cases of its similar paths in some values covering node branches recorded in the relationship matrix are
dimensions. It can be imagined if we could use this principle to design a assigned to the corresponding dimensions of the current test case.
local search strategy for assisting a search-based algorithm with strong
The second category of algorithms improve their search power by
global search ability, then a competitive performance for ATCG-PC can
directly using the prior knowledge of ATCG-PC (i.e., DRTC). Bueno and
be expected. Following this way, this paper proposes a grey prediction
Jino (2002) proposed an improved GA with path-test-cases database,
evolution algorithm with improved scatter search strategy (GPE-IS).
which stores generated test cases and paths covered by them. Test cases
Here, the grey prediction evolution algorithm (GPE) is a competitive
covering the similar paths of a uncovered path are selected from the
search-based algorithm with strong global search ability and improved
database to establish the initial population of GA, then GA is used
scatter search (IS) strategy is a local search strategy designed according
to generate test case covering the uncovered path. Cao et al. (2009)
to DRTC rule.
introduced an improved GA with overlapped path similarity-based
Unlike the classical scatter search (SS) strategy, IS strategy obtains
two feasible test cases by exploiting a dimension of a test case covering multi-path fitness function (OPS_M), which can guide GA to generate
a certain path. In fact, from the related work of the next section, we test cases for multiple paths in one run by making use of discovered test
can know it is the first time that DRTC rule is applied to design a local cases covering some paths. Not long ago, Gong et al. (2020) proposed
search strategy for assisting a search-based algorithm. The search-based an improved multi-population GA with target paths grouping method
algorithm GPE is first applied to solve ATCG-PC. based on path similarity, in which all target paths are divided into
In sum, this paper mainly completes the following works. multiple groups and any pair of paths in the same group are highly
similar. When test cases covering some paths are generated in a group,
• The prior knowledge of ‘‘similar paths are usually executed by these test cases are used by the multi-population GA to quickly generate
similar test cases’’ is analyzed in depth and defined . their similar test cases covering other paths in the same group.
• A local search strategy (IS strategy) inspired by the prior knowl- As can be seen from the above, the first category of algorithms
edge is designed. may take a certain computational cost to obtain sufficient hidden
• GPE-IS is proposed to solve ATCG-PC. knowledge of ATCG-PC by certain data mining methods. In addition,
• The performance of the proposed algorithm is investigated on the data mining methods will also increase the complexity of these
six fog computing benchmark programs and six natural language algorithms. Therefore, this paper focuses on developing the second
processing (NLP) benchmark programs. category of algorithms. Obviously, DRCT has not been utilized to design
The remainder of this paper is organized as follows. Section 2 a local search strategy for improving the search power of search-based
introduces a preliminary including a related work about search-based algorithms. If a local search strategy that takes full advantage of DRTC
algorithms with problem-specific knowledge, several basic concepts of is designed to assist a search-based algorithm with strong global search
ATCG-PC and a fitness function. Original GPE, proposed IS strategy ability, it is likely to generate an effective technique with strong local
and GPE-IS are introduced in Section 3. The application of GPE-IS and global search abilities. In this paper, DRTC is utilized to design a
in ATCG-PC is described in Section 4. In Section 5, three groups local search strategy for assisting a search-based algorithm with strong
numerical experiments are conducted on six fog computing and six NLP global search ability. This is the first time that DRTC has been applied
benchmark programs to investigate the performance of GPE-IS. Finally, to design a local search strategy to assist a search-based algorithm.
a conclusion section is given.
2.2. Related concepts of ATCG-PC
2. Preliminary
This subsection introduces four basic concepts of ATCG-PC, in-
The application of specific knowledge can develop the performance
cluding control flow graph (CFG), test case, path, as well as path
of search-based algorithms for ATCG-PC. The specific knowledge used
coverage (Korel, 1990; Saadatjoo and Babamir, 2019).
in the search-based algorithms concludes the hidden knowledge of
ATCG-PC obtained by certain data mining methods and a prior knowl-
Definition 1 (CFG). CFG 𝐺 = (𝑁, 𝐸, 𝑠, 𝑒), which can be used to visualize
edge of ATCG-PC. In this section, the related work of search-based
the structure of a tested program simply, is a directed graph, where 𝑁 is
algorithms with specific knowledge for ATCG-PC is first introduced.
a set of nodes with an entry node (𝑠) and an exit node (𝑒), 𝐸 represents
Then, some basic concepts of ATCG-PC and the fitness function used
in this paper are described. a set of edges. Each node 𝑛𝑓 ∈ 𝑁 corresponds to a programming
statement in a program, an edge (𝑛𝑓 , 𝑛𝑔 ) ∈ 𝐸 is a control transformation
2.1. Related work from 𝑛𝑓 to 𝑛𝑔 . Each out-coming edge of a conditional or decision node
is known as a branch, which can be marked with a predicate that
Some search-based algorithms have been widely used in ATCG-PC, describes the conditions under which the branch will be traversed.
due to their generality. In general, they can usually obtain desirable
performance when specific knowledge of ATCG-PC is utilized. As far Definition 2 (Test Case). A test case usually denotes a vector in which
as we know, only a few researchers utilize the specific knowledge of each dimension is a feasible value in the corresponding domain of a
ATCG-PC to assist search-based algorithms. There are the following tested program.

2
G. Cai, Q. Su and Z. Hu Engineering Applications of Artificial Intelligence 106 (2021) 104454

Table 1
The branch functions of branch predicates.
No. Branch predicate Branch distance function
1 Boolean If true then 0 else 𝐾
2 𝑎>𝑏 If 𝑏 − 𝑎 < 0 then 0 else (𝑏 − 𝑎)+𝐾
3 𝑎≥𝑏 If 𝑏 − 𝑎 ≤ 0 then 0 else (𝑏 − 𝑎)+𝐾
4 𝑎<𝑏 If 𝑎 − 𝑏 < 0 then 0 else (𝑎 − 𝑏)+𝐾
5 𝑎≤𝑏 If 𝑎 − 𝑏 ≤ 0 then 0 else (𝑎 − 𝑏)+𝐾
6 𝑎=𝑏 If |𝑎 − 𝑏| = 0 then 0 else |𝑎 − 𝑏| +𝐾
7 𝑎≠𝑏 If |𝑎 − 𝑏| ≠ 0 then 0 else 𝐾
8 𝑎∧𝑏 BD(𝑎) + BD(𝑏)
9 𝑎∨𝑏 Min (BD(𝑎), BD(𝑏))
10 ¬𝑎 Negation is propagated over 𝑎

Note: 𝐾 is a constant greater than 0, and set to 1 in this paper.


Fig. 1. An example program and its control flow graph.

𝒙 = (5, 110). The test case 𝒙 only does not satisfy the branch predicate
Definition 3 (Path). In a tested program, a path consists of a sequence ‘‘𝑥1 == 0’’. According to the branch distance functions of the sixth and
of nodes with an entry node (𝑠) and an exit node (𝑒), such as < third branch predicates in Table 1, 𝐵𝐷(𝒙, 𝑝𝑡 ) = (|5-0|+1) + 0 = 6. Thus,
𝑠, 𝑛1 , … , 𝑛𝑔 , 𝑒 >, where each edge composing of nearly two nodes we can obtain the fitness function value of 𝒙.
belongs to 𝐸.

Definition 4 (Path Coverage). As a coverage criterion, path coverage 3. Grey prediction evolution algorithm with improved scatter
reflects the proportion of all possible paths covered by generated test search strategy
cases.
Inspired by DRTC, an improved scatter search (IS) strategy with
For illustrating the above definitions, an example program is em-
ployed in Fig. 1. In CFG of Fig. 1, each circle denotes a node, such strong local search ability is designed to assist the grey prediction evo-
as ⃝ 1 and ⃝. 3 ⃝ 1 and ⃝ 3 are two decision nodes. < 1, 2 > and < lution algorithm (GPE) with strong global search ability. In this section,
1, 3 > represent two edges. This two edges are out-coming edges of the the original GPE is first introduced. Then, an IS strategy inspired by
decision node ⃝, 1 so they are also called branches < 1, 2 > and < 1, 3 >, DRTC, and a grey prediction evolution algorithm with improved scatter
and associated with the branch predicates ‘‘𝑋 = 97’’ and ‘‘𝑋 ! = 97’’, search strategy (GPE-IS) are proposed successively.
respectively. < 𝑠, 1, 2, 3, 4, 𝑒 >, < 𝑠, 1, 2, 3, 𝑒 >, < 𝑠, 1, 3, 4, 𝑒 >, and
< 𝑠, 1, 3, 𝑒 > denote all paths in CFG.
3.1. Grey prediction evolution algorithm
2.3. Fitness function
GPE proposed by Hu et al. (2020a) in 2020 is a competitive global
A fitness function is an important part of a search-based algorithm, optimization algorithm with few parameters and simple coding. Several
because it can effectively guide the algorithm to find the test cases that GPE variants have exhibited excellent performance and strong global
can cover uncovered paths via evaluating the quality of each generated search ability in solving some problems, such as environmental eco-
test case in each iteration (Lv et al., 2018). This paper aims at a nomic dispatch and constrained engineering optimization problems (Xu
single-path coverage problem, in which one path is considered in each et al., 2020; Gao et al., 2020; Hu et al., 2020b; Dai et al., 2020; Zhou
iteration. Therefore, a fitness function considering a path is employed et al., 2021; Hu et al., 2021), but GPE and its variants have not been
in this paper. employed to solve ATCG-PC. In this paper, GPE integrates with a local
Branch distance (BD) (Mala et al., 2010; Huang et al., 2017; Liu search strategy inspired by DRTC to construct an improved algorithm
et al., 2019; Tracey et al., 1998) and approach level (AL) (Lin and with both strong local and global search abilities. In this subsection,
Yeh, 2001; Sahin and Akay, 2016) are popular heuristics in the design
the evolution process of GPE (Hu et al., 2020a) will be introduced,
of the fitness function considering a path. BD reflects the degree of
including an initialization operation, a reproduction operation, and a
deviation between a branch predicate actually traversed and a desired
selection operation. Unlike other evolutionary algorithms, GPE uses
branch predicate, while Al reflects the number of mismatched branch
the reproduction operation to generate trial population instead of
predicates between a target and the path covered by an input. In this
paper, a widely used BD-based fitness function (Liu et al., 2019; Huang conventional mutation and crossover operations.
et al., 2018) is utilized to evaluate the quality of each generated test
case. For an input 𝒙, the BD-based fitness function can be expressed as 3.1.1. Initialization
follows (Liu et al., 2019).
In GPE, the 𝑖th individual of the population in the 𝑔th generation

𝜉
1 is expressed as 𝒙𝑔𝑖 = (𝑥𝑔𝑖,1 , 𝑥𝑔𝑖,2 , … , 𝑥𝑔𝑖,𝑗 , … , 𝑥𝑔𝑖,𝐷 ), 𝑖 = 1, 2, … , 𝑁, and
𝑓 𝑖𝑡𝑛𝑒𝑠𝑠𝐵𝐷 (𝒙) = (1) 𝑔 = 1, 2, … , 𝐺, where 𝐷 represents the dimension of individual, 𝑁
𝑛=1
𝜀 + 𝐵𝐷(𝒙, 𝑝𝑡𝑛 )
represents the population size, and 𝐺 is the maximum number of
where 𝑝𝑡 denotes a current target path; 𝜉 is the number of branch generations. In the initialization process of GPE, it is essential that three
predicates of 𝑝𝑡 ; 𝜀 is a smaller constant to avoid a zero denominator; initialization populations are generated in a feasible region. The 𝑗th
𝐵𝐷(𝒙, 𝑝𝑡𝑛 ), BD of 𝒙 on the 𝑛th branch predicate of 𝑝𝑡 , can be calculated
dimension of the 𝑖th individual in the first to the third generation of
by the branch distance functions in Table 1, which are introduced by
GPE can be generated by the following equation.
Tracey et al. (1998); The second column of Table 1 lists typical branch
predicates, and the third column is their corresponding branch distance 𝑥𝑔𝑖,𝑗 = 𝑙𝑜𝑤𝑗 + 𝑟𝑎𝑛𝑑(0, 1) ⋅ (𝑢𝑝𝑗 − 𝑙𝑜𝑤𝑗 ), 𝑔 = 1, 2, 3 (2)
functions.
A simple example is given to illustrate the calculation process of the where 𝑟𝑎𝑛𝑑(0, 1) denotes a random number generated via a uniform
fitness function based on Table 1. Let there be two branch predicates in distribution with a range of 0 to 1. 𝑢𝑝𝑗 and 𝑙𝑜𝑤𝑗 represent the upper
a target path, i.e. ‘‘𝑥1 == 0’’ and ‘‘𝑥2 >= 10’’, and an input test case be and lower bounds of the 𝑗th dimension, respectively.

3
G. Cai, Q. Su and Z. Hu Engineering Applications of Artificial Intelligence 106 (2021) 104454

3.1.2. Reproduction operator For a given problem, 𝑣𝑘,𝑗 may jump out of the constraint [𝑙𝑜𝑤𝑗 , 𝑢𝑝𝑗 ].
In the reproduction operation of GPE, a population series composed When this happens, 𝑣𝑘,𝑗 will be re-assigned via the following equation.
of three consecutive generations is regarded as a time series to construct
{
an exponential function for predicting the next generation population. 𝑟𝑎𝑛𝑑(𝑙𝑜𝑤𝑗 , 𝑢𝑔𝑖,𝑗 ), if 𝑣𝑘,𝑗 < 𝑙𝑜𝑤𝑗
Let three individuals 𝒙𝑟1 , 𝒙𝑟2 , 𝒙𝑟3 be randomly selected from three con- 𝑣𝑘,𝑗 = (9)
| | 𝑟𝑎𝑛𝑑(𝑢𝑔𝑖,𝑗 , 𝑢𝑝𝑗 ), if 𝑣𝑘,𝑗 > 𝑢𝑝𝑗
secutive populations 𝑋 𝑔−2 , 𝑋 𝑔−1 , 𝑋 𝑔 separately, 𝑑12 = |𝑥𝑟1,𝑗 − 𝑥𝑟2,𝑗 |,
| |
| | | | Step 2: Updating the individual 𝒖𝑔𝑖
𝑑13 = |𝑥𝑟1,𝑗 − 𝑥𝑟3,𝑗 |, 𝑑23 = |𝑥𝑟2,𝑗 − 𝑥𝑟3,𝑗 |, 𝑀𝑎𝑥𝑑𝑟 = 𝑚𝑎𝑥{𝑑12 , 𝑑23 , 𝑑13 },
| | | |
and 𝑀𝑖𝑛𝑑𝑟 = 𝑚𝑖𝑛{𝑑12 , 𝑑23 , 𝑑13 }. Then the 𝑗th dimension of the 𝑖th In each iteration of IS strategy, the update of the individual 𝒖𝑔𝑖 is
𝑔 𝑔 based on a greedy selection mechanism. For the fitness function 𝑓 (𝒙)
individual 𝒖𝑖 of a trial population 𝑈 can be generated by the following
formulation. in Eq. (1), the individual with the maximum fitness function value will
be used to update 𝒖𝑔𝑖 in the comparison of 𝒗1 , 𝒗2 and 𝒖𝑔𝑖 . This process
⎧ (1 − 𝑒𝑎 )(𝑥𝑟1,𝑗 − 𝑏 )𝑒−3𝑎 , if 𝑀𝑖𝑛𝑑𝑟 ≥ 𝛾
⎪ 𝑎 can be expressed as the following equation since the three individuals
𝑢𝑔𝑖,𝑗 = ⎨ 𝑥𝑟3,𝑗 + 𝑤 ⋅ 𝑀𝑎𝑥𝑑𝑟 , elseif 𝑀𝑎𝑥𝑑𝑟 < 𝛾 (3) differ in only 𝑗th dimension.
⎪ 4𝑥𝑟3,𝑗 +𝑥𝑟2,𝑗 −2𝑥𝑟1,𝑗 . otherwise
⎩ 3 ⎧ 𝑣 , if 𝑓 (𝒗1 ) = Max {𝑓 (𝒗1 ), 𝑓 (𝒗2 ), 𝑓 (𝒖𝑔𝑖 )}
⎪ 1,𝑗
𝑢𝑔𝑖,𝑗 = ⎨ 𝑣2,𝑗 , if 𝑓 (𝒗2 ) = Max {𝑓 (𝒗1 ), 𝑓 (𝒗2 ), 𝑓 (𝒖𝑔𝑖 )} (10)
⎧ 2(𝑥𝑟2,𝑗 −𝑥𝑟3,𝑗 ) ⎪ 𝑢𝑔 ,
⎪ 𝑎 = 𝑥 +𝑥 otherwise
⎩ 𝑖,𝑗
⎪ 𝑟2,𝑗 𝑟3,𝑗
2((𝑥𝑟2,𝑗 )2 +𝑥𝑟1,𝑗 ⋅𝑥𝑟2,𝑗 −𝑥𝑟1,𝑗 ⋅𝑥𝑟3,𝑗 )
⎨ 𝑏= (4) The pseudocode of IS strategy is shown in Algorithm 1. In lines 1–
⎪ 𝑥𝑟2,𝑗 +𝑥𝑟3,𝑗
⎪ 𝑤 = 𝑟𝑎𝑛𝑑(−1, 1) ⋅ 𝛾 10, IS strategy will exploit each dimension of 𝒖𝑔𝑖 in turn. To be specific,
⎩ Line 2 initializes the adaptive search step with 𝑠𝑗 = (𝑢𝑝𝑗 − 𝑙𝑜𝑤𝑗 ) ⋅ 0.5.
where 𝛾 is a threshold to control forecast; 𝑤, 𝑎, 𝑏 separately represent a Line 3 is the loop to process 𝑠𝑗 greater than or equal to 1. In lines 4 -
disturbance coefficient, a grey developmental coefficient, a grey control line 6, 𝒗1 and 𝒗2 are generated based on Eqs. (6)–(9). Line 7 updates
parameter. 𝑟𝑎𝑛𝑑(−1, 1) represents a random number sampled from a the individual 𝒖𝑔𝑖 by Eq. (10). In line 8, the value of 𝑠𝑗 will be halved.
uniform distribution with a range of [−1, 1]. Algorithm 1: Improved scatter search strategy
Input: 𝒖𝑔𝑖 ;
3.1.3. Selection operator
Output: A better 𝒖𝑔𝑖 than the previous one ;
Like most evolutionary algorithms, a greedy selection mechanism
1 for 𝑗 = 1 ∶ 𝐷 do
is applied to the selection process of GPE. The most potential individ-
2 Initialize adaptive search step: 𝑠𝑗 = (𝑢𝑝𝑗 − 𝑙𝑜𝑤𝑗 ) ⋅ 0.5 ;
uals will be retained in the next generation by comparing the fitness
3 while 𝑠𝑗 ≥ 1 do
function values of the target individual 𝒙𝑔𝑖 and trial individual 𝒖𝑔𝑖 . For
4 for 𝑘 = 1 ∶ 2 do
a minimization problem, if 𝒙𝑔𝑖 obtains a better fitness function value
5 Generate 𝒗𝑘 by Eqs. (6)–(9) ;
than 𝒖𝑔𝑖 , 𝒙𝑔𝑖 will be retained for the next generation. Instead, 𝒖𝑔𝑖 will
6 end
be retained for the next generation. The selection operator can be
7 Update the individual 𝒖𝑔𝑖 by Eq. (10) ;
expressed by the following equation.
8 𝑠𝑗 = 𝑠𝑗 ⋅ 0.5 ;
{ 𝑔
𝒖𝑖 , if 𝑓 (𝒖𝑔𝑖 ) < 𝑓 (𝒙𝑔𝑖 ) 9 end
𝒙𝑔+1
𝑖 = 𝑔 (5)
𝒙𝑖 , otherwise 10 end

3.2. Improved scatter search strategy IS strategy, SS strategy and AVM, which all focus on each dimension
of an individual, can enhance the local search ability of search-based
Inspired by DRTC, IS strategy that focuses on each dimension of an algorithms, but IS strategy has the following advantages. Specifically,
individual is designed. IS strategy can generate feasible similar indi- IS strategy exploits the left and right regions of each dimension of
viduals by exploiting each dimension of individual 𝒖𝑔𝑖 . The calculation an individual, while AVM always exploits the direction of fitness im-
process of IS strategy can be divided into the following two steps. provement. Thus, IS strategy can reduce the probability of falling into
Step 1: Generating new candidate individuals a local optimum. Being different from SS strategy, values which violate
constraints are reassigned in IS strategy. In addition, IS strategy takes
For the exploitation of the 𝑗th dimension of individual 𝒖𝑔𝑖 , two
into account the case that the input value of the current dimension
candidate individuals 𝑣1 and 𝑣2 are generated in each iteration accord-
of an individual is a boundary value. When the input value is the
ing to an adaptive search step 𝑠𝑗 , which will iteratively decrease to
lower bound and the optimal value is the upper bound, IS strategy can
a threshold based on the idea of the binary search. 𝑣1 , 𝑣2 and 𝒖𝑔𝑖 are
consume fewer resources to jump to the upper bound.
different in only one dimension. Suppose there be 𝑚 iterations in the
𝑗th dimension exploitation. That is, 𝑠𝑗 is equal to the preset threshold
3.3. Proposed algorithm: grey prediction evolution algorithm with improved
after 𝑚 iterations. In the 𝑡th iteration of the 𝑗th dimension of individual
scatter search strategy
𝒖𝑔𝑖 , two candidate individuals are generated by the following equations.

Based on the above GPE with strong exploration capability and IS


𝑠𝑗 = (𝑢𝑝𝑗 − 𝑙𝑜𝑤𝑗 ) ⋅ 0.5𝑡 , 𝑡 = 1, 2, … , 𝑚 (6) strategy with strong exploitation capability, a grey prediction evolution
algorithm with improved scatter search strategy (GPE-IS) is proposed
in this paper to drive the collaborative cooperation of the exploration
𝒗𝑘 = 𝒖𝑔𝑖 (7) and exploitation.
Like the original GPE, GPE-IS includes an initialization operation,
a reproduction operation, and a selection operation. Different from the
⎧ 𝑢𝑔 + 𝑠𝑗 ⋅ (2 ⋅ 𝑘 − 3), if 𝑢𝑔 ≠ 𝑙𝑜𝑤𝑗 and 𝑢𝑔 ≠ 𝑢𝑝𝑗 original GPE, GPE-IS is proposed by importing IS strategy to the end
⎪ 𝑔𝑖,𝑗 𝑔
𝑖,𝑗 𝑖,𝑗
of the reproduction operation of GPE. This strategy will further exploit
𝑣𝑘,𝑗 = ⎨ 𝑢𝑖,𝑗 − 𝑘 ⋅ 𝑠𝑗 , if 𝑢𝑖,𝑗 = 𝑢𝑝𝑗 (8)
⎪ 𝑢𝑔 + 𝑘 ⋅ 𝑠 , if 𝑢𝑔 = 𝑙𝑜𝑤 each trial individual generated by the reproduction operation.
⎩ 𝑖,𝑗 𝑗 𝑖,𝑗 𝑗
The pseudocode of GPE-IS is shown in Algorithm 2 (Java Codes can
where 𝑘 = 1, 2; be found in https://github.com/Zhongbo-Hu/Prediction-Evolutionary-

4
G. Cai, Q. Su and Z. Hu Engineering Applications of Artificial Intelligence 106 (2021) 104454

Algorithm-HOMEPAGE). Line 1 initializes three populations 𝑋 3 , 𝑋 2 , individual generated by GPE-IS represents a test case. Let the tested
𝑋 1 by Eq. (2). Lines 3–11 complete one iteration of GPE-IS. More program have 𝑑 input variables, and 𝐷𝑗 represents the domain of the
specifically, line 4–8 perform the reproduction operator of GPE. Lines 𝑗th input variable, then the search space of GPE-IS is 𝐷1 × 𝐷2 × ⋯ × 𝐷𝑑 .
9 performs IS strategy. In line 10, the selection operator of GPE is Step 2 GPE-IS is executed to generate a test case set. Then this test
performed by Eq. (5). case set are input into the tested program.
Algorithm 2: The procedure of GPE-IS Step 3 The tested program with these test cases is run.
Step 4 The running result is recorded, and the path coverage
Input: 𝑁, 𝐷, 𝑙𝑜𝑤, 𝑢𝑝.
information is updated. If the current target path is covered, then the
Output: Test cases.
next target path is selected according to Eq. (11).
1 Three populations 𝑋 3 , 𝑋 2 , 𝑋 1 are initialized by Eq. (2);
Step 5 The most potential test cases are selected to the next gen-
2 while termination conditions are not satisfied do
eration by comparing the fitness function values of generated test
3 for 𝑖 = 1 ∶ 𝑁 do
cases.
4 Three individuals 𝒙𝑟1 , 𝒙𝑟2 , 𝒙𝑟3 are randomly selected from
Step 6 A termination condition is checked. If 100% path coverage
𝑋 𝑔−2 , 𝑋 𝑔−1 , 𝑋 𝑔 (𝑔 ≥ 3), separately ;
or a maximum acceptable number of test case is reached, the operation
5 for 𝑗 = 1 ∶ 𝐷 do
| | | | to solve ATCG-PC is terminated. Otherwise, return to Step2.
6 𝑑12 = |𝑥𝑟1,𝑗 − 𝑥𝑟2,𝑗 |, 𝑑13 = |𝑥𝑟1,𝑗 − 𝑥𝑟3,𝑗 |,
| | | |
| |
𝑑23 = |𝑥𝑟2,𝑗 − 𝑥𝑟3,𝑗 |, 𝑀𝑎𝑥𝑑𝑟 = 𝑚𝑎𝑥{𝑑12 , 𝑑23 , 𝑑13 }, and 4.3. An example of employing GPE-IS to solve ATCG-PC
| |
𝑀𝑖𝑛𝑑𝑟 = 𝑚𝑖𝑛{𝑑12 , 𝑑23 , 𝑑13 } ;
7 Reproduction operator is performed by Eqs. (3) and This subsection gives an example to illustrate the process of using
(4); the proposed algorithm to generate test cases covering target paths. The
8 end instance shown in Fig. 1 serves as an illustration program. In Fig. 1,
9 Algorithm 1 is performed; < 𝑠, 1, 3, 𝑒 >, < 𝑠, 1, 2, 3, 𝑒 >, < 𝑠, 1, 3, 4, 𝑒 >, and < 𝑠, 1, 2, 3, 4,
10 Selection operator is performed by Eq. (5) ; 𝑒 > are the paths of the program, 𝑋 and 𝑌 are inputs with an integer
11 end and a two-character string respectively. Each character corresponds to
12 end an integer ASCII code in the range [0, 255]. For example, the ASCII
values 97, 98 and 99 represent the characters 𝑎, 𝑏 and 𝑐, respectively.
To calculate the fitness value of each input according to Table 1, the
4. Application of GPE-IS in ATCG-PC strings are encoded as real numbers based on the ASCII code. So the
input range of each character variable is set to [0, 255]. For simplicity,
This paper mainly studies the application of GPE-IS for the single- the input range of 𝑋 is also set to [0, 255]. The test cases of the program
path coverage problem in ATCG-PC. In this problem, uncovered paths can be expressed in (𝑋, 𝑌1 , 𝑌2 ), where 𝑌1 and 𝑌2 represent the first
are covered one by one via multiple iterations. Thus, the sorting of and second characters of 𝑌 respectively. Suppose that the test cases
target paths generally affects the performance of GPE-IS. In this section, generated by the initialization and reproduction operations of GPE-IS
a dynamic target path sorting method based on approach level (AL) is in the first iteration can only cover the path < 𝑠, 1, 3, 𝑒 >. Immediately,
first presented to assist GPE-IS for making full use of DRTC as much as IS strategy exploits each dimension of each test case generated by the
possible. Then, the overall process and an example of solving ATCG-PC reproduction operation.
based on GPE-IS, and a discussion on the effectiveness of GPE-IS for IS strategy first exploits the first test case generated by the repro-
ATCG-PC are described. duction operation. Let the first test case be (40, 65, 71), which covers
the path < 𝑠, 1, 3, 𝑒 >. According to Eq. (11), the path < 𝑠, 1, 2, 3,
4.1. Dynamic target path sorting method based on approach level 𝑒 > is first selected as the current target path. When the path < 𝑠, 1, 2,
3, 𝑒 > is covered, the path < 𝑠, 1, 2, 3, 4, 𝑒 > will be selected as the

Let uncovered path set be 𝑃 𝑢 = {𝑝𝑢1 , 𝑝𝑢2 , … , 𝑝𝑢𝜐 }, 𝑝𝑡 and 𝑝𝑡 re- next target path. To simplify but not lose generality, this subsection
spectively denote the current target path and the next target path. only shows the process of generating test cases for these two paths.
According to DRTC, when 𝑝𝑡 is covered, an uncovered path that is The process is shown in Fig. 3, in which b represents a random number

more similar to 𝑝𝑡 is preferred as 𝑝𝑡 . AL, which reflects the number generated by Eq. (9). The light red, light green, and light blue boxes in
of mismatched branch predicates between two paths, is employed to Fig. 3 represent the exploitation of the IS strategy in the first, second,
measure the similarity of two paths. The smaller the AL value is, the and third dimensions, respectively. In each box, 𝑠𝑗 value on the top
more similar the two paths are. Then the next target path is determined right represents the 𝑠𝑗 value in this iteration, the test cases from left to
by the following equation. right are candidate test case 𝑣1 , test case 𝑢𝑖 , and candidate test case 𝑣2 ,

𝑝𝑡 = arg min {𝐴𝐿(𝑝𝑢𝜅 , 𝑝𝑡 ) | 𝜅 = 1, 2, … , 𝜐} and the test case marked in red is the best one of the three test cases.
𝑢 𝑝𝜅
(11)
For example, in the first box in the light red column, the test cases (b,
Note: If the current target path is not covered after all dimensions of 65, 71) and (167, 65, 71) are candidate test cases 𝑣1 and 𝑣2 generated
𝒖𝑔𝑖 have been exploited by IS strategy, the current target path is likely by the IS strategy in the first iteration of the first dimension according
to be infeasible or GPE-IS may be trapped in a local optimum. When to the test case 𝑢𝑖 (40, 65, 71) and 𝑠1 127, and the test cases (40, 65,
this phenomenon occurs, a path will be randomly selected from the set 71) is the best one of the three test cases. In the exploitation of the
of remaining uncovered paths to replace the current target path. 1th dimension, the adaptive search step 𝑠1 is initialized to 127 (line 2
of Algorithm 1). And then, two candidate test cases, 𝒗1 = (b, 65, 71)
4.2. The overall process of solving ATCG-PC based on GPE-IS and 𝒗2 = (167, 65, 71), are generated (lines 4–6 of Algorithm 1). 𝒖𝑖 is
updated to (40, 65, 71) by comparing the fitness values of 𝒖𝑖 , 𝒗1 , and
Fig. 2 shows the overall view of GPE-IS being applied to ATCG-PC, 𝒗2 (line 7 of Algorithm 1). The adaptive search step 𝑠1 is updated to
and it can be described in the following six steps. 63 (line 8 of Algorithm 1). For the following iterations, 𝑠1 is updated
Step 1 In the ‘‘Static Analysis’’ part, the target path set, input to 31, 25, 7, 3, and 1 successively. The best test case generated in the
variables and their corresponding input types and domain are extracted exploitation of the 1th dimension is (97, 65, 71).
from a tested program. Meanwhile, the input variables and their cor- Subsequently, the 2th dimension are exploited (lines 2–9 of Algo-
responding input types and domain are input to GPE-IS. The search rithm 1). In the exploitation of the 2th dimension, 𝑠2 is updated to 63,
space of GPE-IS is the input domain of the tested program, and each 31, 25, 7, 3, and 1 successively (line 8 of Algorithm 1). The test case

5
G. Cai, Q. Su and Z. Hu Engineering Applications of Artificial Intelligence 106 (2021) 104454

Fig. 2. The overall process of solving ATCG-PC based on GPE-IS.

(97, 98, 71) covering the target path < 𝑠, 1, 2, 3, 𝑒 > is generated Stanford CoreNLP toolkit (‘initFactory’, ‘cleanXmlAnnotator’, ‘word-
in the 7th iteration of the 2th dimension. The current target path is sToSentenceAnnotator’, ‘annotate’, ‘nerClassifierCombiner’, and ‘set-
replaced with the path < 𝑠, 1, 2, 3, 4, 𝑒 >, and the 3th dimension are TrueCaseText’) (Manning et al., 2014), have been used by Huang
exploited (lines 2–9 of Algorithm 1) to find the test case covering this et al. (2018), Liu et al. (2019) and Dai et al. (2021), are employed
target path. In the 6th iteration of the 3th dimension, the test case (97, for experimental studies. These programs are equipped with nested
98, 99) covering target path < 𝑠, 1, 2, 3, 4, 𝑒 > is generated. structures (such as nested IF statements, etc.), relational operators ( =
The target paths < 𝑠, 1, 2, 3, 𝑒 > and < 𝑠, 1, 2, 3, 4, 𝑒 > are covered , ! = , <, >, ≤, ≥) and logical operators (AND, OR). They also contain
by the 28th and 39th test cases respectively. This result indicates that data types such as strings, integers, and so on. Tables 2 and 3 exhibit
the contribution of DRTC is considerable. the basic information and complexity of each benchmark program,
respectively. The programs No.1 to No.6 are fog computing programs
5. Simulation experiments and the programs No.7 to No.12 are NLP programs. In Table 2, the
columns ‘Program’, ‘Loc’, ‘Dim’, ‘Path’ and ‘Description’ mean the name,
In this section, a series of experiments are conducted on six fog number of lines, dimensions, number of paths and brief description
computing benchmark programs and six NLP benchmark programs to of each program, respectively. In Table 3, the column ‘Probability of
verify the performance of GPE-IS. Initially, the parameter sensitivity covering the most difficult path’ means the probability that the path
analysis of GPE-IS is presented to determine the value of the parameter with the smallest possible solutions is covered by a randomly generated
𝛾 of GPE-IS, and the parameter settings of other compared algorithms test case, and the column ‘Search space size’ means the number of
are described in the same subsection. Secondly, the effectiveness of IS possible solutions. From the column ‘Probability of covering the most
strategy is verified by comparing the performance of GPE-IS, GPE and difficult path’ of Table 3, it can be seen that the probability of the
GPE-SS. Finally, to verify the superiority of GPE-IS, GPE-IS is compared program No.2 is 0, and the program No.1 and programs No.4 to No.12
with some basic algorithms (DE, PSO, ABC, and crow search algo- have the lower probabilities than the program No.3. The reason is
rithm (CSA) (Jatana and Suri, 2020), some state-of-the-art algorithms that the program No.2 has some impossible paths, and the program
tested on the fog computing benchmark programs (RP-DE, RP-IGA, No.1 and programs No.4 to No.12 hold some paths which can only be
RP-ABC and RP-PSO) (Huang et al., 2018), and some state-of-the-art covered by specific strings. As can be seen from the column ‘Search
algorithms tested on the NLP benchmark programs (DE-SS, PSO-SS, space size’ in Table 3, the search space size of the program No.1 is
ABC-SS, IGA-SS and CSO-SS) (Liu et al., 2019). After analyzing the three the smallest (1.68E+07), while that of the program No.7 is the largest
experimental results, a comprehensive analysis of the experimental re- (6.04E+23).
sults is presented. All experiments are executed on a personal computer
with an Intel(R) Core(TM) i5-4210M CPU @ 2.60 GHz and 8 GB RAM.
The codes of the algorithms are implemented in Java 8. The benchmark 5.1.2. Evaluation indicators
programs and evaluation indicators for evaluating the performance of Three evaluation indicators, including the average number of test
each algorithm are introduced at the beginning of this section. cases (Ave.m), average path coverage (Ave.c), and average running
time (Ave.t), have been widely employed by other scholars to evaluate
5.1. Benchmark programs and evaluation indicators the performance of the algorithms in their researches (Mohi-Aldeen
et al., 2016; Liu et al., 2019; Huang et al., 2018; Dai et al., 2021).
In all experiments, the following twelve benchmark programs and In this paper, these evaluation indicators are adopted to measure the
three well-known evaluation indicators are employed. performance of all the algorithms.

1. Ave.m, i.e. the average number of test cases generated by an


5.1.1. Benchmark programs algorithm over 30 independent runs. A T-test with the significant
iFogSim (Gupta et al., 2017) and Stanford CoreNLP toolkit (Man- level of 0.05 is adopted to determine the significance between
ning et al., 2014) are hot applications in current society. iFogSim the ave.m of two algorithms in this paper.
is a modeling and simulation toolkit for resource management tech- 2. Ave.c, i.e. the average path coverage of an algorithm over 30
nologies in edge computing, internet of things and fog computing independent runs.
environments. Stanford CoreNLP toolkit is an extensible pipeline which 3. Ave.t, i.e. the average running time (ms) of an algorithm re-
can provide core natural language analysis. In this paper, the six fog quired for 30 independent runs.
computing benchmark programs from iFogSim (‘transmit’, ‘send’, ‘pro-
cessEvent’, ‘executeTuple’, ‘checkCloudletCompletion’, and ‘getResul- For the above evaluation indicators, fewer Ave.m, higher Ave.c, and
tantTuple’) (Gupta et al., 2017) and six NLP benchmark programs from shorter Ave.t values are desirable.

6
G. Cai, Q. Su and Z. Hu Engineering Applications of Artificial Intelligence 106 (2021) 104454

Table 2
The basic information of benchmark programs.
Program type No. Program Loc Dim Path Description
1 transmit 30 3 2 transfer data from one sensor to another
2 send 47 2 9 send packaged tuple from one fog-device or sensor to another fog device
3 processEvent 67 7 9 process events of fog-device
Fog computing programs
4 executeTuple 41 7 5 update device power consumption by using tuple processing logic
5 checkCloudletCompletion 43 5 6 be called on fog-device when tuple execution is complete
6 getResultantTuple 73 8 7 processed tuple is returned to application
7 initFactory 86 7 48 return the correct type of token according to options for properties file and typeclass
8 cleanXmlAnnotator 21 6 3 a new object of cleanXmlAnnotator is create
9 wordsToSentenceAnnotator 108 11 12 a text of natural language is converted to an annotator type
NLP programs
10 annotate 30 4 3 convert a comment into a sentence
11 nerClassifierCombiner 30 11 4 create a new object of nerClassifiercombiner class
12 setTrueCaseText 37 6 10 set the attribute of class trueCaseText

Table 3
The complexity of the benchmark programs.
Program type No. Probability of covering the most difficult path Search space size
1 5.96E−08 1.68E+07
2 0 4.00E+12
3 1.25E−06 6.00E+19
Fog computing programs
4 5.33E−15 8.44E+14
5 4.47E−08 6.71E+07
6 1.78E−15 1.69E+15
7 1.65E−24 6.04E+23
8 3.55E−15 1.84E+19
9 2.33E−10 1.89E+22
Natural language processing programs
10 8.33E−17 3.60E+16
11 3.55E−15 1.44E+17
12 9.07E−13 2.20E+12

Fig. 3. An example of employing GPE-IS to generate test cases for path coverage. (For interpretation of the references to color in this figure legend, the reader is referred to the
web version of this article.)

5.2. Parameters of the algorithms performed. The effect analysis of different 𝛾 values in {0.005𝐿, 0.01𝐿,
0.02𝐿, 0.04𝐿 and 0.08𝐿, where 𝐿 = 𝑢𝑝𝑗 −𝑙𝑜𝑤𝑗 } on GPE-IS is conducted
Generally speaking, the parameter values of algorithms will affect based on the twelve benchmark programs. Table 4 lists Ave.c and
the optimization effect. GPE is equipped with a vital parameter 𝛾, which Ave.m values obtained by five GPE-IS with different 𝛾 values, in which
controls the probability in which each prediction formula in Eq. (3) GPE-IS-0.005𝐿, GPE-IS-0.01𝐿, GPE-IS-0.02𝐿, GPE-IS-0.04𝐿 and GPE-
is used. A smaller 𝛾 value leads to a greater diversity of test cases
IS-0.08𝐿 represent GPE-IS with 𝛾 = 0.005𝐿, 0.01𝐿, 0.02𝐿, 0.04𝐿 and
generated by GPE-IS. The setting method of the parameter 𝛾 in this
0.08𝐿 respectively. The Ave.c and Ave.m values obtained by the five
paper is problem-oriented, because there are differences between the
domain intervals of different problems. 𝛾 value depends on the length GPE-IS with different 𝛾 values on different benchmark programs are
of the domain interval (𝑢𝑝𝑗 − 𝑙𝑜𝑤𝑗 ). To analyze the influence of the listed in columns ‘Ave.c’ and ‘Ave.m’ of Table 4 separately. The best
parameters 𝛾 on GPE-IS, a sensitivity analysis of the parameters 𝛾 is results on each benchmark program are highlighted in bold. As can

7
G. Cai, Q. Su and Z. Hu Engineering Applications of Artificial Intelligence 106 (2021) 104454

Fig. 4. The number of paths covered by generated test cases on different number of the generated test cases.

be seen from Table 4 that the Ave.c values obtained by the five GPE- 5.3. Verification of the proposed IS strategy
IS with different 𝛾 values are all 66.7% for the program No.2 and
100% for the remaining eleven programs, while GPE-IS-0.005𝐿 obtains To verify the effectiveness of IS strategy proposed in this paper, the
the smallest Ave.m values for seven benchmark programs, i.e., the performance of GPE-IS, GPE and GPE-SS is tested based on the above-
mentioned benchmark programs, parameter settings and evaluation
programs No.1, No.2, No.5, No.6, No.8, No.11, and No.12. To sum
indicators. Therein, GPE-SS and GPE-IS are separately the combination
up the above analysis, GPE-IS-0.005𝐿 is better than GPE-IS-0.01𝐿,
of GPE with SS strategy and IS strategy inspired by DRTC. If GPE-IS
GPE-IS-0.02𝐿, GPE-IS-0.04𝐿 and GPE-IS-0.08𝐿. Therefore, in the final outperforms GPE and GPE-SS, IS strategy is an effective technology that
two comparison experiments, the 𝛾 value of GPE-IS is set to 0.005 ⋅ takes full advantage of DRTC. The experimental results obtained by
(𝑢𝑝𝑗 − 𝑙𝑜𝑤𝑗 ). these algorithms in terms of Ave.c, Ave.m and Ave.t are tabulated in
Table 6. The best results on each benchmark program are highlighted
The parameter settings of other compared algorithms are shown in in bold. Columns ‘Ave.c’, ‘Ave.m’ and ‘Ave.t’ respectively show the
Table 5. These parameters are the same as the Liu et al. (2019), Huang Ave.c, Ave.m and Ave.t values obtained by each algorithm on different
et al. (2018), Jatana and Suri (2020) and Huang et al. (2017). benchmark programs. Additionally, the results of the hypothesis test

8
G. Cai, Q. Su and Z. Hu Engineering Applications of Artificial Intelligence 106 (2021) 104454

Table 4
Sensitivity analysis for parameter 𝛾 of GPE-IS.
Program GPE-IS-0.005𝐿 GPE-IS-0.01𝐿 GPE-IS-0.02𝐿 GPE-IS-0.04𝐿 GPE-IS-0.08𝐿
Ave.c Ave.m Ave.c Ave.m Ave.c Ave.m Ave.c Ave.m Ave.c Ave.m
No.1 100% 1.88E+02 100% 1.88E+02 100% 1.89E+02 100% 1.89E+02 100% 1.89E+02
No.2 66.7% 3.00E+05 66.7% 3.00E+05 66.7% 3.00E+05 66.7% 3.00E+05 66.7% 3.00E+05
No.3 100% 3.91E+02 100% 3.63E+02 100% 3.48E+02 100% 4.77E+02 100% 4.70E+02
No.4 100% 3.57E+02 100% 3.53E+02 100% 3.54E+02 100% 3.45E+02 100% 3.62E+02
No.5 100% 1.95E+02 100% 1.95E+02 100% 1.95E+02 100% 1.95E+02 100% 1.95E+02
No.6 100% 3.34E+02 100% 3.43E+02 100% 3.47E+02 100% 3.63E+02 100% 3.50E+02
No.7 100% 1.05E+04 100% 1.01E+04 100% 1.13E+04 100% 1.15E+04 100% 9.31E+03
No.8 100% 2.59E+02 100% 2.84E+02 100% 2.67E+02 100% 2.78E+02 100% 2.60E+02
No.9 100% 9.01E+02 100% 8.89E+02 100% 8.72E+02 100% 8.57E+02 100% 8.65E+02
No.10 100% 2.74E+02 100% 2.90E+02 100% 2.61E+02 100% 2.72E+02 100% 3.02E+02
No.11 100% 2.49E+02 100% 2.49E+02 100% 2.49E+02 100% 2.49E+02 100% 2.49E+02
No.12 100% 5.06E+02 100% 5.12E+02 100% 5.09E+02 100% 5.14E+02 100% 5.24E+02

Table 5
Experimental parameter setting.
Algorithm Parameter Value
Maximum acceptable number of test case 𝑀 3.00E+05
All algorithm Run times for each algorithm 30
Population size 50
GEP, GEP-SS, GPE-IS 𝛾 0.005*(𝑢𝑝𝑗 − 𝑙𝑜𝑤𝑗 )
𝑐1 , 𝑐2 1.6, 1.7
PSO
Weight value 𝑤 0.8
Limit 10 ⋅ 𝐷
ABC
Bee number 50
𝐴𝑃 0.05
CSA
𝐹𝑙 2
Factor Parameter 𝐹 0.5
DE, RP-DE, DE-SS
Crossover probability 𝑃𝑐 0.2
𝑐1 , 𝑐2 1.5, 2
RP-PSO, PSO-SS
Weight value 𝑤 0.4
Limit 2
RP-ABC, ABC-SS
Bee number 50
Crossover probability 0.8
RP-IGA, IGA-SS
Mutation probability 0.1
CSO-SS 𝑃 ℎ𝑖 0.8

for GPE-IS and each compared algorithm are presented in the brack- Fig. 4 visualizes the number of paths covered by test cases generated
ets, where ‘+’ (‘−’) indicates that GPE-IS performs significantly better via each algorithm, in which the abscissa and the ordinate represent
(worse) than the other algorithm in Ave.m, and ‘=’ indicates there is no the number of generated test cases and the number of covered paths,
significant difference between the Ave.m values of two algorithms. The
respectively. For instance, ⧫, ▴ and ◦ in the upper of Fig. 4(a) indicate
last row of the table shows the number of ‘+’, ‘=’ and ‘−’. Taking the
that the 188th test case generated by GPE-IS, the 10900th test case
sixth column of Table 6 as an example, ‘1.09E+04 (+)’ indicates that
the Ave.m obtained by GPE is 1.09E+04, and the Ave.m obtained by generated by GPE, and the 260th test case generated by GPE-SS cover
GPE-IS is significantly better than that obtained by GPE for the program the second path of the program No.1. As can be seen from Fig. 4, GPE-IS
No.1. ‘11/1/0’ indicates that the Ave.m values obtained by GPE-IS are can obtain 100% path coverage with the least number of test cases for
significantly better than those obtained by GPE for eleven benchmark all programs except programs No.2 and No.7. When these algorithms
programs, and the Ave.m value obtained by GPE-IS is no significant cover all six possible paths of No.2, the Ave.m obtained by GPE-IS is the
difference with that obtained by GPE for one benchmark program.
smallest. Two reasons cause GPE-IS having the best performance. First
In terms of Ave.c metric, as can be seen from Table 6, GPE-IS and
GPE-SS obtain the highest Ave.c values for all benchmark programs of all, IS strategy makes full use of the specific knowledge of ATCG-
(The highest Ave.c value of the program No.2 is 66.7% and those PC to find undiscovered paths. In addition, the consideration for the
of the remaining eleven programs are 100%.), while GPE obtains the redistribution of the violated constraint value and the case that the
highest Ave.c values for six out of twelve benchmark programs, i.e., the input value is the boundary value in IS strategy improves the efficiency
programs No.1, No.3, No.5, No.6, No.8, and No.11. This shows that IS of searching desire test cases.
strategy can enhance GPE to obtain higher Ave.c.
In terms of Ave.m metric, the results of the hypothesis test (+/=/−) With respect to Ave.t, GPE-IS outperforms GPE-SS and GPE for all
show that GPE-IS significantly outperforms GPE and GPE-SS for eleven benchmark programs except the program No.2. For the program No.2,
and ten out of twelve benchmark programs respectively, and performs the Ave.t of GPE-IS (6.06E+01) is slightly longer than that of GPE-SS
worse than GPE-SS only for the benchmark program No.7. For program (4.80E+01), but shorter than that of GPE (1.36E+02). The reason is
No.2, the Ave.m values obtained by all algorithms are the maximum that the time complexity of IS strategy is slightly greater than that of
acceptable number of test cases 𝑀. The reason is that the program No.2
SS strategy and smaller than that of GPE for generating a test case.
contains three impossible paths, so it is impossible for all algorithms to
obtain 100% path coverage, and all algorithms are terminated when According to the above observations, GPE-IS outperforms GPE and
the number of test cases obtained by them is 𝑀. GPE-SS in terms of Ave.c, Ave.m and Ave.t. So IS strategy is effective.

9
G. Cai, Q. Su and Z. Hu Engineering Applications of Artificial Intelligence 106 (2021) 104454

Table 6
Comparison of GPE-IS, GPE and GPE-SS.
Program GPE-IS GPE GPE-SS
Ave.c Ave.m Ave.t Ave.c Ave.m Ave.t Ave.c Ave.m Ave.t
No.1 100% 1.88E+02 1.20E+00 100% 1.09E+04 (+) 1.62E+01 100% 2.60E+02 (+) 5.20E+00
No.2 66.7% 3.00E+05 6.06E+01 51.7% 3.00E+05 (=) 1.36E+02 66.7% 3.00E+05 (=) 4.80E+01
No.3 100% 3.91E+02 3.10E+00 100% 8.06E+02 (+) 1.42E+01 100% 7.91E+03 (+) 3.07E+01
No.4 100% 3.57E+02 2.57E+00 94.0% 1.24E+05 (+) 1.85E+02 100% 4.82E+02 (+) 6.27E+00
No.5 100% 1.95E+02 2.20E+00 100% 2.39E+04 (+) 4.27E+01 100% 2.74E+02 (+) 8.33E+00
No.6 100% 3.34E+02 4.27E+00 100% 1.83E+05 (+) 2.23E+02 100% 5.20E+02 (+) 7.77E+00
No.7 100% 1.05E+04 2.38E+01 10.3% 3.00E+05 (+) 4.69E+02 100% 7.52E+03 (−) 2.97E+01
No.8 100% 2.59E+02 2.17E+00 100% 7.06E+04 (+) 9.49E+01 100% 3.26E+02 (+) 4.67E+00
No.9 100% 9.01E+02 1.59E+01 81.2% 3.00E+05 (+) 6.92E+02 100% 1.84E+03 (+) 2.29E+01
No.10 100% 2.74E+02 1.57E+00 66.0% 3.00E+05 (+) 1.80E+02 100% 5.29E+02 (+) 7.80E+00
No.11 100% 2.49E+02 2.50E+00 100% 8.57E+04 (+) 1.27E+02 100% 3.61E+02 (+) 5.20E+00
No.12 100% 5.06E+02 5.73E+00 49.0% 3.00E+05 (+) 2.99E+02 100% 7.91E+02 (+) 1.10E+01
Sumup + /=/− 11/1/0 10/1/1

Table 7
Comparison of statistical results in Ave.c.
Program GPE-IS DE PSO ABC CSA RP-DE RP-PSO RP-ABC RP-IGA DE-SS PSO-SS ABC-SS IGA-SS CSO-SS
No.1 100% 100% 100% 100% 100% 100% 58.3% 56.7% 100% – – – – –
No.2 66.7% 43.3% 45.1% 46.2% 43.3% 66.7% 66.7% 66.7% 66.7% – – – – –
No.3 100% 79.2% 84.1% 80.0% 78.8% 100% 100% 100% 100% – – – – –
No.4 100% 100% 85.3% 100% 98.0% 100% 42.7% 44.0% 58.7% – – – – –
No.5 100% 100% 100% 94.3% 92.6% 100% 76.2% 75.6% 96.7% – – – – –
No.6 100% 100% 71.9% 72.0% 71.0% 97.5% 15.9% 16.8% 43.9% – – – – –
No.7 100% 6.4% 4.3% 8.0% 6.9% – – – – 100% 100% 100% 100% 100%
No.8 100% 100% 100% 100% 100% – – – – 100% 100% 100% 100% 100%
No.9 100% 100% 63.8% 73.4% 60.1% – – – – 100% 100% 100% 100% 100%
No.10 100% 100% 66.0% 100% 100% – – – – 100% 100% 100% 100% 100%
No.11 100% 100% 89.2% 91.7% 86.7% – – – – 100% 100% 100% 100% 100%
No.12 100% 100% 32.3% 43.7% 32.0% – – – – 100% 100% 100% 100% 100%

Note. ‘‘-’’ means that the publishing algorithm has not been tested on the corresponding benchmark programs.

5.4. Comparison of GPE-IS and some other algorithms The Ave.m value obtained by GPE-IS is significantly worse than those of
DE-SS, PSO-SS, ABC-SS, IGA-SS and CSO-SS for the program No.7, and
The above experiments illustrate the effectiveness of IS strategy for significantly worse than those of IGA-SS and CSO-SS for the program
GPE. In this subsection, to further demonstrate the superiority of GPE- No.8. For programs with multiple similar paths like the program No.7,
IS, the performance of GPE-IS is compared with those of four basic GPE-IS is more likely to fall into a local optimum, because GPE-IS’s the
algorithms and nine published state-of-the-art algorithms, i.e., DE, PSO, dynamic target path sorting method based on AL has not randomness,
ABC, CSA, RP-DE, RP-IGA, RP-ABC, RP-PSO, DE-SS, IGA-SS, ABC-SS, while the target path sorting method equipped by DE-SS, IGA-SS,
PSO-SS and CSO-SS. The Ave.c, Ave.m and Ave.t values obtained by all ABC-SS, PSO-SS and CSO-SS has certain randomness.
algorithms are recorded in Tables 7–9 separately. The best results of In terms of Ave.t metric from Table 9, GPE-IS outperforms all the
these algorithms are highlighted in bold. The results of the hypothesis other compared algorithms for all benchmark programs except the
test, ‘+’, ‘=’ and ‘−’, are recorded in the brackets and are summarized programs No.2, and No.7 to No.9. For programs No.7 and No.8, the
in the last row of Table 8. Ave.t values obtained by GPE-IS are longer than those obtained by DE-
With respect to Ave.c from Table 7, it can be clearly seen that GPE- SS, IGA-SS, ABC-SS, PSO-SS and CSO-SS. The reason is that the numbers
IS obtains the highest Ave.c values for all benchmark program, DE-SS, of iterations obtained by GPE-IS are greater than or equal to those
IGA-SS, ABC-SS, PSO-SS and CSO-SS obtain the highest Ave.c values obtained by these five compared algorithms for the programs No.7 and
for the programs No.1 to No.6, while DE, PSO, ABC, CSA, RP-DE, RP- No.8, and the time complexity of GPE-IS is greater than those of these
IGA, RP-ABC and RP-PSO fail to obtain the highest Ave.c values for five compared algorithms.
some benchmark programs. For example, DE, PSO, ABC, and CSA fail to The above analysis shows that GPE-IS obtains the highest path
obtain the highest Ave.c values for the programs No.2, No.3, and so on. coverage with the fewest test cases and running time than all the other
RP-DE, RP-IGA, RP-ABC and RP-PSO cannot obtain the highest Ave.c algorithms. Therefore, the performance of GPE-IS is superior to the
values for the program No.5 and so on. The reason is that DE, PSO, other algorithms.
ABC, and CSA does not have the assistance of the prior knowledge,
5.5. Comprehensive analysis of experimental results
and RP-DE, RP-IGA, RP-ABC and RP-PSO may not obtain sufficient
hidden knowledge of ATCG-PC on some programs during the optimized
Based on the above experimental results and the design mechanism
process, these factors lead to their failure to obtain the highest Ave.c on
of GPE-IS, this subsection conducts an in-depth analysis on the pa-
some benchmark programs. While the prior knowledge of ATCG-PC can
rameter sensitivity of GPE-IS, the effectiveness of IS strategy and the
assist GPE-IS to obtain the highest Ave.c for all benchmark programs.
competitiveness of GPE-IS. The conclusions are as follows.
While with respect to Ave.m from Table 8, as shown in the results
of hypothesis test (+/=/−), the Ave.m values obtained by GPE-IS are • About the parameter 𝛾 of GPE-IS. The smaller 𝛾 value, the greater
significantly better than those of all the other compared algorithms for the probability in which the first formula in Eq. (3) will be used,
all benchmark programs except the programs No.2, No.7, and No.8. and the greater the diversity of test cases generated by GPE. In
This is because GPE-IS can use a discovered test case covering a path this paper, the five different 𝛾 values, i.e., 𝛾 = 0.005𝐿, 0.01𝐿,
to quickly find undiscovered test cases covering its similar paths, so 0.02𝐿, 0.04𝐿 and 0.08𝐿, are explored for their impact on the
that GPE-IS can obtain the highest path coverage with fewer test cases. performance of GPE-IS, where 𝐿 = (𝑢𝑝𝑗 −𝑙𝑜𝑤𝑗 ). It can be seen from

10
G. Cai, Q. Su and Z. Hu
Table 8
Comparison of statistical results in Ave.m.
Program GPE-IS DE PSO ABC CSA RP-DE RP-PSO RP-ABC RP-IGA DE-SS PSO-SS ABC-SS IGA-SS CSO-SS
No.1 1.88E+02 2.27E+03 (+) 5.79E+02 (+) 8.29E+02 (+) 1.61E+03 (+) 4.25E+03 (+) 2.73E+05 (+) 2.84E+05 (+) 3.91E+04 (+) – – – – –
No.2 3.00E+05 3.00E+05 (=) 3.00E+05 (=) 3.00E+05 (=) 3.00E+05 (=) 3.00E+05 (=) 3.00E+05 (=) 3.00E+05 (=) 3.00E+05 (=) – – – – –
No.3 3.91E+02 2.99E+05 (=) 2.85E+05 (+) 2.94E+05 (+) 3.00E+05 (+) 7.01E+03 (+) 5.80E+03 (+) 2.45E+03 (+) 1.34E+04 (+) – – – – –
No.4 3.57E+02 1.05E+04 (=) 1.15E+05 (+) 2.88E+03 (+) 6.32E+04 (+) 9.21E+04 (+) 3.00E+05 (+) 3.00E+05 (+) 2.63E+05 (+) – – – – –
No.5 1.95E+02 3.89E+03 (=) 1.24E+03 (+) 1.01E+05 (+) 1.32E+05 (+) 9.80E+03 (+) 3.00E+05 (+) 2.95E+05 (+) 9.23E+04 (+) – – – – –
No.6 3.34E+02 2.62E+04 (=) 3.00E+05 (+) 2.94E+05 (+) 3.00E+05 (+) 1.61E+05 (+) 3.00E+05 (+) 3.00E+05 (+) 2.81E+05 (+) – – – – –
11

No.7 1.05E+04 3.00E+05 (=) 3.00E+05 (+) 3.00E+05 (+) 3.00E+05 (+) – – – – 7.38E+03 (−) 8.59E+03 (−) 7.79E+03 (−) 7.89E+03 (−) 7.96E+03 (−)
No.8 2.59E+02 8.91E+03 (=) 1.63E+03 (+) 2.07E+03 (+) 6.49E+03 (+) – – – – 2.31E+02 (=) 2.30E+02 (=) 2.85E+02 (=) 2.04E+02 (−) 2.05E+02 (−)
No.9 9.01E+02 1.46E+04 (=) 3.00E+05 (+) 1.53E+05 (+) 3.00E+05 (+) – – – – 2.29E+03 (+) 3.32E+03 (+) 2.42E+03 (+) 1.94E+04 (+) 2.58E+03 (+)
No.10 2.74E+02 3.44E+04 (=) 3.00E+05 (+) 8.74E+03 (+) 1.67E+04 (+) – – – – 4.13E+02 (+) 4.78E+02 (+) 4.72E+02 (+) 6.99E+02 (=) 4.08E+02 (+)
No.11 2.49E+02 1.13E+04 (=) 1.31E+05 (+) 1.03E+05 (+) 1.71E+05 (+) – – – – 6.36E+02 (+) 7.61E+02 (+) 8.76E+02 (+) 2.22E+03 (+) 5.68E+02 (+)
No.12 5.06E+02 3.09E+04 (=) 3.00E+05 (+) 2.94E+05 (+) 3.00E+05 (+) – – – – 1.51E+03 (+) 1.96E+03 (+) 1.89E+03 (+) 1.10E+04 (+) 1.50E+03 (+)
Sumup + /=/− 11/1/0 11/1/0 11/1/0 11/1/0 5/1/0 5/1/0 5/1/0 5/1/0 5/0/1 5/0/1 5/0/1 4/0/2 4/0/2

Engineering Applications of Artificial Intelligence 106 (2021) 104454


Note. ‘‘-’’ indicates that the publishing algorithm has not been tested on the corresponding benchmark programs.
G. Cai, Q. Su and Z. Hu Engineering Applications of Artificial Intelligence 106 (2021) 104454

Table 9
Comparison of statistical results in Ave.t.
Program GPE-IS DE PSO ABC CSA RP-DE RP-PSO RP-ABC RP-IGA DE-SS PSO-SS ABC-SS IGA-SS CSO-SS
No.1 1.20E+00 7.40E+00 3.40E+00 8.67E+00 8.23E+00 3.23E+00 9.16E+01 6.79E+01 1.43E+01 – – – – –
No.2 6.06E+01 6.97E+01 6.97E+01 1.82E+02 6.22E+01 5.65E+01 5.63E+01 6.59E+01 7.48E+01 – – – – –
No.3 3.10E+00 1.33E+02 1.73E+02 1.80E+02 1.17E+02 5.90E+00 4.27E+00 3.13E+00 7.43E+00 – – – – –
No.4 2.57E+00 2.49E+01 1.02E+02 2.01E+01 6.14E+01 5.38E+01 1.49E+02 1.80E+02 1.29E+02 – – – – –
No.5 2.20E+00 1.42E+01 1.20E+01 9.47E+01 7.84E+01 1.40E+01 1.10E+02 1.16E+02 4.90E+01 – – – – –
No.6 4.27E+00 3.82E+01 2.50E+02 2.92E+02 2.16E+02 9.94E+01 1.24E+02 1.52E+02 1.43E+02 – – – – –
No.7 2.38E+01 2.09E+02 2.53E+02 2.76E+02 2.01E+02 – – – – 1.77E+01 1.78E+01 1.75E+01 1.75E+01 1.76E+01
No.8 2.17E+00 1.92E+01 1.52E+01 2.10E+01 1.95E+01 – – – – 1.83E+00 2.17E+00 2.00E+00 2.07E+00 1.73E+00
No.9 1.59E+01 2.74E+01 3.32E+02 1.52E+02 2.47E+02 – – – – 1.34E+01 1.37E+01 1.44E+01 2.75E+01 1.45E+01
No.10 1.57E+00 2.74E+01 1.60E+02 2.39E+01 2.38E+01 – – – – 3.23E+00 3.67E+00 2.83E+00 4.83E+00 2.67E+00
No.11 2.50E+00 2.37E+01 9.99E+01 1.11E+02 1.19E+02 – – – – 4.50E+00 7.40E+00 6.03E+00 1.15E+01 5.97E+00
No.12 5.73E+00 3.48E+01 1.86E+02 2.50E+02 1.85E+02 – – – – 1.15E+01 1.23E+01 1.12E+01 1.90E+01 1.08E+01

Note. ‘‘-’’ means that the publishing algorithm has not been tested on the corresponding benchmark programs.

Table 4 that the performance of the five GPE-IS with different The performance of GPE-IS was verified on the six fog computing
𝛾 values is the same in terms of Ave.c. In terms of Ave.m, the benchmark programs and six NLP benchmark programs. The experi-
Ave.m values obtained by the five GPE-IS with different 𝛾 values mental results are summarized as follows. (1) IS strategy is an effective
have very little difference, even the same in many programs, such technology that takes full advantage of the prior knowledge DRTC.
as the program No.1, No.2. No.5, No.11 and so on. The Ave.m (2) In GPE-IS, GPE is expected to generate more diverse test cases.
obtained by GPE-IS with 𝛾 = 0.005 ⋅ (𝑢𝑝𝑗 − 𝑙𝑜𝑤𝑗 ) is slightly better (3) GPE-IS can obtain the highest path coverage with the fewest test
than the Ave.m values obtained by GPE-IS with the other four cases and running time in all compared algorithms. Therefore, GPE-IS
𝛾 values. This shows that GPE-IS expects GPE to generate more is competitive for solving ATCG-PC.
diverse test cases to cover a variety of different paths, because IS The proposed algorithm is competitive, yet has some defects. The
strategy has strong local search capability. dynamic target path sorting method based on AL in GPE-IS has no
• About the effectiveness of IS strategy. The effectiveness of IS strategy randomness. For programs with multiple similar paths, like program
is verified by comparing the performance of GPE-IS, GPE-SS and No.7, GPE-IS may fall into a local optimum. In addition, GPE-IS focuses
GPE. As can be seen from Table 6, the performance of GPE-IS is on the single-path coverage problem that considers one target path in
better than that of GPE in terms of Ave.c, Ave.m and Ave.t, and one iteration, so GPE-IS is limited in solving the multi-path coverage
GPE-SS in terms of Ave.m and Ave.t. This shows that IS strategy problem that considers multiple target paths in one iteration. In the
is an effective technology that takes full advantage of DRTC, and future scope of the work, the research directions are as follows. (1)
more competitive than SS strategy for ATCG-PC. a target path sorting method with randomness will be studied. (2) IS
• About the competitiveness of GPE-IS. As can be seen from Tables 7– strategy will be combined with other search-based algorithms, such as
9, the overall performance of GPE-IS is better than DE, PSO, ABC, multi-objective scatter search approach (e Silva et al., 2013), to solve
and CSA and RP-DE, RP-IGA, RP-ABC and RP-PSO in terms of the multi-path coverage problem.
Ave.c, Ave.m and Ave.t, and better than DE-SS, PSO-SS, ABC-SS, The proposed algorithm can be considered as a case of using
IGA-SS and CSO-SS in terms of Ave.m and Ave.t. This shows that problem-specific knowledge to enhance the performance of GPE for
GPE-IS, which makes full use of the prior knowledge of ATCG-PC, solving ATCG-PC. DRTC and other problem-specific knowledge can also
is competitive. The performance of the basic algorithms (DE, PSO, be used to enhance the performance of other search-based algorithms,
ABC, and CSA) which do not make use of the specific knowledge such as arithmetic optimization algorithm (Abualigah et al., 2021b),
of ATCG-PC is mediocre, while the search-based algorithms with sine cosine algorithm (Abualigah and Diabat, 2021), etc. In addition,
relation matrix (RP-DE, RP-IGA, RP-ABC and RP-PSO) may fail it is hoped that GPE-IS or the design idea of GPE-IS will provide a
to retrieve sufficient hidden knowledge of ATCG-PC in the op- new perspective for solving other problems, such as mutation testing,
timization process for assisting it to obtain better performance. traveling salesman problem, etc.
Although GPE-IS is competitive, it has a drawback. For the pro-
gram No.7 with multiple similar paths, GPE-IS may fall into a CRediT authorship contribution statement
local optimum, due to its adoption of a dynamic target path
sorting method which has no randomness. Gaocheng Cai: Methodology, Validation, Writing - original draft,
Software. Qinghua Su: Conceptualization, Writing - review & editing,
6. Conclusions Visualization. Zhongbo Hu: Data curation, Writing - review & editing,
Investigation.
Inspired by the prior knowledge of ATCG-PC, i.e., the dimension
correlation of test cases (DRTC), this paper first designed an improved Declaration of competing interest
scatter search (IS) strategy holding strong local search ability and
then proposed a grey prediction evolution algorithm with improved The authors declare that they have no known competing finan-
scatter search strategy (GPE-IS). In the proposed algorithm, IS strategy cial interests or personal relationships that could have appeared to
explores each dimension of the discovered test case covering a definite influence the work reported in this paper.
path to quickly generate undiscovered test cases for uncovered paths,
while GPE maintains the diversity of test cases to cover a variety of Acknowledgments
different paths. In addition, a dynamic target path sorting method
based on AL is proposed to assist GPE-IS to utilize DRTC as much as This work was supported in part by the State Key Laboratory of Bio-
possible. This is the first time that GPE is applied to solve ATCG-PC, and geology and Environmental Geology (China University of Geosciences,
GPE-IS can effectively find undiscovered test cases covering multiple No. GBL21801), the National Nature Science Foundation of China (No.
uncovered paths by exploring a discovered test case in theory. 61972136).

12
G. Cai, Q. Su and Z. Hu Engineering Applications of Artificial Intelligence 106 (2021) 104454

References Korel, B., 1990. Automated software test data generation. IEEE Trans. Softw. Eng. 16
(8), 870–879.
Abualigah, L., Diabat, A., 2021. Advances in Sine cosine algorithm: A comprehensive Lin, J., Yeh, P., 2001. Automatic test data generation for path testing using GAs. Inform.
survey. Artif. Intell. Rev. 54, 2567–2608. Sci. 131 (1–4), 47–64.
Abualigah, L., Diabat, A., Mirjalili, S., Elsayed Abd Elaziz, M., Gandomi, A., 2021b. Liu, F., Huang, H., Yang, Z., Hao, Z., Wang, J., 2019. Search-based algorithm with
The arithmetic optimization algorithm. Comput. Methods Appl. Mech. Engrg. 376, scatter search strategy for automated test case generation of NLP toolkit. IEEE
113609. Trans. Emerg. Top. Comput. Intell. 1–13.
Anand, S., Burke, E.K., Chen, T.Y., Clark, J., Cohen, M.B., Grieskamp, W., Harman, M., Lv, X., Huang, S., Hui, Z., Ji, H., 2018. Test cases generation for multiple paths based
Harrold, M.J., Mcminn, P., 2013. An orchestrated survey of methodologies for on PSO algorithm with metamorphic relations. IET Softw. 12 (4), 306–317.
automated software test case generation. J. Syst. Softw. 86 (8), 1978–2001. Mala, D.J., Mohan, V., Kamalapriya, M., 2010. Automated software test optimisation
Bhattacharjee, G., Pati, P., 2014. A novel approach for test path generation and framework–an artificial bee colony optimisation-based approach. IET Softw. 4 (5),
prioritization of uml activity diagrams using tabu search algorithm. Int. J. Sci. 334–348.
Eng. Res. 52, 1212–1217. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.,
Bueno, P.M.S., Jino, M., 2002. Automatic test data generation for program paths using 2014. The stanford corenlp natural language processing toolkit. In: Proceedings
genetic algorithms. Int. J. Softw. Eng. Knowl. Eng. 12 (06), 691–709. of 52nd Annual Meeting of the Association for Computational Linguistics: System
Cao, Y., Hu, C., Li, L., 2009. Search-based multi-paths test data generation for structure- Demonstrations. Association for Computer Linguistics, pp. 55–60.
oriented testing. In: Proceedings of the First ACM/SIGEVO Summit on Genetic and Mohi-Aldeen, S.M., Mohamad, R., Deris, S., 2016. Application of negative selection
Evolutionary Computation. ACM, pp. 25–32. algorithm (NSA) for test data generation of path testing. Appl. Soft Comput. 49,
Clarke, L.A., 1976. A system to generate test data and symbolically execute programs. 1118–1128.
IEEE Trans. Softw. Eng. 2 (3), 215–222. Nosrati, M., Haghighi, H., Asl, M.V., 2020. Using likely invariants for test data
Dai, X., Gong, W., Gu, Q., 2021. Automated test case generation based on differential generation. J. Syst. Softw. 164, 110549.
evolution with node branch archive. Comput. Ind. Eng. 107290. Saadatjoo, M.A., Babamir, S.M., 2019. Test-data generation directed by program path
Dai, C., Hu, Z., Li, Z., Xiong, Z., Su, Q., 2020. An improved grey prediction coverage through imperialist competitive algorithm. Sci. Comput. Program. 184,
evolution algorithm based on topological opposition-based learning. IEEE Access 102304.
8, 30745–30762. Sahin, O., Akay, B., 2016. Comparisons of metaheuristic algorithms and fitness functions
Gao, C., Hu, Z., Xiong, Z., Su, Q., 2020. Grey prediction evolution algorithm based on on software test data generation. Appl. Soft Comput. 49, 1202–1214.
accelerated even grey model. IEEE Access 8, 107941–107957. Sahoo, R.R., Ray, M., 2020. PSO-based test case generation: A fitness function based
Gong, D., Tian, T., Wang, J., Du, Y., Li, Z., 2020. A novel method of grouping target on value combined branch distance. In: Adv. Comput. Intell. Eng.. Springer, pp.
paths for parallel programs. Parallel Comput. 97, 102665. 589–598.
Gupta, H., Vahid Dastjerdi, A., Ghosh, S.K., Buyya, R., 2017. IFogSim: A toolkit e Silva, M.d.A.C., Klein, C.E., Mariani, V.C., dos Santos Coelho, L., 2013. Multiob-
for modeling and simulation of resource management techniques in the Internet jective scatter search approach with new combination scheme applied to solve
of Things, Edge and Fog computing environments. Softw. - Pract. Exp. 47 (9), environmental/economic dispatch problem. Energy 53, 14–21.
1275–1296. Srivastava, P.R., Baby, K., Raghurama, G., 2009. An approach of optimal path
Hu, Z., Gao, C., Su, Q., 2021. A novel evolutionary algorithm based on even difference generation using ant colony optimization. In: Tencon IEEE Region 10 Conference.
grey model. Expert Syst. Appl. 176, 114898. IEEE, pp. 1–6.
Hu, Z., Li, Z., Dai, C., Xu, X., Xiong, Z., Su, Q., 2020b. Multiobjective grey prediction Sun, B., Wang, J., Gong, D., Tian, T., 2019. Scheduling sequence selection for generating
evolution algorithm for environmental/economic dispatch problem. IEEE Access 8, test data to cover paths of MPI programs. Inf. Softw. Technol. 114, 190–203.
84162–84176. Tracey, N., Clark, J., Mander, K., McDermid, J., 1998. An automated framework for
Hu, Z., Xu, X., Su, Q., Zhu, H., Guo, J., 2020a. Grey prediction evolution algorithm structural test-data generation. In: Proceedings 13th IEEE International Conference
for global optimization. Appl. Math. Model. 79, 145–160. on Automated Software Engineering. IEEE, pp. 285–288.
Huang, H., Liu, F., Yang, Z., Hao, Z., 2018. Automated test case generation based on Xing, Y., Gong, Y., Wang, Y., Zhang, X., 2015. The application of iterative interval
differential evolution with relationship matrix for IFOGSIM toolkit. IEEE Trans. Ind. arithmetic in path-wise test data generation. Eng. Appl. Artif. Intell. 45, 441–452.
Inf. 14 (11), 5005–5016. Xu, X., Hu, Z., Su, Q., Li, Y., Dai, J., 2020. Multivariable grey prediction evolution
Huang, H., Liu, F., Zhuo, X., Hao, Z., 2017. Differential evolution based on self-adaptive algorithm: A new metaheuristic. Appl. Soft Comput. 89, 106086.
fitness function for automated test case generation. IEEE Comput. Intell. Mag. 12 Yao, X., Gong, D., 2014. Genetic algorithm-based test data generation for multiple paths
(2), 46–55. via individual sharing. Comput. Intell. Neurosci. 2014, 591294.
Jatana, N., Suri, B., 2020. An improved crow search algorithm for test data generation Zamli, K.Z., Alkazemi, B.Y., Kendall, G., 2016. A tabu search hyper-heuristic strategy
using search-based mutation testing. Neural Process. Lett. 52 (1), 767–784. for t-way test suite generation. Appl. Soft Comput. 44, 57–74.
Khari, M., Sinha, A., Verdu, E., Crespo, R.G., 2020. Performance analysis of six meta- Zhou, T., Hu, Z., Zhou, Q., Yuan, S., 2021. A novel grey prediction evolution algorithm
heuristic algorithms over automated test suite generation for path coverage-based for multimodal multiobjective optimization. Eng. Appl. Artif. Intell. 100, 104173.
optimization. Soft Comput. 24 (12), 9143–9160.

13

You might also like